Automatically determine the environment #43

louis030195 · 2023-05-16T13:07:10Z

Hi!

This is a wonderful lib that I use in https://github.com/different-ai/embedbase/tree/main/sdk/embedbase-js

I was wondering if it could be possible to automatically determine the environment and import accordingly (e.g. browser, nextjs, vercel edge, etc)? It would be really great and reduce verbosity. Happy to contribute

dqbd · 2023-05-18T12:35:59Z

Hello @louis030195!

I believe it could be possible through exports field in package.json, where we could specify different entrypoint for a different engine: https://runtime-keys.proposal.wintercg.org/

I do think it does necessitate an API change, exposing a magic init() function which will perform the import() of WASM files when needed (?module suffix for Vercel etc).

At the moment I'm swarmed with a lot of work, but if you want to take a knack at it, would love to help 😃

louis030195 · 2023-05-18T15:29:44Z

Interesting, btw do you know how to unit test different environment? Maybe https://edge-runtime.vercel.app/packages/vm?

ATM found this hack somewhere (probably from you) which works both in node and vercel:

export async function getEncoding(
    encoding: TiktokenEncoding,
    options?: {
        signal?: AbortSignal;
        extendedSpecialTokens?: Record<string, number>;
    }
) {
    if (!(encoding in cache)) {
        cache[encoding] =
            getFetch()(`https://tiktoken.pages.dev/js/${encoding}.json`, {
                signal: options?.signal,
            })
                .then((res) => res.json())
                .catch((e) => {
                    delete cache[encoding];
                    throw e;
                });
    }
    const enc = new Tiktoken(await cache[encoding], options?.extendedSpecialTokens);
    // // @ts-ignore
    // const registry = new FinalizationRegistry((heldValue) => {
    //     heldValue.free()
    // });
    // registry.register(enc);
    return enc
}

I'm not 100% sure where the .free() went, bit afraid of memory leaks

dqbd · 2023-05-18T15:36:14Z

Some preliminary work has been done here: https://github.com/dqbd/package-exports-edge, where different dev environments are set up and tested using vitest with puppeteer etc.

The snippet was most likely from the recent PR done on LangChain, where I've swapped the WASM bindings with a JS port instead (js-tiktoken). The rationale was the excessive bundle size and relative complexity to get started with LangchainJS: langchain-ai/langchainjs#1239. That is why no .free() is needed 😄

You might want to consider swapping the implementation as well, considering the trade-offs such as worse performance (around 5x slower, although a PR can reduce the difference to just 2.5x slower).

Regardless, I think the general goal would be to create an isomorphic library, which will load the JS version on edge and WASM in Node.

dqbd mentioned this issue Jun 2, 2023

Document different package options #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically determine the environment #43

Automatically determine the environment #43

louis030195 commented May 16, 2023

dqbd commented May 18, 2023

louis030195 commented May 18, 2023

dqbd commented May 18, 2023 •

edited

Loading

Automatically determine the environment #43

Automatically determine the environment #43

Comments

louis030195 commented May 16, 2023

dqbd commented May 18, 2023

louis030195 commented May 18, 2023

dqbd commented May 18, 2023 • edited Loading

dqbd commented May 18, 2023 •

edited

Loading