Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically determine the environment #43

Open
louis030195 opened this issue May 16, 2023 · 3 comments
Open

Automatically determine the environment #43

louis030195 opened this issue May 16, 2023 · 3 comments

Comments

@louis030195
Copy link

Hi!

This is a wonderful lib that I use in https://github.com/different-ai/embedbase/tree/main/sdk/embedbase-js

I was wondering if it could be possible to automatically determine the environment and import accordingly (e.g. browser, nextjs, vercel edge, etc)? It would be really great and reduce verbosity. Happy to contribute

@dqbd
Copy link
Owner

dqbd commented May 18, 2023

Hello @louis030195!

I believe it could be possible through exports field in package.json, where we could specify different entrypoint for a different engine: https://runtime-keys.proposal.wintercg.org/

I do think it does necessitate an API change, exposing a magic init() function which will perform the import() of WASM files when needed (?module suffix for Vercel etc).

At the moment I'm swarmed with a lot of work, but if you want to take a knack at it, would love to help 😃

@louis030195
Copy link
Author

Interesting, btw do you know how to unit test different environment? Maybe https://edge-runtime.vercel.app/packages/vm?

ATM found this hack somewhere (probably from you) which works both in node and vercel:

export async function getEncoding(
    encoding: TiktokenEncoding,
    options?: {
        signal?: AbortSignal;
        extendedSpecialTokens?: Record<string, number>;
    }
) {
    if (!(encoding in cache)) {
        cache[encoding] =
            getFetch()(`https://tiktoken.pages.dev/js/${encoding}.json`, {
                signal: options?.signal,
            })
                .then((res) => res.json())
                .catch((e) => {
                    delete cache[encoding];
                    throw e;
                });
    }
    const enc = new Tiktoken(await cache[encoding], options?.extendedSpecialTokens);
    // // @ts-ignore
    // const registry = new FinalizationRegistry((heldValue) => {
    //     heldValue.free()
    // });
    // registry.register(enc);
    return enc
}

I'm not 100% sure where the .free() went, bit afraid of memory leaks

@dqbd
Copy link
Owner

dqbd commented May 18, 2023

Some preliminary work has been done here: https://github.com/dqbd/package-exports-edge, where different dev environments are set up and tested using vitest with puppeteer etc.

The snippet was most likely from the recent PR done on LangChain, where I've swapped the WASM bindings with a JS port instead (js-tiktoken). The rationale was the excessive bundle size and relative complexity to get started with LangchainJS: langchain-ai/langchainjs#1239. That is why no .free() is needed 😄

You might want to consider swapping the implementation as well, considering the trade-offs such as worse performance (around 5x slower, although a PR can reduce the difference to just 2.5x slower).

Regardless, I think the general goal would be to create an isomorphic library, which will load the JS version on edge and WASM in Node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants