-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[shortfin] Add C++ tokenizer wrapper library. #610
Conversation
stellaraccident
commented
Nov 26, 2024
•
edited
Loading
edited
- This is gated by SHORTFIN_ENABLE_TOKENIZERS (presently off).
- I'd like to either take over the wrapper or get Abort is not a great error handling strategy mlc-ai/tokenizers-cpp#50 before putting much weight on this.
- There is no great C++ option for this component, so we go to the trouble of integrating a Rust component. We will need to do a bit of prep on our CI systems to enable this by default.
- Python API will be added in a subsequent commit. This should be more efficient than the tokenizers Python API since we will allow direct access to the tokens vs doing a lot of conversions.
- Size analysis: Prior to this patch, libshortfin was 1.8MB, which gave us an entire GPU and CPU runtime stack. After this patch (stripped) it is 8.4MB. Given how important the use case is, I'm willing to tolerate this for the moment. It seems like there is room for something better here, which is why I did not expose the underlying vendor'd API directly (edit: by switching to a nightly rust and activating a bunch of options from https://github.com/johnthagen/min-sized-rust, I was able to produce a binary that is 4.2MB, which is more reasonable).
* This is gated by SHORTFIN_ENABLE_TOKENIZERS (presently off). * I'd like to either take over the wrapper or get mlc-ai/tokenizers-cpp#50 before putting much weight on this. * There is no great C++ option for this component, so we go to the trouble of integrating a Rust component. We will need to do a bit of prep on our CI systems to enable this by default. * Python API will be added in a subsequent commit. This should be more efficient than the tokenizers Python API since we will allow direct access to the tokens vs doing a lot of conversions. * Obligatory language flame bait: Use Rust, they said. It's super efficient. Prior to this patch, libshortfin was 1.8MB, which gave us an entire GPU and CPU runtime stack. After this patch (stripped) it is 8.4MB. Given how important the use case is, I'm willing to tolerate this for the moment. It seems like there is room for something better here, which is why I did not expose the underlying vendor'd API directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Thanks for filing those upstream issues to improve the binary size and error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Scott has already pointed out everything I might have said. No additional comments.
PTAL |
Progress on #130. The manylinux2014 image includes gcc 10.2.1 by default while manylinux_2_28 includes gcc 12.2.1. At one point we had warnings/errors building on the newer gcc version, but that is no longer the case. With the new Rust dependency coming from #610, we will likely want to revive https://github.com/nod-ai/base-docker-images/blob/main/dockerfiles/manylinux_x86_64.Dockerfile, add more dependencies there, then switch from the upstream `quay.io/...` image to that `ghcr.io/nod-ai/...` image. Tested locally with `OUTPUT_DIR="/tmp/wheelhouse" sudo -E ./build_tools/build_linux_package.sh`. If the nightly package build fails for some reason we can easily revert this.