Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shortfin] Add C++ tokenizer wrapper library. #610

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

stellaraccident
Copy link
Contributor

@stellaraccident stellaraccident commented Nov 26, 2024

  • This is gated by SHORTFIN_ENABLE_TOKENIZERS (presently off).
  • I'd like to either take over the wrapper or get Abort is not a great error handling strategy mlc-ai/tokenizers-cpp#50 before putting much weight on this.
  • There is no great C++ option for this component, so we go to the trouble of integrating a Rust component. We will need to do a bit of prep on our CI systems to enable this by default.
  • Python API will be added in a subsequent commit. This should be more efficient than the tokenizers Python API since we will allow direct access to the tokens vs doing a lot of conversions.
  • Size analysis: Prior to this patch, libshortfin was 1.8MB, which gave us an entire GPU and CPU runtime stack. After this patch (stripped) it is 8.4MB. Given how important the use case is, I'm willing to tolerate this for the moment. It seems like there is room for something better here, which is why I did not expose the underlying vendor'd API directly (edit: by switching to a nightly rust and activating a bunch of options from https://github.com/johnthagen/min-sized-rust, I was able to produce a binary that is 4.2MB, which is more reasonable).

* This is gated by SHORTFIN_ENABLE_TOKENIZERS (presently off).
* I'd like to either take over the wrapper or get mlc-ai/tokenizers-cpp#50 before putting much weight on this.
* There is no great C++ option for this component, so we go to the trouble of integrating a Rust component. We will need to do a bit of prep on our CI systems to enable this by default.
* Python API will be added in a subsequent commit. This should be more efficient than the tokenizers Python API since we will allow direct access to the tokens vs doing a lot of conversions.
* Obligatory language flame bait: Use Rust, they said. It's super efficient. Prior to this patch, libshortfin was 1.8MB, which gave us an entire GPU and CPU runtime stack. After this patch (stripped) it is 8.4MB. Given how important the use case is, I'm willing to tolerate this for the moment. It seems like there is room for something better here, which is why I did not expose the underlying vendor'd API directly.
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Thanks for filing those upstream issues to improve the binary size and error handling.

shortfin/CMakeLists.txt Show resolved Hide resolved
shortfin/CMakeLists.txt Show resolved Hide resolved
shortfin/CMakeLists.txt Outdated Show resolved Hide resolved
shortfin/CMakeLists.txt Outdated Show resolved Hide resolved
shortfin/CMakeLists.txt Outdated Show resolved Hide resolved
shortfin/build_tools/cmake/shortfin_testing.cmake Outdated Show resolved Hide resolved
shortfin/src/shortfin/components/tokenizers/CMakeLists.txt Outdated Show resolved Hide resolved
shortfin/src/shortfin/components/tokenizers/CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Scott has already pointed out everything I might have said. No additional comments.

@stellaraccident
Copy link
Contributor Author

PTAL

@stellaraccident stellaraccident merged commit cdb4ccd into main Nov 27, 2024
13 of 19 checks passed
@stellaraccident stellaraccident deleted the shortfin_bundle_tokenizers branch November 27, 2024 03:01
ScottTodd added a commit that referenced this pull request Dec 2, 2024
Progress on #130.

The manylinux2014 image includes gcc 10.2.1 by default while
manylinux_2_28 includes gcc 12.2.1. At one point we had warnings/errors
building on the newer gcc version, but that is no longer the case.

With the new Rust dependency coming from
#610, we will likely want to
revive
https://github.com/nod-ai/base-docker-images/blob/main/dockerfiles/manylinux_x86_64.Dockerfile,
add more dependencies there, then switch from the upstream `quay.io/...`
image to that `ghcr.io/nod-ai/...` image.

Tested locally with `OUTPUT_DIR="/tmp/wheelhouse" sudo -E
./build_tools/build_linux_package.sh`. If the nightly package build
fails for some reason we can easily revert this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants