Fix for vLLM cache kernel signature update (fp8 attention) #223

masahi · 2024-02-27T08:09:27Z

To comply with the vllm kernel change in https://github.com/octoml/tvm/pull/48

This PR adds the following: * A Python chat module with the same functionality defined in the CLI (note that this requires a module without tvm_runtime dependency, see changes to CMakeLists.txt) * A REST API that supports some common endpoints for interacting with Vicuna and RedPajama with streaming support * A sample client that shows you how to use the endpoints * Some documentation on how to run the server and client

fix

63304c2

masahi merged commit 4b59cfa into octoml:batch-serving Feb 27, 2024
1 check passed

masahi deleted the cache-dtype-fix branch February 27, 2024 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for vLLM cache kernel signature update (fp8 attention) #223

Fix for vLLM cache kernel signature update (fp8 attention) #223

masahi commented Feb 27, 2024

Fix for vLLM cache kernel signature update (fp8 attention) #223

Fix for vLLM cache kernel signature update (fp8 attention) #223

Conversation

masahi commented Feb 27, 2024