Releases · li-plus/chatglm.cpp

Dynamic memory allocation on demand to fully utilize device memory. No preset scratch size or memory size any more.
Drop Baichuan/InternLM support since they were integrated in llama.cpp.
API change:
- CMake CUDA option: -DGGML_CUBLAS changed to -DGGML_CUDA
- CMake CUDA architecture: -DCUDA_ARCHITECTURES changed to -DCMAKE_CUDA_ARCHITECTURES
- num_threads in GenerationConfig was removed: the optimal thread settings will be automatically selected.

Provide feedback