Context and model enhancements #510

brittlewis12 · 2024-09-25T02:44:33Z

As mentioned in #505 (comment), various enhancements to Context & Model capablities:

mirostat v1 sampling
llama_token_is_eog equivalent
llama_kv_cache_seq_rm Context method for kv cache manipulation
flash_attn, offload_kqv context parameters

* enables checking against all stop tokens defined by tokenizer for a given model -- EOS, EOT, etc

MarcusDunn

Looks good. I'd like a signature change to reflect how negative numbers work. This is a great addition.

llama-cpp-2/src/context/kv_cache.rs

* enable interesting cache manipulation use cases, from removing recent messages, truncating non- special token stop sequences, & more. * express logic relying on negative values as Options of `u16`, to ensure positive values that fit into `i32` with safe conversion - this means sequence and llama_pos values above ~65k will not be addressable directly, and will need to use the `None` semantics

* return `Result`s to handle failed u32 -> i32 conversion * unify kv cache seq rm methods

MarcusDunn · 2024-09-27T02:14:01Z

Looks good. If Linux tests and Mac build pass I'll merge.

brittlewis12 added 4 commits September 24, 2024 22:36

Demonstrate using candidates in simple example

5de722f

Expose mirostat v1 sampling

d8f80cc

Expose llama_token_is_eog equivalent on Model

3316646

* enables checking against all stop tokens defined by tokenizer for a given model -- EOS, EOT, etc

Demonstrate is_eog_token in simple example

8ce9218

MarcusDunn reviewed Sep 25, 2024

View reviewed changes

llama-cpp-2/src/context/kv_cache.rs Outdated Show resolved Hide resolved

brittlewis12 added 4 commits September 25, 2024 08:32

Expose flash attention

fdc70c1

Expose offload_kqv to control GPU KV cache & KQV ops

b10bd0a

Handle KV cache mutations for llama_pos values greater than u16

7d1b2d5

* return `Result`s to handle failed u32 -> i32 conversion * unify kv cache seq rm methods

brittlewis12 force-pushed the context-and-model-enhancements branch from 3fc30eb to 7d1b2d5 Compare September 27, 2024 02:09

brittlewis12 requested a review from MarcusDunn September 27, 2024 02:12

MarcusDunn approved these changes Sep 27, 2024

View reviewed changes

MarcusDunn merged commit 1466f7e into utilityai:main Sep 28, 2024
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context and model enhancements #510

Context and model enhancements #510

brittlewis12 commented Sep 25, 2024

MarcusDunn left a comment

MarcusDunn commented Sep 27, 2024

Context and model enhancements #510

Context and model enhancements #510

Conversation

brittlewis12 commented Sep 25, 2024

MarcusDunn left a comment

Choose a reason for hiding this comment

MarcusDunn commented Sep 27, 2024