Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context and model enhancements #510

Merged

Conversation

brittlewis12
Copy link
Contributor

As mentioned in #505 (comment), various enhancements to Context & Model capablities:

  • mirostat v1 sampling
  • llama_token_is_eog equivalent
  • llama_kv_cache_seq_rm Context method for kv cache manipulation
  • flash_attn, offload_kqv context parameters

Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'd like a signature change to reflect how negative numbers work. This is a great addition.

llama-cpp-2/src/context/kv_cache.rs Outdated Show resolved Hide resolved
* enable interesting cache manipulation use cases,
  from removing recent messages, truncating non-
  special token stop sequences, & more.
* express logic relying on negative values as Options
  of `u16`, to ensure positive values that fit into
  `i32` with safe conversion
  - this means sequence and llama_pos values above
  ~65k will not be addressable directly, and will
  need to use the `None` semantics
* return `Result`s to handle failed u32 -> i32 conversion
* unify kv cache seq rm methods
@brittlewis12 brittlewis12 force-pushed the context-and-model-enhancements branch from 3fc30eb to 7d1b2d5 Compare September 27, 2024 02:09
@MarcusDunn
Copy link
Contributor

Looks good. If Linux tests and Mac build pass I'll merge.

@MarcusDunn MarcusDunn merged commit 1466f7e into utilityai:main Sep 28, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants