-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to reduce context window #5193
base: main
Are you sure you want to change the base?
Conversation
- Add max_input_tokens parameter to AgentController to allow setting a lower context window - Add token count checking and truncation when adding new events - Improve handling of first user message preservation - Add test case for context window parameter truncation
- Remove max_input_tokens parameter from AgentController constructor - Use LLM configuration system to set max_input_tokens through config.toml or environment variables - Update test to set max_input_tokens directly in LLM config Users can now set max_input_tokens in two ways: 1. Through config.toml: [llm] max_input_tokens = 20000 2. Through environment variables: export LLM_MAX_INPUT_TOKENS=20000
# Create temporary history with new event | ||
temp_history = self.state.history + [event] | ||
try: | ||
token_count = self.agent.llm.get_token_count(temp_history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this line doesn't really do what we want. We need to do the token counting on the messages that are sent to the LLM API. OpenHands works with events, and these are events, which it then 'translates' into messages and sends to the LLM API. You can see, as you try to make it work on your machine for your use case, that the number of tokens will not match, some events differ more than others but they all differ. 😅
An alternative is to define a custom exception like, let's say, TokenLimitExceeded, here, and move this check to the LLM class, then raise the exception when the token comparison fails. Then maybe the exception can be treated like this ContextWindowExceededError is. What do you think, does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I see I think I understand. I was getting unpredictable behavior last night and that makes more sense and this definitely doesn't work. I'll take another crack at it when I get a chance today.
I do have a dumb question but I can't recreate it predictably. How is the config.toml file loaded? Is it only with certain commands? I can't tell if I broke something on my machine or I am assuming incorrect behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's figure it out! There is something a bit unpredictable about it: when you set values there, then run either with UI or with main.py
or with cli.py
, then config.toml is loaded. However, if running with UI, any settings that are defined in the UI (in the Settings window) will override the toml values.
Could that be what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's absolutely a good question! We're still trying to document it properly, the simple thing that it turns out a bit difficult does say that something's wrong and we should rethink some of it.
For reference: #3220
This PR adds functionality to allow users to set a lower context window than their LLMs maximum context window size. This is useful when the models performance degrades significantly with larger context windows, allowing users to optimize the tradeoff between context length and model performance.
Background
With PR #4977 being merged in version 0.14, we now support dynamic context window sizes. This PR builds on that by allowing users to manually set a lower context window size than their LLMs maximum, which can be beneficial in cases where:
Changes
Configuration
Users can set
max_input_tokens
in two ways:export LLM_MAX_INPUT_TOKENS=20000
Implementation Details
_apply_conversation_window
Testing
Added new test case
test_context_window_parameter_truncation
that verifies:This implements and enhances the changes from PR #5079.