-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using latest PMAT with db_cache #536
Conversation
WalkthroughThis pull request introduces several modifications across multiple files, primarily focusing on the removal of the Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
c445cac
to
5742f8f
Compare
5742f8f
to
c04227b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
prediction_market_agent/tools/web_scrape/markdown.py (2)
Line range hint
13-21
: LGTM: Clean separation of concernsGood architectural decision to move caching to the higher-level
web_scrape
function while maintaining retry logic here. This separation allows for more flexible caching strategies while ensuring robust HTTP requests.
Line range hint
36-44
: Consider optimizing DOM manipulationsThe current implementation uses multiple separate list comprehensions for element removal. This could be optimized by:
- Combining selectors into a single query
- Using a more efficient selector method
-[x.extract() for x in soup.findAll("script")] -[x.extract() for x in soup.findAll("style")] -[x.extract() for x in soup.findAll("noscript")] -[x.extract() for x in soup.findAll("link")] -[x.extract() for x in soup.findAll("head")] -[x.extract() for x in soup.findAll("image")] -[x.extract() for x in soup.findAll("img")] +for element in soup.find_all(['script', 'style', 'noscript', 'link', 'head', 'image', 'img']): + element.extract()scripts/benchmark_ofv_resolver.py (3)
Line range hint
15-18
: Add docstring to explain caching behavior.The cached wrapper function would benefit from documentation explaining:
- The caching strategy
- Cache invalidation behavior
- When None results are cached
Add a docstring like this:
@persistent_inmemory_cache def ofv_answer_binary_question_cached(question: str) -> bool | None: + """Cached wrapper for ofv_answer_binary_question. + + Results are persistently cached in memory based on the question string. + None results are also cached to avoid repeated API calls for invalid questions. + + Args: + question: The question to be answered + + Returns: + bool: True for YES, False for NO + None: When resolution is not possible + """ result = ofv_answer_binary_question(question, APIKeys()) return result.factuality if result is not None else None
Line range hint
15-18
: Optimize APIKeys instantiation.Currently,
APIKeys()
is instantiated for each question, even though the result is cached. Consider moving the instantiation outside the function to improve performance.+_API_KEYS = APIKeys() + @persistent_inmemory_cache def ofv_answer_binary_question_cached(question: str) -> bool | None: - result = ofv_answer_binary_question(question, APIKeys()) + result = ofv_answer_binary_question(question, _API_KEYS) return result.factuality if result is not None else None
Line range hint
33-34
: Add error handling for file operations.The script should gracefully handle file-related errors when reading the input TSV and writing the output files.
- df = pd.read_csv(data_path, sep="\t") + try: + df = pd.read_csv(data_path, sep="\t") + except FileNotFoundError: + print(f"Error: Input file '{data_path}' not found") + raise typer.Exit(1) + except pd.errors.EmptyDataError: + print(f"Error: Input file '{data_path}' is empty") + raise typer.Exit(1) + except Exception as e: + print(f"Error reading input file: {e}") + raise typer.Exit(1)scripts/llm_randomness.py (1)
Line range hint
33-35
: Consider making the number parsing more robust.The current implementation assumes the LLM will always return properly formatted comma-separated numbers. Consider adding error handling for malformed responses.
Here's a more robust implementation:
- completion = [ - int(x) for x in str(llm.invoke(messages, max_tokens=512).content).split(",") - ] + def parse_numbers(content: str) -> list[int]: + try: + # Strip whitespace and filter out empty strings + numbers = [x.strip() for x in content.split(",") if x.strip()] + return [int(x) for x in numbers] + except ValueError as e: + raise ValueError(f"Failed to parse LLM response: {content}") from e + + content = str(llm.invoke(messages, max_tokens=512).content) + completion = parse_numbers(content)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
poetry.lock
is excluded by!**/*.lock
,!**/*.lock
pyproject.toml
is excluded by!**/*.toml
📒 Files selected for processing (7)
prediction_market_agent/agents/microchain_agent/market_functions.py
(0 hunks)prediction_market_agent/agents/prophet_agent/deploy.py
(0 hunks)prediction_market_agent/agents/think_thoroughly_agent/think_thoroughly_agent.py
(0 hunks)prediction_market_agent/tools/prediction_prophet/research.py
(0 hunks)prediction_market_agent/tools/web_scrape/markdown.py
(2 hunks)scripts/benchmark_ofv_resolver.py
(1 hunks)scripts/llm_randomness.py
(1 hunks)
💤 Files with no reviewable changes (4)
- prediction_market_agent/agents/microchain_agent/market_functions.py
- prediction_market_agent/agents/prophet_agent/deploy.py
- prediction_market_agent/agents/think_thoroughly_agent/think_thoroughly_agent.py
- prediction_market_agent/tools/prediction_prophet/research.py
🔇 Additional comments (4)
prediction_market_agent/tools/web_scrape/markdown.py (2)
1-1
: LGTM: Import changes align with new caching strategy
The new imports support the transition from in-memory caching to database-backed caching with TTL support.
Also applies to: 8-8
25-26
: 🛠️ Refactor suggestion
Consider enhancing cache error handling and performance
While the switch to DB-based caching with TTL is good, consider these improvements:
- Add error handling for cache operations
- Consider implementing cache warmup for frequently accessed URLs
- Add compression for cached content to reduce DB storage
Let's verify the cache implementation:
scripts/benchmark_ofv_resolver.py (1)
3-5
: Verify consistent usage of the new import path across the codebase.
The import path has been updated to be more specific. Let's ensure this change is consistent across all files.
scripts/llm_randomness.py (1)
7-9
: LGTM! Import path update looks good.
The import path change reflects a better module organization by moving cache-related functionality to a dedicated caches
module.
No description provided.