Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify variable name and use gpt-4o for cheaper runs #556

Merged
merged 2 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions prediction_market_agent_tooling/deploy/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,10 +300,10 @@ class DeployablePredictionAgent(DeployableAgent):
def __init__(
self,
enable_langfuse: bool = APIKeys().default_enable_langfuse,
store_prediction: bool = True,
store_predictions: bool = True,
) -> None:
super().__init__(enable_langfuse=enable_langfuse)
self.store_prediction = store_prediction
self.store_predictions = store_predictions
Comment on lines +303 to +306
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update required: Rename store_prediction method across market classes

The parameter rename from store_prediction to store_predictions in DeployablePredictionAgent is incomplete. Additional changes are needed:

  • The store_prediction method needs to be renamed to store_predictions in:
    • AgentMarket base class (agent_market.py)
    • MetaculusAgentMarket class (metaculus.py)
    • OmenAgentMarket class (omen.py)
  • The script scripts/store_prediction.py should be renamed to store_predictions.py

This ensures consistent naming throughout the codebase, as the functionality deals with storing multiple predictions.

🔗 Analysis chain

Verify parameter rename impact across the codebase

The rename from store_prediction to store_predictions improves naming consistency. However, based on past reviews, there are several places that still use store_prediction (singular) that need to be updated:

  • Method calls in agent.py
  • Method names in market classes (agent_market.py, omen.py, metaculus.py)
  • Script name and references in scripts/store_prediction.py

Let me check the specific occurrences of store_prediction to verify if they are related to the parameter rename.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of old parameter name
# and verify all constructor calls are updated

# Check for any remaining "store_prediction" instances
rg "store_prediction" -l

# Find all instantiations of DeployablePredictionAgent
ast-grep --pattern 'DeployablePredictionAgent($$$)'

Length of output: 326


Script:

#!/bin/bash
# Check the context of store_prediction usage in each file
echo "=== agent.py ==="
rg "store_prediction" prediction_market_agent_tooling/deploy/agent.py -B 2 -A 2

echo -e "\n=== agent_market.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/agent_market.py -B 2 -A 2

echo -e "\n=== metaculus.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/metaculus/metaculus.py -B 2 -A 2

echo -e "\n=== omen.py ==="
rg "store_prediction" prediction_market_agent_tooling/markets/omen/omen.py -B 2 -A 2

echo -e "\n=== store_prediction.py ==="
rg "store_prediction" scripts/store_prediction.py -B 2 -A 2

Length of output: 2355


def initialize_langfuse(self) -> None:
super().initialize_langfuse()
Expand Down Expand Up @@ -455,13 +455,13 @@ def after_process_market(
Executed after processing of each market.
"""
keys = APIKeys()
if self.store_prediction:
if self.store_predictions:
market.store_prediction(
processed_market=processed_market, keys=keys, agent_name=self.agent_name
)
else:
logger.info(
f"Prediction {processed_market} not stored because {self.store_prediction=}."
f"Prediction {processed_market} not stored because {self.store_predictions=}."
Comment on lines +458 to +464
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent method name

While the parameter has been renamed to store_predictions, the method call still uses the old name store_prediction. This should be updated for consistency.

-            market.store_prediction(
+            market.store_predictions(
                 processed_market=processed_market, keys=keys, agent_name=self.agent_name
             )

Committable suggestion skipped: line range outside the PR's diff.

)

def before_process_markets(self, market_type: MarketType) -> None:
Expand Down Expand Up @@ -530,12 +530,12 @@ class DeployableTraderAgent(DeployablePredictionAgent):
def __init__(
self,
enable_langfuse: bool = APIKeys().default_enable_langfuse,
store_prediction: bool = True,
store_predictions: bool = True,
store_trades: bool = True,
place_trades: bool = True,
) -> None:
super().__init__(
enable_langfuse=enable_langfuse, store_prediction=store_prediction
enable_langfuse=enable_langfuse, store_predictions=store_predictions
)
self.store_trades = store_trades
self.place_trades = place_trades
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def rewrite_question_into_image_generation_prompt(question: str) -> str:
"openai not installed, please install extras `langchain` to use this function."
)
llm = ChatOpenAI(
model="gpt-4-turbo",
model="gpt-4o-2024-08-06",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Non-standard GPT model names require standardization across multiple files

The codebase contains several instances of non-standard OpenAI model names that need to be updated:

  • gpt-4o-2024-08-06 in market_thumbnail_gen.py
  • gpt-4o in relevant_news_analysis.py
  • Similar patterns likely exist in is_predictable.py and is_invalid.py (based on the regex search)

These non-standard model names will cause runtime errors as they don't match OpenAI's official model naming convention. Update all instances to use standard OpenAI model names like gpt-4-1106-preview or gpt-4-0613.

🔗 Analysis chain

Invalid OpenAI model name detected

The model name "gpt-4o-2024-08-06" appears to be non-standard and will likely cause runtime errors. OpenAI's official model names don't follow this format.

Apply this fix:

-        model="gpt-4o-2024-08-06",
+        model="gpt-4-1106-preview",  # Latest GPT-4 Turbo model

Let's verify if this non-standard model name is used elsewhere:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for similar non-standard model names across the codebase
# Expected: No other occurrences of this pattern should exist

echo "Searching for non-standard gpt-4o model names:"
rg -l "gpt-4o-[0-9]{4}-[0-9]{2}-[0-9]{2}"

echo -e "\nSearching for all GPT-4 model references for comparison:"
rg -l "gpt-4"

Length of output: 834


Script:

#!/bin/bash
# Let's check the actual model names used in these files
echo "Checking model names in is_predictable.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_predictable.py -A 1

echo -e "\nChecking model names in is_invalid.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/is_invalid.py -A 1

echo -e "\nChecking model names in relevant_news_analysis.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/relevant_news_analysis/relevant_news_analysis.py -A 1

echo -e "\nChecking model names in utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/tools/utils.py -A 1

echo -e "\nChecking model names in benchmark/utils.py:"
rg "model\s*=.*gpt-4" prediction_market_agent_tooling/benchmark/utils.py -A 1

Length of output: 1048


Script:

#!/bin/bash
# Let's check for any model variable assignments or configurations
echo "Checking all model configurations in is_predictable.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_predictable.py

echo -e "\nChecking all model configurations in is_invalid.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/is_invalid.py

echo -e "\nChecking for any model configurations in utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/tools/utils.py

echo -e "\nChecking for any model configurations in benchmark/utils.py:"
rg "model.*=.*" prediction_market_agent_tooling/benchmark/utils.py

# Let's also check for any configuration files that might define these models
echo -e "\nChecking for potential config files:"
fd -e yaml -e json -e toml -e ini

Length of output: 2114


Script:

#!/bin/bash
# Let's check the content of pyproject.toml for any model configurations
echo "Checking pyproject.toml for model configurations:"
cat pyproject.toml

# Let's also check if there's an 'engine' variable defined that might be setting these models
echo -e "\nChecking for engine variable definitions:"
rg "engine\s*=.*gpt" -A 2

Length of output: 2302

temperature=0.0,
api_key=APIKeys().openai_api_key_secretstr_v1,
)
Expand Down
2 changes: 1 addition & 1 deletion prediction_market_agent_tooling/tools/is_invalid.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
@db_cache
def is_invalid(
question: str,
engine: str = "gpt-4o",
engine: str = "gpt-4o-2024-08-06",
temperature: float = LLM_SUPER_LOW_TEMPERATURE,
seed: int = LLM_SEED,
prompt_template: str = QUESTION_IS_INVALID_PROMPT,
Expand Down
11 changes: 8 additions & 3 deletions prediction_market_agent_tooling/tools/is_predictable.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@
get_langfuse_langchain_config,
observe,
)
from prediction_market_agent_tooling.tools.utils import LLM_SUPER_LOW_TEMPERATURE
from prediction_market_agent_tooling.tools.utils import (
LLM_SEED,
LLM_SUPER_LOW_TEMPERATURE,
)

# I tried to make it return a JSON, but it didn't work well in combo with asking it to do chain of thought.
QUESTION_IS_PREDICTABLE_BINARY_PROMPT = """Main signs about a fully qualified question (sometimes referred to as a "market"):
Expand Down Expand Up @@ -81,7 +84,7 @@
@db_cache
def is_predictable_binary(
question: str,
engine: str = "gpt-4-1106-preview",
engine: str = "gpt-4o-2024-08-06",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Invalid model identifier detected

The model identifier "gpt-4o-2024-08-06" appears to be non-standard and likely incorrect:

  1. The format deviates from OpenAI's standard naming convention (no 'o' suffix is used in official models)
  2. The date format used (2024-08-06) is in the future
  3. There's no model validation or configuration that defines this as a valid model
  4. The model is not documented in any configuration files

The previous model "gpt-4-1106-preview" should be retained or updated to a currently available OpenAI model.

🔗 Analysis chain

Verify the availability and compatibility of the new model version

The model has been changed from "gpt-4-1106-preview" to "gpt-4o-2024-08-06". Let's verify:

  1. If this is a valid model identifier
  2. If it maintains compatibility with the existing prompts and token limits

Also applies to: 125-125


Let me gather more information about model validation and configuration in the codebase.


Let me try one more search to find any model-related configurations or imports.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for model version references and verify consistency

# Check for any documentation or configuration files that might list valid models
fd -t f -e md -e yaml -e json . | xargs rg -l "gpt-4o-2024-08-06"

# Check if there are any model-specific configurations or tests
rg -A 5 "gpt-4o-2024-08-06"

# Look for any model validation logic
ast-grep --pattern 'if $model == "gpt-4o-2024-08-06" { $$$ }'

Length of output: 2379


Script:

#!/bin/bash
# Check for model validation or configuration logic
rg -l "SUPPORTED_MODELS|ALLOWED_MODELS|MODEL_CONFIG" .

# Look for any model-related constants or configurations
ast-grep --pattern 'MODEL = $value'
ast-grep --pattern 'MODELS = $value'

# Check for any OpenAI client initialization or model validation
rg -A 5 "openai\..*Client"

Length of output: 162


Script:

#!/bin/bash
# Search for OpenAI imports and configurations
rg -l "from openai|import openai"

# Look for any model-related constants
rg "DEFAULT_MODEL|GPT_|MODEL_NAME"

# Check configuration files
fd -t f "config|settings" -e py -e yaml -e json | xargs cat

Length of output: 8456

prompt_template: str = QUESTION_IS_PREDICTABLE_BINARY_PROMPT,
max_tokens: int = 1024,
) -> bool:
Expand All @@ -98,6 +101,7 @@ def is_predictable_binary(
llm = ChatOpenAI(
model=engine,
temperature=LLM_SUPER_LOW_TEMPERATURE,
seed=LLM_SEED,
api_key=APIKeys().openai_api_key_secretstr_v1,
)

Expand All @@ -118,7 +122,7 @@ def is_predictable_binary(
def is_predictable_without_description(
question: str,
description: str,
engine: str = "gpt-4-1106-preview",
engine: str = "gpt-4o-2024-08-06",
prompt_template: str = QUESTION_IS_PREDICTABLE_WITHOUT_DESCRIPTION_PROMPT,
max_tokens: int = 1024,
) -> bool:
Expand All @@ -137,6 +141,7 @@ def is_predictable_without_description(
llm = ChatOpenAI(
model=engine,
temperature=LLM_SUPER_LOW_TEMPERATURE,
seed=LLM_SEED,
api_key=APIKeys().openai_api_key_secretstr_v1,
)

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "prediction-market-agent-tooling"
version = "0.56.3"
version = "0.57.0"
description = "Tools to benchmark, deploy and monitor prediction market agents."
authors = ["Gnosis"]
readme = "README.md"
Expand Down
2 changes: 1 addition & 1 deletion tests/tools/test_is_predictable.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def test_is_predictable_binary(question: str, answerable: bool) -> None:
(
"Will an AI get gold on any International Math Olympiad by 2025?",
"Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.",
True, # True, because description doesn't provide any extra information.
False, # False, because description says that either `Eliezer or Paul` needs to acknowledge it.
),
],
)
Expand Down
Loading