Update openai runners #219

wongjingping · 2024-09-13T02:22:38Z

Support evaluation of latest openai o1-* models. We branch out the code paths in the openai query generator by checking if the model is from the o1- family, as the o1 model doesn't support some parameters currently.

Tested o1-mini model:

$ python main.py \
  -db postgres \
  -q "data/questions_gen_postgres.csv" "data/instruct_basic_postgres.csv" "data/instruct_advanced_postgres.csv" \
  -o results/openai_o1mini_classic.csv results/openai_o1mini_basic.csv results/openai_o1mini_advanced.csv \
  -g oa \
  -f prompts/prompt_openai_o1.json \
  -m o1-mini \
  -p 1 \
  -t 120 \
  -c 0

(works, still running)

Previous gpt-4o request still works:

$ python main.py \
  -db postgres \
  -q "data/questions_gen_postgres.csv" "data/instruct_basic_postgres.csv" "data/instruct_advanced_postgres.csv" \
  -o results/openai_classic.csv results/openai_basic.csv results/openai_advanced.csv \
  -g oa \
  -f prompts/prompt_openai.json \
  -m gpt-4o \
  -p 10 \
  -c 0
/Users/jp/workspace/miniconda3/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Using prompt file prompts/prompt_openai.json
Preparing questions...
Using all question(s) from data/questions_gen_postgres.csv
Correct so far: 186/210 (88.57%): 100%|███████████████████████████████████████████████████████████████████████████| 210/210 [00:25<00:00,  8.30it/s]
   query_category  num_rows  mean_correct  mean_error_db_exec
0  date_functions        35      0.771429            0.057143
1        group_by        35      0.971429            0.000000
2        instruct        35      0.914286            0.000000
3        order_by        35      0.914286            0.000000
4           ratio        35      0.771429            0.028571
5      table_join        35      0.971429            0.000000
Average correct rate: 0.89
Using prompt file prompts/prompt_openai.json
Preparing questions...
Using all question(s) from data/instruct_basic_postgres.csv
Correct so far: 39/40 (97.50%): 100%|███████████████████████████████████████████████████████████████████████████████| 40/40 [00:05<00:00,  7.39it/s]
                      query_category  num_rows  mean_correct  mean_error_db_exec
0            basic_group_order_limit         8         1.000                 0.0
1  basic_join_date_group_order_limit         8         1.000                 0.0
2                basic_join_distinct         8         1.000                 0.0
3       basic_join_group_order_limit         8         0.875                 0.0
4                    basic_left_join         8         1.000                 0.0
Average correct rate: 0.97
Using prompt file prompts/prompt_openai.json
Preparing questions...
Using all question(s) from data/instruct_advanced_postgres.csv
Correct so far: 55/64 (85.94%): 100%|███████████████████████████████████████████████████████████████████████████████| 64/64 [00:10<00:00,  5.83it/s]
                 query_category  num_rows  mean_correct  mean_error_db_exec
0         instructions_cte_join        16        0.9375               0.000
1       instructions_cte_window         8        0.7500               0.125
2        instructions_date_join        16        0.7500               0.000
3  instructions_string_matching         8        1.0000               0.000
4            keywords_aggregate         8        0.8750               0.000
5                keywords_ratio         8        0.8750               0.000
Average correct rate: 0.86

- support evaluation of latest openai o1-* models

rishsriv

Thank you! We had previously intentionally not included table aliases in the prompt as that seemed to hurt performance for OpenAI's GPT4 models. But YMMV with the O1 models -- so up to you whether you want to keep them or not

wendy-aw

thanks for the speedy additions!

wongjingping · 2024-09-13T02:31:23Z

Thank you! We had previously intentionally not included table aliases in the prompt as that seemed to hurt performance for OpenAI's GPT4 models. But YMMV with the O1 models -- so up to you whether you want to keep them or not

Good point, I'll update after the results come in, depending on which turns out better 👌🏼

wongjingping · 2024-09-13T03:25:31Z

It performed slightly worse (91.40%) than what Rishabh got earlier (92.04%) so gonna remove the {table_aliases} placeholder.

Update openai runners

2ebb83d

- support evaluation of latest openai o1-* models

wongjingping requested review from rishsriv and wendy-aw September 13, 2024 02:22

rishsriv approved these changes Sep 13, 2024

View reviewed changes

wendy-aw approved these changes Sep 13, 2024

View reviewed changes

remove table aliases placeholder

2e50d60

wongjingping merged commit ae1831a into main Sep 13, 2024
2 checks passed

wongjingping deleted the jp/o1 branch September 13, 2024 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update openai runners #219

Update openai runners #219

wongjingping commented Sep 13, 2024 •

edited

Loading

rishsriv left a comment

wendy-aw left a comment

wongjingping commented Sep 13, 2024

wongjingping commented Sep 13, 2024

Update openai runners #219

Update openai runners #219

Conversation

wongjingping commented Sep 13, 2024 • edited Loading

rishsriv left a comment

Choose a reason for hiding this comment

wendy-aw left a comment

Choose a reason for hiding this comment

wongjingping commented Sep 13, 2024

wongjingping commented Sep 13, 2024

wongjingping commented Sep 13, 2024 •

edited

Loading