Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Huanshere committed Sep 19, 2024
1 parent 91834b9 commit 25d56bc
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,10 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
| 4o | 30 | 9 | 39 | 67% |
| 4omini | 30 | 9 | 39 | 67% |
| sonnet | 30 | 12 | 42 | 72% |
| sonnet + so1 | 35 | 10 | 45 | 77%🥉 |
| **sonnet + so1** | 35 | 10 | 45 | **77%🥉** |
| sonnet + g1 * | 30 | 5 | 35 | 60% |
| o1 mini | 37 | 16 | 53 | 91%🥇 |
| o1 preview | 38 | 12 | 50 | 86%🥈|
| **o1 mini** | 37 | 16 | 53 | **91%🥇** |
| **o1 preview** | 38 | 12 | 50 | **86%🥈**|

> Note: sonnet+g1 tends to stop after giving only the first step of reasoning, marked as ⚠️. In scoring, it is simply counted as incorrect, but its actual performance is similar to so1.
Expand All @@ -110,7 +110,7 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
| 4o | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ | ❌✅❌ | ✅❌❌ |
| 4omini | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌✅✅ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
| sonnet | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
| sonnet + so1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
| **sonnet + so1** | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
| sonnet + g1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌✅ | ✅✅⚠️ | ⚠️✅❌ | ✅✅✅ | ❌✅❌ |
| o1 mini | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ |
| o1 preview | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ✅✅✅ |
Expand All @@ -121,7 +121,7 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
| 4o | ✅✅✅ | 👍👍❌ | ❌❌👍 |
| 4omini | ✅✅✅ | ❌👍👍 | ❌❌👍 |
| sonnet | ✅✅✅ | 👍👍❌ | 👍✅👍 |
| sonnet + so1 | ✅✅✅ | ❌❌👍 | 👍👍👍 |
| **sonnet + so1** | ✅✅✅ | ❌❌👍 | 👍👍👍 |
| sonnet + g1 | ✅❌⚠️ | ⚠️❌✅ | ⚠️❌👍 |
| o1 mini | ✅✅✅ | ✅✅✅ | ❌✅✅ |
| o1 preview | ✅✅✅ | ✅✅✅ | ❌❌❌ |
Expand Down
10 changes: 5 additions & 5 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
| 4o | 30 | 9 | 39 | 67% |
| 4omini | 30 | 9 | 39 | 67% |
| sonnet | 30 | 12 | 42 | 72% |
| sonnet + so1 | 35 | 10 | 45 | 77%🥉 |
| **sonnet + so1** | 35 | 10 | 45 | **77%🥉** |
| sonnet + g1 * | 30 | 5 | 35 | 60% |
| o1 mini | 37 | 16 | 53 | 91%🥇 |
| o1 preview | 38 | 12 | 50 | 86%🥈|
| **o1 mini** | 37 | 16 | 53 | **91%🥇** |
| **o1 preview** | 38 | 12 | 50 | **86%🥈**|

> 注意:sonnet+g1 容易在回答时只给出第一步推理就停止,标为⚠️,在记分时简单算作错误,实际性能近似于so1.
Expand All @@ -112,7 +112,7 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
| 4o | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ | ❌✅❌ | ✅❌❌ |
| 4omini | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌✅✅ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
| sonnet | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
| sonnet + so1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
| **sonnet + so1** | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
| sonnet + g1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌✅ | ✅✅⚠️ | ⚠️✅❌ | ✅✅✅ | ❌✅❌ |
| o1 mini | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ |
| o1 preview | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ✅✅✅ |
Expand All @@ -123,7 +123,7 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
| 4o | ✅✅✅ | 👍👍❌ | ❌❌👍 |
| 4omini | ✅✅✅ | ❌👍👍 | ❌❌👍 |
| sonnet | ✅✅✅ | 👍👍❌ | 👍✅👍 |
| sonnet + so1 | ✅✅✅ | ❌❌👍 | 👍👍👍 |
| **sonnet + so1** | ✅✅✅ | ❌❌👍 | 👍👍👍 |
| sonnet + g1 | ✅❌⚠️ | ⚠️❌✅ | ⚠️❌👍 |
| o1 mini | ✅✅✅ | ✅✅✅ | ❌✅✅ |
| o1 preview | ✅✅✅ | ✅✅✅ | ❌❌❌ |
Expand Down

0 comments on commit 25d56bc

Please sign in to comment.