feat: Use gpt-4o-realtime for audio conversation

microsoft · Nov 20, 2024 · 31b8622 · 31b8622
1 parent 9db9752
commit 31b8622
Show file tree

Hide file tree

Showing 25 changed files with 1,358 additions and 5,673 deletions.
diff --git a/README.md b/README.md
@@ -416,15 +416,14 @@ cognitive_service:
   endpoint: https://xxx.cognitiveservices.azure.com
 
 llm:
-  fast:
+  realtime:
     mode: azure_openai
     azure_openai:
-      context: 16385
-      deployment: gpt-4o-mini-2024-07-18
+      context: 128000
+      deployment: gpt-4o-realtime-preview-2024-10-01
       endpoint: https://xxx.openai.azure.com
-      model: gpt-4o-mini
-      streaming: true
-  slow:
+      model: gpt-4o-realtime-preview
+  sequential:
     mode: azure_openai
     azure_openai:
       context: 128000
@@ -632,10 +631,10 @@ Conversation options are represented as features. They can be configured from Ap
 | `answer_hard_timeout_sec` | The hard timeout for the bot answer in seconds. | `int` | 180 |
 | `answer_soft_timeout_sec` | The soft timeout for the bot answer in seconds. | `int` | 30 |
 | `callback_timeout_hour` | The timeout for a callback in hours. | `int` | 3 |
-| `phone_silence_timeout_sec` | The timeout for phone silence in seconds. | `int` | 1 |
+| `recognition_retry_max` | The maximum number of retries for voice recognition. | `int` | 2 |
 | `recording_enabled` | Whether call recording is enabled. | `bool` | false |
-| `slow_llm_for_chat` | Whether to use the slower LLM for chat. | `bool` | true |
-| `voice_recognition_retry_max` | The maximum number of retries for voice recognition. | `int` | 2 |
+| `vad_silence_timeout_ms` | The timeout for phone silence in seconds. | `int` | 500 |
+| `vad_threshold` | The threshold for voice activity detection. | `float` | 0.5 |
 
 ### Use an OpenAI compatible model for the LLM