interop-testing: update the Interop-test-descriptions doc to reflect …

…the soak concurrency (grpc#38126) - Update the Interop-test-descriptions doc to reflect the concurrency improvement in the rpc_soak and channel_soak tests. - PTAL @apolcyn Closes grpc#38126 COPYBARA_INTEGRATE_REVIEW=grpc#38126 from zbilun:soak-doc-update e8ae5ae PiperOrigin-RevId: 699251174
paulosjca · Nov 22, 2024 · 7826ddc · 7826ddc
1 parent c6dccd4
commit 7826ddc
Showing 1 changed file with 29 additions and 10 deletions.
diff --git a/doc/interop-test-descriptions.md b/doc/interop-test-descriptions.md
@@ -1005,14 +1005,21 @@ Client asserts:
 ### rpc_soak
 
 The client performs many large_unary RPCs in sequence over the same channel.
-The client records the latency and status of each RPC in some data structure.
-If the test ever consumes `soak_overall_timeout_seconds` seconds and still hasn't
-completed `soak_iterations` RPCs, then the test should discontinue sending RPCs
-as soon as possible. After performing all RPCs, the test should examine
-previously recorded RPC latency and status results in a second pass and fail if
-either:
+The total number of RPCs to execute is controlled by the `soak_iterations` 
+parameter, which defaults to 10. The number of threads used to execute RPCs 
+is controlled by `soak_num_threads`. By default, `soak_num_threads` is set to 1. 
 
-a) not all `soak_iterations` RPCs were completed
+The client records the latency and status of each RPC in 
+thread-specific data structure, which are later aggregated to form the overall 
+results. If the test ever consumes `soak_overall_timeout_seconds` seconds 
+and still hasn't completed `soak_iterations` RPCs, then the test should 
+discontinue sending RPCs as soon as possible. Each thread should independently 
+track its progress and stop once the overall timeout is reached.
+
+After performing all RPCs, the test should examine the previously aggregated RPC
+latency and status results from all threads in a second pass and fail if either:
+
+a) not all `soak_iterations` RPCs were completed across all threads
 
 b) the sum of RPCs that either completed with a non-OK status or exceeded
    `max_acceptable_per_rpc_latency_ms` exceeds `soak_max_failures`
@@ -1029,10 +1036,15 @@ results of each iteration (i.e. RPC) in a format the matches the following
 regexes:
 
 - Upon success:
-  - `soak iteration: \d+ elapsed_ms: \d+ peer: \S+ succeeded`
+  - `thread_id: \d+ soak iteration: \d+ elapsed_ms: \d+ peer: \S+ server_uri: 
+  \S+ succeeded`
 
 - Upon failure:
-  - `soak iteration: \d+ elapsed_ms: \d+ peer: \S+ failed:`
+  - `thread_id: \d+ soak iteration: \d+ elapsed_ms: \d+ peer: \S+ server_uri: 
+  \S+ failed`
+
+- Thread-specific logs will include the thread_id, helping to track performance
+  across threads.
 
 This test must be configurable via a few different command line flags:
 
@@ -1057,6 +1069,14 @@ This test must be configurable via a few different command line flags:
 * `soak_min_time_ms_between_rpcs`: The minimum time in milliseconds between
   consecutive RPCs. Useful for limiting QPS.
 
+* `soak_num_threads`: Specifies the number of threads to use for concurrently 
+  executing the soak test. Each thread performs `soak_iterations / soak_num_threads`
+  RPCs.
+
+This value defaults to 1 (i.e., no concurrency) but can be 
+  increased for concurrent execution. The total soak_iterations must be 
+  divisible by soak_num_threads.
+
 The following is optional but encouraged to improve debuggability:
 
 * Implementations should log the number of milliseconds that each RPC takes.
@@ -1078,7 +1098,6 @@ included in that latency measurement (channel teardown semantics differ widely
 between languages). This latency measurement should also be the value that is
 logged and recorded in the latency histogram.
 
-
 ### orca_per_rpc
 [orca_per_rpc]: #orca_per_rpc