Replies: 10 comments 27 replies
-
Hey @dorantor. 98% of the performance is made by your worker (if we don't take into account OS, hardware, etc). You may read this tread -> #799 it may give your some assumptions. http:
address: 127.0.0.1:15389
middleware: [ "http_metrics" ]
..... metrics: metrics:
# Server metrics address
#
# Required for the production. Default: 127.0.0.1:9091, for the metrics 127.0.0.1:9091/metrics
address: 127.0.0.1:3001 Then use can use grafana/Prometheus/curl ->
That says nothing. Because your workers might be blocked by some IO operations and become slow to respond. So it's near impossible to say what's wrong with your environment (or might be nothing wrong). Try to share your config, your env, worker, as many details as you can. Also, take into account, that just multiplying response time by the number of workers is not a formula at all -> https://www.davepacheco.net/blog/2019/performance-puzzler-the-slow-server/. |
Beta Was this translation helpful? Give feedback.
-
Also, try to use a smaller number of workers. Should be a good reason to use such a big number. Try to use 10, then 20, and go on. Check the queue size at the same time. If you see, that with the 10 workers the queue size is 0, there is no sense to increase the number of workers (under high load for sure). Also, 10 workers are approx. 25MB/sec of throughput (network), with a simple |
Beta Was this translation helpful? Give feedback.
-
Thx for Response time is quite consistent, so I made this rough assumption with some added extra.
You see, under 6k RPS response time is ~20ms. When there are more(than 6k) requests it is still ~20ms(according to blackfire). My understanding that it should be able to work with higher RPS. With rr lots of data moved from memc to local(array) storage, thus reducing network induced latency. rr config: server:
# optional, only for linux under sudo user
user: "exampleuser"
command: "php -dopcache.enable=1 -dopcache.enable_cli=1 /home/exampleuser/example.com/current/rr2-worker.php"
http:
address: 10.42.2.11:8080
pool:
num_workers: 230
max_jobs: 20480
middleware: http_metrics
rpc:
# TCP address:port for listening.
#
# Default: "tcp://127.0.0.1:6001"
listen: tcp://127.0.0.1:6001
metrics:
# prometheus client address (path /metrics added automatically)
address: 10.42.2.11:2112
logs:
mode: production
level: warn
channels:
http:
level: warn
output: /var/log/rr.log
server:
output: /var/log/rr.log Worker: <?php
/**
* @var Goridge\RelayInterface $relay
*/
use Spiral\RoadRunner;
use Nyholm\Psr7;
ini_set('display_errors', 'stderr');
require_once __DIR__ . '/vendor/autoload.php';
$worker = RoadRunner\Worker::create();
$psrFactory = new Psr7\Factory\Psr17Factory();
$worker = new RoadRunner\Http\PSR7Worker($worker, $psrFactory, $psrFactory, $psrFactory);
$framework = new Project\Framework();
$framework->registerDebugHandlers();
// let's create a blackfire client
$config = new \Blackfire\ClientConfiguration();
$config->setClientId('....');
$config->setClientToken('....');
$blackfire = new \Blackfire\Client($config);
while ($req = $worker->waitRequest()) {
try {
$cookies = $req->getCookieParams();
$probe = null;
if (\array_key_exists('blackfire', $cookies) && '1' == $cookies['blackfire']) {
// start blackfire probe
try {
$probe = $blackfire->createProbe();
} catch (\Blackfire\Exception\ExceptionInterface $e) {
// @see https://blackfire.io/docs/php/integrations/sdk#error-management
}
}
// Send request to framework
$frameworkResponse = $framework
->processHttpServerRequest($req)
;
$resp = new Psr7\Response(
$frameworkResponse->getStatusCode(),
$frameworkResponse->getHeaders(),
(string) $frameworkResponse->getBody()
);
$worker->respond($resp);
if ($probe && \array_key_exists('blackfire', $cookies) && '1' == $cookies['blackfire']) {
$profile = $blackfire->endProbe($probe);
}
} catch (\Throwable $e) {
$worker->getWorker()->error((string) $e);
}
} |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Unexpected turn of events. My understanding is that
|
Beta Was this translation helpful? Give feedback.
-
Maybe I should rephrase my question: how can I profile rr? I have profiling of my app. I have profiling for response time in nginx. But I don't have profiling in two places:
My suggestion that problem is somewhere between app response and nginx. But not sure how can I profile it in order to find bottleneck. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@dorantor Friendly ping 😃 Any updates? |
Beta Was this translation helpful? Give feedback.
-
Hello @dorantor , I'm curious about your experimental, do you have any update on this topic? Hi @rustatian , I discovered that Opcache is enabled on our system, but it doesn't optimize or enhance anything, some important metrics about opcache are always 0: (noted that I set max_jobs of RR to 100 already, and ran multiple requests)
1 thing I found that you mentioned all source code of workers are kept in memory, does it mean we don't need opcache anymore? Thanks. |
Beta Was this translation helpful? Give feedback.
-
There is 1 article I found comparing perf between pure PHP vs. [RoadRunner + php] here: https://discourse.world/h/2019/10/22/Trying-preload(PHP-7.4)and-RoadRunner I'm looking forward to have more advice from @rustatian @dorantor if you can enable Opcache when running with RoadRunner. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Context:
Q: Why peak performance could be lower than expected?
Under average load response time is less then 20ms(nginx
$request_time
). Same values can be seen with Blackfire (3-10ms). Some network overhead and something for rr itself. 20-30 is fine. 230 workers and CPU load is 50% at max. Expected RPS 7500-11500(1000/responseTime_ms * workers). But peak RPS I can see in our VictoriaMetrics reported by rr - 5k-6k RPS. 2-4 times less then expected. Why could it be? Where to look?I understand why response time is growing under load - it's starting to queue requests(btw, how to monitor size of this queue?). But I don't understand why peak performance so low(about the same as we had with fpm, but a lot of network requests removed with rr, thanks to local in-memory cache). And what to do in order to fix it.
Beta Was this translation helpful? Give feedback.
All reactions