Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load testing SMT and NMT in preparation for 50 projects #22

Closed
johnml1135 opened this issue Aug 23, 2022 · 59 comments
Closed

Load testing SMT and NMT in preparation for 50 projects #22

johnml1135 opened this issue Aug 23, 2022 · 59 comments

Comments

@johnml1135
Copy link
Collaborator

Here are some ways to increase the number of simultaneous users on Machine:

  • Throw more memory at Machine? How much do we have? 32GB? How long will this last for? Assuming 300MB/project, we could support 100 simultaneous projects...
  • If we can't get it, then C# can look at the amount of memory being used and if it is too high, offload the oldest model (last used longest ago).
  • If that is too slow, we can then look at horizontal scaling...
  • Pull the machine inferencing to a different microservice?
@ddaspit ddaspit added this to Serval May 22, 2023
@ddaspit ddaspit moved this to 📋 Backlog in Serval May 22, 2023
@johnml1135
Copy link
Collaborator Author

johnml1135 commented Jul 18, 2023

This issue is about verifying initial scalability in 2 parts:

  • Calculations and other evidence that we can be responsive enough to different projects to have NMT drafts
  • Load tests to prove response times for SMT and MongoDB performance
  • MongoDB - assume 3000 files per project (one file per chapter of the Bible) and 100 projects (50 forward and 50 back). Make sure that normal workflow response times are not too slow (> 20 seconds), and that getting pretranslations and word-graphs is not slow (< 2 seconds).

@johnml1135 johnml1135 changed the title Machine Scaling to 100 projects Machine Scaling to 50 projects Jul 18, 2023
@johnml1135
Copy link
Collaborator Author

8 hours per training, 24 per week per GPU - can handle 50 projects per week on 2 GPU's.

@johnml1135 johnml1135 added this to the Mother Tongue MVP milestone Jul 19, 2023
@johnml1135 johnml1135 moved this from 📋 Backlog to 🏗 In progress in Serval Jul 19, 2023
@johnml1135 johnml1135 changed the title Machine Scaling to 50 projects Load testing SMT and NMT in preparation for 50 projects Jul 27, 2023
@Enkidu93 Enkidu93 moved this from 🏗 In progress to 🔖 Ready in Serval Aug 9, 2023
@Enkidu93 Enkidu93 moved this from 🔖 Ready to 🏗 In progress in Serval Aug 11, 2023
@Enkidu93
Copy link
Collaborator

Enkidu93 commented Aug 29, 2023

Two questions, @johnml1135 :

  1. Where should I check this script in? (i.e., what directory and what repo?)
  2. This is what I have so far a process. Let me know if this works for you or if you'd like any other endpoints bombarded/setting changed/etc:
  • Bombard get-all-engines for 60s with 50 connections at 10 req/s and print stats.
    (e.g.
    Statistics Avg Stdev Max
    Reqs/sec 10.01 2.96 18.35
    Latency 35.06ms 14.05ms 59.06ms
    Latency Distribution
    50% 42.61ms
    75% 47.74ms
    90% 49.39ms
    95% 50.54ms
    99% 52.94ms
    HTTP codes:
    1xx - 0, 2xx - 601, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
    Throughput: 2.42MB/s
    )
  • Post 30,000 new engines.
  • Again, bombard get-all-engines for 60s with 50 connections at 10 req/s and print stats.
  • Build an SMT engine, bombard get-word-graph with settings as above, print stats.
  • Build an NMT engine, bombard get-pretranslations with settings as above, print stats.
  • Clean up (delete everything created in this process)

As soon as I hear from you, I will commit and we can test and close this issue.

@johnml1135
Copy link
Collaborator Author

Great - let's see if we can test against the internal QA - now at https://qa-int.serval-api.org/swagger/index.html (only on VPN). We can monitor the cpu and memory load.

@Enkidu93
Copy link
Collaborator

@johnml1135 OK, great! What kind of additional set up will I need to bombard that (besides being on the VPN and changing the url)? Anything?

@johnml1135
Copy link
Collaborator Author

The right auth0 client - use Machine API (Test Application) from the sil-appbuilder tenant.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 1, 2023

OK, sweet. I will paste those results when I get them. Unfortunately, a lot of today the queue has been full, so it hasn't been convenient to get results. Hopefully, tomorrow morning 🤞. (Update: The queue is full still - waiting. Maybe we should consider routing tests to a higher priority queue).

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

Finally managed to get some results in:

Fetching authorization token...
Setting up bombardier...
Bombarding get all translation engines endpoint...
Statistics Avg Stdev Max
Reqs/sec 9.94 8.23 135.91
Latency 426.49ms 220.51ms 2.14s
Latency Distribution
50% 359.85ms
75% 467.92ms
90% 575.44ms
95% 730.49ms
99% 1.55s
HTTP codes:
1xx - 0, 2xx - 600, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 279.77KB/s
Posting engines to DB...
Bombarding get all translation engines endpoint after adding docs...
Statistics Avg Stdev Max
Reqs/sec 3.39 7.48 63.32
Latency 11.96s 327.39ms 13.19s
Latency Distribution
50% 11.87s
75% 12.03s
90% 12.37s
95% 12.78s
99% 13.05s
HTTP codes:
1xx - 0, 2xx - 0, 3xx - 0, 4xx - 0, 5xx - 0
others - 253
Errors:
timeout - 252
read tcp 10.255.255.71:58762->10.3.0.94:443: i/o timeout - 1
Throughput: 65.98KB/s

Adding corpus to smt engine...
Building an SMT engine for bombardment...
Bombarding word graph endpoint...
Statistics Avg Stdev Max
Reqs/sec 9.97 5.42 45.57
Latency 230.25ms 174.04ms 1.73s
Latency Distribution
50% 180.13ms
75% 201.33ms
90% 310.88ms
95% 527.24ms
99% 0.94s
HTTP codes:
1xx - 0, 2xx - 0, 3xx - 0, 4xx - 601, 5xx - 0
others - 0
Throughput: 19.15KB/s
Creating NMT engine...
Building NMT engine...
Bombarding pretranslation endpoint...
Statistics Avg Stdev Max
Reqs/sec 9.98 5.28 36.64
Latency 213.27ms 112.77ms 1.10s
Latency Distribution
50% 169.65ms
75% 219.19ms
90% 292.70ms
95% 467.60ms
99% 707.70ms
HTTP codes:
1xx - 0, 2xx - 601, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 22.45KB/s
Cleaning up...

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

These are results of running the same script locally:

Fetching authorization token...
Setting up bombardier...
Bombarding get all translation engines endpoint...
Statistics Avg Stdev Max
Reqs/sec 9.99 7.44 171.65
Latency 34.42ms 188.31ms 1.83s
Latency Distribution
50% 5.29ms
75% 5.50ms
90% 5.67ms
95% 6.00ms
99% 1.23s
HTTP codes:
1xx - 0, 2xx - 600, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 13.29KB/s
Posting engines to DB...
Bombarding get all translation engines endpoint after adding docs...
Statistics Avg Stdev Max
Reqs/sec 4.28 4.21 18.34
Latency 9.83s 1.21s 10.01s
Latency Distribution
50% 10.00s
75% 10.01s
90% 10.01s
95% 10.01s
99% 10.01s
HTTP codes:
1xx - 0, 2xx - 6, 3xx - 0, 4xx - 0, 5xx - 0
others - 300
Errors:
timeout - 300
Throughput: 1.18MB/s

Adding corpus to smt engine...
Building an SMT engine for bombardment...
Bombarding word graph endpoint...
Statistics Avg Stdev Max
Reqs/sec 10.02 3.14 36.42
Latency 19.09ms 14.12ms 232.40ms
Latency Distribution
50% 14.04ms
75% 16.39ms
90% 37.10ms
95% 41.28ms
99% 44.63ms
HTTP codes:
1xx - 0, 2xx - 601, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 33.51KB/s
Creating NMT engine...
Building NMT engine...
Bombarding pretranslation endpoint...
Statistics Avg Stdev Max
Reqs/sec 10.02 2.80 18.31
Latency 8.68ms 4.08ms 36.05ms
Latency Distribution
50% 10.12ms
75% 12.67ms
90% 12.89ms
95% 13.02ms
99% 13.54ms
HTTP codes:
1xx - 0, 2xx - 601, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 20.81KB/s
Cleaning up...

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

Everything looks good except Mongo's response to the 30,000 docs and the 400 codes on word graph (which don't appear locally). I'd like to investigate more as to why its failing. That really shouldn't be such an overwhelming number for MongoDB, so I wonder if it's still processing the added docs and maybe adding a sleep would help. I'll investigate. Any thoughts? @johnml1135

@Enkidu93 Enkidu93 moved this from 🏗 In progress to 👀 In review in Serval Sep 8, 2023
@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

I'm 99% sure that the issue causing the timeouts after adding the files is that some requests get canceled at the end of 60-second window of bombardment and these are causing something akin to the cancellation subscription issue we'd seen earlier. Wasn't that issue addressed? Do we need to investigate further solutions?

@johnml1135
Copy link
Collaborator Author

On the 30,000 docs:

  • This is how SF is using Serval. We need to account for this.
  • What is the bombardment rate? At what rate does it break? Can we do 5 per second?
    • Listing all the files is rarely needed if at all. It puts a big strain on Mongo, but has few use cases.
  • Could/should we add a filtering endpoint for file and engines based upon names?
  • Could/should we add paging for files (pages of 300...)

On the 601's

  • What is happening? I have no idea...

@ddaspit
Copy link
Contributor

ddaspit commented Sep 8, 2023

@Enkidu93 I'm trying to interpret the results. Are all of the requests to the get-all-engines endpoint timing out?

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

On the 30,000 docs:

* This is how SF is using Serval.  We need to account for this.

* What is the bombardment rate?  At what rate does it break?  Can we do 5 per second?
  
  * Listing all the files is rarely needed if at all.  It puts a big strain on Mongo, but has few use cases.

* Could/should we add a filtering endpoint for file and engines based upon names?

* Could/should we add paging for files (pages of 300...)

On the 601's

* What is happening?  I have no idea...

Right, so like I mentioned above (way up there ^), this is 50 concurrent connections at 10 requests per second. In other words, I imagine this is significantly more traffic than we should expect (but I figured I ought to push it). I can experiment and see what a sufficiently slow rate would be.

One option is to do like I've seen elsewhere and, like you said, allow filtering on the 'get-all' end points as well as having a max number of results parameter that we default to a 'safe' value.

As for the 601 400s, I'm really unsure. Given that I can't recreate the problem locally, I'm probably gonna need to dig in the logs on the qa server (what's the best way to do that?).

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

@Enkidu93 I'm trying to interpret the results. Are all of the requests to the get-all-engines endpoint timing out?

Yes, that's right. They're all timing out.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 8, 2023

Should we spin off separate issues for addressing these problems and close/leave open this?

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 11, 2023

400s on word-graph are the result of running out of memory => being unable to build more engines:

{"log":"\u001B[41m\u001B[30mfail\u001B[39m\u001B[22m\u001B[49m: Grpc.AspNetCore.Server.ServerCallHandler[6]\n Error when executing service method 'Create'.\n System.IO.IOException: No space left on device : '/var/lib/machine/engines/64fa1c9ae06e7b98dd799db0'\n at System.IO.FileSystem.CreateDirectory(String fullPath)\n at System.IO.Directory.CreateDirectory(String path)\n at SIL.Machine.AspNetCore.Services.ThotSmtModelFactory.InitNew(String engineId) in /app/src/SIL.Machine.AspNetCore/Services/ThotSmtModelFactory.cs:line 58\n at SIL.Machine.AspNetCore.Services.SmtTransferEngineState.InitNew() in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineState.cs:line 36\n at SIL.Machine.AspNetCore.Services.SmtTransferEngineService.CreateAsync(String engineId, String engineName, String sourceLanguage, String targetLanguage, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs:line 41\n at SIL.Machine.AspNetCore.Services.SmtTransferEngineService.CreateAsync(String engineId, String engineName, String sourceLanguage, String targetLanguage, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs:line 41\n at SIL.Machine.AspNetCore.Services.ServalTranslationEngineServiceV1.Create(CreateRequest request, ServerCallContext context) in /app/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs:line 28\n at Grpc.Shared.Server.UnaryServerMethodInvoker`3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext)\n at Grpc.Shared.Server.UnaryServerMethodInvoker`3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext)\n at SIL.Machine.AspNetCore.Services.UnimplementedInterceptor.UnaryServerHandler[TRequest,TResponse](TRequest request, ServerCallContext context, UnaryServerMethod`2 continuation) in /app/src/SIL.Machine.AspNetCore/Services/UnimplementedInterceptor.cs:line 21\n at Grpc.Shared.Server.InterceptorPipelineBuilder`2.<>c__DisplayClass5_0.<<UnaryPipeline>b__1>d.MoveNext()\n --- End of stack trace from previous location ---\n at Grpc.Shared.Server.InterceptorPipelineBuilder`2.<>c__DisplayClass5_0.<<UnaryPipeline>b__1>d.MoveNext()\n --- End of stack trace from previous location ---\n at Grpc.AspNetCore.Server.Internal.CallHandlers.UnaryServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)\n at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)\n","stream":"stdout","time":"2023-09-07T18:55:22.764485472Z","type":"fail","source":"Grpc.AspNetCore.Server.ServerCallHandler"}

@ddaspit
Copy link
Contributor

ddaspit commented Sep 11, 2023

This looks like it is running out of disk space not memory.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 15, 2023

How long does it take for Serval to respond to a single get-all-engines request?

About 30 seconds. (Note, this is with ~20,000 engines). I can try again after deleting them.

Getting a single engine is a little more than half a second.

@Enkidu93
Copy link
Collaborator

Yes, pretranslations should be deleted when an engine is deleted.

OK, I'll investigate further, but it looks like that hadn't happened on the internal QA (given that there were nearly no engines (at one point) while there were hundreds of thousands of pretranslations). I'll verify though before opening an issue.

@Enkidu93
Copy link
Collaborator

How long does it take for Serval to respond to a single get-all-engines request?

About 30 seconds. (Note, this is with ~20,000 engines). I can try again after deleting them.

Getting a single engine is a little more than half a second.

Something fishy is definitely going on, and I think it's a user error. A couple days ago, I did manage to run the load testing script without issue and the numbers that came back were good. I wanted to tweak one of the values, came back to it, and now this. It's almost as though there's a delay between my posting engines and them actually being created. I'll delete them all again, and try once more.

@Enkidu93
Copy link
Collaborator

It's still intermittent for me, and I haven't found a way to successfully run my test and paste results here because of the timeout issues. Is it possible I'm triggering some kind of security protocol meant to protect the server from a DoS attack or something? Any ideas? @johnml1135

@ddaspit
Copy link
Contributor

ddaspit commented Sep 18, 2023

If it takes 30 seconds to respond to a single get-all-engines request with 20,000 engines, I would like to understand what is taking so much time. Is it the Mongo query, deserializing the query results into model objects, serializing the results into JSON, etc.? @Enkidu93, you said this does not seem to happen on your dev machine, so using a profiler might not be an option. Maybe we could add some debug log entries that benchmark different parts of the request? At the very least, we should be able to see how much time it takes ASP.NET Core to handle the request using the existing logs. That should tell us if it is the actual Serval service that is slow or something external, such as the reverse proxy. It would also be good to understand how much memory is needed to fulfill the request.

@Enkidu93
Copy link
Collaborator

Yes, or is it just an issue on my end - we don't know. That's correct, @ddaspit, I can't recreate the issue locally. @johnml1135, I can follow the instructions from the README to redeploy the internal QA with modified Serval-Machine code for debugging this, right?

@johnml1135
Copy link
Collaborator Author

@Enkidu93 - yes you may (that is what it is for). Another way to try to hopefully reproduce without having to create docker images and deploy (a 10 minute cycle at least), is to use CPU limits on docker compose: https://stackoverflow.com/questions/42345235/how-to-specify-memory-cpu-limit-in-docker-compose-version-3.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 20, 2023

Good idea! I can now recreate the issue locally. I'll be debugging more tomorrow.

@johnml1135
Copy link
Collaborator Author

johnml1135 commented Sep 20, 2023 via email

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Sep 20, 2023

The settings are in the k8s yaml files. Did you also limit the mongo database to the same limits?

On Tue, Sep 19, 2023 at 8:34 PM Eli C. Lowry @.> wrote: Good idea! I tried limiting each service to 0.5 cpus and 128MB wand wasn't able to recreate the problem locally. Do you know off-hand what the limits are on the int-qa? — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIY5NFFJEEZX3YPALOINEDX3I2YLANCNFSM57JVNLYQ . You are receiving this because you were mentioned.Message ID: @.>

Sure, OK. Thank you! I did. I am now able to recreate the issue regardless (I edited my comment). Working on debugging now.

@ddaspit
Copy link
Contributor

ddaspit commented Sep 20, 2023

Are we only allocating 128MB of memory to the MongoDB instance?

@Enkidu93
Copy link
Collaborator

Are we only allocating 128MB of memory to the MongoDB instance?

I believe it's set up with a 1500MB limit. I'm investigating this as we speak.

@Enkidu93
Copy link
Collaborator

I tried timing different elements of the logic, and the majority of the time is spent outside the visible code. Serval and MongoDB were maxing out cpu-usage. I tried bumping up the allotment without too much luck. I did notice something odd though which is that Serval seems to use a huge amount of memory that scales with the amount that I afford it (i.e. say I give it 1000MB, it'll use 90% of it after creating all the engines; same with 2000MB). I'm not sure what's going on there - thoughts?

@Enkidu93
Copy link
Collaborator

Is there some way that serval might be keeping all of the engines in memory even when not querying?

@Enkidu93
Copy link
Collaborator

So, I can get it to work by increasing the memory and cpu allotment for serval and mongo. I'm now experimenting with how tightly I can bound those. Is there a reason we can't afford more resources to them? What's our actual limit? I don't seem to have access to viewing information about our nodes specs etc.

@johnml1135
Copy link
Collaborator Author

We have 4 CPU's for Mongo and Serval - for all engines. We can likely increase the memory if needed, but increasing the CPU would be more $$$. Let's see if we can get acceptable behavior without increasing the levels first.

@johnml1135
Copy link
Collaborator Author

Also, Machine is hanging out at 65% and Mongo at 15% in NLP. Why could that be? I just restarted the services and they are back down again...
image

@johnml1135
Copy link
Collaborator Author

Memory is around 30-40% - but that should be ok:
image

@ddaspit
Copy link
Contributor

ddaspit commented Sep 20, 2023

65% and 15% CPU usage?

@johnml1135
Copy link
Collaborator Author

johnml1135 commented Sep 20, 2023

Yes - continuous for weeks. There is nothing on the logs that would indicate what is happening.

@Enkidu93
Copy link
Collaborator

We have 4 CPU's for Mongo and Serval - for all engines. We can likely increase the memory if needed, but increasing the CPU would be more $$$. Let's see if we can get acceptable behavior without increasing the levels first.

So we should pursue paging? @johnml1135

@ddaspit
Copy link
Contributor

ddaspit commented Sep 20, 2023

Here is some info on allocating sufficient CPU and memory to Mongo.

@johnml1135
Copy link
Collaborator Author

I'm checking if we can give more CPU's to Mongo...

@johnml1135
Copy link
Collaborator Author

Working with @g3mackay on allocating CPU's more dynamically - https://stackoverflow.com/questions/52487333/how-to-assign-a-namespace-to-certain-nodes. We should be able to ensure that Mongo gets up to (or more than) 2 CPU's without starving the Serval API.

@Enkidu93
Copy link
Collaborator

What has yet to be done in this issue? Where do we go from here, @johnml1135?

@johnml1135
Copy link
Collaborator Author

Please add the scripts to the Serval repo to refer to them in the future as needed.

@Enkidu93
Copy link
Collaborator

Addressed here - at least for the time being

@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Serval Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants