Performance / scalability issues #1927
Replies: 3 comments
-
There are definitely areas we can improve performance, especially around the queries. However, scalability depends on many factors. Most important is probably what hardware you are running on. Obviously a sharded, horizontally scaled array of database servers with tons of RAM will perform better. It is difficult to create a solution that will handle every case. But even vertically scaling a single database should provide some performance upgrade. But usually scaling is outside the scope of the application because it is generally a hardware issue. There are also potential issues with Prisma. For instance, if you are using pgbouncer, you need to add That being said, we are looking at some additional solutions like Clickhouse and Timescaledb. |
Beta Was this translation helpful? Give feedback.
-
Update, we are currently working on ClickHouse support. |
Beta Was this translation helpful? Give feedback.
-
Sounds interesting! Let me know if I can help by being be a beta tester. |
Beta Was this translation helpful? Give feedback.
-
First of all, thanks for providing this excellent tool open source!
I have been using it for about half a year now, but I'm starting to run in to more and more performance issues.
To give some reference about the amount of data:
I have multiple websites in the database, but 95%+ of the data is from a single website.
I'm running umami 1.33.1 on a Linode VM with SSD and dedicated 4 core CPU option. Both umami and postgres are running in docker with the docker compose file provided with umami.
I noticed a lot of "Something went wrong" messages in the dash boards, especially if you pick periods longer periods. Currently last 7 days works, last 30 days starts to fail on a few of the items, beyond 30 days most things start to fail.
I checked the server and saw a high load from the postgres database. The connections seem to timeout, so I tried adding
pool_timeout=60
to the database connection string. This resulted in a similar situation. With long periods such as all-time it to crash with an out of memory error (using over 2gb of memory on the umami container):This indicates that on top of database query performance issues, scaling also seems to be limited by memory as part of the data is probably processed in memory in a non-streaming way?
I took a quick look at the running queries, the first one I saw is:
This query has an inefficient and seemingly superfluous join with the pageview table, removing it makes the query over 10 times faster.
I assume this is the matching nodejs code:
The inefficient IN join seems to be to allow a generic way of filtering the metrics. Perhaps using a separate query without a join on pageviews would be better when no filters are provided?
In any case, this is just the first query I looked at. There seems to be many more queries that don't perform well.
I did see a branch brian/um-13-query-optimizations. I took one of the optimizations from that branch, changing
to:
date_trunc('${unit}', ${field}),
and applied that to one of the other slow queries, which seemed to reduce 50% or so on that query. So applying that optimization throughout the entire application would certainly be helpful.
But even if all queries would become faster, there also seems to be the out of memory issue? I didn't manage to trace that down to a certain piece of code though.
Do you plan to make umami scalable to larger amounts of page views in the future?
Just a thought: umami has charts based on hours and on days. The day based charts are currently calculated on-demand using all the individual sessions/pageviews. For high traffic websites this could easily be hundreds of thousands of records to process for each day in the report. Perhaps it would be an idea to periodically create/update aggregated data on a day (and perhaps also hour) base persisted in the database? You would be able to create much faster reports then.
I'm not very familiar with NodeJS, but I am proficient in JavaScript (I'm an experienced Java / TypeScript developer). I'm willing to help.
Beta Was this translation helpful? Give feedback.
All reactions