Efficient table structure #618
Replies: 5 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
-
Yes, that's another area of optimizations: collect exact views data first and then move it to aggregated tables. I didn't suggest this because it will require quite a lot of work to implement this. Optimizing table structure is way easier, change int to smallint here, replace strings with ids there, all this little cheap optimiztions do count, when you have millions of rows. The smaller table size on disk - the faster it is scanned. |
Beta Was this translation helpful? Give feedback.
-
Everything is a trade-off. When you normalize data you exchange storage optimization for potential performance loss. Moving the url and referrer to another table now requires two lookups and two potential inserts for every page view. Also, you would be introducing more joins which would further slow down the queries. Not saying these things can't be done, but there is a bigger issue here, and that is the migration path. Database migrations are very difficult and error prone. You can't simply update to the next version and have everything work. We'd have to provide a migration script that works across many different setups, including docker, and have it run without screwing up people's data. This change may have to be reserved for a future 2.0 version so that people at least know what they are getting into. @joshinat0r I'm well aware of the performance issues once data starts getting very large. Remember this is not an enterprise product and for most people the common setup works fine, so we have to strike a balance. We can start exploring things like aggregate tables like @smartpunter suggested, using caching like redis or memcache, or a third-party solution like https://www.timescale.com/. But again these should be extensions to the current app and not involve major migrations. |
Beta Was this translation helpful? Give feedback.
-
Yes, just realized this after getting it all setup. Everything else is just so good, but ugh, not going to take much traffic to really slow things down especially when you have multiple sites. timescaledb is probably the best direction to go. Guess i need to resetup as postgres vs mysql. lol. I already have postgres/timescaledb installed for other purposes. |
Beta Was this translation helpful? Give feedback.
-
Any luck with improvements here? I keep revisiting hoping the storage structure has improved. |
Beta Was this translation helpful? Give feedback.
-
With 10k+ daily views the amount of data, which umami will store in
pageview
table will become quite an issue after several months of collecting.You store url and referrer as strings, which is nor very efficient:
I have just started collecting data and as you can see, your table structure uses around 5-6 times more, then it could, if you would deduplicate most storage demanding 'url' and 'referrer' fields.
You can have 2 separate tables for this values, store strings there and ids in 'pagevew' table, this will greatly reduce storage size, improve performance and is quite easy to implement.
Beta Was this translation helpful? Give feedback.
All reactions