-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PostgresSQL backend for performance analysis #26
Comments
To add some consistency check on the database, vocabularies should also be put into the backend: -- technically PPN is an integer with checksum so more could be improved later
CREATE DOMAIN ppn AS TEXT CHECK (VALUE ~* '^[0-9]+[0-9X]$');
CREATE TABLE IF NOT EXISTS Vocabulary (
key text NOT NULL,
jskos json NOT NULL DEFAULT '{}'::json,
PRIMARY KEY (key),
CONSTRAINT valid_key CHECK (key ~* '^[a-z]+$')
);
CREATE TABLE IF NOT EXISTS Subject (
ppn ppn NOT NULL,
voc text NOT NULL,
notation text NOT NULL,
FOREIGN KEY (voc) REFERENCES Vocabulary (key),
); |
|
Batch import can be optimized by using There are still some issues to be sorted out with this though, and I think we can do something similar for SQLite as well (although SQLite is, for some reason, much faster with inserting data). |
I think I'll be able to push my changes with batch import soon. Then we can test the performance between the two backends on our server. For some reason, on my local machine, SQLite is significantly faster than PostgreSQL, although in the past I had the opposite experience. I think much depends on disk performance which is fairly bad on our server... |
Changes are now pushed to Dev. I'll do the performance comparison soon. |
Okay, while I didn't do any scientific tests on the performance, my results are fairly clear (these are all performed on our server with the current Dev version of occurrences-api):
Overall, SQLite seems to be about 2x faster for the usual queries and seems to have a higher cache limit. This is not what I expected, to be honest, especially since our dataset has over 80 million rows. It seems like staying with SQLite is the better choice in our case, even though I expect things could be optimized for PostgreSQL more. Also in both cases, performance is severely limited by our server's slow disk performance. My laptop (which has a fast NVMe SSD) is about 4-5x faster. |
b5c3a41 added backend method |
The question is whether we really want to support both SQLite and PostgreSQL in the long run. In theory, there won't be too many things to add, so there might not be much work, but if we decide to stay with SQLite anyway, it might be better to remove PostgreSQL support again. What do you think? |
We may later drop PostgreSQL but SPARQL and in particular SRU are needed so we will have multiple backends with different capabilities anyway. Let's keep it as experimental. |
Originally posted by @stefandesu in #17 (comment)
The text was updated successfully, but these errors were encountered: