Implement Reader Performance Queries for /datafiles and /datasets #488

MRichards99 · 2024-09-20T13:58:09Z

This PR will close #{issue number}

Description

These changes allow efficiency gains to be made when either the datafiles or datasets are being queried, with a WHERE filter for an ID of the parent entity (e.g. WHERE dataset.id = 4 on /datafiles). The changes have been implemented for the count endpoints of datafiles and datasets.

If a relevant request comes in, the API queries the parent entity directly, to check whether the user can see it (i.e. has the permissions to). If the user can, their original query is sent to ICAT (JPQL query constructed by Python ICAT as normal) but it is executed as the 'reader' user (as configured in config.yaml). This should bypass the additional complexity added to queries by ICAT Server, which is causing performance issues. As I don't have rules setup in my ICAT, I wasn't able to test this, but hopefully testing next week done by others should test that.

This functionality is completely optional - if the config isn't in config.yaml or it is disabled, the API will behave as normal (i.e. pass user queries onto ICAT as the user of the API request).

Testing Instructions

Add a set up instructions describing how the reviewer should test the code

Review code
Check GitHub Actions build
If icatdb Generator Script Consistency Test CI job fails, is this because of a deliberate change made to the script to change generated data (which isn't actually a problem) or is here an underlying issue with the changes made?
Review changes to test coverage
Does this change mean a new patch, minor or major version should be made? If so, does one of the commit messages feature fix:, feat: or BREAKING CHANGE: so a release is automatically made via GitHub Actions upon merge?
{more steps here}

Agile Board Tracking

Connect to #{issue number}

codecov · 2024-09-25T08:28:18Z

Codecov Report

Attention: Patch coverage is 7.36842% with 88 lines in your changes missing coverage. Please review.

Project coverage is 43.45%. Comparing base (a9f336c) to head (b3807ae).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
...i/src/datagateway_api/icat/reader_query_handler.py	0.00%	57 Missing ⚠️
...atagateway_api/src/datagateway_api/icat/helpers.py	0.00%	28 Missing ⚠️
...atagateway_api/src/datagateway_api/icat/backend.py	0.00%	2 Missing ⚠️
datagateway_api/src/common/filters.py	50.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a9f336c) and HEAD (b3807ae). Click for more details.

HEAD has 3 uploads less than BASE

Flag BASE (a9f336c) HEAD (b3807ae)

4 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #488       +/-   ##
===========================================
- Coverage   96.66%   43.45%   -53.22%     
===========================================
  Files          39       40        +1     
  Lines        3242     3325       +83     
  Branches      317      326        +9     
===========================================
- Hits         3134     1445     -1689     
- Misses         80     1872     +1792     
+ Partials       28        8       -20

Flag	Coverage Δ
	`43.45% <7.36%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kevinphippsstfc

Having reviewed the code, made a number of changes to it and tested it on preprod, I'm happy that it is functioning as required. I think there are just a few things to complete the code to a "production" level: complete the TODOs, add docstrings to each function, and add some tests for the new functionality. That last one may not be trivial given that it probably requires starting up a new instance of the API using a different config.

- If there's an issue when logging in (e.g. due to invalid credentials), the user will now return a 500, not a 403

- Add docstrings - Remove client_pool kwargs - Add type hints - Fix any outstanding linting issues

- As ever, the vulnerabilities that are being ignored are because we cannot upgrade due to being tied to 3.6

- The final diff in the CI will still fail because `main` doesn't contain these dropped columns, but the initial diff to check that two runs produce identical data should pass

MRichards99 · 2024-10-02T16:32:49Z

This is ready for review. I've addressed the TODOs, added documentation in the form of docstrings and added a test class to test the new functionality. Quite a bit of time went into fixing the CI which was broken, presumably because a PR hasn't gone through this repo for a little while. I've also edited a couple of the commit messages so a new release will be made.

The generator script CI job fails but this is intentional - final diff in the CI will still fail because main doesn't contain two dropped columns that I've added to an SQL script, but the initial diff to check that two runs produce identical data passes. This is what I expected to happen :)

kevinphippsstfc

Looks good!

Thanks for tidying up the code and adding the tests.

And thanks for implementing this bit of functionality which appears to have been successful in reducing database load.

MRichards99 added 2 commits September 18, 2024 12:23

Add config section for reader performance query option

720dbf4

Add class to handle queries which should be done by reader account

c9449f4

kevinphippsstfc force-pushed the reader-performance-query branch from 475e1a1 to d7f74c0 Compare September 26, 2024 08:24

kevinphippsstfc requested changes Sep 30, 2024

View reviewed changes

MRichards99 and others added 16 commits October 2, 2024 16:01

feat: Implement ReaderQueryHandler for entity and count endpoints

60c5254

Add repr for WHERE filter to make logging them more readable

350480b

Ensure API works when reader config isn't present in config file

292cfef

Use clients from the pool for reader queries

b47391a

Add entity type lookup when checking user access

6eed052

Move reader_client into ReaderQueryHandler

6a5d1e5

Remove unused imports

11f4a7b

Ensure reader client only initialised once

7b4d5af

Do config check before ReaderQueryHandler creation

dca658f

Move import and remove whitespace

c5325b0

Improve exception handling on reader account login

74185f9

- If there's an issue when logging in (e.g. due to invalid credentials), the user will now return a 500, not a 403

Code cleanup

a36b607

- Add docstrings - Remove client_pool kwargs - Add type hints - Fix any outstanding linting issues

Upgrade dependencies and ignore vulnerabilities

642ea8b

- As ever, the vulnerabilities that are being ignored are because we cannot upgrade due to being tied to 3.6

build: Fix installation issues in modern Python 3.8+

88d6390

Add columns to drop list for SQL dump diff

b7b2b07

- The final diff in the CI will still fail because `main` doesn't contain these dropped columns, but the initial diff to check that two runs produce identical data should pass

Add tests for reader performance queries

e361e2b

MRichards99 force-pushed the reader-performance-query branch from cad8b4f to e361e2b Compare October 2, 2024 16:01

MRichards99 marked this pull request as ready for review October 2, 2024 16:32

MRichards99 requested a review from kevinphippsstfc October 2, 2024 16:32

kevinphippsstfc approved these changes Oct 7, 2024

View reviewed changes

MRichards99 merged commit 7fba647 into main Oct 8, 2024
17 of 18 checks passed

MRichards99 deleted the reader-performance-query branch October 8, 2024 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Reader Performance Queries for /datafiles and /datasets #488

Implement Reader Performance Queries for /datafiles and /datasets #488

MRichards99 commented Sep 20, 2024

codecov bot commented Sep 25, 2024

kevinphippsstfc left a comment

MRichards99 commented Oct 2, 2024

kevinphippsstfc left a comment

Implement Reader Performance Queries for /datafiles and /datasets #488

Implement Reader Performance Queries for /datafiles and /datasets #488

Conversation

MRichards99 commented Sep 20, 2024

Description

Testing Instructions

Agile Board Tracking

codecov bot commented Sep 25, 2024

Codecov Report

kevinphippsstfc left a comment

Choose a reason for hiding this comment

MRichards99 commented Oct 2, 2024

kevinphippsstfc left a comment

Choose a reason for hiding this comment