Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend #6619

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

rabbull
Copy link
Contributor

@rabbull rabbull commented Nov 19, 2024

This PR introduces the contains filter operator for the SQLite backend, implemented using custom functions rather than SQLAlchemy's high-level abstractions. Comprehensive tests have been included to cover all potential use cases for contains on the SQLite backend, providing a detailed demonstration of its behavior across various scenarios.

The primary goal of this PR is to replicate the functionality of the contains operator from PostgreSQL on SQLite. However, due to the complex and sometimes unexpected behavior of PostgreSQL's containment operator (#6618), a pure SQL implementation would be highly complex and difficult to maintain. To address this, custom functions were chosen, leveraging the flexibility of high-level programming languages (Python, in this case) to handle complex control flows more effectively.

While invoking functions from other languages introduces potential performance limitations, this trade-off is acceptable compared to the alternative of lacking this functionality entirely. Benchmarks will be added to assess the performance impact and ensure it remains within acceptable bounds.

This implementation serves as a proof of concept to evaluate the feasibility of using custom functions for the contains operator. While functional, the current code structure is still preliminary and requires significant cleanup. If this approach proves successful, a subsequent refactor will address technical debt and improve the maintainability of the codebase. This PR is marked as WIP while the approach is evaluated, benchmarks are added, and the necessary refactor is performed.

Closes #6494

@rabbull rabbull requested a review from GeigerJ2 November 19, 2024 17:35
@rabbull rabbull changed the title [WIP] ORM: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 19, 2024
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 77.86%. Comparing base (c532b34) to head (7e78b22).

Files with missing lines Patch % Lines
src/aiida/storage/sqlite_zip/utils.py 91.67% 2 Missing ⚠️
src/aiida/storage/sqlite_zip/orm.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6619      +/-   ##
==========================================
- Coverage   77.92%   77.86%   -0.05%     
==========================================
  Files         563      563              
  Lines       41671    41695      +24     
==========================================
- Hits        32467    32463       -4     
- Misses       9204     9232      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe shelve this for now, until we have resolved contains for PSQL? Also, just out of curiosity, any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

If we agree that the behavior for PSQL is fine, I'd vouch that we also go the SQLA route here. We just need to make sure the behavior is consistent for both backends, as well as document it somewhere.

src/aiida/storage/sqlite_zip/utils.py Show resolved Hide resolved
@rabbull
Copy link
Contributor Author

rabbull commented Nov 22, 2024

any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

The main reason for opting for a custom implementation over SQLA is it's too hard to implement such an operator in SQL, as well as in SQLA, due to the insufficient support of recursive operations and effective branching in SQL. Even if a viable solution could be carried out, it is most likely to be unreadable and unmaintainable.

@rabbull
Copy link
Contributor Author

rabbull commented Nov 26, 2024

Hi @GeigerJ2,

I’ve updated the tests for this feature by combining all tests for JSON filters dedicated on the SQLite backend into those on the PostgreSQL backend, and removed requires_psql marks for related test cases. This migration also helped uncover and fix a few bugs in the implementation of JSON filters for the PostgreSQL backend.

As for the placement of the custom function _json_contains, I believe its current position is acceptable for now. Since everything related to sqlite_dos is currently located in the sqlite_zip package, which doesn't seem a proper structure for me and is planned to be refactored in the near future (#6607), a partial refactor doesn't look worthwhile.

Please kindly review the new modification when you are available.

@rabbull rabbull changed the title [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 26, 2024
@GeigerJ2
Copy link
Contributor

GeigerJ2 commented Dec 5, 2024

Hi @rabbull, will review this PR soon. Sorry for the delay!

@rabbull
Copy link
Contributor Author

rabbull commented Dec 5, 2024

Hi @rabbull, will review this PR today. Sorry for the delay!

No worries. Take your time.

I was just trying to figure out why it is blocked by the codecov/patch. It seems including files that I didn't changed. I assumed it was because my branch is out-of-date so that comparing to the main branch a lot files looks modified, but it seems not the case because it is still blocking even if I merged main branch :(

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, great work, @rabbull! Let's figure out why codecov fails, and then I'd say we get it merged!

src/aiida/storage/psql_dos/orm/querybuilder/main.py Outdated Show resolved Hide resolved
@rabbull rabbull force-pushed the sqlite-json-contains branch from 322c3d6 to fbb7ee3 Compare December 10, 2024 16:05
@rabbull rabbull self-assigned this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryBuilder: 'contains' operator not implemented for SQLite storage backends
2 participants