-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cv2 5085 move get items with similar text to presto #2023
Merged
DGaffney
merged 6 commits into
epic/cv2-5050-text-vectorization-via-presto
from
cv2-5085-move-get-items-with-similar-text-to-presto
Sep 11, 2024
Merged
Cv2 5085 move get items with similar text to presto #2023
DGaffney
merged 6 commits into
epic/cv2-5050-text-vectorization-via-presto
from
cv2-5085-move-get-items-with-similar-text-to-presto
Sep 11, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review
* CV2-5080 update request model alegre calls to use presto-based alegre querying * move to sync * update for bypassing async calls in tests
caiosba
reviewed
Sep 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DGaffney is it expected that this PR is against develop
and not the epic branch?
DGaffney
changed the base branch from
develop
to
epic/cv2-5050-text-vectorization-via-presto
September 9, 2024 13:19
Good catch, fixed! |
…085-move-get-items-with-similar-text-to-presto
caiosba
approved these changes
Sep 10, 2024
melsawy
approved these changes
Sep 10, 2024
DGaffney
merged commit Sep 11, 2024
c361771
into
epic/cv2-5050-text-vectorization-via-presto
4 checks passed
DGaffney
deleted the
cv2-5085-move-get-items-with-similar-text-to-presto
branch
September 11, 2024 13:55
DGaffney
added a commit
that referenced
this pull request
Oct 24, 2024
* Cv2 5082 article indexing to presto (#1994) * CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review * Cv2 5080 request model to presto (#2015) * CV2-5080 update request model alegre calls to use presto-based alegre querying * move to sync * update for bypassing async calls in tests * Cv2 5086 smooch nlu to presto 2 (#2019) * Cv2 5082 article indexing to presto (#1994) * CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review * CV2-5086 second attempt on clean Smooch NLU to Presto branch * fix broken test stubs * fix typo brought over from previous PR * alias and rename per caio review * fix syntax * Mayyyyybe its the alias? * fix old reference * move another stale function reference * Replace with alias * symbolize aliased method names * Revert to proper function * Cv2 5085 move get items with similar text to presto (#2023) * Cv2 5082 article indexing to presto (#1994) * CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review * Cv2 5080 request model to presto (#2015) * CV2-5080 update request model alegre calls to use presto-based alegre querying * move to sync * update for bypassing async calls in tests * CV2-5085 move get_items_from_similar_text calls to use sync endpoint * review and resolve broken tests * update stub * Cv2 5084 Update Reindexing to use async presto endpoint (#2031) * CV2-5084 update reindexing strategy to use singular requests for now * switch to async * mrege in latest changes on epic branch * update fixture * fix stub path * Update reindex_alegre_workspace.rb * CV2-5081 switch text to presto-based querying (#2034) * CV2-5081 switch text to presto-based querying * more test stub updates * fix more stubs * update stub and typo * CV2-5050 add explicit callback for text * update stub * more tweaking during testing * Bot events for test endpoint * move async query to sync * CV2-5324: create a method for create relationship and use it everywhere (#2053) * A couple of improvements for shared feeds (#2056) * Making sure that the way the feed Cluster.last_request_date field is calculated is the same as the ClusterTeam.last_request_date * Make sure that if a parent item is tagged, all child items are also included in the cluster, even if, individually, they are not tagged Reference: CV2-5331. * CV2-5371: fix sentry issue (#2058) * Update setuptools module and pin to known good version. (#2059) * 5120 – Dont create duplicate tags and clean up `#` (#2054) Context While looking into a tags issue I noticed a few things: - when we made a request with duplicate tags: - we got an error, so the job was retried - the tag was added twice to the FactCheck - the tag is added once to the ProjectMedia - when we made a request with a tag with a # - we got an error, so the job was retried - there seem to have been two errors related to this: - ActiveRecord::RecordInvalid: Tag already exists - ActiveRecord::RecordInvalid: Text has already been taken - the tag with the # is added to the FactCheck - the tag is not added to the ProjectMedia What was happening There are a few things happening at the same time: - Creation of a ProjectMedia with tags - Creation of a FactCheck with tags - Tags are an Object from the Tag Class for ProjectMedia, but are a simple array for FactCheck For the ProjectMedia: - It would create the tag, then it would try to create the same tag again, and then it would fail and retry again, and so on - This happen because we have validations in place for the Tag class For FactCheck: - It would just create the tags twice - Because we have no validations for the tags array TLDR: There were some issues in the tags clean-up before we create them for ProjectMedia and FactCheck, and there was a mismatch between them. How it was fixed - I created a helper to clean up the tags before creating them - We need to make sure tags are: - stripped - unique - don't have a prepending `#` - We use this helper both in FactCheck.rb and Tag.rb References: 5120 PR: 2054 * CV2-5005: Sentry issue related to ES (#2057) * CV2-5005: limit ES date to updated fields only * CV2-5005: fix tests * CV2-5005: test coverage * CV2-5371: fix sentry error (#2062) * Fix setuptools version pin for check-api builds (#2063) * Pin version for this distribution. * CV2-5391: use save instead of save! to avoid raising error (#2061) * CV2-5391: use save instead of save! to avoid raising error * CV2-5391: fix archived validation error * Reset item status to default one when claim/fact-check is detached. (#2064) Fixes CV2-4502. * CV2-5418: fix sentry issue (#2066) * Add ukrainian translation (#2065) * Add ukrainian translation * Add 'uk' to config.i18n.available_locales * CV2-5420: include cached fields that require ES or PG updates (#2067) * :create_project_media_tags should be able to ignore tag already added to item (#2068) A small fix to how tags are created in the background to make sure :create_project_media_tags is able to ignore tag already added to item. References: 5426, 5120 PR: 2068 * CV2-5392: export cluster description (#2070) * Setting initial value for `last_request_date` for feed clusters. (#2072) There was a change introduced in CV2-5331 that normalized how the `last_request_date` field for a shared feed cluster is calculated. But there is an issue: if the cluster doesn't have any request, no value is set. The fix needed here is to be sure that there is an initial value, which can be the same date as the last item that joined the cluster, when this item has no requests. Fixes: CV2-5446. * set the annotator to be the “Smooch Bot” (#2069) Set the annotator to "Smooch Bot" so that the bot user is displayed in the content warning cover when a user is blocked. Reference: CV2-5142 * CV2-5419: rescue ActiveRecord::RecordNotUnique for relationship save (#2071) * CV2-5419: rescue ActiveRecord::RecordNotUnique for relationship save * CV2-5419: delete suggested relation before create confirmed one * CV2-5451: set confirmed before creation based on relationship type (#2073) * Make sure that an item can't be related to itself. (#2075) We already had an Active Record validation for that, but it was bypassed for straight updates or race conditions. This PR makes it more robust by adding a database constraint. Fixes: CV2-5437. * Report bad relationship structure only if relationship is not nil. (#2074) Fixes: CV2-5454. * Revert rejecting suggestion if relationship creation fails. (#2076) When a confirmed relationship is created, if there is any suggestion between the same two items, the suggestion should be delete. We had logic in two different places for that. This PR keeps this logic in only one place (the relationship model), so I removed the logic from the `create_unless_exists` method, which now also takes the relationship type into account. This way we have the logic in only one place, and having this logic as a `before_create` that contains a transaction means that the `destroy` of the suggestion will be rolled back by PostgreSQL if the `create` fails. Fixes: CV2-5436. * Fixing Sentry error * CV2-5434: skip process the TeamTask background job if there is a more recent one (#2078) * Setting retry interval for GenericWorker. (#2080) This change tries to avoid a single case reported by Sentry: A race condition situation where the related object was still not fully persisted in the database when the job executed. Setting a longer retry interval which should avoid this case. Fixes: CV2-5459. * Adding a log line for outgoing Smooch requests. (#2081) Adding a log line for the outgoing Smooch requests. This can help debugging some issues. Reference: CV2-5378. * Do not send report if search results were already received. (#2082) I noticed this regression introduced by CV2-5451. Now that `confirmed_by` and `confirmed_at` are set for all relationships, even the ones created as confirmed matches, we have a regression here. In order to know if a report should be sent for an accepted suggestion, we can't rely solely on the existence of a value for `confirmed_by` or `confirmed_at`. We know that a suggestion will happen after the relationship was created, so, if `confirmed_at` happens before or at the same time as `created_at`, we know that this is not a suggestion that was accepted, but a relationship that was already created as confirmed match. Reference: CV2-5451. * CV2-5348: refactor ES cached field calling and remove retry_on_conflict (#2083) * CV2-5348: set retry_on_conflict to zero and pass id instead of the object * CV2-5348: skip blank obj * CV2-5348: apply PR comments * CV2-5190: Create a Link and Claim from tipline message that contain both link and long text (#2084) * CV2-5190: create a link and claim from tipline message that contaion link and long text * CV2-5190: handle link and short text * CV2-5190: fix tests * CV2-5190: fix CC * Request/5424 add tagalog translations (#2085) * Add Tagalog hardcoded strings * Bump rails-i18n * Request/5424 add tagalog translations (#2086) * Add Tagalog hardcoded strings * Bump rails-i18n * Bump rails-i18n again. Changed long date format instead of the default one * update fixtures on broken tests * add webmock * review and resolve missing line errors * resolve changes from Sawy * gut source and id from any tests and responses * more tweaking to resolve broken tests --------- Co-authored-by: Caio <[email protected]> Co-authored-by: Mohamed El-Sawy <[email protected]> Co-authored-by: Martin Peck <[email protected]> Co-authored-by: Manu Vasconcelos <[email protected]> Co-authored-by: Alexandre Amoedo Amorim <[email protected]> Co-authored-by: Daniele Valverde <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Move get_items_from_similar_text functionality to use sync endpoint on presto
References: CV2-5079, CV2-5083, CV2-5085
How has this been tested?
Not tested yet - will be breaking all tests first to highlight needed fixes intentionally
Things to pay attention to during code review
Nothing in particular! One question for @caiosba is if we're comfortable using sync for these things, generally speaking - perhaps another refactor at the end of the epic to make everything async?
Checklist