Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

also cast doc_id to str #3031

Merged
merged 3 commits into from
Dec 16, 2024
Merged

Conversation

seanstory
Copy link
Member

@seanstory seanstory commented Dec 13, 2024

Closes https://github.com/elastic/search-team/issues/8956

We were ensuring that the ingested doc[id] was a string, but at the time we computed:

                if doc_id in existing_ids:

doc_id could be numeric, but existing_ids were guaranteed to be strings.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

Release Note

Fixes a bug where full syncs may delete documents they just ingested if the document ID when fetched from the 3rd party was numeric.

@jedrazb
Copy link
Member

jedrazb commented Dec 16, 2024

doc_id could be numeric, but existing_ids were guaranteed to be strings

Is it because we are checking if doc["_id"] is present in the set of doc["id"]s (existing ids) ... ? As a followup, we might want to improve the logic to always operate on _ids rather than ids.

@seanstory seanstory merged commit 21c0edb into main Dec 16, 2024
2 checks passed
@seanstory seanstory deleted the seanstory/always-treat-doc-id-as-str branch December 16, 2024 14:36
github-actions bot pushed a commit that referenced this pull request Dec 16, 2024
github-actions bot pushed a commit that referenced this pull request Dec 16, 2024
Copy link

💔 Failed to create backport PR(s)

Status Branch Result
8.16 Commit could not be cherrypicked due to conflicts
8.17 #3034
8.x #3035

Successful backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 3031 --autoMerge --autoMergeMethod squash

@seanstory
Copy link
Member Author

As a followup, we might want to improve the logic to always operate on _ids rather than ids.

Yep, covered in #2776

Is it because we are checking if doc["_id"] is present in the set of doc["id"]s (existing ids) ... ?

Right. We have some connectors that emit docs with _id in their dict

seanstory added a commit that referenced this pull request Dec 16, 2024
(cherry picked from commit 21c0edb)

# Conflicts:
#	connectors/es/sink.py
#	tests/test_sink.py
@seanstory
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.16

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

seanstory added a commit that referenced this pull request Dec 16, 2024
seanstory added a commit that referenced this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants