Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Filestream input to change file identity to fingerprint without re-ingesting files #41762

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Nov 22, 2024

Proposed commit message

DRAFT

The Filestream input has the ability to update file identifiers, however it never worked as expected, leading to full data duplication when changing the file identity. This commit fixes it to allow changing the file identity from native (inode + device ID) and path to fingerprint without any data duplication.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

The `sourceStore.UpdateIdentifiers` has always been part of the
fileProspector.Init, its purpose is to update the identifiers in the
registry if the file identity has changed, however it was generating
the wrong key and not updating the in memory
registry (store.ephemeralStore).

This commit fixes it and also removes `sourceStore.FixUpIdentifiers`
because it just a working version of
`sourceStore.UpdateIdentifiers`. Now there is a single method to
manipulate identifiers in the `sourceStore`.
This commit checks if 'source' matches the real file by calculating
the registry key using the old identifier, if they match, then update
the registry.
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 22, 2024
Copy link
Contributor

mergify bot commented Nov 22, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Nov 22, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 22, 2024
@belimawr belimawr changed the title 40197 filestream migrate file identity Fix file identity migration on Filestream input Nov 25, 2024
@belimawr belimawr added the bug label Nov 25, 2024
@belimawr belimawr changed the title Fix file identity migration on Filestream input Enable Filestream input to change file identity to fingerprint without re-ingesting files Nov 25, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 25, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use fingerprint file identity by default and migrate all existing filestream inputs to it
2 participants