Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add store_full_path to converters (2/3) #8573

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Amnah199
Copy link
Contributor

@Amnah199 Amnah199 commented Nov 22, 2024

Related Issues

Proposed Changes:

This PR is similar to #8566

Add a new parameter to the __init__ method to allow users to toggle whether the file_path in metadata stores the complete file path or only the base file name. A DeprecationWarning is generated about the upcoming change in file_path storage behavior in the next release.

How did you test it?

Added new test for checking the parameter function
CI

Notes for the reviewer

This is 2/3 PRs that will add the same change to other converters. PR has many files but they all include the same change.

Checklist

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 22, 2024
@coveralls
Copy link
Collaborator

coveralls commented Nov 22, 2024

Pull Request Test Coverage Report for Build 11974098350

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 40 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.02%) to 90.26%

Files with Coverage Reduction New Missed Lines %
components/converters/tika.py 3 93.65%
components/converters/txt.py 3 92.11%
components/converters/json.py 12 85.56%
components/converters/markdown.py 22 37.5%
Totals Coverage Status
Change from base Build 11973695900: -0.02%
Covered Lines: 7914
Relevant Lines: 8768

💛 - Coveralls

@Amnah199 Amnah199 marked this pull request as ready for review November 22, 2024 14:05
@Amnah199 Amnah199 requested review from a team as code owners November 22, 2024 14:05
@Amnah199 Amnah199 requested review from dfokina and davidsbatista and removed request for a team November 22, 2024 14:05
for text, extra_meta in data:
merged_metadata = {**bytestream.meta, **metadata, **extra_meta}

if not self._store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this with the walrus operator, merging if the assignment in one go

                    if file_path := bytestream.meta.get("file_path"):
                        merged_metadata["file_path"] = os.path.basename(file_path)


if not self.store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
if file_path: # Ensure the value is not None for pylint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something as above


if not self.store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
if file_path: # Ensure the value is not None for pylint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something here


if not self.store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
if file_path: # Ensure the value is not None for pylint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something here


if not self.store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
if file_path: # Ensure the value is not None for pylint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something here


if not self.store_full_path and "file_path" in bytestream.meta:
file_path = bytestream.meta.get("file_path")
if file_path: # Ensure the value is not None for pylint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem


deprecations:
- |
The default value of the `store_full_path` parameter will be changed to `False` in Haysatck 2.9.0 to enhance privacy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default value of the `store_full_path` parameter will be changed to `False` in Haysatck 2.9.0 to enhance privacy.
The default value of the `store_full_path` parameter will change to `False` in Haysatck 2.9.0 to enhance privacy.

@davidsbatista
Copy link
Contributor

@Amnah199 looks good, left a comment regarding small improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants