Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Consider each site_package #2920

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

BennettRand
Copy link

Why are the changes needed?

site.getsitepackages() was returning

[
    '/home/username/project/python/.venv/lib/python3.10/site-packages',
    '/home/username/project/python/.venv/local/lib/python3.10/dist-packages',
    '/home/username/project/python/.venv/lib/python3/dist-packages',
    '/home/username/project/python/.venv/lib/python3.10/dist-packages'
]

which when run through os.path.commonpath is

'/home/username/project/python/.venv'

This makes it so that any mod_file compared to the site packages set, will never be a string in the site packages set, thus being included in the list of filed to be packaged for fast serialization.

What changes were proposed in this pull request?

Compare the common path of the module file to each site package in the site package set to see if the mosule file should be skipped.

How was this patch tested?

No tests added, all tests passed in Python 3.12.7

Manually inspected the file tree generated by a pyflyte -v run --remote ... command and the contents of the resulting .tar.gz file.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

welcome bot commented Nov 11, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@wild-endeavor
Copy link
Contributor

any chance you can think of a simple unit test for this?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BennettRand I think the simplest unit test is to monkeypatch site.getsitepackages and check that list_imported_modules_as_files does the right thing.

Signed-off-by: Bennett <[email protected]>
@BennettRand
Copy link
Author

@thomasjpfan @wild-endeavor Just added it.
Should cover the general case and the corner case I ran into.

thomasjpfan
thomasjpfan previously approved these changes Nov 12, 2024
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test! LGTM

pingsutw
pingsutw previously approved these changes Nov 13, 2024

def test_list_imported_modules_as_files():

source_path = "/home/username/project"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For windows CI to pass, I think we'll either need to use os.path.join or pathlib.Path to define the paths.

(Windows uses \ instead of /)

Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.

Project coverage is 51.11%. Comparing base (51f9a3e) to head (71369ed).
Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/tools/script_mode.py 14.28% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2920      +/-   ##
==========================================
+ Coverage   46.91%   51.11%   +4.19%     
==========================================
  Files         199      200       +1     
  Lines       20840    20839       -1     
  Branches     2681     2688       +7     
==========================================
+ Hits         9778    10651     +873     
+ Misses      10582     9693     -889     
- Partials      480      495      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BennettRand The failing windows test looks related to this PR.

@BennettRand
Copy link
Author

@thomasjpfan Yeah, I believe my unit test is showing that list_imported_modules_as_files is not behaving as expected in Windows. I'm working on launching a Windows VM to debug it properly.

@BennettRand
Copy link
Author

os.path.commonpath was raising a ValueError mid-generator, leading it to hit the pass statement in the except block and falsely including some modules.

flytekit/tools/script_mode.py Outdated Show resolved Hide resolved
flytekit/tools/script_mode.py Outdated Show resolved Hide resolved
Comment on lines 251 to 252
root_dir = Path("C:\\") if platform.system() == "Windows" else Path("/")
source_path = root_dir / "home" / "username" / "project"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep this simple, can we have the source_path be Path.cwd()?

Comment on lines 306 to 307
for m, p in zip(modules, module_paths):
m.__file__ = p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on two list being aligned feels a little brittle. Can have the modules be a list of tuples? Something like this:

local_modules = [
    (ModuleType("local_module_1"), str(source_path / "package_a" / "module_1.py")),
    ...
]

all_modules = local_modules + flyte_modules + ...

for mod, mod_path in all_modules:
    mod.__file__  = mod_path

# to be passed in later
modules = [mod for mod, _ in all_modules]

with mock.patch("site.getsitepackages", new=lambda: site_packages):
    file_list = list_imported_modules_as_files(str(source_path), modules)

local_module_paths = [path for _, path in local_modules]
assert sorted(file_list) == sorted(local_module_paths)

Signed-off-by: Bennett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants