Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Move all modules into root namespace. #862

Closed
wants to merge 1 commit into from

Conversation

csadorf
Copy link
Contributor

@csadorf csadorf commented Nov 18, 2022

Description

Closes #756 .

Motivation and Context

Checklist:

@csadorf csadorf added expertise-needed Extra attention is needed refactor Code refactoring labels Nov 18, 2022
@csadorf csadorf linked an issue Nov 18, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Nov 18, 2022

Codecov Report

Merging #862 (d22c0d5) into next (52ff017) will decrease coverage by 0.75%.
The diff coverage is 85.51%.

@@            Coverage Diff             @@
##             next     #862      +/-   ##
==========================================
- Coverage   86.32%   85.57%   -0.76%     
==========================================
  Files          51       42       -9     
  Lines        4687     4498     -189     
  Branches     1022      977      -45     
==========================================
- Hits         4046     3849     -197     
- Misses        456      471      +15     
+ Partials      185      178       -7     
Impacted Files Coverage Δ
signac/_synced_collections/__init__.py 100.00% <ø> (ø)
signac/_synced_collections/_caching.py 0.00% <0.00%> (ø)
signac/_synced_collections/backends/__init__.py 100.00% <ø> (ø)
..._synced_collections/backends/collection_mongodb.py 90.90% <ø> (ø)
...c/_synced_collections/backends/collection_redis.py 65.62% <ø> (ø)
...ac/_synced_collections/backends/collection_zarr.py 88.00% <ø> (ø)
signac/_synced_collections/buffers/__init__.py 100.00% <ø> (ø)
..._collections/buffers/memory_buffered_collection.py 100.00% <ø> (ø)
signac/_synced_collections/data_types/__init__.py 100.00% <ø> (ø)
signac/_synced_collections/data_types/attr_dict.py 100.00% <ø> (ø)
... and 49 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be targeting master. We rebased next and merged it into master. At present, next is out of date and can be archived/deleted (unless it preserves some history we want to keep).

@@ -28,6 +28,7 @@ repos:
- id: pyupgrade
args:
- --py38-plus
exclude: '_vendor'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to use the same (?x)^( syntax as the other exclusions.

^signac/common/configobj/|
^signac/common/deprecation/|
^signac/_vendor/configobj/|
^signac/_vendor/deprecation/|
^signac/db/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line be removed?

@@ -31,7 +31,7 @@ concurrency = thread,multiprocessing
parallel = True
source = signac
omit =
*/signac/common/configobj/*.py
*/signac/_vendor/configobj/*.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this broader?

Suggested change
*/signac/_vendor/configobj/*.py
*/signac/_vendor/*

@@ -637,7 +635,7 @@ def main_update_cache(args):
# UNCOMMENT THE FOLLOWING BLOCK WHEN THE FIRST MIGRATION IS INTRODUCED.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this PR is targeting next. The next branch is no longer up to date, and this PR should be targeting master.

@vyasr
Copy link
Contributor

vyasr commented Nov 26, 2022

Here's what the directory structure currently looks like in this PR:

signac
├── config.py
├── dict_manager.py
├── diff.py
├── errors.py
├── filterparse.py
├── h5store.py
├── hashing.py
├── import_export.py
├── __init__.py
├── job.py
├── linked_view.py
├── __main__.py
├── migration
│   ├── __init__.py
│   ├── v0_to_v1.py
│   └── v1_to_v2.py
├── project.py
├── schema.py
├── _searchindexer.py
├── sync.py
├── syncutil.py
├── testing.py
├── utility.py
├── validate.py
├── _vendor
│   ├── configobj
│   │   ├── __init__.py
│   │   ├── LICENSE
│   │   ├── validate.py
│   │   └── _version.py
│   └── deprecation
│       ├── __init__.py
│       ├── LICENSE
│       └── README
└── version.py

Overall I would say that this is pretty good. Merging in master and retargeting this PR will remove some modules that are no longer needed. Here are some suggestions for improvement (I'll bold the items which I think should be done in this PR and leave unbolded those items that should be done in follow-ups), curious to hear what others think about them:

  • Having some subpackage for internal functionality would be helpful. When I originally suggested having more at the top level, I was thinking of functions/classes that the user might interact with directly. There are some pieces that should remain internal and could be hidden in an internal subpackage, such as:
    • _searchindexer.py
    • dict_manager.py (although I'm not convinced it even needs a separate module, see below)
    • filterparse.py
    • utility.py
  • In general, I would like to avoid unnecessary levels of indirection caused by having single-use functions defined in different modules than where they are primarily used. There are a number of such examples, and I doubt that this list is exhaustive:
    • syncutil.py should be merged into sync.py. Yes, it makes the module larger, but 600 vs 900 LOC isn't that big a deal IMO when the difference is a largely arbitrary division of a single module's functionality.
    • Many of the functions in utility.py are used only by the CLI. They should be moved to __main__.py.
    • Some of the functions in import_export.py are reused by linked_view.py and should probably be moved to utility.py
    • The id computed by hashing.calc_id is fundamental to the data layout defined by project/job, and hashing.py is a tiny module. I would rather define calc_id in job.py where it is immediately obvious what ids it computes, and then import that into project.py as needed. I have also considered whether it would make sense to make this method private, but since the nature of the project/job data layout is part of the public API I think it would be OK to leave this public. What do others think?
    • The dict_manger.py module could probably be merged into h5store.py. It is theoretically more general, but if we're not exposing it publicly then it should be fine for us to reevaluate its placement in a separate module some time in the future if we find additional uses for it.
  • There are some public free functions that are implementations of Job or Project methods. These are in modules like diff.py, linked_view.py, and import_export.py. For the 2.0 release we should consider a couple of questions:
    • Do we want to organize this into some shared subpackage, e.g. signac.functions?
    • Do we want to continue supporting calling this directly, or should we funnel people towards the class methods? Alternatively, should we remove the class methods and always have people use the free functions? I don't think we've ever communicated to users the cost/benefits of the alternate approaches. I think we're better off just settling on one to simplify the presented API (see the next bullet for some additional considerations here).

Thinking ahead to a world where we support different layouts than the current project/job layout, we should make sure that the current organization would support such an API (with no expectation that this gets implemented any time soon, just trying to future-proof the design). Here's one example of how this is relevant now. If we view signac's future as supporting arbitrary (or at least much more general) layouts of directories, APIs like sync_projects don't really make sense. However, it's also not clear that every directory layout can or should support all of the functions. Taking sync as one such API, one way we could think about this level of flexibility is to implement abstract methods like Directory.sync and then require different data models to be subclasses of a Directory that provide specific implementations. Alternatively, we could implement this as something like C++'s "pointer to implementation" strategy where Directory.sync is concrete and dispatches to an appropriate internal function depending on the current Directory's selected data layout. The choices we make here are closely intertwined with the previous question I raised about class methods vs free functions.

@bdice
Copy link
Member

bdice commented Nov 26, 2022

I largely agree with your assessments @vyasr. For the places you asked for feedback:

  • I’m happy to leave calc_id public.
  • I’m -1 on signac.functions but could be persuaded to remove them in favor of only exposing Project methods
  • I’m not a fan of reworking the connection between data model and Project any further at this time. I concur more generality could be achieved but I think the use cases are very limited right now (no users are clamoring for this) and it is out of scope for 2.0. I don’t want to spend further time in that direction.

@vyasr
Copy link
Contributor

vyasr commented Nov 26, 2022

I’m -1 on signac.functions but could be persuaded to remove them in favor of only exposing Project methods

To clarify, you're saying that if we keep these functions public you would prefer that they stay at the top level of the package? I don't particularly like that we could end up littering the top level of the package with files like import_export.py etc since they are really subsets of e.g. project.py/job.py logic, but I could be persuaded otherwise as well. On the other hand if we remove them and only leave Project/Job methods then we could move to signac._internals and satisfy everybody.

I’m not a fan of reworking the connection between data model and Project any further at this time. I concur more generality could be achieved but I think the use cases are very limited right now (no users are clamoring for this) and it is out of scope for 2.0. I don’t want to spend further time in that direction.

I don't want to spend much time, only as much as is required to address the other questions I raised. If we have a decision that we already have to make about other APIs as part of the package reorg, I would like us to make the decision that would most easily allow implementing the more general layouts in the future without API breaks (or at least weigh that factor in). I also don't want to spend a lot of time thinking about it, but I do think it's worth considering in our discussions.

@bdice
Copy link
Member

bdice commented Nov 27, 2022

To clarify, you're saying that if we keep these functions public you would prefer that they stay at the top level of the package?

Yes.

we could end up littering the top level of the package with files like import_export.py etc since they are really subsets of e.g. project.py/job.py logic

I know we debated this when designing signac.diff_jobs. It is possible to diff any jobs, regardless of whether they belong to the same project or not. For import/export, the Project methods just call the top level function. However, there is only a top level signac.diff_jobs and no method Project.diff_jobs.

I'm lightly biased towards the status quo -- I don't think the proposed motivation to change (reducing friction for an arbitrarily general future that isn't hotly demanded / receiving current-era developer attention) is sufficiently strong, especially as package development slows. Unless you have other reasons in mind or think we can get a consensus of developers/users to agree that this is worth considering further, I would try to follow the minimal/expedient/least-breaking path here.

@vyasr
Copy link
Contributor

vyasr commented Nov 27, 2022

Unless you have other reasons in mind or think we can get a consensus of developers/users to agree that this is worth considering further, I would try to follow the minimal/expedient/least-breaking path here.

That goal is secondary, you may have missed the primary one:

I think we're better off just settling on one to simplify the presented API (see the next bullet for some additional considerations here).

I think having a single way to do things is easier to teach. To take a specific example, if we move import_export to the top level of the package and make it publicly visible, it's almost as easily discoverable as Project.export_to. If a user of Project.export_to discovers signac.export_jobs and asked me which one to use I think I would just shrug and say it doesn't matter. Not really Python Zen:

There should be one-- and preferably only one --obvious way to do it.

On a tangentially related note, after seeing how signac-flow's FlowProject has evolved I am somewhat wary of adding methods to a class only to see it turn into a container for far too much functionality and implementation. signac's current approach of having free function implementations that are called by class methods helps, but from a simplicity perspective I would prefer to have more functionality in free functions that users are encouraged to call. It improves encapsulation (assuming that the free functions only rely on the public APIs of the class) and makes the code more composable. So that's another consideration for why I'm proposing we reconsider having both APIs available.

Having said that, this is a pretty minor point. I'm also pretty OK with leaving both visible, I'm just nitpicking while we have the opportunity before anything gets set in stone for 2.0.

@bdice bdice mentioned this pull request Dec 3, 2022
6 tasks
@bdice
Copy link
Member

bdice commented Dec 3, 2022

Closing in favor of #868.

@bdice bdice closed this Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expertise-needed Extra attention is needed refactor Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reorganize modules for signac 2.0
3 participants