Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add Collection to Project and Reference #6

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Zethson
Copy link
Member

@Zethson Zethson commented Nov 25, 2024

Coming out of a discussion with Sunny.

  1. References are defined on a Collection level for CxG.
  2. We don't have a way to sync References between Artifact and Collection. Therefore we have decided not to also add them on the Collection level.

This PR is blocked until we have discussed this topic further.

@Zethson Zethson added bug Something isn't working BLOCKED labels Nov 25, 2024
@sunnyosun sunnyosun added documentation Improvements or additions to documentation and removed bug Something isn't working labels Nov 25, 2024
@falexwolf
Copy link
Member

I think that's perfectly reasonable, I'd just do it.

@sunnyosun
Copy link
Member

I think that's perfectly reasonable, I'd just do it.

I think you forgot about our discussion when we removed all links to collections except ULabel. How would you resolve 2?

@falexwolf
Copy link
Member

falexwolf commented Nov 25, 2024

I think you forgot about our discussion when we removed all links to collections except ULabel. How would you resolve 2?

No, I didn't forget about it.

I'd use the same rationale that we have for ULabel: https://docs.lamin.ai/lamindb.collection#lamindb.Collection.ulabels

Labeling a collection is simply completely independent from labeling artifacts. The ulabels of the artifacts won't be ulabels of the collection and vice versa. If one wants to traverse the ManyToMany and access the other entity's annotations, one has to write the corresponding query, e.g. the following would give all collections that are annotated with T cells via their artifacts:

ln.Collection.filter(artifacts__cell_types__name="T cell").df()

All this being said I can also understand if we don't want to link Collection to Reference and Project and want to rely on the relationship with Artifact for these.

For Project, I can totally see that a new project groups artifacts from an old project and these annotations make sense independent from one another. So, I'd add projects: Project to Collection.

Adding Reference seems dangerous because then people only add it to the collection and not to the artifact or the other way round and end up being confused where it is. So, I'm leaning towards not adding Reference just like we don't add cell_types to Collection for this reason.

What do you guys think?

@sunnyosun
Copy link
Member

I agree with you, but what's urgent here is the reference. If we don't add the reference m2m, I'd postpone this PR until we need project.

@falexwolf
Copy link
Member

Moved forward with adding both: 370a869

The references link doesn't anchor on features and thereby won't be confusing, I think. In particular if this is used for study reports, it's helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BLOCKED documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants