Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add SpatialDataCurator #2290

Merged
merged 40 commits into from
Dec 19, 2024
Merged

✨ Add SpatialDataCurator #2290

merged 40 commits into from
Dec 19, 2024

Conversation

Zethson
Copy link
Member

@Zethson Zethson commented Dec 17, 2024

  • Adds a SpatialDataCurator that allows the curation of SpatialData objects
  • Currently depends on PR 806 in SpatialData and expects the sample level (or dataset level) metadata to be stored in a single zattrs key (here sample)
import lamindb as ln
import bionty as bt
import pandas as pd
from spatialdata.datasets import blobs

sdata = blobs()
sdata.attrs["sample"] = {
    "assay": "Visium Spatial Gene Expression",
    "disease": "Alzheimer's dementia",
    "developmental_stage": "very early",
}
sdata.tables["table"].var.index = [
    "ENSG00000139618",  # BRCA2
    "ENSG00000157764",  # BRAF
    "ENSG00000999999"   # Does not exist - to test add new 
]
sdata.tables["table"].obs["region"] = pd.Categorical(
        ["region 1"] * 13 + ["region 2"] * 13
    )

curator = ln.Curator.from_spatialdata(
    sdata,
    var_index={"table": bt.Gene.ensembl_gene_id},
    categoricals={
        "sample": {"assay": bt.ExperimentalFactor.name,
                   "disease": bt.Disease.name,
                   "developmental_stage": bt.DevelopmentalStage.name},
        "table": {"region": ln.ULabel.name},
    },
    organism="human",
)

curator.validate()
curator.add_new_from_var_index("table")
curator.add_new_from(key="developmental_stage", accessor="sample")
curator.add_new_from(key="region", accessor="table")

curator.standardize(key="disease", accessor="sample")

artifact = curator.save_artifact(description="blob spatialdata")
artifact.describe()

image

  • Moves the docs to Python 3.12 because 3.10 caused issues
  • Introduces a new pattern in the build CI using dorny/paths-filter@v3 to condition the groups that are executed. I tested several options and I can only say now that it executes the spatial group but I can't test with this PR the cases where it should NOT execute the spatial group. I'll follow up if necessary.

Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 91.84549% with 19 lines in your changes missing coverage. Please review.

Project coverage is 92.81%. Comparing base (2a17099) to head (dc67a5a).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
lamindb/curators/_spatial.py 91.98% 17 Missing ⚠️
lamindb/curators/__init__.py 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2290      +/-   ##
==========================================
- Coverage   92.82%   92.81%   -0.02%     
==========================================
  Files          55       56       +1     
  Lines        7202     7442     +240     
==========================================
+ Hits         6685     6907     +222     
- Misses        517      535      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
@Zethson
Copy link
Member Author

Zethson commented Dec 17, 2024

Resolved 139 packages in 2.07s
  × Failed to download and build `llvmlite==0.36.0`
  ╰─▶ Build backend failed to determine requirements with `build_wheel()`
      (exit status: 1)

      [stderr]
      Traceback (most recent call last):
        File "<string>", line 14, in <module>
        File
      "/home/runner/.cache/uv/builds-v0/.tmpHwd7nz/lib/python3.10/site-packages/setuptools/build_meta.py",
      line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File
      "/home/runner/.cache/uv/builds-v0/.tmpHwd7nz/lib/python3.10/site-packages/setuptools/build_meta.py",
      line 304, in _get_build_requires
          self.run_setup()
        File
      "/home/runner/.cache/uv/builds-v0/.tmpHwd7nz/lib/python3.10/site-packages/setuptools/build_meta.py",
      line 522, in run_setup
          super().run_setup(setup_script=setup_script)
        File
      "/home/runner/.cache/uv/builds-v0/.tmpHwd7nz/lib/python3.10/site-packages/setuptools/build_meta.py",
      line 320, in run_setup
          exec(code, locals())
        File "<string>", line 55, in <module>
        File "<string>", line 52, in _guard_py_ver
      RuntimeError: Cannot install on Python version 3.10.15; only versions
      >=3.6,<3.10 are supported.

  help: `llvmlite` (v0.36.0) was included because `lamin-spatial` (v0.1.0)
        depends on `spatialdata` (v0.2.6) which depends on `numba` (v0.53.1)
        which depends on `llvmlite`

docs job

@Koncopd
Copy link
Member

Koncopd commented Dec 17, 2024

We had this many times, it is a uv problem, sometimes its resolver goes mad and tries to install very old package versions. One solution is to install with pip, another is to find the package causing the problem and add uv install manually at the beginning with an appropriate lower bound.

@sunnyosun
Copy link
Member

No I feel almost the entire curator file should be just another repo, it feels so heavy to run all the CI with tiledbsoma and spatialdata and many more in the future if I want to make an unrelated change.

I get that Alex is a huge fan of monorepo, but if we continue supporting many 100 more formats, do we really want to run all these CI in every PR push?

Signed-off-by: zethson <[email protected]>
@falexwolf falexwolf changed the title ✨ Add SpatialDataCurator ✨ Add SpatialDataCurator Dec 18, 2024
@falexwolf
Copy link
Member

Sorry for being slow here.

My view is that we can't keep adding jobs to the matrix but we should switch to a different model of managing CI. If we keep adding jobs @sunnyosun's concern up there is 100% justified, so we have to something better.

Cross-pasting what I suggested on Slack las Monday:

I think we should start to introduce conditional working directories like we do in laminhub (e.g. here) so that we don’t run all tests on every little change to every part of the codebase which will ultimately be huge.
So, the spatialdata related code could be in its own private submodule and the job only fires when a change is made to the corresponding folder.

I'll now look through the changes to make suggestions on organizing the code.

@falexwolf
Copy link
Member

I think we also should eliminate lamin_spatial which in my mind should never have seen the light of day. 😅

@falexwolf
Copy link
Member

Folders

curators/
   __init__.py. # this has the whole slew of existing curators
   _spatialdata/  # this has what's currently in lamin-spatial and triggers the corresponding tests that are also in lamin-spatial
       __init__.py

Docs:

curators/
- DataFrameCurator
- AnnDataCurator
- SpatialDataCurator

New suggested idiom:

ln.curators.SpatialDataCurator(...)

instead of

ln.Curator.from_spatialdata(...)

Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Copy link

github-actions bot commented Dec 19, 2024

@github-actions github-actions bot temporarily deployed to pull request December 19, 2024 17:08 Inactive
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
@github-actions github-actions bot temporarily deployed to pull request December 19, 2024 18:24 Inactive
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
@Zethson Zethson marked this pull request as ready for review December 19, 2024 21:01
@Zethson
Copy link
Member Author

Zethson commented Dec 19, 2024

Followup in #2303

With the next release we can yank lamin-spatial from pypi and I'll remove the SpatialDataCurator code there.

@github-actions github-actions bot temporarily deployed to pull request December 19, 2024 21:05 Inactive
@Zethson Zethson merged commit 3ce9edd into main Dec 19, 2024
15 checks passed
@Zethson Zethson deleted the feature/spatialdata_curator branch December 19, 2024 21:07
@falexwolf
Copy link
Member

With the next release we can yank lamin-spatial from pypi and I'll remove the SpatialDataCurator code there.

Great! :D -- One PyPI package less!

@sunnyosun
Copy link
Member

This PR broke cellxgene-lamin because the import filename was changed from ._curate to curators...

@Zethson
Copy link
Member Author

Zethson commented Dec 20, 2024

This PR broke cellxgene-lamin because the import filename was changed from ._curate to curators...

Sorry - I'll fix that

@falexwolf
Copy link
Member

I think the cellxgene-lamin package should also disappear and be part of lamindb just like spatial curator

@sunnyosun
Copy link
Member

I'm already doing it!

@falexwolf
Copy link
Member

☺️

@sunnyosun
Copy link
Member

Also this fix is now gone since this PR: #2299

It's probably due to you deleting the _curator.py file before rebasing...

@Zethson
Copy link
Member Author

Zethson commented Dec 20, 2024

Sorryyyy also on me then, yes..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants