Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods section for metadata and ontologies #65

Merged
merged 9 commits into from
Mar 7, 2024

Conversation

allyhawkins
Copy link
Member

Closes #50
Stacked on #61

Here I'm adding a section to the methods on metadata, including ontology assignments. I added this section below the data processing and generation and above any of processing related methods.

I mentioned that metadata is standardized as much as possible across projects and list out the ontology terms that were assigned. I provided any relevant details on how exactly we assigned terms, but let me know if we should include more detail?

I'm requesting @jaclyn-taroni since she was the most involved other than me in the ontology assignment and metadata cleaning.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level of detail for the individual fields looks like a good start. Returning some comments.

Comment on lines 15 to 22
Additionally, ontology term identifiers were assigned to the following metadata categories for each sample:
- Age: Ontology term obtained from HsapDv [@url:https://www.ebi.ac.uk/ols4/ontologies/hsapdv]. For ages 0-11 months, the HsapDv for age in months was used. For ages 12 months and greater, the HsapDv for age in years was used.
- Sex: Ontology term obtained from PATO, either male (PATO:0000384), female (PATO:0000383), or unknown [@url:https://www.ebi.ac.uk/ols4/ontologies/pato].
- Organism: NCBI taxonomy term for organism. All current samples available on the Portal are from Homo sapiens or NCBITaxon:9606 [@url:https://www.ncbi.nlm.nih.gov/taxonomy].
- Diagnosis: The most appropriate MONDO term based on the provided diagnosis [@url:https://www.ebi.ac.uk/ols4/ontologies/mondo]. An exact match was identified for most samples, but in a handful of cases, the most closely related term was used.
- Tissue of origin: The most appropriate UBERON term based on the provided tissue of origin [@url:https://www.ebi.ac.uk/ols4/ontologies/uberon]. An exact match was identified for most samples, but in a handful of cases, the most closely related term was used.
- Ethnicity (if applicable): If the submitter provided ethnicity, the associated Hancestro term [@url:https://www.ebi.ac.uk/ols4/ontologies/hancestro]. If ethnicity is unavailable, `unknown` is used.
The human-readable metadata and the associated ontology term identifiers are available on the Portal for all samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to bioRxiv first, so we don't have to worry about "journal-ready" formatting yet. Can we ignore any main display item limits and make this a table instead? I think that will be much more scannable/digestible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't use a table, this should still follow one sentence per line; there should be no impact on the bullet formatting.


Submitters were required to submit the age, sex, organism, diagnosis, subdiagnosis (if applicable), and tissue of origin for each sample.
The submitted metadata was standardized across projects, including converting all ages to years, removing abbreviations used in diagnosis, subdiagnosis, or tissue of origin, and using standard terms across projects as much as possible.
Additionally, ontology term identifiers were assigned to the following metadata categories for each sample:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain (and cite!) why, i.e., the CELLxGENE schema.

### Metadata

Submitters were required to submit the age, sex, organism, diagnosis, subdiagnosis (if applicable), and tissue of origin for each sample.
The submitted metadata was standardized across projects, including converting all ages to years, removing abbreviations used in diagnosis, subdiagnosis, or tissue of origin, and using standard terms across projects as much as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say more about what terms tended to get standardized (e.g., disease timing). We should be able to figure that out from metadata cleaning PRs.

Copy link
Member Author

@allyhawkins allyhawkins Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean which of the terms get standardized, like Diagnosis, subdiagnosis, disease timing, and tissue type. Or do you mean mention specific examples - e.g., all samples collected at diagnosis were labeled with Initial diagnosis.

@allyhawkins
Copy link
Member Author

@jaclyn-taroni I updated this to include a table rather than the bulleted list. I added some column titles for the table, but I'm 50/50 on them and think we could also just do without.
I also added a reference to the CZI schema and mentioned the specific metadata terms that were standardized across projects and provided an example.

This should be ready for another look.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning some comments with the expectation that my suggestions will be taken. I don't need to see this again 👍🏻

content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
Base automatically changed from allyhawkins/cell-type-methods to main March 7, 2024 14:59
Copy link

github-actions bot commented Mar 7, 2024

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit f0e0802.

Manuscript build

Copy link

github-actions bot commented Mar 7, 2024

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 4c0e41a.

Manuscript build

Copy link

github-actions bot commented Mar 7, 2024

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit be094de.

Manuscript build

@allyhawkins allyhawkins merged commit bb3732d into main Mar 7, 2024
1 check passed
@allyhawkins allyhawkins deleted the allyhawkins/ontology-methods branch March 7, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ontology methods
2 participants