-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions on: Keep ensembl gene ID at read-in, automate non-unique gene renaming, add default rownames(), accept .h5 format :) #4
Comments
My two cents: Also, wouldn't it be faster to do something like :
I have another question, are we sure that we can always find only 1 duplicate for each gene? if there are more? Maybe it could be better to generalize this with an increasing number from 1 to n, where n is the number of duplicates per each duplicated rowname. About the .h5 matrix, it would be nice to have the possibility to load it instead of the csv matrix. Dario |
FYI: base R
...and yes, I'd be in favor of 1) keeping both gene symbols and ensembl IDs as |
Hi Estella! @estellad BiocManager::install("waldronlab/TENxIO") |
@drighelli |
Note. By default, I'm reading in the first column in the |
Hi Marcel @LiNk-NY , I tried to install
The imported Visium object now has gene Ensembl ID as the default rownames and gene Symbol stored in rowData(). 👍 Here are still some action items 😄 :
Thank you! |
Hi Estella, @estellad
This is actually implemented here https://github.com/waldronlab/TENxIO/blob/7b86cc13a75a3bebc4def176ac5ca334bcadf9a9/R/TENxH5-class.R#L338-L343
Do you have a reproducible example? I have tested this with the examples in the package. See : Lines 175 to 180 in 25324bb
Thanks! -Marcel |
Hi Marcel,
Thank you for the package! There has been a long-lasting issue with reading in Visium /outs, where there are non-unique gene names. There are 18085 genes, but 18082 unique gene names. Some of the downstream analyses do require unique gene names.
In Seurat, this was solved by adding a ".1" to one of the duplicates in the following genes "HSPA14", "TBCE", "TMSB15B". They are distinguishable based on their Ensembl IDs, which were preserved during the read-in process.
Previously with
SpatialExperimen::read10xVisium()
, where the Ensembl ID was preserved, I had a workaround for this issue by mimicking Seurat's approach:With
VisiumIO::TENxVisium()
, however, I noticed that Ensembl ID is lost or not retrieved from the gene count matrix folder during the read-in.Therefore, I cannot rename them based on Ensembl ID anymore. Could you please automate this renaming process internally, like Seurat, so that users do not have to do this manually every time? Given these 3 genes are not highly expressed, I excluded them in my analysis for now.
In addition, a small inconvenience is the
rownames()
of the object is left as NULL, but I would prefer the default to have something, either Ensembl ID or Symbol.I also noticed that
VisiumIO::TENxVisium()
cannot take a gene count .h5 file, as an alternative to the gene count matrix folder. I find this alternative helpful. I made it flexible in my packageSpatialExperimentIO::readXeniumSXE()
. Would you consider implementing this alternative again, asSpatialExperimen::read10xVisium()
did before?Also looping in @drighelli and @HelenaLC
Thank you for your time!
Sincerely,
Estella
The text was updated successfully, but these errors were encountered: