Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to filter a Biom file in the Microbiome Explorer environment #8

Open
Sandra-ctrl opened this issue Jul 25, 2021 · 7 comments
Open

Comments

@Sandra-ctrl
Copy link

Sandra-ctrl commented Jul 25, 2021

Hi,
i used the microbiome explorer to generate a relative abundance for my data and got this Bar graph. (please find attached)
April 2018 RA top 10

i am looking for a way to filter the "no match" from the bar graph so i can understand the data better without the "no match" category. meanwhile in my Biom file, there's no word as "no match"... just words like "unassignable"
I think the Microbiome Explorer calls a sample "no_match" if the feature taxonomy entry is blank for the level in question. I aggregated at the level of genus (X6), so any sample that is blank for the assignment at the genus level will be called "no_match", even though it's not listed as unassignable.

I tried to fix this in the Data Input part of Microbiome Explorer by going to the Features tab and using Annotate Blank Values (choose Roll down taxonomy as the method and then click Assign) but it did not take the "no match" category away, i also choose "unknown" as the method and it only changed the "no match" to "unknown".

Please i would appreciate if you can assist me in this regard.

Thanks

Sandra Nnadi

@zoecastillo
Copy link
Owner

Hi Sandra,

I am currently out of office traveling without access to my laptop. I have encountered the same issue before as well ( I think) and would be happy to help as much as possible. Unfortunately, it will take around 2 more weeks before I can look into this!

@Sandra-ctrl
Copy link
Author

Hi
Great news, look forward to your assistance when you get back
have a happy vacation
Thanks

Sandra

@zoecastillo
Copy link
Owner

Hi,

Turns out this is a different issue.

You are correct: no_match is assigned to any features that are blank at the chosen taxonomy level.
Theoretically, using the "Roll down taxonomy" method should fix this. But only if the feature is not entirely unassigned, i.e. not even at Kingdom level.
You could take a look at the abundance at the highest taxonomy level to see if there is still the same portion of no_match that is shown when doing genus level with rolled down taxonomy. Or just take a look at the feature table to see if there is a significant amount of OTUs that are unassigned.

If that is not the case, would you be able to share the structure of the feature data? A screenshot of the feature table would work or a list of the columns in order.

If all you need is to remove the no_match data from the plot, you could just click on "no_match" in the legend which hides it from view. But, of course that is just a visual adjustment.

Hope this helps,

Janina

@Sandra-ctrl
Copy link
Author

Hi,
Thanks for your reply
been a month (went for a vacation and resumed lectures)
I tried looking at the abundance at Kingdom, Phylum and Class level and discovered the "no match" appears at class level (please find pictures attached).
Also find attached a screenshot of the feature table, out of 2548 OTUs, unassignable occurred 175 times and unidentified occurred 1345 times, i wonder what could influence the "no match" in this regard.
Thanks for your usual cooperation.

Sandra

April 2018 RA at Kingdom level

April 2018 RA at phylum level

April 2018 RA at class level

Apr 2018 feature snapshot

@zoecastillo
Copy link
Owner

Hi,

The feature table is not processed correctly, because values of the confidence column are included in the abundance analysis.

How do you split the feature data?

I think what happens here is that the value of the confidence column is shifted to the left as far as possible. So for say OTU40, it ends up in the phylum column whereas for OTU34 it ends up in the family column. I suspect this causes the problem you see of the roll down not working correctly and the significant amount of no_match values in the data.

There is a function in Microbiome Explorer that can split the data for you, but I just found a small bug in that one causing a crash of the app. I will correct this and update the repo.
However, if you take out the confidence column before splitting the feature table, your approach should be fine as well.

Hope this helps!

Janina

@Sandra-ctrl
Copy link
Author

Hi Janina,
hope research is going well
from your last response, were you able to fix the bug?
I think manually removing the rows that have "unassignable" may help to filter the data
I have searched for online resources on how to filter rows using R but only found codes for filtering columns
if i am able to get that step done, the data may be more informative
Also, if the function in ME that can split the data is fixed, i could use that too
Thanks

Sandra

@zoecastillo
Copy link
Owner

Hi Sandra,

I had already made an update last month, but I just bumped the version number again to make sure it is bigger than the latest bioconductor release and also added another change to ensure it works with non-characters as well.
You would need to install directly from github:

BiocManager::install("zoecastillo/microbiomeExplorer", 
                         ref = "master")

Once you import your data, you should see an option on the Feature tab to select a column to split your taxonomy data from. Once that is done, you have to click save to update the changes for analysis.

If you are working with MRExperiments in R, you could try to filter out the unassignable features this way:
mD[fData(mD)[["Taxon"]] != "Unassignable"]
(assuming that your MRExperiment is called mD and the column of the feature data which holds the "Unassignable" is called "Taxon")

I hope this helps! Let me know if you run into issues with the new version.

Best,
Janina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants