Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell-by-isoform matrix #115

Open
sameer-aryal opened this issue Oct 25, 2023 · 2 comments
Open

Cell-by-isoform matrix #115

sameer-aryal opened this issue Oct 25, 2023 · 2 comments

Comments

@sameer-aryal
Copy link

sameer-aryal commented Oct 25, 2023

I wanted to ask if it was possible to generate a barcode-by-isoform count matrix (instead of gene-level counts) using simpleaf; thanks very much.

@rob-p
Copy link
Contributor

rob-p commented Oct 25, 2023

Hi @sameer-aryal,

Yes, this is possible. To do this, you'd want to replace the tg2 or t2g_3col file, which is a mapping from transcripts to gene (or transcript to gene + splicing-status) with a corresponding file that maps transcripts to themselves. You can, of course, decide how you want to handle the splicing status in this case (e.g. consider each merged intronic span as a separate transcript, group them all together into a single intronic supertranscript for the gene, etc.).

However, the big caveat here is that while this is easy to to technically, current 3' tag-based protocols are likely not going to be very good at giving you isoform level information reliably. This is because they are sequencing in a strongly-biased way from the 3' end of the transcripts, so, at most, you may be able to distinguish families of transcripts that share different terminal exons. Likewise, the per-cell depth of coverage is very low, so there is not much information to help with resolving ambiguous reads (I'm guessing you'd want to use a UMI resolution method in this case that turns on the EM to help avoid losing too many reads to multimapping).

Anyway, we're happy to help you out if you want to give this a try. I'm pinging @DongzeHE so he can chime in here as well if he wants.

Best,
Rob

@sameer-aryal
Copy link
Author

Dear @rob-p,

Thanks very much for the guidance, as well for creating and maintaining this excellent tool.

…so, at most, you may be able to distinguish families of transcripts that share different terminal exons.

This is exactly the case I wish to use this approach for.

you'd want to replace the tg2 or t2g_3col file, which is a mapping from transcripts to gene (or transcript to gene + splicing-status) with a corresponding file that maps transcripts to themselves.

I will give this a try; thanks very much again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants