Possible redundant rowname assignment in tximport.r #1365

pmoris · 2024-08-29T10:14:43Z

I was running the tximport.r code manually while trying to wrap my head around how it works and how it relates to the different types of salmon quant files and gene_count files generated by the pipeline (see https://nfcore.slack.com/archives/CE8SSJV3N/p1724863202019919?thread_ts=1686156473.934039&cid=CE8SSJV3N). While doing so, I stumbled upon the following line of code that seems to be doing nothing?

rnaseq/bin/tximport.r

Line 114 in b89fac3

rownames(gene_info) <- gene_info[["tx"]]

As far as I can tell, gene_info is derived from transcript_info which contains the columns "gene_id", "gene_name", but not tx. So, at least when I was trying to reproduce the steps manually, the assignment did nothing (NULL was being assigned to the row names, changing nothing).

Related, the line just above it

rnaseq/bin/tximport.r

Line 113 in b89fac3

    
           gene_info <- transcript_info$gene[match(rownames(gi[[1]]), transcript_info$gene[["gene_id"]]),]

also appears to be redundant. I believe gene_info just ends up holding the contents of transcript_info$gene. Or are there cases where the genes resulting from summarizeToGene can differ from those in transcript_info (which are ultimately derived from the tx2gene file)?

Caveat to all of the above: since I'm actively trying to understand the code and tximport methodology, I might be completely off the mark here ;)

If anyone can confirm, I'd be happy to make a PR for this.

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2024-08-29T10:49:41Z

Note that all of this logic is now moved to an nf-core module (see the dev branch). But the lines you highlight are still there, so I'll explain:

gene_info <- transcript_info\$gene[match(rownames(gi[[1]]), transcript_info\$gene[["gene_id"]]),]

This fetches the gene-level metadata table from the transcript_info list object, and also sorts it to match the gene order in the gi object. The gene metadata table is a uniquified subset of the gene-wise columns from the input metadata table.

So that line is not redundant.

But yes, this line probably is, maybe a relic of prior development. It's not doing any harm, but feel free to PR on nf-core/modules, and once it's in we can bump the module here.

pmoris · 2024-08-29T13:21:27Z

Cheers, thanks for the explainer!

pmoris closed this as completed Aug 29, 2024

pmoris mentioned this issue Aug 29, 2024

Remove redundant rowname assignment nf-core/modules#6466

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible redundant rowname assignment in tximport.r #1365

Possible redundant rowname assignment in tximport.r #1365

pmoris commented Aug 29, 2024

pinin4fjords commented Aug 29, 2024 •

edited

Loading

pmoris commented Aug 29, 2024

Possible redundant rowname assignment in tximport.r #1365

Possible redundant rowname assignment in tximport.r #1365

Comments

pmoris commented Aug 29, 2024

pinin4fjords commented Aug 29, 2024 • edited Loading

pmoris commented Aug 29, 2024

pinin4fjords commented Aug 29, 2024 •

edited

Loading