NCBIgff2chrom -> gxf2chrom #66

alejandrogzi · 2024-03-04T04:13:23Z

I developed this: gxf2chrom, while thinking about your NCBIgff2chrom.py script!

In short, this is a CLI-tool written in Rust that does basically the same exact thing your script does with some additional features:

Can accept GTF files and GTF.gz files
Ensures that no proteins with length < 1 are written to the output file
Instead of sending the output to stdout, directly writes everything to a file the user specifies
Supports "custom" GTF/GFF and not only GENCODE/Ensembl. Instead of only looking for "protein_id", with --feature one can specify the name of the attribute they want to parse (e,g, -f proteinName)

Here is a quick benchmark:

Format	odp	gxf2chrom	fold
gff3	4.30 +/- 0.03	1.88 +/- 0.01	x2.29
gff3.gz	6.27 +/- 0.18	2.05 +/- 0.01	x3.06
gtf	---	1.83 +/- 0.01	---
gtf.gz	---	1.94 +/- 0.01	---

The main attribute of this new tool is its speed (which can may be noticed at large scale). On top of that, the good thing about Rust is that does not depend on external packages, so the only thing needed is Rust itself and that is all. This makes it easier to attach to any pipeline/tool/etc through any configuration step/script.

Please let me know what you think!

Best,
Alejandro

The text was updated successfully, but these errors were encountered:

conchoecia · 2024-03-07T13:57:28Z

Hi @alejandrogzi - I appreciate that you took the time to do this! I have been meaning to learn some Rust, so I enjoyed going through the code to see how you structured it. The addition of the parser for custom fields is also useful given the huge diversity in how people structure their GTF/GFF files.

At the moment I do not plan to incorporate Rust dependencies into this software, but I would be happy to point to your repository in the documentation.

alejandrogzi · 2024-03-07T16:27:30Z

@conchoecia,

Definitely! I also would be more than happy if it's included in the docs, made this thinking on contribute in some way.

Let me know if you have any questions regarding gxf2chrom or additional ideas!

Best,
Alejandro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCBIgff2chrom -> gxf2chrom #66

NCBIgff2chrom -> gxf2chrom #66

alejandrogzi commented Mar 4, 2024 •

edited

Loading

conchoecia commented Mar 7, 2024

alejandrogzi commented Mar 7, 2024

NCBIgff2chrom -> gxf2chrom #66

NCBIgff2chrom -> gxf2chrom #66

Comments

alejandrogzi commented Mar 4, 2024 • edited Loading

conchoecia commented Mar 7, 2024

alejandrogzi commented Mar 7, 2024

alejandrogzi commented Mar 4, 2024 •

edited

Loading