You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I developed this: gxf2chrom, while thinking about your NCBIgff2chrom.py script!
In short, this is a CLI-tool written in Rust that does basically the same exact thing your script does with some additional features:
Can accept GTF files and GTF.gz files
Ensures that no proteins with length < 1 are written to the output file
Instead of sending the output to stdout, directly writes everything to a file the user specifies
Supports "custom" GTF/GFF and not only GENCODE/Ensembl. Instead of only looking for "protein_id", with --feature one can specify the name of the attribute they want to parse (e,g, -f proteinName)
Here is a quick benchmark:
Format
odp
gxf2chrom
fold
gff3
4.30 +/- 0.03
1.88 +/- 0.01
x2.29
gff3.gz
6.27 +/- 0.18
2.05 +/- 0.01
x3.06
gtf
---
1.83 +/- 0.01
---
gtf.gz
---
1.94 +/- 0.01
---
The main attribute of this new tool is its speed (which can may be noticed at large scale). On top of that, the good thing about Rust is that does not depend on external packages, so the only thing needed is Rust itself and that is all. This makes it easier to attach to any pipeline/tool/etc through any configuration step/script.
Please let me know what you think!
Best,
Alejandro
The text was updated successfully, but these errors were encountered:
Hi @alejandrogzi - I appreciate that you took the time to do this! I have been meaning to learn some Rust, so I enjoyed going through the code to see how you structured it. The addition of the parser for custom fields is also useful given the huge diversity in how people structure their GTF/GFF files.
At the moment I do not plan to incorporate Rust dependencies into this software, but I would be happy to point to your repository in the documentation.
Hi @conchoecia!
I developed this: gxf2chrom, while thinking about your NCBIgff2chrom.py script!
In short, this is a CLI-tool written in Rust that does basically the same exact thing your script does with some additional features:
--feature
one can specify the name of the attribute they want to parse (e,g,-f proteinName
)Here is a quick benchmark:
The main attribute of this new tool is its speed (which can may be noticed at large scale). On top of that, the good thing about Rust is that does not depend on external packages, so the only thing needed is Rust itself and that is all. This makes it easier to attach to any pipeline/tool/etc through any configuration step/script.
Please let me know what you think!
Best,
Alejandro
The text was updated successfully, but these errors were encountered: