Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCBIgff2chrom -> gxf2chrom #66

Open
alejandrogzi opened this issue Mar 4, 2024 · 2 comments
Open

NCBIgff2chrom -> gxf2chrom #66

alejandrogzi opened this issue Mar 4, 2024 · 2 comments

Comments

@alejandrogzi
Copy link

alejandrogzi commented Mar 4, 2024

Hi @conchoecia!

I developed this: gxf2chrom, while thinking about your NCBIgff2chrom.py script!

In short, this is a CLI-tool written in Rust that does basically the same exact thing your script does with some additional features:

  • Can accept GTF files and GTF.gz files
  • Ensures that no proteins with length < 1 are written to the output file
  • Instead of sending the output to stdout, directly writes everything to a file the user specifies
  • Supports "custom" GTF/GFF and not only GENCODE/Ensembl. Instead of only looking for "protein_id", with --feature one can specify the name of the attribute they want to parse (e,g, -f proteinName)

Here is a quick benchmark:

Format odp gxf2chrom fold
gff3 4.30 +/- 0.03 1.88 +/- 0.01 x2.29
gff3.gz 6.27 +/- 0.18 2.05 +/- 0.01 x3.06
gtf --- 1.83 +/- 0.01 ---
gtf.gz --- 1.94 +/- 0.01 ---

The main attribute of this new tool is its speed (which can may be noticed at large scale). On top of that, the good thing about Rust is that does not depend on external packages, so the only thing needed is Rust itself and that is all. This makes it easier to attach to any pipeline/tool/etc through any configuration step/script.

Please let me know what you think!

Best,
Alejandro

@conchoecia
Copy link
Owner

Hi @alejandrogzi - I appreciate that you took the time to do this! I have been meaning to learn some Rust, so I enjoyed going through the code to see how you structured it. The addition of the parser for custom fields is also useful given the huge diversity in how people structure their GTF/GFF files.

At the moment I do not plan to incorporate Rust dependencies into this software, but I would be happy to point to your repository in the documentation.

@alejandrogzi
Copy link
Author

@conchoecia,

Definitely! I also would be more than happy if it's included in the docs, made this thinking on contribute in some way.

Let me know if you have any questions regarding gxf2chrom or additional ideas!

Best,
Alejandro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants