Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attributes_to_columns() raises ValueError on some valid GFFs #10

Open
joelb123 opened this issue Oct 7, 2020 · 0 comments
Open

attributes_to_columns() raises ValueError on some valid GFFs #10

joelb123 opened this issue Oct 7, 2020 · 0 comments

Comments

@joelb123
Copy link

joelb123 commented Oct 7, 2020

  • pandasgff version: 1.2.0
  • Python version: 3.8
  • Operating System: linux

Description

I got the following failure when I read
this GFF file:

File "gffpandas/gffpandas.py", line 133, in
lambda attributes: dict([key_value_pair.split('=') for
ValueError: dictionary update sequence element #6 has length 1; 2 is required

This lambda is one of three in gffpandas. In my opinion, all 3 should be
refactored into map() or stand-alone functions. Lambdas are hard to read
and even harder to make defensive.

This failure is because a handful of features in this file end with a ";". Yes, this is a somewhat rare thing, but I also see it on some NCBI gffs. Like many ugly things in GFF-land, I don't know if it's strictly valid, but plenty of GFFs that pass GFF validators (e.g. the one in genometools) have them. There's an additional (even more rare) problem that somebody can put an equal sign into the feature string.

Here's a function that can be substituted for the lambda expression that fixes the problem:

def _split_atts(atts):
"""Split a feature string into attributes."""
splits_list = [a.split("=") for a in atts.split(";") if "=" in a]
return {l[0]:"=".join(l[1:]) for l in splits_list}

@joelb123 joelb123 mentioned this issue Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant