-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
augur curate passthru
can add double quotations
#1312
Comments
The escaped double quotes ( |
Hmm, yes. I guess the issue comes up when running through
|
OH! Yeah, that's a misconfiguration of the parser/producer then. |
Shows the undesired behavior with internal quotes described in <#1312> comes from the `write_records_to_tsv` function.
According to TSV specs,¹ there are no restrictions on special characters other than tabs are not allowed in a field. This is different from the CSV specs,² which require double quotes around fields that contain special characters. Since this function only produces TSVs, follow the TSV specs and stop adding quotes. Resolves <#1312> ¹ <https://www.iana.org/assignments/media-types/text/tab-separated-values> ² <https://datatracker.ietf.org/doc/html/rfc4180#page-2>
Shows the undesired behavior with internal quotes described in <#1312> comes from the `write_records_to_tsv` function.
According to TSV specs,¹ there are no restrictions on special characters other than tabs are not allowed in a field. This is different from the CSV specs,² which require double quotes around fields that contain special characters. Since this function only produces TSVs, follow the TSV specs and stop adding quotes. Resolves <#1312> ¹ <https://www.iana.org/assignments/media-types/text/tab-separated-values> ² <https://datatracker.ietf.org/doc/html/rfc4180#page-2>
I think we want to revisit the change here, i.e. 915672e. When I said:
What I meant was, "we should fix the configuration of the parser/producer to not endlessly add additional quotes, but we should still use RFC-compliant CSV-like quoting in our TSVs rather than no quoting". With Relatedly: #1563 (comment) |
For example: $ augur curate passthru --output-metadata - <<<'{"x":"\u0009"}'
x
Traceback (most recent call last):
File "/home/tom/nextstrain/augur/augur/__init__.py", line 69, in run
return args.__command__.run(args)
File "/home/tom/nextstrain/augur/augur/curate/__init__.py", line 243, in run
write_records_to_tsv(validated_output_records, args.output_metadata)
File "/home/tom/nextstrain/augur/augur/io/metadata.py", line 556, in write_records_to_tsv
tsv_writer.writerow(first_record)
File "/home/tom/.conda/envs/augur/lib/python3.9/csv.py", line 154, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
_csv.Error: need to escape, but no escapechar set
An error occurred (see above) that has not been properly handled by Augur.
To report this, please open a new issue including the original command and the error above:
<https://github.com/nextstrain/augur/issues/new/choose>
|
Ah gotcha. I was wondering why |
Resolves <#1312> We are expecting the CSV-like double quoting when there are internal quotes. If the field value is already correctly double quoted, then there should not be any additional quotes.
Resolves <#1312> We are expecting the CSV-like double quoting when there are internal quotes. If the field value is already correctly double quoted, then there should not be any additional quotes.
Resolves <#1312> We are expecting the CSV-like double quoting when there are internal quotes. If the field value is already correctly double quoted, then there should not be any additional quotes.
Current Behavior
When using the
--metadata
input, field values with double quotes in them can result in additional double quotes in the output.Since the metadata is read through csv.DictReader, we can probably tweak this behavior through the csv.Dialect attributes
augur/augur/io/metadata.py
Line 183 in 961cb00
Additional context
This was first observed in nextstrain/mpox#179
The text was updated successfully, but these errors were encountered: