You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently all output appears to be escaped by org.apache.commons.lang.StringEscapeUtils::escapeJava, which appears to be designed to escape strings for usage in java code (i.e. strings such escaped could be copy-pasted directly into a .java file). Apparently this includes a encoding of non-ascii characters into a \u[codepoint] format. The CSV reader of our choice did not expect this. I propose adding the option to not escape the output in this way. If no double quotes or line breaks appear in the original string, this is perfectly fine when dealing with CSV files.
Additionally, all instances of PrintStream are new-ed using a single-argument constructor, a such constructed PrintStream apparently reduces all non-ascii characters to question marks (?). To allow for utf8 output, these could simply be replaced by three parameter constructors by following substitution:
new PrinstStream(param) -> new PrintStream(param, false, StandardCharsets.UTF_8.name());
where false is the autoflush setting which is false in the single-parameter constructor.
It would be even better to allow type-specific escapes (in the case of CSV: escape double quotes by doubling them), but this could be a separate effort.
I would be happy to create a merge-request.
The text was updated successfully, but these errors were encountered:
Currently all output appears to be escaped by org.apache.commons.lang.StringEscapeUtils::escapeJava, which appears to be designed to escape strings for usage in java code (i.e. strings such escaped could be copy-pasted directly into a .java file). Apparently this includes a encoding of non-ascii characters into a \u[codepoint] format. The CSV reader of our choice did not expect this. I propose adding the option to not escape the output in this way. If no double quotes or line breaks appear in the original string, this is perfectly fine when dealing with CSV files.
Additionally, all instances of PrintStream are new-ed using a single-argument constructor, a such constructed PrintStream apparently reduces all non-ascii characters to question marks (?). To allow for utf8 output, these could simply be replaced by three parameter constructors by following substitution:
new PrinstStream(param) -> new PrintStream(param, false, StandardCharsets.UTF_8.name());
where false is the autoflush setting which is false in the single-parameter constructor.
It would be even better to allow type-specific escapes (in the case of CSV: escape double quotes by doubling them), but this could be a separate effort.
I would be happy to create a merge-request.
The text was updated successfully, but these errors were encountered: