-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop control characters from output #140
Drop control characters from output #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left a couple of comments.
|
||
func (l line) SafeText() string { | ||
return strings.Map(func(r rune) rune { | ||
if unicode.IsControl(r) && !unicode.IsSpace(r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some characters considered whitespace by unicode.IsSpace
are illegal in the XML spec (for example \v
and \f
). And unicode.IsControl
doesn't exclude other illegal character ranges either.
Let's use the exact character range defined by the spec in this function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. Unfortunately xml.Escpe
does not handle whitespaces as I'd expect and not translate \t
into spaces
I would caution you on just doping the invalid Unicode characters. I ran into this problem with test that are specifically for code that uses some of the Unicode characters that are invalid in the cdata section. Removing them will alter the output from the test results. I found a way to replace the Unicode character with its quoted escape code. |
@TomTardigradeSEL thanks! Yeah, I agree we probably shouldn't just drop the illegal characters. From what I've seen I don't think we can't just escape these illegal characters, unless I'm missing something? I had a look at the As for the ANSI escape sequences, removing just the illegal character(s) will still leave some things behind. Detecting and removing these sequences is something we should do, but not in this PR. |
Hey, thanks for feedback! |
I'm not sure how to handle
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are failing because we're escaping too much. One of the reasons I switched to using CDATA
in the generated XML is to avoid having to escape so many characters, which made the output hard to read.
I think your initial approach using strings.Map
was good, we just need to make sure to return 0xfffd
for every rune that falls outside of the character range as defined in the XML 1.0 standard, like what is done in this part of xml.EscapeText
.
7eac4f5
to
7309f18
Compare
@stefan-zh I updated PR |
junit/junit_test.go
Outdated
Name: "TestEscapeOutput", | ||
Classname: "package/name", | ||
Time: "0.000", | ||
SystemOut: &Output{Data: "�\v\f \t\\"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\v
and \f
are also considered illegal characters, they should also map to \uFFFD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thanks! :) |
Fixes: #138