Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop control characters from output #140

Merged

Conversation

janisz
Copy link
Contributor

@janisz janisz commented Aug 4, 2022

Fixes: #138

Copy link
Owner

@jstemmer jstemmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left a couple of comments.

parser/gotest/internal/collector/collector.go Outdated Show resolved Hide resolved

func (l line) SafeText() string {
return strings.Map(func(r rune) rune {
if unicode.IsControl(r) && !unicode.IsSpace(r) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some characters considered whitespace by unicode.IsSpace are illegal in the XML spec (for example \v and \f). And unicode.IsControl doesn't exclude other illegal character ranges either.

Let's use the exact character range defined by the spec in this function instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Unfortunately xml.Escpe does not handle whitespaces as I'd expect and not translate \t into spaces

testdata/src/whitespace/whitespace_test.go Outdated Show resolved Hide resolved
@TomTardigradeSEL
Copy link

I would caution you on just doping the invalid Unicode characters. I ran into this problem with test that are specifically for code that uses some of the Unicode characters that are invalid in the cdata section. Removing them will alter the output from the test results.

I found a way to replace the Unicode character with its quoted escape code.
For the case of ANSI color codes, I also found a way to strip the entire color code from the output.
By first striping the ANSI code, then reformatting the output for the Unicode characters, I have been able to get junit reports
that are valid for XML while retaining any Unicode that is not allowed as their folded equivalent.

@jstemmer
Copy link
Owner

@TomTardigradeSEL thanks! Yeah, I agree we probably shouldn't just drop the illegal characters. From what I've seen I don't think we can't just escape these illegal characters, unless I'm missing something?

I had a look at the encoding/xml package in the standard library, they replace illegal characters with the unicode replacement character 0xfffd, which renders as �. Let's do the same here when we encounter illegal characters. At this point it's clear from the output that there were some characters that couldn't be displayed, and we can leave it up to the test author to do something about it (for example, change how they print their data).

As for the ANSI escape sequences, removing just the illegal character(s) will still leave some things behind. Detecting and removing these sequences is something we should do, but not in this PR.

@janisz janisz requested a review from jstemmer August 16, 2022 09:51
@janisz
Copy link
Contributor Author

janisz commented Aug 16, 2022

Hey, thanks for feedback!
I moved code to formatOutput and used xml.EscapeText

@janisz
Copy link
Contributor Author

janisz commented Aug 23, 2022

I'm not sure how to handle

=== RUN   TestRun/035-whitespace.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // 983 identical bytes
          	"assname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<system-out><![CDA",
          	"TA[    whitespace_test.go:31: no-tab\n    whitespace_test.go:32: ",
        - 	`	`,
        + 	"&#x9;",
          	"one-tab\n    whitespace_test.go:33: ",
        - 	"\t\ttwo-tab\nno-tab\n\tone-tab\n\t\t",
        + 	"&#x9;&#x9;two-tab\nno-tab\n&#x9;one-tab\n&#x9;&#x9;",
          	"two-tab]]></system-out>\n\t\t</testcase>\n\t\t<testcase name=\"TestWith",
          	"NewlinesFlat\" classname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<s",
          	... // 1118 identical bytes
          	"assname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<system-out><![CDA",
          	"TA[    whitespace_test.go:31: no-tab\n    whitespace_test.go:32: ",
        - 	`	`,
        + 	"&#x9;",
          	"one-tab\n    whitespace_test.go:33: ",
        - 	"\t\ttwo-tab\nno-tab\n\tone-tab\n\t\t",
        + 	"&#x9;&#x9;two-tab\nno-tab\n&#x9;one-tab\n&#x9;&#x9;",
          	"two-tab]]></system-out>\n\t\t</testcase>\n\t\t<testcase name=\"TestSubT",
          	`ests/TestWithNewlinesFlat" classname="package/whitespace" time="`,
          	... // 3[43](https://github.com/jstemmer/go-junit-report/runs/7925604213?check_suite_focus=true#step:5:44) identical bytes
          }, "")
=== RUN   TestRun/036-benchfail.txt
=== RUN   TestRun/037-legacy-fail.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // 393 identical bytes
          	"d\"></failure>\n\t\t</testcase>\n\t\t<testcase name=\"TestTwo\" classname",
          	"=\"package/name\" time=\"0.130\"></testcase>\n\t\t<system-out><![CDATA[",
        - 	`	`,
        + 	"&#x9;",
          	"file_test.go:11: Error message\n",
        - 	`	`,
        + 	"&#x9;",
          	"file_test.go:11: Longer\n",
        - 	"\t\terror\n\t\t",
        + 	"&#x9;&#x9;error\n&#x9;&#x9;",
          	"message.\nexit status 1]]></system-out>\n\t</testsuite>\n</testsuite",
          	"s>\n",
          }, "")
=== RUN   TestRun/100-pass.gojson.txt
=== RUN   TestRun/101-fail.gojson.txt
=== RUN   TestRun/102-broken.gojson.txt
=== RUN   TestRun/103-subtests.gojson.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // [56](https://github.com/jstemmer/go-junit-report/runs/7925604213?check_suite_focus=true#step:5:57)4 identical bytes
          	`estMultiple/Single" classname="package/name/subtest" time="0.000`,
          	"\">\n\t\t\t<failure message=\"Failed\"><![CDATA[    pkg_test.go:20: Do(",
        - 	`"a"`,
        + 	"&#34;a&#34;",
          	"): got aaaaaaaaaa, want a]]></failure>\n\t\t</testcase>\n\t\t<testcase",
          	` name="TestMultiple/Multi" classname="package/name/subtest" time`,
          	... // 102 identical bytes
          }, "")

Copy link
Owner

@jstemmer jstemmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are failing because we're escaping too much. One of the reasons I switched to using CDATA in the generated XML is to avoid having to escape so many characters, which made the output hard to read.

I think your initial approach using strings.Map was good, we just need to make sure to return 0xfffd for every rune that falls outside of the character range as defined in the XML 1.0 standard, like what is done in this part of xml.EscapeText.

junit/junit.go Show resolved Hide resolved
@stefan-zh
Copy link

Hey @janisz are you planning on completing this pull request? My team is also facing this issue #138 and your fix will be very helpful

@janisz janisz force-pushed the remove_control_characters_from_output branch from 7eac4f5 to 7309f18 Compare September 15, 2022 09:09
@janisz
Copy link
Contributor Author

janisz commented Sep 15, 2022

@stefan-zh I updated PR
@jstemmer PTAL

Name: "TestEscapeOutput",
Classname: "package/name",
Time: "0.000",
SystemOut: &Output{Data: "�\v\f \t\\"},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\v and \f are also considered illegal characters, they should also map to \uFFFD.

junit/junit.go Outdated Show resolved Hide resolved
Copy link
Owner

@jstemmer jstemmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jstemmer jstemmer merged commit 84a5190 into jstemmer:master Sep 17, 2022
@jstemmer
Copy link
Owner

Thanks! :)

@stefan-zh
Copy link

Thank you @janisz @jstemmer. I confirm that the GitLab CI pipeline for my team was fixed by this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need proper handling for invalid character for CDATA field
4 participants