Drop control characters from output #140

janisz · 2022-08-04T09:32:04Z

Fixes: #138

jstemmer

Thanks for the PR! I left a couple of comments.

parser/gotest/internal/collector/collector.go

jstemmer · 2022-08-07T13:28:58Z

parser/gotest/internal/collector/collector.go

+
+func (l line) SafeText() string {
+	return strings.Map(func(r rune) rune {
+		if unicode.IsControl(r) && !unicode.IsSpace(r) {


Some characters considered whitespace by unicode.IsSpace are illegal in the XML spec (for example \v and \f). And unicode.IsControl doesn't exclude other illegal character ranges either.

Let's use the exact character range defined by the spec in this function instead.

Great point. Unfortunately xml.Escpe does not handle whitespaces as I'd expect and not translate \t into spaces

testdata/src/whitespace/whitespace_test.go

TomTardigradeSEL · 2022-08-14T05:59:10Z

I would caution you on just doping the invalid Unicode characters. I ran into this problem with test that are specifically for code that uses some of the Unicode characters that are invalid in the cdata section. Removing them will alter the output from the test results.

I found a way to replace the Unicode character with its quoted escape code.
For the case of ANSI color codes, I also found a way to strip the entire color code from the output.
By first striping the ANSI code, then reformatting the output for the Unicode characters, I have been able to get junit reports
that are valid for XML while retaining any Unicode that is not allowed as their folded equivalent.

jstemmer · 2022-08-14T21:25:41Z

@TomTardigradeSEL thanks! Yeah, I agree we probably shouldn't just drop the illegal characters. From what I've seen I don't think we can't just escape these illegal characters, unless I'm missing something?

I had a look at the encoding/xml package in the standard library, they replace illegal characters with the unicode replacement character 0xfffd, which renders as �. Let's do the same here when we encounter illegal characters. At this point it's clear from the output that there were some characters that couldn't be displayed, and we can leave it up to the test author to do something about it (for example, change how they print their data).

As for the ANSI escape sequences, removing just the illegal character(s) will still leave some things behind. Detecting and removing these sequences is something we should do, but not in this PR.

janisz · 2022-08-16T09:54:01Z

Hey, thanks for feedback!
I moved code to formatOutput and used xml.EscapeText

janisz · 2022-08-23T08:20:23Z

I'm not sure how to handle

=== RUN   TestRun/035-whitespace.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // 983 identical bytes
          	"assname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<system-out><![CDA",
          	"TA[    whitespace_test.go:31: no-tab\n    whitespace_test.go:32: ",
        - 	`	`,
        + 	"&#x9;",
          	"one-tab\n    whitespace_test.go:33: ",
        - 	"\t\ttwo-tab\nno-tab\n\tone-tab\n\t\t",
        + 	"&#x9;&#x9;two-tab\nno-tab\n&#x9;one-tab\n&#x9;&#x9;",
          	"two-tab]]></system-out>\n\t\t</testcase>\n\t\t<testcase name=\"TestWith",
          	"NewlinesFlat\" classname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<s",
          	... // 1118 identical bytes
          	"assname=\"package/whitespace\" time=\"0.000\">\n\t\t\t<system-out><![CDA",
          	"TA[    whitespace_test.go:31: no-tab\n    whitespace_test.go:32: ",
        - 	`	`,
        + 	"&#x9;",
          	"one-tab\n    whitespace_test.go:33: ",
        - 	"\t\ttwo-tab\nno-tab\n\tone-tab\n\t\t",
        + 	"&#x9;&#x9;two-tab\nno-tab\n&#x9;one-tab\n&#x9;&#x9;",
          	"two-tab]]></system-out>\n\t\t</testcase>\n\t\t<testcase name=\"TestSubT",
          	`ests/TestWithNewlinesFlat" classname="package/whitespace" time="`,
          	... // 3[43](https://github.com/jstemmer/go-junit-report/runs/7925604213?check_suite_focus=true#step:5:44) identical bytes
          }, "")
=== RUN   TestRun/036-benchfail.txt
=== RUN   TestRun/037-legacy-fail.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // 393 identical bytes
          	"d\"></failure>\n\t\t</testcase>\n\t\t<testcase name=\"TestTwo\" classname",
          	"=\"package/name\" time=\"0.130\"></testcase>\n\t\t<system-out><![CDATA[",
        - 	`	`,
        + 	"&#x9;",
          	"file_test.go:11: Error message\n",
        - 	`	`,
        + 	"&#x9;",
          	"file_test.go:11: Longer\n",
        - 	"\t\terror\n\t\t",
        + 	"&#x9;&#x9;error\n&#x9;&#x9;",
          	"message.\nexit status 1]]></system-out>\n\t</testsuite>\n</testsuite",
          	"s>\n",
          }, "")
=== RUN   TestRun/100-pass.gojson.txt
=== RUN   TestRun/101-fail.gojson.txt
=== RUN   TestRun/102-broken.gojson.txt
=== RUN   TestRun/103-subtests.gojson.txt
    go-junit-report_test.go:84: Unexpected report diff (-want, +got):
          strings.Join({
          	... // [56](https://github.com/jstemmer/go-junit-report/runs/7925604213?check_suite_focus=true#step:5:57)4 identical bytes
          	`estMultiple/Single" classname="package/name/subtest" time="0.000`,
          	"\">\n\t\t\t<failure message=\"Failed\"><![CDATA[    pkg_test.go:20: Do(",
        - 	`"a"`,
        + 	"&#34;a&#34;",
          	"): got aaaaaaaaaa, want a]]></failure>\n\t\t</testcase>\n\t\t<testcase",
          	` name="TestMultiple/Multi" classname="package/name/subtest" time`,
          	... // 102 identical bytes
          }, "")

jstemmer

The tests are failing because we're escaping too much. One of the reasons I switched to using CDATA in the generated XML is to avoid having to escape so many characters, which made the output hard to read.

I think your initial approach using strings.Map was good, we just need to make sure to return 0xfffd for every rune that falls outside of the character range as defined in the XML 1.0 standard, like what is done in this part of xml.EscapeText.

junit/junit.go

stefan-zh · 2022-09-09T12:28:55Z

Hey @janisz are you planning on completing this pull request? My team is also facing this issue #138 and your fix will be very helpful

Fixes: jstemmer#138

This reverts commit bc70670.

janisz · 2022-09-15T09:14:28Z

@stefan-zh I updated PR
@jstemmer PTAL

jstemmer · 2022-09-15T21:36:35Z

junit/junit_test.go

+						Name:      "TestEscapeOutput",
+						Classname: "package/name",
+						Time:      "0.000",
+						SystemOut: &Output{Data: "�\v\f \t\\"},


\v and \f are also considered illegal characters, they should also map to \uFFFD.

junit/junit.go

jstemmer

Looks good!

jstemmer · 2022-09-17T21:27:19Z

Thanks! :)

stefan-zh · 2022-09-19T09:34:39Z

Thank you @janisz @jstemmer. I confirm that the GitLab CI pipeline for my team was fixed by this change.

janisz mentioned this pull request Aug 5, 2022

Need proper handling for invalid character for CDATA field #138

Open

jstemmer requested changes Aug 7, 2022

View reviewed changes

janisz requested a review from jstemmer August 16, 2022 09:51

jstemmer requested changes Aug 23, 2022

View reviewed changes

junit/junit.go Show resolved Hide resolved

janisz added 4 commits September 14, 2022 18:10

Drop control characters from output

bb3ad9f

Fixes: jstemmer#138

Use xml.EscapeText

2409244

Revert "Drop control characters from output"

517787c

This reverts commit bc70670.

Allow whitespaces

7309f18

janisz force-pushed the remove_control_characters_from_output branch from 7eac4f5 to 7309f18 Compare September 15, 2022 09:09

janisz requested a review from jstemmer September 15, 2022 09:09

jstemmer reviewed Sep 15, 2022

View reviewed changes

junit/junit.go Outdated Show resolved Hide resolved

Do not include \v\f\r

c585335

janisz requested a review from jstemmer September 16, 2022 09:26

jstemmer approved these changes Sep 17, 2022

View reviewed changes

jstemmer merged commit 84a5190 into jstemmer:master Sep 17, 2022

jan-kantert mentioned this pull request Jun 26, 2023

New release #164

Open

osmaczko mentioned this pull request Jan 19, 2024

Nightly tests output isn't parsed properly status-im/status-go#4587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop control characters from output #140

Drop control characters from output #140

janisz commented Aug 4, 2022

jstemmer left a comment

jstemmer Aug 7, 2022

janisz Aug 16, 2022

TomTardigradeSEL commented Aug 14, 2022

jstemmer commented Aug 14, 2022

janisz commented Aug 16, 2022

janisz commented Aug 23, 2022

jstemmer left a comment

stefan-zh commented Sep 9, 2022

janisz commented Sep 15, 2022

jstemmer Sep 15, 2022

jstemmer left a comment

jstemmer commented Sep 17, 2022

stefan-zh commented Sep 19, 2022

Drop control characters from output #140

Drop control characters from output #140

Conversation

janisz commented Aug 4, 2022

jstemmer left a comment

Choose a reason for hiding this comment

jstemmer Aug 7, 2022

Choose a reason for hiding this comment

janisz Aug 16, 2022

Choose a reason for hiding this comment

TomTardigradeSEL commented Aug 14, 2022

jstemmer commented Aug 14, 2022

janisz commented Aug 16, 2022

janisz commented Aug 23, 2022

jstemmer left a comment

Choose a reason for hiding this comment

stefan-zh commented Sep 9, 2022

janisz commented Sep 15, 2022

jstemmer Sep 15, 2022

Choose a reason for hiding this comment

jstemmer left a comment

Choose a reason for hiding this comment

jstemmer commented Sep 17, 2022

stefan-zh commented Sep 19, 2022