Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZestGuidance does not correctly udpate total Coverage when test fails under every inputs #234

Open
shuaiwang516 opened this issue Jun 23, 2023 · 3 comments

Comments

@shuaiwang516
Copy link
Contributor

shuaiwang516 commented Jun 23, 2023

What happened:

When fuzzing a test that will fail under every input, ZestGuidance does not correctly compute the Total Coverage. To be more specific, the total coverage always shows as 0. Another issue caused by this is that valid coverage and total coverage are all the same value at any time. Because these two metrics are only updated under valid inputs.

How to reproduce:

Try to fuzz the following example test with Zest:

    @Fuzz
    public void exampleTest(String str) {
        int a = 0;
        a = a + 1;
        assertTrue(str.length() == -1);
    }

This test always fails because str.length() will not be -1. But the coverage should not be 0.

Example output:

Elapsed time:         3s (no time limit)
Number of executions: 44 (no trial limit)
Valid inputs:         0 (0.00%)
Cycles completed:     0
Unique failures:      2
Queue size:           0 (0 favored last cycle)
Current parent input: <seed>
Execution speed:      14/sec now | 11/sec overall
Total coverage:       0 branches (0.00% of map)
Valid coverage:       0 branches (0.00% of map)

How to improve

In ZestGuidance.java, there should have some totalCoverage update in this else if branch of handleResult() method:
for example:

  } else if (result == Result.FAILURE || result == Result.TIMEOUT) {
      String msg = error.getMessage();

      // We should also increase the totalCoverage but not validCoverage
      totalCoverage.updateBits(runCoverage);
      int nonZeroAfter = totalCoverage.getNonZeroCount();
      if (nonZeroAfter > maxCoverage) {
          maxCoverage = nonZeroAfter;
      }
   ...
  }

Also because there is a coverage responsibility check in completeCycle(), so we need to check the saved input responsibility is the same as valid coverage there instead of the total coverage (because Zest does not save invalid input)

        if (sumResponsibilities != validCoverageCount) {
            if (multiThreaded) {
                infoLog("Warning: other threads are adding coverage between test executions");
            } else {
                throw new AssertionError("Responsibilty mismatch");
            }
        }
@rohanpadhye
Copy link
Owner

Thanks for the report! You are right that the total coverage numbers don't take into account failing tests. This is by design. If we update totalCoverage with new coverage X discovered during failing tests, then we won't have inputs that cover X that can also be used for mutation in the fuzzing queue. So, we only save non-failing inputs for new coverage.

I understand that this makes reporting a bit weird. We usually don't expect targets to have only failing inputs. One thing that we can possibly add is another map that tracks coverage of failing inputs only, and reports that to the status screen in another line.

@hwang-pku
Copy link

Hi @rohanpadhye. I think there could be an elegant solution to this while make the reporting not that weird. The current coverage report can be very misleading when, e.g., all branch of the test in which all path would ultimately result in failures. In my understandings for this case, the total coverage should be at least non-zero if all the failures are triggered. However, the currently implementation of JQF would result in a very small coverage, which is counterintuitive.

As for the input selections, I assume that we could deal with the coverages triggered by failure inputs differently. For example, we could mark the coverages as "failure-induced" and have valid inputs "steal" them later if they also cover these coverages. Fixing this could give a more accurate coverage number and also prevent improper comparisons if we want to, e.g., compare between different tools & guidances.

@rohanpadhye
Copy link
Owner

Thanks for the input @hwang-pku. Marking specific parts of the coverage as "failure-induced" sounds a lot like keeping track of "coverage achieved by failure" and "coverage achieved by normal" separately, which I think is acceptable for reporting purposes. Feel free to open a PR for this, otherwise I will try to work on this in my free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants