Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation score is changing and not stable enough #62

Open
vmassol opened this issue May 22, 2018 · 47 comments
Open

Mutation score is changing and not stable enough #62

vmassol opened this issue May 22, 2018 · 47 comments

Comments

@vmassol
Copy link
Contributor

vmassol commented May 22, 2018

Hi, I've updated pitest/descartes to use descartes 1.2, pitest 1.4.0 and pitest-junit5-plugin 0.5 and I've found today that plenty of mutation scores are changing (without any change to the sources). I don't know why but it's looking bad.

Somehow I have the feeling it could be related to the introduction of pitest-junit5-plugin in xwiki/xwiki-commons@0a7d7ee

Some examples:

Any idea?

vmassol added a commit to xwiki/xwiki-commons that referenced this issue May 22, 2018
…and the drop and no code/test were changes in that module). I've now raised STAMP-project/pitest-descartes#62
vmassol added a commit to xwiki/xwiki-commons that referenced this issue May 22, 2018
…and the drop and no code/test were changes in that module). I've now raised STAMP-project/pitest-descartes#62
@vmassol vmassol changed the title Mutation score is changing Mutation score is changingn and not stable enough Oct 19, 2018
@vmassol
Copy link
Contributor Author

vmassol commented Oct 19, 2018

One guess is that timeouts are in play here and when there's a timeout, it's counted as a bad score.

We could increase the timeout I guess but that'll also increase a lot the build time (we build all XWiki with descartes).

@vmassol vmassol changed the title Mutation score is changingn and not stable enough Mutation score is changing and not stable enough Oct 19, 2018
@oscarlvp
Copy link
Member

When PIT performs the mutation analysis it multiplies by a factor the timeout considering the original execution time. By default this factor is 1.25. This timeout could be configured to be used only in the mutation analysis phase instead of changing the original timeout. You can check the timeoutFactor property. See here.

@surli
Copy link

surli commented Nov 9, 2018

Hi @oscarlvp

we recently configured some threshold to detect mutation score regression, but we got really stuck with those timeouts even when setting the timeoutFactor.

For example, our build spotted a mutation score of 80 against 88, that I was not able to reproduce on my own computer. When setting the timeoutFactor to 10 on an offline agent on the CI, I got successively the following mutation scores: 90, 88, 83, 85 and 88 again.

It's really hard to stabilize a threshold in this case. Any advice how we could improve that? Is there a limit value for the timeoutFactor? Should we use a timeoutConstant instead?

@vmassol
Copy link
Contributor Author

vmassol commented Jan 31, 2019

FTR here's what we did with the timeoutFactor: xwiki/xwiki-commons@10dd48f

@vmassol
Copy link
Contributor Author

vmassol commented Jan 31, 2019

Recent case of timeout: https://github.com/xwiki/xwiki-platform/pull/946/files

@oscarlvp
Copy link
Member

oscarlvp commented Feb 19, 2019

Report for xwiki/xwiki-commons@88f8b64 included here
Test are probably unstable when using the mutated values.
xwiki/xwiki-commons@574c333 corresponds to the same issue.

@oscarlvp
Copy link
Member

oscarlvp commented Feb 19, 2019

Report for xwiki/xwiki-commons@85c9c98 included here
The mutation score was stable after 100 executions. No mutant is killed by a timeout.
@vmassol @surli How was the first score determined for this module?

@surli
Copy link

surli commented Feb 19, 2019

I guess you meant xwiki/xwiki-commons@85c9c98 (I got a 404 with your link).

AFAIK we computed the score by running the following command line and going to the pitest report:

mvn clean install -Pquality -Dxwiki.pitest.skip=false -Djacoco.skip=true -Dxwiki.checkstyle.skip=true

@oscarlvp
Copy link
Member

@surli This is the same way I'm computing the score to reproduce the issues. However, there is no change for this particular commit and module over 100 executions. So I'm wondering, if by any chance, there was a mistake with the first score. Is the issue still happening on your side?

@surli
Copy link

surli commented Feb 19, 2019

I don't know for this module maybe @vmassol has more information about that one. Now I know that for https://github.com/xwiki/xwiki-platform/pull/946/files even executing it dozens of time on the same machine I didn't get a change. It's when executing it on another machine with other spec (less memory / processor) that I spotted the difference.

@oscarlvp
Copy link
Member

As for xwiki/xwiki-commons@2032022 the analysis produces a 0 mutation score as no mutant is reported as covered. There is a configuration problem. xwiki-commons is using the Junit 5 PIT plugin. For this particular module Junit 5 is not in the classpath, so the plugin fails to execute and therefore finds no test class. fixing the configuration should fix the issue. Detailed report here

@oscarlvp
Copy link
Member

oscarlvp commented Feb 19, 2019

@vmassol @surli There is indeed a difference on how the initial scores were computed. PIT was configured in xwiki/xwiki-commons@cb0ff74 to use Junit 4. In xwiki/xwiki-commons@0a7d7ee the configuration switched to Junit 5. This was done on 22-05-2018. The same day were reported significant differences in the score, which are the ones listed in the initial description of this issue.
Here is a summary of the issues and the outcome when trying to reproduce it.

Commit Score change Outcome
xwiki/xwiki-commons@88f8b64 from 68 to 40 Score unstable but with no significant change. Might be due to some tuned test cases.
xwiki/xwiki-commons@85c9c98 from 67 to 36 Score stable at 36% after 100 executions.
xwiki/xwiki-commons@574c333 from 40 to 39 Same as xwiki/xwiki-commons@88f8b64
xwiki/xwiki-commons@f5394b4 from 21 to 11 Score stable at 11% after 100 executions.
xwiki/xwiki-commons@2032022 from 20 to 0 Score stable at 0% (no coverage) after 30 executions.

As hinted by the configuration issue for xwiki/xwiki-commons@2032022 the drastic score change could be explained by the change of test plugin for PIT from Junit 4 to Junit 5. The Junit 5 plugin is not as stable as the other and may be missing some tests.
xwiki/xwiki-commons@2032022 should be solved by adding Junit 5 to the module configuration.
The actual unstable score from xwiki/xwiki-commons@88f8b64 and xwiki/xwiki-commons@574c333 should be solved by removing the test cases identified as problematic (entropy generation).

@surli
Copy link

surli commented Feb 25, 2019

Thanks for this report @oscarlvp ! To continue on this topic, we got this weekend a report from our pit-descartes CI job with this:

xwiki-platform-webjars-api: Mutation score of 22 is below threshold of 24

The job is now back to normal. What's interesting is that nothing has been committed this weekend on any of our repo on master, and we didn't change anything in our config. So it does prove that we still experiencing this timeout issue.
In order to reproduce the state of the three repo used for building at that moment were the following: xwiki/xwiki-commons@12c3203 xwiki/xwiki-rendering@2a8a41b and xwiki/xwiki-platform@d2b8e20

You can find the job report here on our CI

@oscarlvp
Copy link
Member

You're welcome @surli.
I haven't checked the xwiki-platform issues you have reported. I started with the issues from xwiki-commons. Given the small percentage change it is likely to be a timeout issue (in the same sense as xwiki/xwiki-commons@88f8b64). I will do the same experiment with the commit your are reporting.

@oscarlvp
Copy link
Member

@surli
I can't build xwiki-platforms on my side so I inspected the console output from Jenkins and there is indeed a variation.
The score for xwiki-platform-webjars-api is 11/45 -> 24 % for all builds except for the following 4:

  • 132 10/45 -> 22%
  • 146 12/45 -> 27%
  • 147 10/45 -> 22%
  • 148 12/45 -> 27%

From the details in the console output it can be seen that the variation concerns two mutants: a true mutant, affecting one boolean method and a void mutant affecting of course a void method.
No mutant is killed with a timeout, so the timeout factor will not solve this particular case.
Is it possible to get the report files generated by PIT during those builds? That would shed a light on which methods are being affected by those mutants and the tests involved.

@surli
Copy link

surli commented Feb 25, 2019

Is it possible to get the report files generated by PIT during those builds?

I don't think it is we run it with a mvn clean goal and the workspaces are not archived. But the last report is available in the target directory.

@vmassol
Copy link
Contributor Author

vmassol commented Feb 25, 2019

@oscarlvp
Copy link
Member

Thanks @vmassol
The mutations.json file in the workspace is actually showing some TIMED_OUT mutants. One of them is void. Of the three true mutants one is marked as SURVIVED. Both these mutants are covered by the following test case org.xwiki.webjars.internal.FilesystemResourceReferenceSerializerTest.serializeCSSResourceWithURLsInIt whose code can be checked here.
Now, it seems that this test case, and others in the same module, are dealing with actual files. The assertions seem to verify if some files exist. How are these files created? Could the outcome of another test influence this test case? When the code is mutated there may be some unexpected results reflected in the files created/erased/modified. Could this be the case?
The methods being mutated are org.xwiki.webjars.internal.FilesystemResourceReferenceCopier.processCssfile and org.xwiki.webjars.internal.FilesystemResourceReferenceCopier.isRelativeURL

@surli
Copy link

surli commented Mar 7, 2019

@oscarlvp Hi we got another example of mutation score changing. This time on module xwiki-platform-observation-remote, the threshold is set at 80 and we got a build we reached only 75, see there

This time I saved the reports before triggering back the build. You can find them attached: build 160 was failing, 161 was ok.
platform-observation-remote-pitreport-build-161.zip
platform-observation-remote-pitreport-build-160.zip

vmassol added a commit to xwiki/xwiki-commons that referenced this issue Mar 12, 2019
@vmassol
Copy link
Contributor Author

vmassol commented Feb 12, 2020

Note that today (2020-02-12) we had an unstability again with no code change AFAICS:

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-commons-extension-repository-maven: Mutation score of 85 is below threshold of 88 -> [Help 1]

I updated the mutation score to the current value at xwiki/xwiki-commons@ad888d7#diff-4e06773273323f2703b4c3a54f9afd47R36 and it passed on the CI yesterday and the days before and today, suddenly, it failed in https://ci.xwiki.org/job/xwiki-commons_pitest/724/console

@vmassol
Copy link
Contributor Author

vmassol commented Feb 12, 2020

And another one found today: `[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-mailsender: Mutation score of 26 is below threshold of 44 -> [Help 1]``

I tried it locally and I have different mutation scores everytime I run it. Sometimes 9%, sometimes 34%, sometimes 18%, etc.

In the logs I see plenty of the following:

1:46:24 PM PIT >> INFO : Created  5 mutation test units
stderr  : Exception in thread "smtp:127.0.0.1:3025" java.lang.IllegalStateException: Can not open server socket for smtp:127.0.0.1:3025
        at com.icegreen.greenmail.server.AbstractServer.initServerSocket(AbstractServer.java:115)
        at com.icegreen.greenmail.server.Abstrstderr  : actServer.run(AbstractServer.java:86)
Caused by: java.net.BindException: Address already in use (Bind failed)
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.Sestderr  : rverSocket.bind(ServerSocket.java:390)
        at java.net.ServerSocket.bind(ServerSocket.java:344)
        at com.icegreen.greenmail.server.AbstractServer.openServerSocket(AbstractServer.java:71)
        at com.icegreen.greenmail.server.AbstractServer.initServerSocket(Abstracstderr  : tServer.java:110)
        ... 1 more
stderr  : Exception in thread "smtp:127.0.0.1:3025" java.lang.IllegalStateException: Can not open server socket for smtp:127.0.0.1:3025
        at com.icegreen.greenmail.server.AbstractServer.initServerSocket(AbstractServer.java:115)
        at com.icegreen.greenmail.server.Abstrstderr  : actServer.run(AbstractServer.java:86)
Caused by: java.net.BindException: Address already in use (Bind failed)
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.Sestderr  : rverSocket.bind(ServerSocket.java:390)
        at java.net.ServerSocket.bind(ServerSocket.java:344)
        at com.icegreen.greenmail.server.AbstractServer.openServerSocket(AbstractServer.java:71)
        at com.icegreen.greenmail.server.AbstractServer.initServerSocket(Abstracstderr  : tServer.java:110)
        ... 1 more

However this happens only when pitest/descartes executes.

Note: I've tried to check the generated mutations but it doesn't work. When using the EXPORT feature, I get an error:

[INFO] Defaulting target classes to match packages in build directory
5:46:19 PM PIT >> INFO : Verbose logging is disabled. If you encounter a problem, please enable it before reporting an issue.
5:46:20 PM PIT >> INFO : Sending 8 test classes to minion
5:46:20 PM PIT >> INFO : Sent tests to minion
5:46:20 PM PIT >> INFO : MINION : 5:46:20 PM PIT >> INFO : Checking environment

5:46:20 PM PIT >> INFO : MINION : 5:46:20 PM PIT >> INFO : Found  14 tests

5:46:20 PM PIT >> INFO : MINION : 5:46:20 PM PIT >> INFO : Dependency analysis reduced number of potential tests by 0

5:46:20 PM PIT >> INFO : MINION : 5:46:20 PM PIT >> INFO : 14 tests received
                                                                                                                                                                                                         -5:46:23 PM PIT >> INFO : Calculated coverage in 3 seconds.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  27.643 s
[INFO] Finished at: 2020-02-12T17:46:23+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-mailsender: Execution pitest-check of goal org.pitest:pitest-maven:1.4.10:mutationCoverage failed: Type javax/mail/Multipart not present: javax.mail.Multipart -> [Help 1]
[ERROR] 

I've also made sure that the mail SMTP socket is closed after each test. Didn't help. FTR here's what I did in MailSenderApiTest:

    @AfterAll
    public static void afterAll() throws Exception
    {
        mailserver.stop();
        while(!isTcpPortAvailable(3025)) {
            Thread.sleep(100L);
        }
    }

    private static boolean isTcpPortAvailable(int port) {
        try (ServerSocket serverSocket = new ServerSocket()) {
            // setReuseAddress(false) is required only on OSX,
            // otherwise the code will not work correctly on that platform
            serverSocket.setReuseAddress(false);
            serverSocket.bind(new InetSocketAddress(InetAddress.getByName("localhost"), port), 1);
            return true;
        } catch (Exception ex) {
            return false;
        }
    }

Any idea is most welcome.

@vmassol
Copy link
Contributor Author

vmassol commented Feb 13, 2020

Another one: [ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-vfs-api: Mutation score of 62 is below threshold of 67 -> [Help 1]

@oscarlvp
Copy link
Member

@vmassol the case of the xwiki-platform-mailsender is definitively an interference between the mutants. Remember that a mutant may corrupt the state of the test. If the code of the test has global effects, as it seems to be the case with this one, then it is very difficult to ensure the integrity of the system for the execution of the next mutant. Also , waiting for the port to be ready may make the test fail, remember that PIT always sets a timeout.
On my side I will try to reproduce the issues with xwiki-commons-extension-repository-maven and xwiki-platform-vfs-api.

@oscarlvp
Copy link
Member

Executing the following goal on xwiki-commons-extension-repository-maven

mvn clean install -Pquality -Dxwiki.pitest.skip=false -Djacoco.skip=true

ends in the following error:

[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.xwiki.extension.repository.xwiki.internal.XWikiExtensionRepositoryTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.32 s - in org.xwiki.extension.repository.xwiki.internal.XWikiExtensionRepositoryTest
[INFO] Running org.xwiki.extension.repository.xwiki.internal.SystemHTTPProxyTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.054 s <<< FAILURE! - in org.xwiki.extension.repository.xwiki.internal.SystemHTTPProxyTest
[ERROR] testProxy  Time elapsed: 1.054 s  <<< ERROR!
org.xwiki.component.manager.ComponentLookupException: Failed to lookup component [org.xwiki.extension.repository.xwiki.internal.XWikiExtensionRepositoryFactory] identified by type [interface org.xwiki.extension.repository.ExtensionRepositoryFactory] and hint [xwiki]
	at org.xwiki.component.embed.EmbeddableComponentManager.getInstance(EmbeddableComponentManager.java:204)
	at org.xwiki.test.mockito.MockitoComponentMockingRule.getComponentUnderTest(MockitoComponentMockingRule.java:243)
	at org.xwiki.extension.repository.xwiki.internal.SystemHTTPProxyTest.testProxy(SystemHTTPProxyTest.java:56)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.xwiki.test.mockito.MockitoComponentMockingRule$1.evaluate(MockitoComponentMockingRule.java:188)
	at com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:62)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
	at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:248)
	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$5(DefaultLauncher.java:211)
	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:226)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:199)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:132)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:150)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:124)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: org.xwiki.component.phase.InitializationException: Failed to create JAXB context
	at org.xwiki.extension.repository.xwiki.internal.XWikiExtensionRepositoryFactory.initialize(XWikiExtensionRepositoryFactory.java:67)
	at org.xwiki.component.embed.InitializableLifecycleHandler.handle(InitializableLifecycleHandler.java:39)
	at org.xwiki.component.embed.EmbeddableComponentManager.createInstance(EmbeddableComponentManager.java:365)
	at org.xwiki.component.embed.EmbeddableComponentManager.getComponentInstance(EmbeddableComponentManager.java:451)
	at org.xwiki.component.embed.EmbeddableComponentManager.getInstance(EmbeddableComponentManager.java:201)
	... 50 more
Caused by: javax.xml.bind.JAXBException: Implementation of JAXB-API has not been found on module path or classpath.
 - with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:131)
	at javax.xml.bind.ContextFinder.find(ContextFinder.java:318)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:478)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:435)
	at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:336)
	at org.xwiki.extension.repository.xwiki.internal.XWikiExtensionRepositoryFactory.initialize(XWikiExtensionRepositoryFactory.java:65)
	... 54 more
Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
	at javax.xml.bind.ServiceLoaderUtil.nullSafeLoadClass(ServiceLoaderUtil.java:92)
	at javax.xml.bind.ServiceLoaderUtil.safeLoadClass(ServiceLoaderUtil.java:125)
	at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:128)
	... 59 more

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   SystemHTTPProxyTest.testProxy:56 » ComponentLookup Failed to lookup component ...
[INFO] 
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

@vmassol
Copy link
Contributor Author

vmassol commented Feb 13, 2020

@oscarlvp that's weird, it works fine on our CI + locally (just retested now). Maybe try with "-U" to make sure you have recent deps?

I ran it from xwiki-commons/xwiki-commons-core/xwiki-commons-extension/xwiki-commons-extension-repositories/xwiki-commons-extension-repository-maven.

What JDK are you using?

I'm on:

$ java -fullversion
openjdk full version "1.8.0_202-b08"

@oscarlvp
Copy link
Member

I'm usind JDK 13
openjdk full version "13.0.1+9"
so, that might be the problem right?

@vmassol
Copy link
Contributor Author

vmassol commented Feb 13, 2020

so, that might be the problem right?

yes could well be. It's possible that they remove some stuff or that it's optional, etc. On our side we build with java 8.

@vmassol
Copy link
Contributor Author

vmassol commented Feb 16, 2020

Got a new one (from https://ci.xwiki.org/job/xwiki-platform_pitest/558/console):

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-observation-remote: Mutation score of 78 is below threshold of 80 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-observation-remote: Mutation score of 78 is below threshold of 80

Same as #62 (comment)

@vmassol
Copy link
Contributor Author

vmassol commented Feb 26, 2020

Another flicker from today:

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-user-default: Mutation score of 87 is below threshold of 88

@vmassol
Copy link
Contributor Author

vmassol commented Mar 20, 2020

Another flicker today:

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-webjars-api: Mutation score of 22 is below threshold of 24 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-platform-webjars-api: Mutation score of 22 is below threshold of 24

Unless the following change could cause the mutation score to change but then I'd be curious to know why: xwiki/xwiki-platform@9ebbeb6

Thanks!

@vmassol
Copy link
Contributor Author

vmassol commented Apr 5, 2020

New flicker today:

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-commons-extension-repository-maven: Mutation score of 83 is below threshold of 85 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.pitest:pitest-maven:1.4.10:mutationCoverage (pitest-check) on project xwiki-commons-extension-repository-maven: Mutation score of 83 is below threshold of 85

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

Is it possible to mention this issue in all the commits you make lowering the mutation score so we have a direct link to the related information, it has been done in in some cases above?
Thanks.

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

Is it possible to mention this issue in all the commits you make lowering the mutation score so we have a direct link to the related information, it has been done in in some cases above?

Indeed, I don't do it systematically. I'll try to remember it.

However, I reference the issue in all the commit content (been do it for some time now; wasn't do it initially when this issue was created though). So for example you can see all cases here:

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

@vmassol Thanks a lot!

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

@vmassol I get the following error while trying to build xwiki-commons-extension-repository-maven

Failed to execute goal fr.inria.gforge.spoon:spoon-maven-plugin:3.2:check (default) on project 
xwiki-commons-extension-repository-maven: 
Exception during the spoonify of the target project.: 
There is no [/usr/target/classes/META-INF/components.txt] file and thus Component 
[org.xwiki.extension.repository.aether.internal.components.PlexusContainerProvider] 
isn't declared!
Consider adding a components.txt file or if it is normal use the 
"staticRegistration" parameter as in "@Component(staticRegistration = false)"

I used the following command

mvn  clean install -Pquality -Dxwiki.pitest.skip=false -Djacoco.skip=true

I have the XWiki repositories configured in settings.xml. and I'm using Java 8 with a docker image.

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

@oscarlvp on which branch are you? master? Did you git pull the repo?

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

@vmassol Yes I'm on master after git pull

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

@oscarlvp Just tried it and it worked fine. I've used mvn clean install -Pquality -Dxwiki.pitest.skip=false -Djacoco.skip=true -U to make sure that it doesn't use any artifact from my local repo.

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

@oscarlvp maybe you're running from a different directory? The /usr/target/classes/META-INF/components.txt part of the message seem to indicate that your current dir is /usr/. We added some check in the build recently and the current dir is important.

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

@vmassol Indeed I made a mistake while mounting the working directory in docker, I realised that from your comment. The project is building OK now, so I should be able to see what is happening.

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

@vmassol
I executed the mutation analysis on xwiki-commons-extension-repository-maven 7 times. In all of them I got an 88% mutation score.
However, for some methods the analysis didn't produce the same outcome in all attempts.
The following transformations were reported by PIT sometimes as TIMED_OUT and sometimes as KILLED:

Method Transformation TIMED_OUT
AetherUtils.createArtifact(String,String) { return null; } 1 of 7
AetherExtensionRepositoryFactory.createRepository(ExtensionRepositoryDescriptor) { return null; } 6 of 7
AetherExtensionRepository.toAetherDependencies(Collection,RepositorySystemSession) { return null; } 1 of 7
AetherExtensionRepository.resolveVersionConstraint(String, VersionConstraint, RepositorySystemSession) { return null; } 3 of 7
AetherExtensionRepository.newResolutionRepositories(RepositorySystemSession) { return null; } 6 of 7
AetherExtensionRepository.getAllMavenRepositories() { return null; } 2 of 7

So, for example AetherExtensionRepository.resolveVersionConstraint was reported 3 times as TIMED_OUT and 4 times as KILLED in the 7 attempts I executed.
TIMED_OUT means that no test finished to execute with the mutant in the time given by PIT, which, by default is 1.25 multiplied by the execution time of the same tests with the original code.

The following are transformations always reported as TIMED_OUT in the 7 attempts:

Method Transformation
NexusXWikiOrgExtensionRepositorySource.getExtensionRepositoryDescriptors() { return null; }
AetherUtils.createExtensionId(Artifact, ExtensionFactory) { return null; }
AetherExtensionRepositoryFactory.createTemporaryFile(String, String) { return null; }
AetherExtensionRepository.AetherExtensionFileInputStream.close() { }
AetherExtensionRepository.createRepositorySystemSession() { return null; }
AetherExtensionRepository.convertToAether(Dependency,ArtifactTypeRegistry) { return null; }
AetherExtensionFile.openStream() { return null; }
AetherExtensionDependency.getAetherDependency() { return null; }

Given that the outcome of the transformations of the first table above is erratic, I wouldn't be surprised if some of the cases in the second table have also an erratic behaviour, and some of them could be reported as SURVIVED thus lowering the mutation score in random executions. To confirm this it would be nice to save the PIT reports when such score drops occur.

The tests related to methods above use mocks and also deal with files. Both things might be related to this kind of erratic outcome. Do these tests affect the content of external files?

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

Since the nature of these score drops may be quite different from one to the other, I propose to open separated issue for each one of them. In this way it is easier for me to track the number of times they happen and check if they get solved.

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

To confirm this it would be nice to save the PIT reports when such score drops occur.

Noted, I'll start providing the PIT reports when I see flickers. Now it'll only be for new flickers since we already reduced the thresholds for the ones reported.

TIMED_OUT means that no test finished to execute with the mutant in the time given by PIT, which, by default is 1.25 multiplied by the execution time of the same tests with the original code.

I see that @surli put the following in our pom:

    <!-- Default Pitest timeout factor. Increasing the value will increase the duration of a build
         but might change the mutation score: some mutation might take longer than the timeout to be killed
         on some machines. See: https://github.com/STAMP-project/pitest-descartes/issues/62 -->
    <xwiki.pitest.timeoutFactor>1.25</xwiki.pitest.timeoutFactor>

Does it mean that 1.25 is too low? Is there a way to have pitest or descartes (since it might be harder to change pitest) provide a stable score independent of timeouts? I'm asking because there can always be slowness on CI agents when lots of jobs are building in parallel and it would be hard to find a perfect timeout value. I fear it would need to be too high to be safe, making the build take a lot more time (hourq and hour more). WDYT?

The tests related to methods above use mocks and also deal with files. Both things might be related to this kind of erratic outcome. Do these tests affect the content of external files?

Not sure what you mean by "content of external files". You mean do these test call java code located in classes other than the test class, that use the File API (or the NIO API) directly or indirectly?

Since the nature of these score drops may be quite different from one to the other, I propose to open separated issue for each one of them. In this way it is easier for me to track the number of times they happen and check if they get solved.

Ok I can start doing that if you prefer. I wanted to have everything in the same place since the topic is the same (and the cause might be too) and it's harder to relate the issues together. Maybe introduce some label for that?

@vmassol
Copy link
Contributor Author

vmassol commented Apr 6, 2020

@oscarlvp many thanks for looking into this! :)

@oscarlvp
Copy link
Member

oscarlvp commented Apr 6, 2020

Does it mean that 1.25 is too low? Is there a way to have pitest or descartes (since it might be harder to change pitest) provide a stable score independent of timeouts? I'm asking because there can always be slowness on CI agents when lots of jobs are building in parallel and it would be hard to find a perfect timeout value. I fear it would need to be too high to be safe, making the build take a lot more time (hourq and hour more). WDYT?

This feature is totally on PIT's side. Its main purpose is to detect those mutants that may cause an infinite loop. Say for example that a method that should return false is mutated to return true therefore a loop terminating condition will never be met and the code enters an infinite loop. The timeout will catch cases like this. A middle ground solution could be to avoid mutating the methods that are known to produce the flickering by configuring exceptions in the module.

Not sure what you mean by "content of external files". You mean do these test call java code located in classes other than the test class, that use the File API (or the NIO API) directly or indirectly?

Yes, maybe that. What I simply meant was if they touch files and if the mutated code could make those files corrupt and affect other tests.

Ok I can start doing that if you prefer. I wanted to have everything in the same place since the topic is the same (and the cause might be too) and it's harder to relate the issues together. Maybe introduce some label for that?

A label could be nice. My point is that if these score drops keep piling up it will be harder to keep track of them in the same issue.

many thanks for looking into this! :)

Don't mention it :)

@vmassol
Copy link
Contributor Author

vmassol commented May 28, 2020

@oscarlvp Hi. Hope you're doing good. I have some bad news: on the xwiki project we've decided to remove pitest/descartes from our build FTM. We had too many false positives and the developers did not have faith in the execution anymore. The cost of maintaining it was outweighting the perceived benefits. And unfortunately we were not ready to take the ownership of the development of pitest/descartes ourselves.

I feel that without the false positives, we would have continued using it. But if in the future there's a version of pitest/descartes that fixes the issue, we'll be able to revisit it and put back all that we had setup. I've been careful to list all places and to make it easy to rollback, see https://jira.xwiki.org/browse/XCOMMONS-1960

So I apologize because I know this is your baby and you may feel that we/I are letting you down. I personally believe in the mutation testing concept. There's probably not that much work remaining to use it the way we wanted to use it. I acknowledge that there are other ways of using it and we'll continue using pitest/descartes in these ways, namely using it as an aid inside your IDE when you're writing the tests.

Let's keep in touch. Feel free to contact me if you wish to discuss more.

Just taking the occasion to thank you again for your great work and support all along.

@oscarlvp
Copy link
Member

@vmassol Bad news indeed. IMHO you could remove it only from xwiki-platform. Most of the score issues come from that project. The mutation testing strategy definitively conflicts with the way you test the platform modules.
On my side I was working in a way to automate the diagnosis of these score flickering issues.
As you say, let's keep in touch.
Thank you for the fruitful collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants