Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate slow test: Lucene.Net.Tests.Index.TestAddIndexes::TestAddIndexesWithCloseNoWait() #308

Closed
NightOwl888 opened this issue Jul 8, 2020 · 3 comments · May be fixed by #312
Closed

Comments

@NightOwl888
Copy link
Contributor

The test Lucene.Net.Tests.Index.TestAddIndexes::TestAddIndexesWithCloseNoWait() (in Lucene.Net.Tests._E-I) is usually (around 90% of the time) correctly completing in about 2 seconds. However, occasionally it can take much longer. The test only checks for an invalid condition (adding index to a disposed IndexWriter) and is supposed to exit early because of the exception. However, there seems to be contention between the threads that makes it extremely slow to exit on some runs (sometimes up to 5 minutes or more).

No doubt, this is one of the primary contributors to the tests in the CI environment taking excessive time to complete.

@NightOwl888 NightOwl888 added this to the 4.8.0 milestone Jul 8, 2020
@NightOwl888 NightOwl888 added help-wanted Extra attention is needed performance up-for-grabs This issue is open to be worked on by anyone labels Jul 9, 2020
@Shazwazza Shazwazza self-assigned this Jul 9, 2020
@NightOwl888 NightOwl888 removed help-wanted Extra attention is needed up-for-grabs This issue is open to be worked on by anyone labels Jul 28, 2020
@Shazwazza
Copy link
Contributor

Just adding some notes here for now, with the change here #312 this fixes a deadlocking scenario which i was encountering when trying to debug this. Now that there is no deadlock the tests will work as expected however using a custom nunit test runner attribute (FindFirstFailingSeed) I still see that the test runs slow. While debugging it and stepping through the code we no longer see deadlocks so I can actively step through the code but when using the Parallel Stacks view in VS or just the Threads window we can see there's always many threads waiting on a single lock.

Some main things to note:

  • Quite a lot of threads call into IndexWriter.CopySegmentAsIs and then wait on
    • Debug.Assert(!SlowFileExists(directory, newFileName), "file \"" + newFileName + "\" already exists; siFiles=" + string.Format(J2N.Text.StringFormatter.InvariantCulture, "{0}", siFiles)); ... Since this only runs in Debug builds, i wonder if the unit tests on the build server are running in debug mode? This assertion actually comes with a lot of overhead!! It also takes a lock on the same lock that we are waiting on within MockDirectoryWrapper.locker.
    • And then later on info.Info.Dir.Copy(directory, file, newFileName, context); ... This also takes a lock on the same lock we are waiting within MockDirectoryWrapper.locker because it calls into MockDirectoryWrapper.Copy which takes a lock
      • ... Actually, it turns out that MockDirectoryWrapper.Copy will force a recursive lock on MockDirectoryWrapper.locker even with our changes/fixes to recursive locking. This is because: MockDirectoryWrapper.Copy (Lock!) -> Directory.Copy -> IOUtils.DisposeWhileHandlingException(priorException, os, @is) (where os == MockIndexOutputWrapper) -> MockIndexOutputWrapper.Dispose -> MockDirectoryWrapper.RemoveIndexOutput (Recursive lock!)

I think the above few things probably plays a role in the performance of this test so need to investigate this a little more. Since there is a recursive lock via indirect references that general means there can be deadlocks again but seeing as though there are so many locks trying to be taken anyways this will slow things down quite a lot.

@NightOwl888
Copy link
Contributor Author

NightOwl888 commented Jul 30, 2020

Debug.Assert(!SlowFileExists(directory, newFileName), "file "" + newFileName + "" already exists; siFiles=" + string.Format(J2N.Text.StringFormatter.InvariantCulture, "{0}", siFiles)); ... Since this only runs in Debug builds, i wonder if the unit tests on the build server are running in debug mode? This assertion actually comes with a lot of overhead!! It also takes a lock on the same lock that we are waiting on within MockDirectoryWrapper.locker.

This goes to the heart of one of a few dozen issues that I have written down in a notebook that I haven't reported on GitHub yet.

In Java, it is possible to turn on and off asserts in a production build, they aren't simply compiled out of the build. They are turned on during testing. What this effectively means is that there are a whole suite of tests (namely anything that is using System.Diagnostics.Debug.Assert() currently) that we are completely skipping. To make matters even more complicated, some parts of the test framework are designed to catch the AssertionError that is thrown from those asserts when they fail and ignore them, and other parts are designed to fail the test in those cases.

I recently "fixed" a related issue (#299) by throwing InvalidOperationException, but I see that may have been the wrong approach, since the test framework has different behavior for AssertionException and InvalidOperationException in some cases.

I have been considering ways of reproducing the Java assertion behavior without producing negative performance impacts in production. But one of the main things to note is that Debug.Assert() is implemented as a regular function in .NET, meaning that both parameters are resolved first before it is called. Putting an expensive function call and/or expensive string building operation there is what is causing this problem in Debug builds. In Java, the asserts are not implemented as a function, and I suspect the compiler doesn't run the string building operation unless the assert fails, and I am sure neither of them are run if assertions are disabled.

What is needed is to come up with a solution that allows us to turn on asserts during testing in a way that doesn't hamper debug or runtime performance. One option I have been considering is to create a wrapper for Debug.Assert, something like:

internal static class Debugging
{

    public static bool AssertsEnabled { get; set; } = SystemProperties.GetPropertyAsBoolean("assert", false);

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static void Assert(Func<bool> conditionFactory, Func<string> messageFactory)
    {
        if (AssertsEnabled)
        {
            if (!conditionFactory())
                throw new AssertionException(messageFactory());
        }
        else
        {
            Debug.Assert(conditionFactory(), messageFactory()); // Note this line is completely removed from Release builds
        }
    }
}

Which can be used like:

Debugging.Assert(() => !SlowFileExists(directory, newFileName), () => "file \"" + newFileName + "\" already exists; siFiles=" + string.Format(J2N.Text.StringFormatter.InvariantCulture, "{0}", siFiles));

I suspect to get optimal production performance, we will probably also have to duplicate the AssertsEnabled check, even though it is not DRY. That will completely cut off the execution path to the fallback Debug.Assert() call in debug mode, but being that it is implemented as a function, it is probably best that we don't include it for debugging anyway and just rely on turning assertions "on" or "off".

if (Debugging.AssertsEnabled)
    Debugging.Assert(() => !SlowFileExists(directory, newFileName), () => "file \"" + newFileName + "\" already exists; siFiles=" + string.Format(J2N.Text.StringFormatter.InvariantCulture, "{0}", siFiles));

Do note that the AssertionException already exists in the test framework. I have been trying to avoid putting testing code in the release, but it appears in order to duplicate this behavior we will either need to or come up with a solution that involves injecting a class for testing purposes or include it in the release code. Certainly to turn "on" and "off" asserts in production, it would be easier to follow the former approach.

@NightOwl888
Copy link
Contributor Author

#326 fixed the Debug.Assert() issue some time ago. It is difficult to tell exactly which commit fixed this test, but it no longer seems to be running slow.

I will leave #312 open for now in case we need to borrow some ideas from it, but this issue has now been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants