Support for Ordering of Indexing with SeqNo #886

Jeevananthan-23 · 2023-11-04T15:05:52Z

Is there an existing issue for this?

I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

IndexWriter can determine the order of concurrent index operations but does not provide this information to the user.

Describe the solution you'd like

Currently, IW can determine the order in which operations were executed by updating the methods to return a long instead of void. If you are a caller who does not require this information, you can ignore the returned long. Additionally, this change enables the removal of the TrackingIndexWriter wrapper class with similar functionality by returning a long for each operation but with weaker guarantees.

Additional context

For example, the below basic test returns the indexing in ordering with incremental seqNo

        [Test]
        public void TestBasic()
        {
            Directory dir = NewDirectory();
            IndexWriter w = new IndexWriter(dir, NewIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(Random)));
            long a = w.AddDocument(new Document());
            long b = w.AddDocument(new Document());
            assertTrue(b > a);
            w.Dispose();
            dir.Dispose();
        }

The text was updated successfully, but these errors were encountered:

NightOwl888 · 2023-11-15T16:13:50Z

So, this is something we don't want to do. The reason for this is that the DocumentsWriter in Lucene 4.8.0 writes segments concurrently, not sequentially. However, we are getting test failures (I don't recall which tests) when attempting to do the same in .NET, possibly due to a missing lock or very subtle locking behavior in Java that doesn't work with the same syntax in .NET. 00d3942 is interesting and may help to address the problem, although we almost always strictly follow the way the tests are written in Java unless there is a good reason to change the test (and there may be here).

963e10c is the hack that we put in place to make it run sequentially for the time being, but our intention is to fix the bug rather than change the API like this which would render it unfixable.

That being said, nobody is currently working on trying to get the concurrent document writing to function and it is considered low priority since it can most likely be addressed without any breaking API change after the release. However, you seem to have a knack for this, so you are welcome to attempt to roll back those changes and work on fixing the concurrency bug.

Do note that DocumentsWriter is in an inconsistent state somewhere between Lucene 4.8.0 and 4.8.1 which may be contributing to the issue. So it may require upgrading to 4.8.1 in order to properly patch the bug. I ran a diff some time ago and there are less than 100 files that have changes between the two versions (and several of the modules were ported from 4.8.1 so there are fewer changes to deal with than that).

Jeevananthan-23 · 2023-11-18T17:35:03Z

00d3942 , After fixing all the tests, this one failed because the Java version supports volatile long but .NET doesn't. It is important to note that the returned long can be ignored. Also currently working on NRT feature 02ed5d3 kindly take a look.

I Rant LuceneNet's 4.8 is 10-year-old release. I am eagerly looking forward to seeing more improvements in the project. Recently vector search added for Azure Search but they never leveraged LuceneNet.

Jeevananthan-23 · 2023-11-22T05:35:44Z

963e10c is the hack that we put in place to make it run sequentially for the time being, but our intention is to fix the bug rather than change the API like this which would render it unfixable.

That being said, nobody is currently working on trying to get the concurrent document writing to function and it is considered low priority since it can most likely be addressed without any breaking API change after the release. However, you seem to have a knack for this, so you are welcome to attempt to roll back those changes and work on fixing the concurrency bug.

@NightOwl888 Can you please provide more details about the concurrency issue? This will help me understand the problem better and work on finding a solution.

NightOwl888 · 2023-11-22T15:14:54Z

@Jeevananthan-23 - 963e10c points to #325 where the original error report is. I followed up with another stack trace on the same failing test.

I have reverted the relevant changes from 963e10c in this branch: https://github.com/NightOwl888/lucenenet/tree/fix/documentswriter-concurrency. I ran the tests 30 times on Azure DevOps and ran the TestMultiThreadedSnapshotting test locally 30,000 times and couldn't get a failure. That is the good news. The bad news is that another test TestRollingUpdates.TestUpdateSameDoc fails, but very rarely. I got it to fail locally on both .NET 5.0 and .NET 6.0, but not on .NET 7.0.

So, we cannot merge the patch until we have a fix for the failing test. I am attaching the log from the test failure. I got it to fail on net5.0 on Windows (the original failure was on Linux). I used the [Repeat(1000)] attribute on the test, and it failed after about 3 runs.

I also used the assembly attributes as specified in the test failure. This ensures the same random components are plugged into the test during each run, which may help narrow down which component is faulty. On the other hand, these may have nothing to do with the exception at all - it is hard to determine this when the failure happens so rarely. Do note we have our own random class so these will work consistently across target frameworks and operating systems.

[assembly: Lucene.Net.Util.RandomSeed("0xe6dee1082501680d")]
[assembly: NUnit.Framework.SetCulture("sat-Olck")]

TestUpdateSameDoc-638362856625826917.zip

If you could pull down the branch to investigate why the test is failing, that would be great.

TestTargetFramework.props is where the target framework for the tests can be specified.

…ntLock.tryLock() method barges to the front of the queue instead of returning false like Monitor.TryEnter(). Use Monitor.Enter(object, ref bool) instead, which always returns true. We get locks in a different order, but I am not sure whether that matters. Fixes apache#935. Closes apache#886.

Jeevananthan-23 added the is:feature label Nov 4, 2023

Jeevananthan-23 mentioned this issue Nov 4, 2023

BREAKING: Indexing Sequence Number #887

Closed

NightOwl888 mentioned this issue Apr 25, 2024

Poor multi-threaded indexing performance #935

Open

1 task

NightOwl888 assigned NightOwl888 and unassigned NightOwl888 May 17, 2024

NightOwl888 linked a pull request May 23, 2024 that will close this issue

BUG: Fix for DocumentsWriter concurrency (fixes #935, closes #886) #940

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Ordering of Indexing with SeqNo #886

Support for Ordering of Indexing with SeqNo #886

Jeevananthan-23 commented Nov 4, 2023 •

edited

NightOwl888 commented Nov 15, 2023

Jeevananthan-23 commented Nov 18, 2023 •

edited

Jeevananthan-23 commented Nov 22, 2023

NightOwl888 commented Nov 22, 2023

Support for Ordering of Indexing with SeqNo #886

Support for Ordering of Indexing with SeqNo #886

Comments

Jeevananthan-23 commented Nov 4, 2023 • edited

Is there an existing issue for this?

Is your feature request related to a problem? Please describe the problem.

Describe the solution you'd like

Additional context

NightOwl888 commented Nov 15, 2023

Jeevananthan-23 commented Nov 18, 2023 • edited

Jeevananthan-23 commented Nov 22, 2023

NightOwl888 commented Nov 22, 2023

Jeevananthan-23 commented Nov 4, 2023 •

edited

Jeevananthan-23 commented Nov 18, 2023 •

edited