Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java 21 causes TLC's BufferedRandomAccessFileTest and OffHeapDiskFPSetTest to fail #835

Open
lemmy opened this issue Oct 23, 2023 · 3 comments · May be fixed by #907
Open

Java 21 causes TLC's BufferedRandomAccessFileTest and OffHeapDiskFPSetTest to fail #835

lemmy opened this issue Oct 23, 2023 · 3 comments · May be fixed by #907
Labels
bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ...

Comments

@lemmy
Copy link
Member

lemmy commented Oct 23, 2023

Some methods such as tlc2.util.BufferedRandomAccessFile#readLong that used to throw IOException have started to throw ArrayIndexOutOfBoundsExceptions.

@lemmy lemmy added bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ... labels Oct 23, 2023
@Calvin-L
Copy link
Collaborator

Calvin-L commented Dec 7, 2023

I'm looking at putting together a fix for this.

I'm very impressed with the quality and clarity of BufferedRandomAccessFile, but it seems to contain a critical incorrect assumption:

"curr(f)" is an index in the closed interval [0, len(f)].

But, in reality, "The offset may be set beyond the end of the file".

In JDK<21, readLong() and friends were implemented using repeated calls to read(). In JDK>=21 they are implemented using a bulk array read. That change in behavior seems to have revealed the incorrect assumption above. When seeking past the end of the file, some of the checks and math in BufferedRandomAccessFile don't work as intended. It ends up breaking its own internal invariants (in particular, it enters a state where this.curr > this.hi).

I think fixing this issue will require relaxing the invariant that lo <= curr <= hi:

@Calvin-L
Copy link
Collaborator

Calvin-L commented Jan 2, 2024

@lemmy I discovered something else interesting while testing my changes. The tests include:

	@Test
	public void testReadSeek() throws IOException {
		final File tmpFile = File.createTempFile("BufferedRandomAccessFileTest_testReadSeek", ".bin");
		tmpFile.deleteOnExit();
		final java.io.RandomAccessFile raf = new BufferedRandomAccessFile(tmpFile, "rw");
		
		raf.setLength(BufferedRandomAccessFile.BuffSz + 1L);
		raf.seek(1);
		
		for (int i = 0; i < BufferedRandomAccessFile.BuffSz / 8; i++) {
			assertEquals(0L, raf.readLong());
		}
		raf.close();
	}

That test checks that setLength fills the new content with zeros when it extends the file. But that isn't promised by the underlying RandomAccessFile:

If the present length of the file as returned by the length method is smaller than the newLength argument then the file will be extended. In this case, the contents of the extended portion of the file are not defined.

Do you know if any clients of BufferedRandomAccessFile actually require the fill-with-zeros behavior? We could certainly add that as a guarantee. But, since it isn't promised by RandomAccessFile.setLength, we might incur some performance penalty doing it ourselves.

@lemmy
Copy link
Member Author

lemmy commented Jan 3, 2024

I've reviewed the different fingerprint sets to refresh my memory. I recall that the OffHeapDiskFPSet needs its in-memory representation to be zeroed. However, when using disk files, there's no need to zero the files. If I remember correctly, the FPSets always write data up to the full length of the file. I also don't recall any clients in the liveness package to require zeroed files.

Calvin-L added a commit that referenced this issue Apr 10, 2024
This commit makes major alterations to `BufferedRandomAccessFile` to
bring it in line with the general `RandomAccessFile` contract:

 - Invariant V1 has been strengthened to include a contract for the
   `length` field.

 - Invariants V2-V5 have been altered to account for cases where
   `curr > length`, which the `RandomAccessFile` contract allows.

 - Additionally, invariant V5 has been adjusted to clarify that
   `dirty=true` does not imply that there are unflushed bytes.  The
   implication is only one-way: if there are unflushed bytes, then
   `dirty=true`.

 - The `hi` and `maxHi` fields have been removed.  These largely
   duplicate information that can be quickly recomputed, meaning they
   add additional complexity to the class invariants.  Without those
   fields, invariant V6 can be removed.

 - The `close()` method now closes the underlying file descriptor even
   if `flush()` throws an exception.

Due to the complexity of this change, I have added a TLA+ specification
for `BufferedRandomAccessFile` to enable model checking of its design.

To ensure that the code is correct, this commit also adds a substantial
amount of test infrastructure:

 - Several new regression tests have been added.  These are a little
   cryptic because they were randomly generated, but they do reflect
   true behaviors of the `RandomAccessFile` contract.

 - A fuzz test has been added.  This test ensures that many sequences
   of random operations behave correctly.  It also minimizes failures
   so they can be easily converted to regression tests.

This commit also makes some whitespace and formatting adjustments.

Fixes #835.  I have verified that the tests now pass under Java 21.
@Calvin-L Calvin-L linked a pull request Apr 10, 2024 that will close this issue
Calvin-L added a commit that referenced this issue Apr 10, 2024
This commit makes major alterations to `BufferedRandomAccessFile` to
bring it in line with the general `RandomAccessFile` contract:

 - Invariant V1 has been strengthened to include a contract for the
   `length` field.

 - Invariants V2-V5 have been altered to account for cases where
   `curr > length`, which the `RandomAccessFile` contract allows.

 - Additionally, invariant V5 has been adjusted to clarify that
   `dirty=true` does not imply that there are unflushed bytes.  The
   implication is only one-way: if there are unflushed bytes, then
   `dirty=true`.

 - The `hi` and `maxHi` fields have been removed.  These largely
   duplicate information that can be quickly recomputed, meaning they
   add additional complexity to the class invariants.  Without those
   fields, invariant V6 can be removed.

 - The `close()` method now closes the underlying file descriptor even
   if `flush()` throws an exception.

Due to the complexity of this change, I have added a TLA+ specification
for `BufferedRandomAccessFile` to enable model checking of its design.

To ensure that the code is correct, this commit also adds a substantial
amount of test infrastructure:

 - Several new regression tests have been added.  These are a little
   cryptic because they were randomly generated, but they do reflect
   true behaviors of the `RandomAccessFile` contract.

 - A fuzz test has been added.  This test ensures that many sequences
   of random operations behave correctly.  It also minimizes failures
   so they can be easily converted to regression tests.

This commit also makes some whitespace and formatting adjustments.

Fixes #835.  I have verified that the tests now pass under Java 21.

Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ...
Development

Successfully merging a pull request may close this issue.

2 participants