-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework BufferedRandomAccessFile
#907
base: master
Are you sure you want to change the base?
Conversation
This commit makes major alterations to `BufferedRandomAccessFile` to bring it in line with the general `RandomAccessFile` contract: - Invariant V1 has been strengthened to include a contract for the `length` field. - Invariants V2-V5 have been altered to account for cases where `curr > length`, which the `RandomAccessFile` contract allows. - Additionally, invariant V5 has been adjusted to clarify that `dirty=true` does not imply that there are unflushed bytes. The implication is only one-way: if there are unflushed bytes, then `dirty=true`. - The `hi` and `maxHi` fields have been removed. These largely duplicate information that can be quickly recomputed, meaning they add additional complexity to the class invariants. Without those fields, invariant V6 can be removed. - The `close()` method now closes the underlying file descriptor even if `flush()` throws an exception. Due to the complexity of this change, I have added a TLA+ specification for `BufferedRandomAccessFile` to enable model checking of its design. To ensure that the code is correct, this commit also adds a substantial amount of test infrastructure: - Several new regression tests have been added. These are a little cryptic because they were randomly generated, but they do reflect true behaviors of the `RandomAccessFile` contract. - A fuzz test has been added. This test ensures that many sequences of random operations behave correctly. It also minimizes failures so they can be easily converted to regression tests. This commit also makes some whitespace and formatting adjustments. Fixes #835. I have verified that the tests now pass under Java 21. Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
cd7cf92
to
297c4c8
Compare
Impressive! |
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Outdated
Show resolved
Hide resolved
ArraySlice(a, startInclusive, endExclusive) == [elems |-> SubSeq(a.elems, startInclusive + 1, endExclusive)] | ||
ArrayConcat(a1, a2) == [elems |-> a1.elems \o a2.elems] | ||
|
||
WriteToFile(file, offset, data_to_write) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting to me that most of these functions are about Arrays. Even WriteToFile
and ExtendFile
are relatively easy make generic array operations, maybe OverlapAt
and TruncateOrExtendWith
, where both take a placeholder as an argument. It may simply be worth making this an Array
module, and then Common
stays more focused on the spec itself WriteToFile(file, offset, data_to_write) == Array.Overlap(file, data_to_write, offset, ArbitarySymbol)
(I have no idea why I felt compelled to reverse arguments).
I suppose, otherwise I was going to suggest that we add brief documentation on what the semantics of WriteToFile
and ExtendFile
were, but in many ways something like ExtendFile == Array.TurncateOrExtendWith(file, new_length, ArbitrarySymbol)
does this for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably overthinking these utility functions, but it struck me that maybe the most intuitive implementation involves slice & concat.
TruncateOrExtend(a, T, len) == IF len > ArrayLen(a) THEN ArrayConcat(a, Array(T, len - ArrayLen(a))) ELSE ArraySlice(a, 0, len)
OverlayAt(a1, a2, T, i) == ArrayConcat(TruncateOrExtend(a2, T, i), a2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of those utility operators be contributed as an Array module to our CommunityModules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some simple documentation to these operators.
but it struck me that maybe the most intuitive implementation involves slice & concat
I agree that the current definitions are a little ugly. I wrote them after one or two failed attempts to come up with simpler definitions.
The ones you've given are certainly cleaner than my early attempts, and I will adopt your definition for TruncateOrExtend
. But I might keep the definition I wrote for WriteToFile
. The equivalent definition in terms of other operators is actually fairly cryptic (your definition for OverlayAt
needs to include the portion of the contents after the written data if the contents were not written at the end of the array):
WriteToFile(file, offset, data_to_write) ==
ArrayConcat(ArrayConcat(
TruncateOrExtendFile(file, offset),
data_to_write),
ArraySlice(file, offset + ArrayLen(data_to_write), ArrayLen(file)))
Should any of those utility operators be contributed as an Array module to our CommunityModules?
Yes! I'll look into doing that soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your definition for OverlayAt needs to include the portion of the contents after the written data if the contents were not written at the end of the array
Whoops! Easy to make things simpler if you forget the critical parts 😆
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
@@ -0,0 +1,384 @@ | |||
\* Copyright (c) 2024, Oracle and/or its affiliates. | |||
|
|||
----------------------- MODULE BufferedRandomAccessFile ----------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to say that I have absolutely found it useful to have a spec of the class for review, and especially as a refinement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec could also be referenced at tlaplus/examples
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.tla
Show resolved
Hide resolved
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
This commit also updates the specification to model EXACTLY what the `RandomAccessFile` docs say about the interaction between `setLength` and the file pointer. Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Thanks a ton @IamfromSpace for the feedback! I think I've addressed all the major comments so far. |
These updates look great! For the sake of thoroughness I still have a few comments coming through on tests, but nothing of real concern. Just wrapped up the unit tests, but it's taking me a bit longer to digest the fuzzing file :) |
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileTest.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileTest.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileTest.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileTest.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileTest.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileFuzzTest.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileFuzzTest.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileFuzzTest.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileFuzzTest.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/test/tlc2/util/BufferedRandomAccessFileFuzzTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty, that's 100% everything from me :) I think there's some test comments that are worth a look over, but there's certainly no deal-breakers in there, so I've got the approval on there now. Not sure if the repo is configured to reset on changes or not, but if so, I'll keep it active for any commits that come in.
Nice work on this!
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
In particular, "legal" is now "well-defined". Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
I won't have time to review this PR anytime soon and I don't want to be a bottleneck. Given @IamfromSpace's review, I feel confident about this PR. @Calvin-L, feel free to merge or consider inviting additional reviewers at the TLA+ Community Meeting in two weeks. Reviewing the spec offers a great opportunity for those unfamiliar with or not fond of Java to contribute to the TLA+ tools. |
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
// But do not be tempted! That call incurs a performance penalty WITHOUT OFFERING A STRONGER CONTRACT! There | ||
// are still cases where we rely on "super" to extend the file---meaning we inherit the weakest possible | ||
// contract from RandomAccessFile. In particular, this happens when extending the file using setLength() or | ||
// when the client makes a giant forward seek followed by a write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment could imply a possible completeness bug in DiskFPSet
s.t. the arbitrary data matches a fingerprint that is probed. This particular bug would vanish if the content of the buffer was zeroed.
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
\* is generally restored by calling `seeek(curr)`. But, that behavior is | ||
\* difficult to model in straight TLA+ because each method may modify variables | ||
\* multiple times. So instead, this spec treats Inv2 as a precondition for the | ||
\* methods and verifies that it is indeed restored by `Seek(pos)`, which should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inv2CanAlwaysBeRestored
suggests that this should mention FlushBuffer
in addition to Seek
.
|
||
\* Inv2 is a precondition for many actions; it should always be possible to | ||
\* restore Inv2. | ||
Inv2CanAlwaysBeRestored == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I translated the prose comment into the following property:
Inv2CanAlwaysBeRestoredPart2 ==
[][IF dirty THEN FlushBuffer => Inv2' ELSE Seek(curr) => Inv2']_vars
, which TLC says is violated by the following trace:
Finished computing initial states: 1 distinct state generated at 2024-06-03 15:53:42.
Action property Inv2CanAlwaysBeRestored is violated.
The behavior up to this point is:
1: <Initial predicate>
/\ BuffSz = 2
/\ MaxOffset = 4
/\ dirty = FALSE
/\ length = 0
/\ curr = 0
/\ lo = 0
/\ buff = [elems |-> <<ArbitrarySymbol, ArbitrarySymbol>>]
/\ diskPos = 0
/\ file_content = [elems |-> <<>>]
/\ file_pointer = 0
/\ abstract_contents = [elems |-> <<>>]
2: <WriteAtMost([elems |-> <<A, A>>]) line 193, col 5 to line 202, col 60 of module BufferedRandomAccessFile>
/\ BuffSz = 2
/\ MaxOffset = 4
/\ dirty = TRUE
/\ length = 2
/\ curr = 2
/\ lo = 0
/\ buff = [elems |-> <<A, A>>]
/\ diskPos = 0
/\ file_content = [elems |-> <<>>]
/\ file_pointer = 0
/\ abstract_contents = [elems |-> <<A, A>>]
3: <FlushBuffer line 109, col 5 to line 119, col 43 of module BufferedRandomAccessFile>
/\ BuffSz = 2
/\ MaxOffset = 4
/\ dirty = FALSE
/\ length = 2
/\ curr = 2
/\ lo = 0
/\ buff = [elems |-> <<A, A>>]
/\ diskPos = 2
/\ file_content = [elems |-> <<A, A>>]
/\ file_pointer = 2
/\ abstract_contents = [elems |-> <<A, A>>]
Revisiting https://github.com/tlaplus/tlaplus/pull/907/files#r1625145343, this property, perhaps, too strong. This hypothesis is supported because the following holds:
Inv2CanAlwaysBeRestoredPart2 ==
[][Seek(curr) => Inv2']_vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic point. I will look into strengthening this property on Monday. Perhaps it can be fully specified using \cdot
to compose FillBuffer
and Seek(curr)
.
WriteToFile(file, offset, data_to_write) == | ||
LET file_len == ArrayLen(file) IN | ||
LET data_len == ArrayLen(data_to_write) IN | ||
LET length == Max(file_len, offset + data_len) IN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick and you probably know this, but I prefer a single LET/IN:
LET file_len == ArrayLen(file)
data_len == ArrayLen(data_to_write)
length == Max(file_len, offset + data_len) IN
/\ numReadableWithoutSeeking >= 0 | ||
/\ LET numToRead == Min(ArrayLen(data), numReadableWithoutSeeking) IN | ||
LET buffOff == curr - lo IN | ||
/\ data = ArraySlice(buff, buffOff, buffOff + numToRead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither Read
nor Read1
are enabled if the buffer contains ArbitrarySymbol
, i.e., the following invariant holds:
DebugReadsWithArbitrarySymbol ==
\E i \in 1..Len(buff.elems): buff.elems[i] = ArbitrarySymbol =>
/\ \A data \in UNION{ Array(Symbols, len): len \in 1..MaxOffset }:
~ENABLED Read(data)
/\ \A symbol \in Symbols:
~ENABLED Read1(symbol)
Why this restriction, given that BufferedRandomAccessFile.java
does not impose such a restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right; I believe that restriction can and should be relaxed. It was not intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemmy There is an associativity problem in your definition; I think it should be
DebugReadsWithArbitrarySymbol ==
(\E i \in 1..Len(buff.elems): buff.elems[i] = ArbitrarySymbol) => \* NOTE parens around \E expression
/\ ...
Using that modified definition shows that Read
actually is enabled when the read does not cover ArbitrarySymbol
, even if there are ArbitrarySymbol
s in the buffer.
Even so, I do think it makes sense to allow reads of ArbitrarySymbol
in the next-relation.
BufferedIndexes == lo .. (Min(lo + BuffSz, length) - 1) | ||
|
||
Inv1 == | ||
/\ diskPos = file_pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs from V1
in BufferedRandomAccessFile.java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good job! I haven't got to the end yet, I've found a read after close issue and add some nits.
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Outdated
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
tlatools/org.lamport.tlatools/src/tlc2/util/BufferedRandomAccessFile.java
Show resolved
Hide resolved
return new BigInteger(b); | ||
// If we read past the buffer, restore our internal invariants | ||
if (this.curr >= this.lo + BuffSz) { | ||
seeek(this.curr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seek in this case would be better as the return value is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to factor this out into a helper method with a name like restoreInv2
, since an identical check and seeek
call appears other places as well. I will make that change soon. (When I do, I will probably keep the call to seeek
and not seek
, since it avoids an extra call stack entry.)
System.arraycopy(availBuffs, 0, newBuffs, 0, numAvailBuffs); | ||
availBuffs = newBuffs; | ||
try { | ||
// Assert.check(!this.closed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, the implementation allows read after close. We close the underlying file, give back the buffer to the pool but the buffer is not nullified, and subsequent reads that don't access the file will succeed.
Sample test adopted from the testRead (see last three rows)
@Test
public void testRead() throws IOException {
final File tmpFile = File.createTempFile("BufferedRandomAccessFileTest_testRead", ".bin");
tmpFile.deleteOnExit();
final java.io.RandomAccessFile raf = new BufferedRandomAccessFile(tmpFile, "rw");
for (long i = 0L; i < BufferedRandomAccessFile.BuffSz / 8; i++) {
raf.writeLong(i);
}
raf.seek(0);
for (long i = 0l; i < BufferedRandomAccessFile.BuffSz / 8; i++) {
assertEquals(i, raf.readLong());
}
raf.seek(0); //seek back to zero
raf.close(); // closed
assertEquals(0, raf.readLong()); // still works
}
This could be a problem in case the buffer is reassigned to a different BRAF. To reduce the state, it would be sufficient to call "isClosed" on the underlinying RAF. Because there is no isClosed method, getFilePointer could be used for its side effect of failing if the underlying object is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this.
the implementation allows read after close
Yes and no. The implementation does not throw an exception if you do a read after close, so in that sense it "allows" read after close. But, the implementation also doesn't specify behavior if you read after close, so in that sense it "disallows" read after close. In other words, clients are responsible for ensuring that they do not interact with closed files.
In this case, I would peronsally rather push responsibility to the clients---but it's true that the official RandomAccessFile contract does require IOException
to be thrown for some read and write methods if the file is closed (but weirdly, not all of them). In your specific example there is no specified behavior, and therefore the current behavior ("return an arbitrary long without throwing an exception") is legal. But if you replace the final readLong()
with read(byte[])
, it is indeed a demonstration of a bug, since that method requires an exception if the file is closed.
Also, because you called this out I noticed that this close()
method is not idempotent, and it must be since this class implements Closeable
. A duplicated close()
call will put buff
into the availBuffs
pool a second time, which could be catastrophic.
I think all this means that there is a strong case for reinstating the commented-out closed
field, which would make it very easy to fix these problems. I will look into doing this on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: it is also irksome to me that the official contract for RandomAccessFile
is to throw IOException
if the file is closed, when IllegalStateException
exists and was designed for exactly this purpose:
Signals that a method has been invoked at an illegal or inappropriate time.
But, to my surprise, RandomAccessFile
is older than IllegalStateException
, so I suppose it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FedericoPonzi @lemmy On second thought, I do not plan to address this comment in this PR. There is no new bug here, so I think it makes sense to address this in a separate PR.
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
Signed-off-by: Calvin Loncaric <calvin.loncaric@oracle.com>
This commit makes major alterations to
BufferedRandomAccessFile
to bring it in line with the generalRandomAccessFile
contract:length
field.curr > length
, which theRandomAccessFile
contract allows.dirty=true
does not imply that there are unflushed bytes. The implication is only one-way: if there are unflushed bytes, thendirty=true
.hi
andmaxHi
fields have been removed. These largely duplicate information that can be quickly recomputed, meaning they add additional complexity to the class invariants. Without those fields, invariant V6 can be removed.close()
method now closes the underlying file descriptor even ifflush()
throws an exception.Due to the complexity of this change, I have added a TLA+ specification for
BufferedRandomAccessFile
to enable model checking of its design.To ensure that the code is correct, this commit also adds a substantial amount of test infrastructure:
RandomAccessFile
contract.This commit also makes some whitespace and formatting adjustments.
Fixes #835. I have verified that the tests now pass under Java 21.