New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlobInputStream improvements #3148
base: master
Are you sure you want to change the base?
Conversation
PG 8.4: This failure is due to the lower max blob size, causing an exception when requesting to seek too far. There doesn't seem to be any way to determine the DB version inside the That also reveals the bug (already extant in mark/reset) I've just noticed, where |
Both can be solved in the same way! In the constructor call |
long maxBlobSize; | ||
try { | ||
this.absolutePosition = lo.tell64(); | ||
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs indeed say 4TB, but you cannot actually get that big (with PG14 anyway).
I don't know why the real limit is 2kB less than 4TB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest determining the correct value otherwise?
If the limit is raised to 4 exabytes, then the LargeObject
API will also have to change to take a value larger than a long
, so all this code would need changing to support it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes), and unsigned long is ~18 exabytes, so the current seek64(long pos
is more-or-less enough for multi-exabyte long values. I'm not sure what is the point of artificially limiting the values at the driver level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes)
That's 8 EiB, but you're right, I miscounted the digits.
what is the point of artificially limiting the values at the driver level
If you don't then it throws. If you prefer that behaviour then I can remove maxBlobSize
, but that would be a breaking change for skip()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good
Here is a link that you might be able to use:
https://www.enterprisedb.com/postgres-tutorials/postgresql-toast-and-working-blobsclobs-explained
It seems crazy, but I think the only mention of the 4TB upper limit on blobs on postgresql.org is in the release notes for 9.3 under "E.26.3.5. Data Types": https://www.postgresql.org/docs/9.3/release-9-3.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esabol: https://www.postgresql.org/docs/16/lo-intro.html
The problem is that it's not 4 TB (TiB). It's slightly less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it's not 4 TB (TiB). It's slightly less.
I'm aware of that. @davecramer asked for a link (presumably to be added as a comment in the code), so I thought I'd offer what I could find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it: https://github.com/postgres/postgres/blob/REL_16_2/src/include/storage/large_object.h#L76
That's (2^31 - 1) * 2048 == 4 * 1024^4 - 2048
I'll change the code here to match, with a link.
Although, you could compile pg with a different BLCKSZ
, which would change the resulting limit.
That brings us back to should we just ignore it entirely, or should we SHOW block_size;
somewhere?
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; | ||
} catch (SQLException e1) { | ||
try { | ||
this.absolutePosition = lo.tell(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document what this is doing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line specifically, or the whole added section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me that we are switching from LO to bytea here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me that we are switching from LO to bytea here ?
No? On old versions of Postgres (which you still test against), large objects only use an int for their position, with a limit of 2GB: https://www.postgresql.org/docs/9.2/lo-intro.html
See also the existing implementation of reset()
, which checks whether it can use seek
instead of seek64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding extra db calls in the constructor?
Not really. Someone might call mark()
before anything else, and the position needs to be correctly initialised before then.
I suppose you could add extra logic to defer the initialisation to the first call to either read()
, mark()
, reset()
, or skip()
. That didn't seem like a good idea to me though. Better to guarantee that it's been done before anything else can be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks easier than I thought
long maxBlobSize; | ||
try { | ||
this.absolutePosition = lo.tell64(); | ||
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these magic numbers? Could you add clarifications?
long maxBlobSize; | ||
try { | ||
this.absolutePosition = lo.tell64(); | ||
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; | ||
} catch (SQLException e1) { | ||
try { | ||
this.absolutePosition = lo.tell(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024; | ||
} catch (SQLException e1) { | ||
try { | ||
this.absolutePosition = lo.tell(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?
} | ||
} catch (SQLException e) { | ||
throw new IOException( | ||
GT.tr("Can not skip stream for large object {0} by {1}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdyt of adding the current position to the error message?
try (LargeObject blob = lom.open(loid, LargeObjectManager.WRITE)) { | ||
byte[] buf = new byte[1024]; | ||
for (byte i = 0; i < 96; i++) { | ||
Arrays.fill(buf, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer random data since 11111 22222 333 might mask bugs in case read accesses a wrong buffer item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test has to know which value is expected. Random data would be difficult to verify by hand.
As there are only 256 possible byte values, there will always be duplicates in the data, so you cannot prove it wasn't accidentally reading a different byte that happened to have the same value.
assertThrows(IOException.class, () -> bis.skip(1)); | ||
} | ||
|
||
try (LargeObject blob = lom.open(loid, LargeObjectManager.READ)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use parameterized tests or at least individual test methods with their own names? Currently, it is hard to understand the intention behind the added tests
@davecramer @vlsi can I get your input to finish resolving all the above? If necessary, I can split the bug fix commit to a separate PR. |
@davecramer @vlsi I've not had any responses for two months now |
// initialise current position for mark/reset | ||
this.absolutePosition = lo.tell64(); | ||
// https://github.com/postgres/postgres/blob/REL9_3_0/src/include/storage/large_object.h#L76 | ||
maxBlobSize = (long) Integer.MAX_VALUE * LO_BLOCK_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the backend might have its own limits. What is the purpose of adding extra validation at the driver level though? Why can't we assume the max limit is Long.MAX_VALUE, and expect an error from the DB if the specific DB can't handle the given size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that is a breaking change to the behaviour of skip(long)
.
#3148 (comment)
What it currently does is skip to the end of the current input for all values larger than the remaining input. This includes returning 0 if it's already at the end.
I had another idea recently. We could provide a setter to allow users to change it if they have built postgres with a custom block size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we defer the call to the database. I do not see value in artificially adding a limit on the driver's side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I therefore need to split the PR so that the bug fix doesn't have to wait for v43?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course older postgres behaves differently, looking at the 8.4 test failure.
I was expecting it to throw due to seek64
not being available.
Edit: it does, but that exception does not fail the transaction. I guess I can just change the test to expect different behaviour for the older versions?
assertEquals(0, bis.read()); | ||
assertEquals(1024L, bis.skip(1024)); | ||
assertEquals(1, bis.read()); | ||
assertEquals(0L, bis.skip(-1)); | ||
assertEquals(1, bis.read()); | ||
assertEquals(64 * 1024L, bis.skip(64 * 1024)); | ||
assertEquals(65, bis.read()); | ||
assertEquals(expectedBigSkip, bis.skip(Long.MAX_VALUE)); | ||
assertEquals(-1, bis.read()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the corresponding messages describing the reason why expect certain values. If something goes wrong, it would be hard to tell why 65 was expected at line 278.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how to phrase it. What exactly do you want?
The reason why it should be 65 etc. is obvious if you know the structure of the file. I can add a comment to each test describing it, if the code in createMediumLargeObject
is insufficient?
Or do you want the aspect of the implementation that is being tested? Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?
No other assertions in this project use messages, so I have nothing to go on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why it should be 65 etc. is obvious if you know the structure of the
I am afraid it is not obvious.
No other assertions in this project use messages
I am afraid that is not quite right.
Please check here:
pgjdbc/pgjdbc/src/test/java/org/postgresql/util/LazyCleanerTest.java
Lines 63 to 103 in d12b625
assertEquals( | |
list.size(), | |
t.getWatchedCount(), | |
"All objects are strongly-reachable, so getWatchedCount should reflect it" | |
); | |
assertTrue(t.isThreadRunning(), | |
"cleanup thread should be running, and it should wait for the leaks"); | |
cleaners.get(1).clean(); | |
assertEquals( | |
list.size() - 1, | |
t.getWatchedCount(), | |
"One object has been released properly, so getWatchedCount should reflect it" | |
); | |
list.set(0, null); | |
System.gc(); | |
System.gc(); | |
Await.until( | |
"One object was released, and another one has leaked, so getWatchedCount should reflect it", | |
ofSeconds(5), | |
() -> t.getWatchedCount() == list.size() - 2 | |
); | |
list.clear(); | |
System.gc(); | |
System.gc(); | |
Await.until( | |
"The cleanup thread should detect leaks and terminate within 5-10 seconds after GC", | |
ofSeconds(10), | |
() -> !t.isThreadRunning() | |
); | |
assertEquals( | |
Arrays.asList("LEAK", "NO LEAK", "LEAK").toString(), | |
Arrays.asList(collected).toString(), | |
"Second object has been released properly, so it should be reported as NO LEAK" |
pgjdbc/pgjdbc/src/test/java/org/postgresql/test/jdbc2/BatchedInsertReWriteEnabledTest.java
Lines 421 to 433 in d12b625
Assert.assertEquals( | |
"Insert with " + nBinds + " binds should be rewritten into multi-value insert" | |
+ ", so expecting Statement.SUCCESS_NO_INFO == -2", | |
Arrays.toString(new int[]{Statement.SUCCESS_NO_INFO, Statement.SUCCESS_NO_INFO}), | |
Arrays.toString(pstmt.executeBatch())); | |
} else { | |
Assert.assertEquals( | |
"Insert with " + nBinds + " binds can't be rewritten into multi-value insert" | |
+ " since write format allows 65535 binds maximum" | |
+ ", so expecting batch to be executed as individual statements", | |
Arrays.toString(new int[]{1, 1}), | |
Arrays.toString(pstmt.executeBatch())); | |
} |
Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?
https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md#test
CONTRIBUTING.md#test: General rule: failing test should look like a good bug report
In other words, the message should describe the reason you expect something to equal to a something else.
Note: if the test class and test method gives enough information, it might be fine to omit the message.
However, in your case you put like ~10 assertions one after another, so it is important to make the failures distinct.
No other assertions in this project use messages, so I have nothing to go on
There's maintenance cost. If the test fails sometime in the future, the one who observes the failure would have to pay the cost of analyzing the failure and deciphering the intentions behind the test. Of course the current code is not all perfect, however, it is important to keep the added code maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid it is not obvious.
I don't know how to write a good explanation because it really is obvious to me.
If you know a stream goes [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
, and you do read(); skip(5); read()
then how it it not obvious that the results should be 1
and 3
?
I cannot put the full structure of the file into every assert message. I can put things like "failed to skip beyond internal buffer". That's what would be wrong with it. The value returned is arbitrary, but as previously discussed I chose this structure of values so that it's easy to see what the expected value should be, and minimise the chance of a fluke pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.
I've done that a little, now that seeking too far throws and breaks the transaction. I was following the style of the existing tests here, which all have multiple assertions. Especially with streams you need multiple previous operations to set up the state you're testing. e.g. having had the buffer reallocate, being in a position where it might go backwards, etc.
I'll push the next updates without messages and more separate methods, and then see what you think.
A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this. Initialising current position correctly will fix this. Closes pgjdbc#3149
Using seek is more efficient than reading and discarding the bytes. The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything. Closes pgjdbc#3144
51ad2f7
to
d4a8c7e
Compare
BlobInputStream.seek will now throw if attempting to skip beyond the max size for a large object. This is a breaking change. The previous implementation was imperfect as it did not account for the server being compiled with a different limit. The state of the stream after the exception is undefined, but hopefully sane. Further exceptions may be thrown on future operations.
fix: inconsistent behaviour of BlobInputStream.reset()
A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this.
Initialising current position correctly will fix this.
Closes BlobInputStream does not reset() to correct position if LargeObject was not at position 0. #3149
feat: implement efficient skipping in BlobInputStream
Using seek is more efficient than reading and discarding the bytes. The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything.
Closes BlobInputStream should implement skip #3144
All Submissions:
New Feature Submissions:
./gradlew styleCheck
pass ?in alphabetical order?Changes to Existing Features: