BlobInputStream improvements #3148

OrangeDog · 2024-02-29T15:04:41Z

fix: inconsistent behaviour of BlobInputStream.reset()

A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this.
Initialising current position correctly will fix this.

Closes BlobInputStream does not reset() to correct position if LargeObject was not at position 0. #3149
feat: implement efficient skipping in BlobInputStream

Using seek is more efficient than reading and discarding the bytes. The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything.

Closes BlobInputStream should implement skip #3144

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Does ./gradlew styleCheck pass ?
Have you added your new test classes to an existing test suite ~~in alphabetical order~~?

Changes to Existing Features:

Does this break existing behaviour? If so please explain.
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully run tests with your changes locally?

OrangeDog · 2024-02-29T15:36:41Z

PG 8.4: This failure is due to the lower max blob size, causing an exception when requesting to seek too far.

There doesn't seem to be any way to determine the DB version inside the BlobInputStream.
The only alternative would be to seek to the end of the blob and then back in the constructor, to get its full size.

That also reveals the bug (already extant in mark/reset) I've just noticed, where absolutePosition is assumed to be 0 on construction, even if the LargeObject is currently in a different place.

OrangeDog · 2024-02-29T16:05:18Z

Both can be solved in the same way!

In the constructor call tell64(), then if it throws call tell().
Then you know both the current position, and the max blob length.

davecramer · 2024-03-05T14:25:53Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+    long maxBlobSize;
+    try {
+      this.absolutePosition = lo.tell64();
+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;


Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good

The docs indeed say 4TB, but you cannot actually get that big (with PG14 anyway).
I don't know why the real limit is 2kB less than 4TB.

What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?

How do you suggest determining the correct value otherwise?

If the limit is raised to 4 exabytes, then the LargeObject API will also have to change to take a value larger than a long, so all this code would need changing to support it anyway.

Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes), and unsigned long is ~18 exabytes, so the current seek64(long pos is more-or-less enough for multi-exabyte long values. I'm not sure what is the point of artificially limiting the values at the driver level

Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes)

That's 8 EiB, but you're right, I miscounted the digits.

what is the point of artificially limiting the values at the driver level

If you don't then it throws. If you prefer that behaviour then I can remove maxBlobSize, but that would be a breaking change for skip().

Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good

Here is a link that you might be able to use:

https://www.enterprisedb.com/postgres-tutorials/postgresql-toast-and-working-blobsclobs-explained

It seems crazy, but I think the only mention of the 4TB upper limit on blobs on postgresql.org is in the release notes for 9.3 under "E.26.3.5. Data Types": https://www.postgresql.org/docs/9.3/release-9-3.html

@esabol: https://www.postgresql.org/docs/16/lo-intro.html

The problem is that it's not 4 TB (TiB). It's slightly less.

The problem is that it's not 4 TB (TiB). It's slightly less.

I'm aware of that. @davecramer asked for a link (presumably to be added as a comment in the code), so I thought I'd offer what I could find.

I found it: https://github.com/postgres/postgres/blob/REL_16_2/src/include/storage/large_object.h#L76
That's (2^31 - 1) * 2048 == 4 * 1024^4 - 2048
I'll change the code here to match, with a link.

Although, you could compile pg with a different BLCKSZ, which would change the resulting limit.
That brings us back to should we just ignore it entirely, or should we SHOW block_size; somewhere?

davecramer · 2024-03-05T14:26:27Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
+    } catch (SQLException e1) {
+      try {
+        this.absolutePosition = lo.tell();


Document what this is doing ?

This line specifically, or the whole added section?

seems to me that we are switching from LO to bytea here ?

The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases

It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?

seems to me that we are switching from LO to bytea here ?

No? On old versions of Postgres (which you still test against), large objects only use an int for their position, with a limit of 2GB: https://www.postgresql.org/docs/9.2/lo-intro.html

See also the existing implementation of reset(), which checks whether it can use seek instead of seek64.

Can we avoid adding extra db calls in the constructor?

Not really. Someone might call mark() before anything else, and the position needs to be correctly initialised before then.

I suppose you could add extra logic to defer the initialisation to the first call to either read(), mark(), reset(), or skip(). That didn't seem like a good idea to me though. Better to guarantee that it's been done before anything else can be called.

vlsi

Looks easier than I thought

vlsi · 2024-03-01T02:13:25Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+    long maxBlobSize;
+    try {
+      this.absolutePosition = lo.tell64();
+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;


What are these magic numbers? Could you add clarifications?

vlsi · 2024-03-05T15:14:02Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+    long maxBlobSize;
+    try {
+      this.absolutePosition = lo.tell64();
+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;


What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?

vlsi · 2024-03-05T15:16:08Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
+    } catch (SQLException e1) {
+      try {
+        this.absolutePosition = lo.tell();


The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases

vlsi · 2024-03-05T15:18:11Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+      maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
+    } catch (SQLException e1) {
+      try {
+        this.absolutePosition = lo.tell();


It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?

vlsi · 2024-03-05T15:20:08Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+          }
+        } catch (SQLException e) {
+          throw new IOException(
+              GT.tr("Can not skip stream for large object {0} by {1}",


Wdyt of adding the current position to the error message?

vlsi · 2024-03-05T15:21:42Z

pgjdbc/src/test/java/org/postgresql/test/jdbc2/BlobTest.java

+    try (LargeObject blob = lom.open(loid, LargeObjectManager.WRITE)) {
+      byte[] buf = new byte[1024];
+      for (byte i = 0; i < 96; i++) {
+        Arrays.fill(buf, i);


I would prefer random data since 11111 22222 333 might mask bugs in case read accesses a wrong buffer item

The test has to know which value is expected. Random data would be difficult to verify by hand.

As there are only 256 possible byte values, there will always be duplicates in the data, so you cannot prove it wasn't accidentally reading a different byte that happened to have the same value.

vlsi · 2024-03-05T15:23:31Z

pgjdbc/src/test/java/org/postgresql/test/jdbc2/BlobTest.java

+      assertThrows(IOException.class, () -> bis.skip(1));
+    }
+
+    try (LargeObject blob = lom.open(loid, LargeObjectManager.READ)) {


Can you please use parameterized tests or at least individual test methods with their own names? Currently, it is hard to understand the intention behind the added tests

OrangeDog · 2024-03-18T11:25:49Z

@davecramer @vlsi can I get your input to finish resolving all the above?

If necessary, I can split the bug fix commit to a separate PR.

OrangeDog · 2024-05-12T13:07:03Z

@davecramer @vlsi I've not had any responses for two months now

vlsi · 2024-05-13T07:30:48Z

pgjdbc/src/main/java/org/postgresql/largeobject/BlobInputStream.java

+      // initialise current position for mark/reset
+      this.absolutePosition = lo.tell64();
+      // https://github.com/postgres/postgres/blob/REL9_3_0/src/include/storage/large_object.h#L76
+      maxBlobSize = (long) Integer.MAX_VALUE * LO_BLOCK_SIZE;


I understand the backend might have its own limits. What is the purpose of adding extra validation at the driver level though? Why can't we assume the max limit is Long.MAX_VALUE, and expect an error from the DB if the specific DB can't handle the given size?

Because that is a breaking change to the behaviour of skip(long).
#3148 (comment)

What it currently does is skip to the end of the current input for all values larger than the remaining input. This includes returning 0 if it's already at the end.

I had another idea recently. We could provide a setter to allow users to change it if they have built postgres with a custom block size.

I suggest we defer the call to the database. I do not see value in artificially adding a limit on the driver's side.

Do I therefore need to split the PR so that the bug fix doesn't have to wait for v43?

Of course older postgres behaves differently, looking at the 8.4 test failure.
I was expecting it to throw due to seek64 not being available.

Edit: it does, but that exception does not fail the transaction. I guess I can just change the test to expect different behaviour for the older versions?

vlsi · 2024-05-13T07:35:23Z

pgjdbc/src/test/java/org/postgresql/test/jdbc2/BlobTest.java

+      assertEquals(0, bis.read());
+      assertEquals(1024L, bis.skip(1024));
+      assertEquals(1, bis.read());
+      assertEquals(0L, bis.skip(-1));
+      assertEquals(1, bis.read());
+      assertEquals(64 * 1024L, bis.skip(64 * 1024));
+      assertEquals(65, bis.read());
+      assertEquals(expectedBigSkip, bis.skip(Long.MAX_VALUE));
+      assertEquals(-1, bis.read());


Please add the corresponding messages describing the reason why expect certain values. If something goes wrong, it would be hard to tell why 65 was expected at line 278.

I'm not really sure how to phrase it. What exactly do you want?

The reason why it should be 65 etc. is obvious if you know the structure of the file. I can add a comment to each test describing it, if the code in createMediumLargeObject is insufficient?

Or do you want the aspect of the implementation that is being tested? Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?

No other assertions in this project use messages, so I have nothing to go on.

The reason why it should be 65 etc. is obvious if you know the structure of the

I am afraid it is not obvious.

No other assertions in this project use messages

I am afraid that is not quite right.
Please check here:

pgjdbc/pgjdbc/src/test/java/org/postgresql/util/LazyCleanerTest.java

Lines 63 to 103 in d12b625

assertEquals(

list.size(),

t.getWatchedCount(),

"All objects are strongly-reachable, so getWatchedCount should reflect it"

);

assertTrue(t.isThreadRunning(),

"cleanup thread should be running, and it should wait for the leaks");

cleaners.get(1).clean();

assertEquals(

list.size() - 1,

t.getWatchedCount(),

"One object has been released properly, so getWatchedCount should reflect it"

);

list.set(0, null);

System.gc();

System.gc();

Await.until(

"One object was released, and another one has leaked, so getWatchedCount should reflect it",

ofSeconds(5),

() -> t.getWatchedCount() == list.size() - 2

);

list.clear();

System.gc();

System.gc();

Await.until(

"The cleanup thread should detect leaks and terminate within 5-10 seconds after GC",

ofSeconds(10),

() -> !t.isThreadRunning()

);

assertEquals(

Arrays.asList("LEAK", "NO LEAK", "LEAK").toString(),

Arrays.asList(collected).toString(),

"Second object has been released properly, so it should be reported as NO LEAK"

pgjdbc/pgjdbc/src/test/java/org/postgresql/test/jdbc2/BatchedInsertReWriteEnabledTest.java

Lines 421 to 433 in d12b625

Assert.assertEquals(

"Insert with " + nBinds + " binds should be rewritten into multi-value insert"

+ ", so expecting Statement.SUCCESS_NO_INFO == -2",

Arrays.toString(new int[]{Statement.SUCCESS_NO_INFO, Statement.SUCCESS_NO_INFO}),

Arrays.toString(pstmt.executeBatch()));

} else {

Assert.assertEquals(

"Insert with " + nBinds + " binds can't be rewritten into multi-value insert"

+ " since write format allows 65535 binds maximum"

+ ", so expecting batch to be executed as individual statements",

Arrays.toString(new int[]{1, 1}),

Arrays.toString(pstmt.executeBatch()));

}

Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?

https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md#test

CONTRIBUTING.md#test: General rule: failing test should look like a good bug report

In other words, the message should describe the reason you expect something to equal to a something else.

Note: if the test class and test method gives enough information, it might be fine to omit the message.
However, in your case you put like ~10 assertions one after another, so it is important to make the failures distinct.

No other assertions in this project use messages, so I have nothing to go on

There's maintenance cost. If the test fails sometime in the future, the one who observes the failure would have to pay the cost of analyzing the failure and deciphering the intentions behind the test. Of course the current code is not all perfect, however, it is important to keep the added code maintainable.

It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.

I am afraid it is not obvious.

I don't know how to write a good explanation because it really is obvious to me.

If you know a stream goes [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], and you do read(); skip(5); read() then how it it not obvious that the results should be 1 and 3?

I cannot put the full structure of the file into every assert message. I can put things like "failed to skip beyond internal buffer". That's what would be wrong with it. The value returned is arbitrary, but as previously discussed I chose this structure of values so that it's easy to see what the expected value should be, and minimise the chance of a fluke pass.

It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.

I've done that a little, now that seeking too far throws and breaks the transaction. I was following the style of the existing tests here, which all have multiple assertions. Especially with streams you need multiple previous operations to set up the state you're testing. e.g. having had the buffer reallocate, being in a position where it might go backwards, etc.

I'll push the next updates without messages and more separate methods, and then see what you think.

A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this. Initialising current position correctly will fix this. Closes pgjdbc#3149

Using seek is more efficient than reading and discarding the bytes. The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything. Closes pgjdbc#3144

BlobInputStream.seek will now throw if attempting to skip beyond the max size for a large object. This is a breaking change. The previous implementation was imperfect as it did not account for the server being compiled with a different limit. The state of the stream after the exception is undefined, but hopefully sane. Further exceptions may be thrown on future operations.

OrangeDog force-pushed the patch-1 branch from baa8cb7 to c48d6dd Compare February 29, 2024 17:12

OrangeDog changed the title ~~feat: implement efficient skipping in BlobInputStream~~ BlobInputStream improvements Feb 29, 2024

OrangeDog force-pushed the patch-1 branch from c48d6dd to f122203 Compare February 29, 2024 17:47

davecramer reviewed Mar 5, 2024

View reviewed changes

vlsi requested changes Mar 5, 2024

View reviewed changes

OrangeDog force-pushed the patch-1 branch from f122203 to fee6b7e Compare March 5, 2024 18:10

OrangeDog requested a review from vlsi March 11, 2024 11:18

OrangeDog force-pushed the patch-1 branch from fee6b7e to 3efe164 Compare March 18, 2024 11:24

vlsi reviewed May 13, 2024

View reviewed changes

OrangeDog force-pushed the patch-1 branch from 3efe164 to 5616776 Compare May 13, 2024 13:59

OrangeDog added 2 commits May 13, 2024 15:23

OrangeDog force-pushed the patch-1 branch 3 times, most recently from 51ad2f7 to d4a8c7e Compare May 13, 2024 15:35

OrangeDog force-pushed the patch-1 branch from d4a8c7e to 046152a Compare May 13, 2024 17:34

	assertEquals(
	list.size(),
	t.getWatchedCount(),
	"All objects are strongly-reachable, so getWatchedCount should reflect it"
	);

	assertTrue(t.isThreadRunning(),
	"cleanup thread should be running, and it should wait for the leaks");

	cleaners.get(1).clean();

	assertEquals(
	list.size() - 1,
	t.getWatchedCount(),
	"One object has been released properly, so getWatchedCount should reflect it"
	);

	list.set(0, null);
	System.gc();
	System.gc();

	Await.until(
	"One object was released, and another one has leaked, so getWatchedCount should reflect it",
	ofSeconds(5),
	() -> t.getWatchedCount() == list.size() - 2
	);

	list.clear();
	System.gc();
	System.gc();

	Await.until(
	"The cleanup thread should detect leaks and terminate within 5-10 seconds after GC",
	ofSeconds(10),
	() -> !t.isThreadRunning()
	);

	assertEquals(
	Arrays.asList("LEAK", "NO LEAK", "LEAK").toString(),
	Arrays.asList(collected).toString(),
	"Second object has been released properly, so it should be reported as NO LEAK"

	Assert.assertEquals(
	"Insert with " + nBinds + " binds should be rewritten into multi-value insert"
	+ ", so expecting Statement.SUCCESS_NO_INFO == -2",
	Arrays.toString(new int[]{Statement.SUCCESS_NO_INFO, Statement.SUCCESS_NO_INFO}),
	Arrays.toString(pstmt.executeBatch()));
	} else {
	Assert.assertEquals(
	"Insert with " + nBinds + " binds can't be rewritten into multi-value insert"
	+ " since write format allows 65535 binds maximum"
	+ ", so expecting batch to be executed as individual statements",
	Arrays.toString(new int[]{1, 1}),
	Arrays.toString(pstmt.executeBatch()));
	}

BlobInputStream improvements #3148

Are you sure you want to change the base?

BlobInputStream improvements #3148

Conversation

OrangeDog commented Feb 29, 2024 • edited

All Submissions:

New Feature Submissions:

Changes to Existing Features:

OrangeDog commented Feb 29, 2024 • edited

OrangeDog commented Feb 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog Mar 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

esabol Mar 16, 2024 • edited

Choose a reason for hiding this comment

OrangeDog Mar 18, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog Mar 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog Mar 5, 2024 • edited

Choose a reason for hiding this comment

OrangeDog Mar 5, 2024 • edited

Choose a reason for hiding this comment

vlsi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog commented Mar 18, 2024

OrangeDog commented May 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog May 13, 2024 • edited

Choose a reason for hiding this comment

OrangeDog May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OrangeDog commented Feb 29, 2024 •

edited

OrangeDog commented Feb 29, 2024 •

edited

OrangeDog Mar 5, 2024 •

edited

esabol Mar 16, 2024 •

edited

OrangeDog Mar 18, 2024 •

edited

OrangeDog Mar 5, 2024 •

edited

OrangeDog Mar 5, 2024 •

edited

OrangeDog Mar 5, 2024 •

edited

OrangeDog May 13, 2024 •

edited

OrangeDog May 13, 2024 •

edited