Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlobInputStream improvements #3148

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

OrangeDog
Copy link
Contributor

@OrangeDog OrangeDog commented Feb 29, 2024

  • fix: inconsistent behaviour of BlobInputStream.reset()

    A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this.
    Initialising current position correctly will fix this.

    Closes BlobInputStream does not reset() to correct position if LargeObject was not at position 0. #3149

  • feat: implement efficient skipping in BlobInputStream

    Using seek is more efficient than reading and discarding the bytes. The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything.

    Closes BlobInputStream should implement skip #3144

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does ./gradlew styleCheck pass ?
  3. Have you added your new test classes to an existing test suite in alphabetical order?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?

@OrangeDog
Copy link
Contributor Author

OrangeDog commented Feb 29, 2024

PG 8.4: This failure is due to the lower max blob size, causing an exception when requesting to seek too far.

There doesn't seem to be any way to determine the DB version inside the BlobInputStream.
The only alternative would be to seek to the end of the blob and then back in the constructor, to get its full size.

That also reveals the bug (already extant in mark/reset) I've just noticed, where absolutePosition is assumed to be 0 on construction, even if the LargeObject is currently in a different place.

@OrangeDog
Copy link
Contributor Author

Both can be solved in the same way!

In the constructor call tell64(), then if it throws call tell().
Then you know both the current position, and the max blob length.

@OrangeDog OrangeDog changed the title feat: implement efficient skipping in BlobInputStream BlobInputStream improvements Feb 29, 2024
long maxBlobSize;
try {
this.absolutePosition = lo.tell64();
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs indeed say 4TB, but you cannot actually get that big (with PG14 anyway).
I don't know why the real limit is 2kB less than 4TB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest determining the correct value otherwise?

If the limit is raised to 4 exabytes, then the LargeObject API will also have to change to take a value larger than a long, so all this code would need changing to support it anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes), and unsigned long is ~18 exabytes, so the current seek64(long pos is more-or-less enough for multi-exabyte long values. I'm not sure what is the point of artificially limiting the values at the driver level

Copy link
Contributor Author

@OrangeDog OrangeDog Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed Long.MAX_VALUE is 9223372036854775807L (9 exabytes)

That's 8 EiB, but you're right, I miscounted the digits.

what is the point of artificially limiting the values at the driver level

If you don't then it throws. If you prefer that behaviour then I can remove maxBlobSize, but that would be a breaking change for skip().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a link or explanation for this math ? I realize the docs say 4TB. Link to that would be good

Here is a link that you might be able to use:

https://www.enterprisedb.com/postgres-tutorials/postgresql-toast-and-working-blobsclobs-explained

It seems crazy, but I think the only mention of the 4TB upper limit on blobs on postgresql.org is in the release notes for 9.3 under "E.26.3.5. Data Types": https://www.postgresql.org/docs/9.3/release-9-3.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esabol: https://www.postgresql.org/docs/16/lo-intro.html

The problem is that it's not 4 TB (TiB). It's slightly less.

Copy link

@esabol esabol Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's not 4 TB (TiB). It's slightly less.

I'm aware of that. @davecramer asked for a link (presumably to be added as a comment in the code), so I thought I'd offer what I could find.

Copy link
Contributor Author

@OrangeDog OrangeDog Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it: https://github.com/postgres/postgres/blob/REL_16_2/src/include/storage/large_object.h#L76
That's (2^31 - 1) * 2048 == 4 * 1024^4 - 2048
I'll change the code here to match, with a link.

Although, you could compile pg with a different BLCKSZ, which would change the resulting limit.
That brings us back to should we just ignore it entirely, or should we SHOW block_size; somewhere?

maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
} catch (SQLException e1) {
try {
this.absolutePosition = lo.tell();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document what this is doing ?

Copy link
Contributor Author

@OrangeDog OrangeDog Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line specifically, or the whole added section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me that we are switching from LO to bytea here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?

Copy link
Contributor Author

@OrangeDog OrangeDog Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me that we are switching from LO to bytea here ?

No? On old versions of Postgres (which you still test against), large objects only use an int for their position, with a limit of 2GB: https://www.postgresql.org/docs/9.2/lo-intro.html

See also the existing implementation of reset(), which checks whether it can use seek instead of seek64.

Copy link
Contributor Author

@OrangeDog OrangeDog Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding extra db calls in the constructor?

Not really. Someone might call mark() before anything else, and the position needs to be correctly initialised before then.

I suppose you could add extra logic to defer the initialisation to the first call to either read(), mark(), reset(), or skip(). That didn't seem like a good idea to me though. Better to guarantee that it's been done before anything else can be called.

Copy link
Member

@vlsi vlsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks easier than I thought

long maxBlobSize;
try {
this.absolutePosition = lo.tell64();
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these magic numbers? Could you add clarifications?

long maxBlobSize;
try {
this.absolutePosition = lo.tell64();
maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of having the math in the driver? What if PG 19 lifts the limit to 4 exabytes?

maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
} catch (SQLException e1) {
try {
this.absolutePosition = lo.tell();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole section. Frankly, the code is non obvious, so it makes sense to document the considered edge cases

maxBlobSize = 4L * 1024 * 1024 * 1024 * 1024 - 2 * 1024;
} catch (SQLException e1) {
try {
this.absolutePosition = lo.tell();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is sad we increase the number of roundtrips for a common case of "blob is used only once". Can we avoid adding extra db calls in the constructor?

}
} catch (SQLException e) {
throw new IOException(
GT.tr("Can not skip stream for large object {0} by {1}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt of adding the current position to the error message?

try (LargeObject blob = lom.open(loid, LargeObjectManager.WRITE)) {
byte[] buf = new byte[1024];
for (byte i = 0; i < 96; i++) {
Arrays.fill(buf, i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer random data since 11111 22222 333 might mask bugs in case read accesses a wrong buffer item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has to know which value is expected. Random data would be difficult to verify by hand.

As there are only 256 possible byte values, there will always be duplicates in the data, so you cannot prove it wasn't accidentally reading a different byte that happened to have the same value.

assertThrows(IOException.class, () -> bis.skip(1));
}

try (LargeObject blob = lom.open(loid, LargeObjectManager.READ)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use parameterized tests or at least individual test methods with their own names? Currently, it is hard to understand the intention behind the added tests

@OrangeDog
Copy link
Contributor Author

@davecramer @vlsi can I get your input to finish resolving all the above?

If necessary, I can split the bug fix commit to a separate PR.

@OrangeDog
Copy link
Contributor Author

@davecramer @vlsi I've not had any responses for two months now

// initialise current position for mark/reset
this.absolutePosition = lo.tell64();
// https://github.com/postgres/postgres/blob/REL9_3_0/src/include/storage/large_object.h#L76
maxBlobSize = (long) Integer.MAX_VALUE * LO_BLOCK_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the backend might have its own limits. What is the purpose of adding extra validation at the driver level though? Why can't we assume the max limit is Long.MAX_VALUE, and expect an error from the DB if the specific DB can't handle the given size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that is a breaking change to the behaviour of skip(long).
#3148 (comment)

What it currently does is skip to the end of the current input for all values larger than the remaining input. This includes returning 0 if it's already at the end.

I had another idea recently. We could provide a setter to allow users to change it if they have built postgres with a custom block size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we defer the call to the database. I do not see value in artificially adding a limit on the driver's side.

Copy link
Contributor Author

@OrangeDog OrangeDog May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I therefore need to split the PR so that the bug fix doesn't have to wait for v43?

Copy link
Contributor Author

@OrangeDog OrangeDog May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course older postgres behaves differently, looking at the 8.4 test failure.
I was expecting it to throw due to seek64 not being available.

Edit: it does, but that exception does not fail the transaction. I guess I can just change the test to expect different behaviour for the older versions?

Comment on lines 272 to 398
assertEquals(0, bis.read());
assertEquals(1024L, bis.skip(1024));
assertEquals(1, bis.read());
assertEquals(0L, bis.skip(-1));
assertEquals(1, bis.read());
assertEquals(64 * 1024L, bis.skip(64 * 1024));
assertEquals(65, bis.read());
assertEquals(expectedBigSkip, bis.skip(Long.MAX_VALUE));
assertEquals(-1, bis.read());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the corresponding messages describing the reason why expect certain values. If something goes wrong, it would be hard to tell why 65 was expected at line 278.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure how to phrase it. What exactly do you want?

The reason why it should be 65 etc. is obvious if you know the structure of the file. I can add a comment to each test describing it, if the code in createMediumLargeObject is insufficient?

Or do you want the aspect of the implementation that is being tested? Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?

No other assertions in this project use messages, so I have nothing to go on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why it should be 65 etc. is obvious if you know the structure of the

I am afraid it is not obvious.

No other assertions in this project use messages

I am afraid that is not quite right.
Please check here:

assertEquals(
list.size(),
t.getWatchedCount(),
"All objects are strongly-reachable, so getWatchedCount should reflect it"
);
assertTrue(t.isThreadRunning(),
"cleanup thread should be running, and it should wait for the leaks");
cleaners.get(1).clean();
assertEquals(
list.size() - 1,
t.getWatchedCount(),
"One object has been released properly, so getWatchedCount should reflect it"
);
list.set(0, null);
System.gc();
System.gc();
Await.until(
"One object was released, and another one has leaked, so getWatchedCount should reflect it",
ofSeconds(5),
() -> t.getWatchedCount() == list.size() - 2
);
list.clear();
System.gc();
System.gc();
Await.until(
"The cleanup thread should detect leaks and terminate within 5-10 seconds after GC",
ofSeconds(10),
() -> !t.isThreadRunning()
);
assertEquals(
Arrays.asList("LEAK", "NO LEAK", "LEAK").toString(),
Arrays.asList(collected).toString(),
"Second object has been released properly, so it should be reported as NO LEAK"

Assert.assertEquals(
"Insert with " + nBinds + " binds should be rewritten into multi-value insert"
+ ", so expecting Statement.SUCCESS_NO_INFO == -2",
Arrays.toString(new int[]{Statement.SUCCESS_NO_INFO, Statement.SUCCESS_NO_INFO}),
Arrays.toString(pstmt.executeBatch()));
} else {
Assert.assertEquals(
"Insert with " + nBinds + " binds can't be rewritten into multi-value insert"
+ " since write format allows 65535 binds maximum"
+ ", so expecting batch to be executed as individual statements",
Arrays.toString(new int[]{1, 1}),
Arrays.toString(pstmt.executeBatch()));
}

Is there any preference on phrasing, given that it's supposed to describe the failure, not describe what it was testing?

https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md#test

CONTRIBUTING.md#test: General rule: failing test should look like a good bug report

In other words, the message should describe the reason you expect something to equal to a something else.

Note: if the test class and test method gives enough information, it might be fine to omit the message.
However, in your case you put like ~10 assertions one after another, so it is important to make the failures distinct.


No other assertions in this project use messages, so I have nothing to go on

There's maintenance cost. If the test fails sometime in the future, the one who observes the failure would have to pay the cost of analyzing the failure and deciphering the intentions behind the test. Of course the current code is not all perfect, however, it is important to keep the added code maintainable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid it is not obvious.

I don't know how to write a good explanation because it really is obvious to me.

If you know a stream goes [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], and you do read(); skip(5); read() then how it it not obvious that the results should be 1 and 3?

I cannot put the full structure of the file into every assert message. I can put things like "failed to skip beyond internal buffer". That's what would be wrong with it. The value returned is arbitrary, but as previously discussed I chose this structure of values so that it's easy to see what the expected value should be, and minimise the chance of a fluke pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be some of the assertions could be split into their own test methods. Then the failures would be easier to analyze and reproduce.

I've done that a little, now that seeking too far throws and breaks the transaction. I was following the style of the existing tests here, which all have multiple assertions. Especially with streams you need multiple previous operations to set up the state you're testing. e.g. having had the buffer reallocate, being in a position where it might go backwards, etc.

I'll push the next updates without messages and more separate methods, and then see what you think.

A BlobInputStream will start reading from the current position of the LargeObject used to create it, but the current/marked position does not account for this.
Initialising current position correctly will fix this.

Closes pgjdbc#3149
Using seek is more efficient than reading and discarding the bytes.
The only behavioural change is it now allows skipping past the natural end of the stream, but the method contract already allows for this. It should not break anything.

Closes pgjdbc#3144
@OrangeDog OrangeDog force-pushed the patch-1 branch 3 times, most recently from 51ad2f7 to d4a8c7e Compare May 13, 2024 15:35
BlobInputStream.seek will now throw if attempting to skip beyond the max size for a large object. This is a breaking change. The previous implementation was imperfect as it did not account for the server being compiled with a different limit.

The state of the stream after the exception is undefined, but hopefully sane. Further exceptions may be thrown on future operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants