Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LargeObject support via OID #255

Open
fzoli opened this issue Mar 5, 2020 · 18 comments
Open

LargeObject support via OID #255

fzoli opened this issue Mar 5, 2020 · 18 comments
Labels
status: ideal-for-contribution An issue that a contributor can help us with type: enhancement A general enhancement

Comments

@fzoli
Copy link

fzoli commented Mar 5, 2020

Feature Request

PostgreSQL supports two type of large object:

  • byte array (R2DBC SPI supports it with Blob)
  • oid as a reference (of course the SPI has no such support)

Here are the differences:

Characteristic BYTEA OID
Max. allowed space 1 GB 2 GB
Data access As a whole Stream-style
Storage In defined table In pg_largeobject system table
Data manipulation Using SQL and escaping sequnces Only within transaction block by special functions
Loading Preload On demand

Stream-style makes it possible to implement a server that supports HTTP range requests.
Is it possible to create an extension for r2dbc-postgresql to support LargeObject with OID?

Describe the solution you'd like

Blob from the SPI can not be used because data manipulation of a LargeObject is not possible with insert/update clauses.

io.r2dbc.postgresql.api.PostgresqlConnection could provide something like:

  • Publisher<ByteBuffer> readLargeObject(long oid)
  • Publisher<ByteBuffer> readLargeObject(long oid, Range range)
  • Mono<Long> createLargeObject(Publisher<ByteBuffer>)
  • Mono<Void> deleteLargeObject(long oid)

The Range can be described with two attribute:

  • start index: long starts from zero
  • size limit: long maximum number of requested bytes (optional)
@mp911de
Copy link
Collaborator

mp911de commented Mar 5, 2020

Thanks for your proposal. How does this work right now with PGJDBC? Ideally, we can reuse Blob as the streaming interface is already available there.

@mp911de mp911de added the type: enhancement A general enhancement label Mar 5, 2020
@fzoli
Copy link
Author

fzoli commented Mar 5, 2020

Here is the API of PGJDBC

Connection of PGJDBC provides LargeObjectManager via connection.unwrap(PGConnection.class).getLargeObjectAPI()

Methods of LargeObjectManager:

  • LargeObject open(long oid, int mode) opens an existing large object, based on its OID
  • long create(int mode) creates a large object, returning its OID
  • void delete(long oid) deletes the large object

Argument mode can be READ, WRITE, READWRITE
LargeObject is a stream-like object with seek support.

Methods of LargeObject:

  • OutputStream getOutputStream() returns an OutputStream to the large object
  • InputStream getInputStream(long limit) returns an InputStream that will limit the amount of data that is visible
  • InputStream getInputStream() returns an InputStream from the large object
  • byte[] read(int length) reads some data from the object, and return as a byte[] array
  • int read(byte[] buffer, int offset, int length) reads some data from the object into an existing array
  • void write(byte[] buffer) writes an array to the object
  • void write(byte[] buffer, int offset, int length) writes some data from an array to the object
  • seek(long position, int reference) sets the current position within the object
  • long tell() the current position within the object
  • long size() the size of the large object - inefficient, as the only way to find out the size of the object is to seek to the end, record the current position, then return to the original position

Argument reference can be SEEK_SET, SEEK_CUR, SEEK_END

Internally LargeObject uses Fastpath API.

@fzoli
Copy link
Author

fzoli commented Mar 6, 2020

I reconsidered my request. LargeObject is memory efficient because the client code uses the same byte array in a loop. With reactive stream data is immutable so we can consume only ByteBuffer chunks. With a large file maybe the overhead of the chunks are worse than the memory requirement of a new thread that can use the blocking API. Or not?

@mp911de
Copy link
Collaborator

mp911de commented Mar 6, 2020

Thanks for the details. The Large Object API is pretty different from a plain Blob, so it seems using io.r2dbc.Blob isn't necessarily a suitable approach.

is memory efficient because the client code uses the same byte array in a loop

With this API being a driver-specific API, we can use Netty's ByteBuf which can be pooled buffers so we have GC pressure covered. Looking at PGJDBC's LargeObject we should be able to mirror the same API with using reactive types along (e.g. Mono<ByteBuf> read(int length), ByteBufFlux read() corresponding withgetInputStream(), Mono<Long> write(Flux<ByteBuf>) corresponding with getOutputStream()).

The FastPath API is largely deprecated with the background of using parameterized statements. Maybe @davecramer can shed a bit of light here.

@mp911de mp911de added the status: ideal-for-contribution An issue that a contributor can help us with label Mar 6, 2020
@davecramer
Copy link
Member

@mp911de what is the question ?
see https://www.postgresql.org/docs/current/libpq-fastpath.html for the reasoning for the deprecation

@mp911de
Copy link
Collaborator

mp911de commented Mar 6, 2020

The actual question is how the replacement should look like. I wasn't able to find an example how to call functionality required for the Large Object API such as lo_close, loread, lo_tell and others via SQL.

@davecramer
Copy link
Member

Interesting looks like the driver still uses FastPath and we don't have any tests for it. Honestly I don't use it but could probably figure it out when I have some time. If they happen to figure it out first that would be awesome

@mp911de
Copy link
Collaborator

mp911de commented Mar 6, 2020

Thanks, Dave. I marked this ticket as ideal for contribution.

It makes sense to add such a feature to our library and we can work out the actual API when we receive a design proposal in form of a PR.

@Squiry
Copy link
Collaborator

Squiry commented Mar 7, 2020

@fzoli You can call any of those functions without driver's explicit support. Take a look here.
But there's one bad thing in reading: backend will send data by 65536 bytes chunks, but our driver won't decode it until it will receive full packet. So we have to rework our receiving process to support lazy decoding or we have to call lo_get multiple times with reasonable for parameter until the whole file is here.

@OrangeDog
Copy link

In JDBC, the OID-based large objects are exposed as java.sql.Blob. I've not really looked at R2DBC, but I don't see why the io.r2dbc.Blob interface wouldn't be suitable for the common case (read/write a stream).

@mp911de
Copy link
Collaborator

mp911de commented Mar 12, 2020

That was at first also my thought. The large object API allows for several additional methods such as seeking, truncation, length introspection. We need a bit of code to see how this enhancement goes and how the API shapes up.

@davecramer
Copy link
Member

So is LO support a necessity? I don't really see many people using this in JDBC ?

@mp911de
Copy link
Collaborator

mp911de commented Mar 14, 2020

I don't know as I'm not so much involved in how folks use Postgres with large objects. I don't mind making sure our driver is able to work with server-side functions so that this feature could live also outside of the driver.

@davecramer
Copy link
Member

Makes sense I guess. I probably wouldn't prioritize it.

@OrangeDog
Copy link

As an actual Postgres user, blob support is crucial. A bytea is not a blob.

This is like arguing you don’t need to support char because you don’t see many people using it.

@davecramer
Copy link
Member

Speaking from some authority as primary maintainer for the PostgreSQL JDBC driver and major contributor for Postgres and long time user (since 1999) I honestly don't see a lot of use of large objects. Now perhaps I am myopic, I don't know. That said I didn't say don't support it, I said I wouldn't prioritize it.

@OrangeDog
Copy link

If by "large objects" you mean LargeObjects then perhaps that's fair, but most people will use large objects in Java via Blob.

@davecramer
Copy link
Member

Yes I was referring to LargeObjects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ideal-for-contribution An issue that a contributor can help us with type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

5 participants