Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I fetch COPY output result ? #332

Open
unoexperto opened this issue Sep 19, 2020 · 11 comments
Open

How do I fetch COPY output result ? #332

unoexperto opened this issue Sep 19, 2020 · 11 comments
Labels
status: waiting-for-feedback We need additional information before we can continue type: enhancement A general enhancement

Comments

@unoexperto
Copy link

Is there official way of getting COPY output result ? I see that CopyData messages are coming in PostgresqlResult.messages but there is no way for me to get it because PostgresqlResult.map is hardcoded to get regular rows only.

Thank you.

@unoexperto unoexperto added the status: waiting-for-triage An issue we've not yet triaged label Sep 19, 2020
@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Sep 21, 2020
@mp911de
Copy link
Collaborator

mp911de commented Sep 21, 2020

There's no direct API for CopyData consumption. PostgresReplicationStream uses CopyData that lets you consume the replication stream of START_REPLICATION SLOT … LOGICAL ….

Care to elaborate on your use-case so we can understand what you want to achieve to discuss how to make it possible?

@unoexperto
Copy link
Author

@mp911de My use-case is data replication. So It's possible that single COPY query will return 100s of GB of data. As result I need to parse response in streaming fashion.

@mp911de
Copy link
Collaborator

mp911de commented Oct 1, 2020

Data replication as in logical decode? How would you consume COPY though the pgjdbc driver? Maybe we can come up with something similar.

@davecramer
Copy link
Member

I read it as he just wants to consume the copy stream. Data replication is orthogonal to the request

@unoexperto
Copy link
Author

unoexperto commented Oct 1, 2020

Dave is correct. I want to consume very long stream of binary data. Typically COPY output is either CSV or BINARY. So my client code will take of parsing it.

What's also important is to know that the stream is fully consumed. For that I need to get CopyDone.

If I were to modify existing code with minimal intrusion I'd probably have something like CopyDataStreaming where data field is of Flux<ByteBuf> type.

@mp911de
Copy link
Collaborator

mp911de commented Oct 1, 2020

Is that something we could map onto PostgresqlResult in the form of e.g.

/**
 * Return a stream of mapped objects from {@link CopyData}. The stream terminates when the underlying stream receives  {@link CopyDone}.
 */
<T> Flux<T> mapCopy(Function<CopyData, ? extends T> mappingFunction);

Since I'm not terribly familiar with Copy (and CopyIn, CopyOut), I'm happy for suggestions. Also an example how the calling code could look like would be helpful to find a suitable approach. CopyData holds a ByteBuf that can be consumed by the caller.

@mp911de mp911de added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Oct 1, 2020
@unoexperto
Copy link
Author

Thank you, Mark. I'm afraid returning response in ByteBuf won't cut it. It can easily be terabyte of data. Let me fork and experiment with implementation options and I'll raise PR to discuss if solution is feasible.

@mp911de
Copy link
Collaborator

mp911de commented Oct 1, 2020

Sure, happy to continue the discussion over the actual code.

Note that the driver is built with the assumption to decode an entire frame (message) (see BackendMessageDecoder) before a consumer can process the frame. According to the Postgres protocol spec, a single frame may be up to 2GB which is a lot of memory.

@davecramer
Copy link
Member

No, copy is a stream it could be TB as mentioned.

@mp911de
Copy link
Collaborator

mp911de commented Oct 1, 2020

Does the server send multiple CopyData frames or is that part of a single frame?

@davecramer
Copy link
Member

Good question I actually haven't looked. you are probably correct though it does probably send a frame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants