Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: delimitedMethods option #460

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

AlaaZorkane
Copy link

@AlaaZorkane AlaaZorkane commented Dec 31, 2021

Related issue(s)

#233

This PR adds the following:

  • encodeDelimited generator
  • decodeDelimited generator
  • delimitedMethods option to generate above methods
  • delimiatedMethods usage documentation
  • Tests for above methods
  • Tests for encoding/decoding a stream of messages

Notes

I tried to find an API where I don't have to copy paste redundant generation logic but I failed to make it backward-compatible, per example, one of the approaches is to have an option instead of having specific functions like this:

decode(input: _m0.Reader | Uint8Array, length?: number, isDelimited = false) {}
encode(message: JarAccountState, writer: _m0.Writer = _m0.Writer.create(), isDelimited = false) {}

But this would appose these problems:

  • The length will be ignored but needs to be passed on (or `null) explicitly.
  • You have to pass a writer or null explicitly

I also couldn't find a way to not have redundant code for adding cases for each incoming field due to my unfamiliarity with the codebase. 431c595

Useful links

Protobuf streaming: https://developers.google.com/protocol-buffers/docs/techniques#streaming
Protobuf-js implementations: encodeDelimited and decodeDelimited

Workarounds

This is what I currently do as a workaround, very verbose and loosely typed:

export const encoderDelimited = <T = any>(proto: any, message: T) => {
  const encoded: Writer = proto.encode(message);

  encoded.ldelim();

  return encoded.finish();
};

export const decoderDelimited = <T = any>(proto: any, encoded: Uint8Array) => {
  const reader = new Reader(encoded);
  const length = reader.int32();
  const decoded: T = proto.decode(reader, length);

  return decoded;
};

Copy link
Owner

@stephenh stephenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlaaZorkane thanks for the PR!

I had a few questions, and then also if you could make a new integration-tests/delimited-methods directory with a minimal proto and a test that round trips like:

const foo1 = FooMessage.fromPartial({ ... });
const foo2 = FooMessage.decodeDelimited(FooMessage.encodeDelimited(foo1));
expect(foo2).toEqual(foo1);

That'd be great. Thanks!

src/main.ts Outdated
input: ${Reader} | Uint8Array,
): ${fullName} {
const reader = input instanceof ${Reader} ? input : new ${Reader}(input);
let end = reader.uint32();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of repeating the entire contents of decode, could this be something like:

const reader = ...
let end = reader.uint32();
return decode(reader, end);

I.e. decode already accepts an optional length argument, which I think? is the only thing that has changed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Since delimitedMethods depends on outputEncodeMethods we can definitely reuse that logic.

src/main.ts Show resolved Hide resolved
@stephenh
Copy link
Owner

stephenh commented Jan 1, 2022

Also @AlaaZorkane , I'm just curious, I've not personally used delimited messages before, but I get the concept and was reading the docs:

https://developers.google.com/protocol-buffers/docs/techniques#streaming

One thing I'm curious about is, if there are multiple messages in a stream coming in, how do we know which message is whic? Like if a stream of [FooMessage, BarMessage] is then with delimited lengths it might be [200, FooMessage, 300, BarMessage], i.e. we know to read 200 bytes for foo and 300 bytes for bar; but how do we know, after reading FooMessage, that BarMessage is what comes next?

I.e. I kinda assumed that delimited messages would have to encode both the length of the next message, as well as the type of the message, using some sort of schema registry / message id type setup...

Or is the assumption that the caller of decodeDelimited knows that there will be a specific / fixed order of messages?

@AlaaZorkane
Copy link
Author

AlaaZorkane commented Jan 4, 2022

Hey @stephenh - I added the integration tests and the reusing refactor! Would appreciate if you can review them please 🙏

Also, about streaming multiple messages, I think implementations can vary from application to another? And it's mostly a higher level topic that won't be included and/or abstracted in decoding/encoding primitives (like decodeDelimited).

There isn't really a set of best-practices on how to deal with the case of a stream of different types of messages AFAIK, and the assumption I made when I read the streaming docs is for use cases like [50, FooMessage, 100, FooMessage, ...] (there will always only be FooMessage)

P.S: One way I could do what you mentioned is communicate the specific order of messages pre-streaming (?)
P.S (2): A good article I found about streaming protobufs: https://sureshjoshi.com/development/streaming-protocol-buffers


EDIT: Hey @stephenh just letting you know that I added a custom implementation test for streaming different types of messages via:

it('decodes a stream of different length-delimited messages', () => {

Fair to mention that this is one implementation of many and might vary from app to another.

},

encodeDelimited(message: Simple, writer: Writer = Writer.create()): Writer {
return this.encode(message, writer).ldelim();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at some of the code we create today and it does this:

      Nested_InnerMessage.encode(message.message, writer.uint32(18).fork()).ldelim();

And the docs for ldelim say "Resets to the last state and appends the fork state's current write length as a varint followed by its operations."

This makes me think we need to ensure a fork exists, and change this to be:

return this.encode(message, writer.fork()).ldelim();

Does that seem right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! It seems like it's the case in the protobuf-js implementation here. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that this should have been caught by a test. Would it be possible to change the tests to write and read 2 messages instead of 1? Then we will really test the main use-case of this feature @AlaaZorkane

Comment on lines 20 to 25
it('decodes length-delimited messages', () => {
const encoded = Simple.encodeDelimited(message).finish();
const decoded = Simple.decodeDelimited(encoded);

expect(decoded).toEqual(message);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this test (or adding one) to encode and decode 2 messages?

It will give us more confidence that the main use case is working, and also demonstrate how to use the feature in practice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Just added streaming tests via 0ec5e53, please let me know what do you think. Thanks!

Copy link
Owner

@stephenh stephenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing about message vs. _ but otherwise looks great!

src/main.ts Outdated
// create the basic function declaration
chunks.push(code`
encodeDelimited(
${messageDesc.field.length > 0 ? 'message' : '_'}: ${fullName},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor, but I think this message || _ variable name was for encode whose for loop would actually never use message if there are no fields. For encodeDelimited, the function body always uses message, so this can be just message: ${fullName}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in b0f5e38, thanks!

src/main.ts Outdated
Comment on lines 830 to 831
chunks.push(code`const message = this.decode(reader, length);`);
chunks.push(code`return message;`);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just return this.decode(...)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, definitely!

- removed unused message condition
- return decoding directly
Copy link
Collaborator

@boukeversteegh boukeversteegh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests added! Looks good to me 👍

@stephenh
Copy link
Owner

@AlaaZorkane looks great! Looks like just the build is failing, maybe b/c some output needs updated after getting rebased:

https://github.com/stephenh/ts-proto/runs/4766206207?check_suite_focus=true

@AlaaZorkane
Copy link
Author

AlaaZorkane commented Jan 11, 2022

@AlaaZorkane looks great! Looks like just the build is failing, maybe b/c some output needs updated after getting rebased:

https://github.com/stephenh/ts-proto/runs/4766206207?check_suite_focus=true

Hey @stephenh, sorry can you walk me through it? I already ran the codegen commands and updated the .bin/.ts outputs to conform with last changes.

P.S: I took a look at the workflow logs and I am not sure I completely got what the diff is about ^^'

@stephenh
Copy link
Owner

@AlaaZorkane sure, np; I pulled your PR down locally, and AFAICT just rebasing on main and then re-running codegen for your new delimited-methods test (i.e. yarn bin2ts delimited-methods) did update the delimited-methods.ts file to match what the CI build expected.

I'd push my change up to this PR just to be sure, but I can't push to your main branch; so probably try that on your side and then I think the build will be clean?

Thanks!

@stephenh stephenh force-pushed the main branch 4 times, most recently from cdf2835 to 7017d4c Compare May 28, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants