Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties using TAIL from a NodeJS backend #7160

Closed
petrosagg opened this issue Jun 24, 2021 · 10 comments
Closed

Difficulties using TAIL from a NodeJS backend #7160

petrosagg opened this issue Jun 24, 2021 · 10 comments
Assignees
Labels
C-feature Category: new feature or request

Comments

@petrosagg
Copy link
Contributor

I encountered the following issues trying to build a realtime application based on TAIL:

  • TAIL outputs rows that can have conflicting names if the target view contains columns named timestamp, diff, or progressed
    Timestamp in particular is a pretty common name. Libraries that auto convert rows into hashmaps (like node-postgres) will only keep one of the two. While it is possible to use alternative APIs and access the row as an array, it's cumbersome.
    Proposal: We could choose less likely names for the system columns like mz_timestamp, mz_diff, and mz_progressed

  • My application was listening on the TAIL stream and was reconstructing the collection according to the diffs. I wanted to only release well-formed collections to the rest of the application so the receiving code needed to know when a timestamp is complete. TAIL doesn't provide a consistent mechanism for that, even when you do WITH (PROGRESS). The timestamps progress if you get a row where it has progressed=true OR if you receive a row with a timestamp higher than the previous row. This also implies that data might be buffered on the materialize side to make this happen.
    Proposal: Always send progress messages when a timestamp closes and don't hold back any row data, at least in the WITH (PROGRESS) mode

  • The diff field is declared as a i64 which makes the javascript client to provide it as a string since JS only has 64bit floats.
    Proposal: Make the diff field i32. It's extremely unlikely that any system will produce a single row with cardinality over 2 billion

@petrosagg petrosagg added the C-feature Category: new feature or request label Jun 24, 2021
@benesch
Copy link
Member

benesch commented Jun 25, 2021

Point 1 is #5908, and the proposal you suggest here is the one agreed upon by everyone!

cc @mjibson for thoughts on point 2.

Re point 3: I'm really loathe on the Materialize side to do anything to work around JS's numeric woes. We had this problem at Cockroach too, and, well, folks who use PostgreSQL seriously from JS are used to dealing with this, and usually have some workarounds. JS is slowly getting a BigInt type that may one day be supported by node-pg: brianc/node-pg-types#78. In the meantime you can use pg.defaults.parseInt8(true) to YOLO it if you're sure you'll never see an int8 that exceeds the maximum safe integer.

@petrosagg
Copy link
Contributor Author

oh, great about #5908! amazing that we ended up in an identical proposal

re point 3: If the rule is "we don't cut corners because of JS", then I fully agree with it and I will argue that this change does not violate it. The argument that i64 is excessive for the diff field I think stands on its own regardless of the minor JS inconvenience that made me realize it. I also saw that repr::Diffis defined as isize in the code, is that on purpose?

In my demo I did the even YOLOer thing of using == instead of === to let JS do its type coercion abominations :P

@philip-stoev
Copy link
Contributor

philip-stoev commented Jun 25, 2021

Every time a 32-bit has been used for anything internal a the database system it has overflowed and had to be changed to a 64-bit integer. record IDs, message IDs, anything , always. I suggest we keep this a 64-bit thing.

If you have a source with 2 billion NULLs in it, you can potentially get a diff with such a large value if you are very unlucky.

@petrosagg
Copy link
Contributor Author

Alright, I'm convinced. I withdraw point 3 :)

@maddyblue
Copy link
Contributor

I thought that progress=true produced periodic messages even if there were no rows. There's no code that buffers in the "after dataflow" side. We even specifically care about the case where there's no rows and thus just a progress event

// batch. All of the batches might have zero rows, so we do not depend on
. So I'm not sure what's going on there? It might be a dataflow-side thing which I don't know about.

@benesch benesch added this to Icebox in Storage (Old) via automation Jun 28, 2021
@benesch benesch added this to Icebox in SQL and Coordinator via automation Jun 28, 2021
@benesch benesch moved this from Icebox to To do in SQL and Coordinator Jun 28, 2021
@benesch benesch moved this from Icebox to To Do in Storage (Old) Jun 28, 2021
@benesch
Copy link
Member

benesch commented Jun 28, 2021

Adding to To Do in both SQL and Coordinator and Sources and Sinks so we don't lose track of this. There's definitely something strange here. @petrosagg, would you be able to provide repro instructions for the missing progress messages?

@maddyblue
Copy link
Contributor

maddyblue commented Jun 28, 2021

This is easily reproducible though:

materialize=> create table t (i int);
CREATE TABLE
materialize=> copy (tail t with (progress=true)) to stdout;

has no output, when I would expect it to. It perhaps did at one point and this regressed.

@maddyblue
Copy link
Contributor

I was not able to produce empty progress messages for any version since this was released in 0.5.2 so I assume I misremember having them ever. We just wrote the code such that they might exist, but they never did.

@maddyblue
Copy link
Contributor

I've opened #7257 to track the empty progress messages.

@benesch
Copy link
Member

benesch commented Jun 29, 2021

Ok, sounds like #7257 covers it then!

@benesch benesch closed this as completed Jun 29, 2021
SQL and Coordinator automation moved this from To do to Done Jun 29, 2021
Storage (Old) automation moved this from To Do to Landed Jun 29, 2021
@nmeagan11 nmeagan11 removed this from Landed in Storage (Old) Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: new feature or request
Projects
Development

No branches or pull requests

4 participants