Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: add quic #38233

Closed
wants to merge 6 commits into from
Closed

quic: add quic #38233

wants to merge 6 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Apr 13, 2021

Working on adding QUIC back. This is a bit of a rewrite of the original and is not yet complete (thus the draft status).

There are several important bits here:

  1. The internal classes are Endpoint, EndpointWrap, Session, and Stream (originally these were named QuicSocket, QuicSession, and QuicStream.

  2. Endpoint and EndpointWrap are a bit special...

  • Endpoint wraps a uv_udp_t but does not wrap UDPWrap. The reason is to give us finer control over how the UDP port is used and configured, and how data flows.
  • EndpointWrap is the JS Wrapper object around Endpoint. It is designed to be cloneable via MessagePort. What that means is that an Endpoint (and it's associated uv_udp_t can essentially be shared across multiple worker threads while still being owned by a single event loop). It makes the implementation a bit more complicated but opens up the ability for processing QUIC sessions across multiple worker threads.
  1. Stream is no longer a StreamBase. In fact, while the QUIC implementation will be capable of sending from a Readable stream, and will support being read as a Readable, the QUIC JavaScript Stream object will not be a Node.js stream object. This is a major API change from the original implementation. More details on this later, but the usage pattern will be closer to what we see in browser environments (e.g. stream.blob(), stream.readable(), stream.text(), etc).

  2. There's no only partial JS API exposed here yet. That's coming as I work through the rewrite.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 13, 2021
@richardlau richardlau added the semver-minor PRs that contain new features and should be released in the next minor version. label Apr 13, 2021
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndpointWrap is the JS Wrapper object around Endpoint. It is designed to be cloneable via MessagePort. What that means is that an Endpoint (and it's associated uv_udp_t can essentially be shared across multiple worker threads while still being owned by a single event loop). It makes the implementation a bit more complicated but opens up the ability for processing QUIC sessions across multiple worker threads.

This doesn't really seem future-proof in a world in which we will eventually be able to transfer handles between threads. Using another thread's I/O facilities kind of breaks the mental model that people have of Workers and renders them fairly pointless in general.

Stream is no longer a StreamBase. In fact, while the QUIC implementation will be capable of sending from a Readable stream, and will support being read as a Readable, the QUIC JavaScript Stream object will not be a Node.js stream object. This is a major API change from the original implementation. More details on this later, but the usage pattern will be closer to what we see in browser environments (e.g. stream.blob(), stream.readable(), stream.text(),

Gonna be honest, this feels like something that's going to come back to bite us, both from an API perspective (special-casing QUIC vs making the API as close as possible to the TCP one is going to make everything harder for users, not easiers) and from an internals perspective (having some consistent stream API on the C++ side does make sense!).

static std::shared_ptr<SocketAddress> FromSockName(const uv_udp_t& handle);
static std::shared_ptr<SocketAddress> FromSockName(const uv_tcp_t& handle);
static std::shared_ptr<SocketAddress> FromPeerName(const uv_udp_t& handle);
static std::shared_ptr<SocketAddress> FromPeerName(const uv_tcp_t& handle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why? SocketAddress is copyable, and copying it would be expected to have lower performance overhead than a separate allocation

(And even if not ... why std::shared_ptr instead of std::unique_ptr?)

src/aliased_struct.h Outdated Show resolved Hide resolved
src/async_signal.h Show resolved Hide resolved
@jasnell
Copy link
Member Author

jasnell commented Apr 13, 2021

This doesn't really seem future-proof in a world in which we will eventually be able to transfer handles between threads.

Will we? It's been discussed but I hadn't seen any activity to that end so I didn't really want to bank on that arriving at any time in the near future.

If/when that time does come, it can be accommodated under this model by modifying EndpointWrap to allow for transfers as well as clones. A transfered EndpointWrap would transfer ownership of the underlying inner Endpoint and it's corresponding uv_udt_t handle to the other event loop without breaking the assumptions here.

Using another thread's I/O facilities kind of breaks the mental model that people have of Workers and renders them fairly pointless in general.

Why? Can you explain? It's not far off from the model we use in Piscina, for instance, where we keep the i/o on the main thread and dispatch the compute to workers.

Gonna be honest, this feels like something that's going to come back to bite us, both from an API perspective (special-casing QUIC vs making the API as close as possible to the TCP one is going to make everything harder for users, not easiers) and from an internals perspective (having some consistent stream API on the C++ side does make sense!).

Having been through all this code many times, I disagree entirely. And I don't plan on special casing the QUIC code -- I plan on revisiting both the HTTP/2 and HTTP/1 APIs to align them closer with this model. In other words, I'm intentionally moving away from the existing streams API (in both internal and external APIs).

@addaleax
Copy link
Member

This doesn't really seem future-proof in a world in which we will eventually be able to transfer handles between threads.

Will we? It's been discussed but I hadn't seen any activity to that end so I didn't really want to bank on that arriving at any time in the near future.

I mean, "nobody's working on it so it's not going to happen" is ... Yes, it's unlikely that we're going to get a nice API for this on the libuv side anytime soon, something that I had been waiting/hoping for, but we can always choose to do the same things we do for child processes. That's not trivial either, but it's still way less complex than adding special casing for each class that wraps a libuv handle.

Using another thread's I/O facilities kind of breaks the mental model that people have of Workers and renders them fairly pointless in general.

Why? Can you explain? It's not far off from the model we use in Piscina, for instance, where we keep the i/o on the main thread and dispatch the compute to workers.

Because in that case the thread whose event loop owns the handle also performs any active I/O calls -- it's the exact opposite of what piscina does.

As an extreme example, when the handle-owning becomes blocked, other threads won't be able to actually make use of the handle.

This is a blocker, imo.

Gonna be honest, this feels like something that's going to come back to bite us, both from an API perspective (special-casing QUIC vs making the API as close as possible to the TCP one is going to make everything harder for users, not easiers) and from an internals perspective (having some consistent stream API on the C++ side does make sense!).

Having been through all this code many times, I disagree entirely. And I don't plan on special casing the QUIC code -- I plan on revisiting both the HTTP/2 and HTTP/1 APIs to align them closer with this model. In other words, I'm intentionally moving away from the existing streams API (in both internal and external APIs).

You can feel free to disagree, but please don't add APIs without having a plan for how to integrate them with the existing streams API that has consensus among collaborators beforehand.

@jasnell
Copy link
Member Author

jasnell commented Apr 13, 2021

... that has consensus among collaborators beforehand.

That's why this is all being done in the open with visibility through a draft PR.

@addaleax
Copy link
Member

... that has consensus among collaborators beforehand.

That's why this is all being done in the open with visibility through a draft PR.

Well... in that case, what's the state that you eventually want to reach in terms of internal streams API?

I mostly really want to avoid a scenario in which one chunk of the code base uses one streams model and another uses another streams model. It's bad enough that we have streams like zlib or fs streams that don't align with other code, but let's not make it worse.

@jasnell
Copy link
Member Author

jasnell commented Apr 14, 2021

Well... in that case, what's the state that you eventually want to reach in terms of internal streams API?

Let's start with a look at the external API then work our way in.

Let's look at server-side once a session has been created and we've received a stream from the peer...

stream.respondWith({ body: fs.promises.open('file') });  // Promise<FileHandle>
stream.respondWith({ body: await fs.promises.open('file') });  // FileHandle
stream.respondWith({ body: fs.openReadStream('...') });  // any stream.Readable
stream.respondWith({ body: blob });  // buffer.Blob
stream.respondWith({ body: buffer });  // Buffer/TypedArray/ArrayBuffer
stream.respondWith({ body: 'hello world' });  // string

stream.blob() // get the stream contents as a Blob
stream.readable() // get the stream contents as a stream.Readable
stream.text() // get the stream contents as a string
// ...

On the open stream side:

session.openStream({ unidirectional: true, body: fs.promises.open('file') });  // Promise<FileHandle>
session.openStream({ unidirectional: true, body: fs.openReadStream('file') });  // stream.Readable
session.openStream({ unidirectional: true, body: 'hello world' });  // string
// ...

This is largely mockup currently because I'm still working through the details and ergonomics but it gives a basic idea. Rather than the QUIC Stream being a stream itself, the API will accept any Node.js Readable as a source and will be capable of being read as a Readable for folks who want to stick with that API.

From here, on the internals, the implementation will be capable of consuming from any Readable and from any StreamBase. It will just also be capable of consuming from TypedArray, Blob, strings, etc.

As you are already aware, internally, the QUIC implementation already has to be different because of the way ngtcp2 works. Outbound data has to be queued internally with potentially multiple passes at attempting to read. Whenever a QUIC packet is serialized, we pass ngtcp2 a pointer to the available data and it decides how much it wishes to consume, we are then required to retain the sent data in memory in case ngtcp2 determines that there has been packet loss. Any single chunk of outbound data could be passed multiple times into ngtcp2, and any packet serialization could be smaller or larger than any single uv_buf_t that is passed through via a StreamBase instance. We can only free the retained data once successful receipt has been acknowledged. This requires functionality that does not exist in the current StreamBase. Inbound data is easier to handle, and when usercode chooses to consume that as a stream.Readable, it will essentially just pass through a StreamBase instance and out via stream.Readable, but there will be other options (such as "consuming" the data as a Blob, in which case the data may never actually pass up into the JS realm at all).

Further, there is a bottleneck currently built into the JS to StreamBase design in that it will queue outbound writes at the JS side until the previous write callback is invoked. However, once invoked the Buffer reference is released, allowing it to be garbage collected. That means we cannot invoke the callback until after the data is acknowledged, which bottlenecks our outbound data flow. StreamBase currently does not give us access to the Buffer to force it to be retained -- it only gives us a set of uv_buf_t's. This means that, when using Stream, the outbound data ends up needlessly being buffered twice and causes our quic stream to be starved of data until all outstanding acks come in. So while the api will support consuming the existing JS to StreamBase model, it won't be limited by it. When reading from a native StreamBase instance, such as a FileHandle, we thankfully won't have that limitation. Reading from those will be straightforward, pretty simple, and very performant -- tho it still needs a bridge into the pull model required by ngtcp2.

Signed-off-by: James M Snell <jasnell@gmail.com>
@dead-claudia dead-claudia mentioned this pull request Feb 19, 2022
@splitice
Copy link

@jasnell may I make a suggestion?

That before the code base becomes too different you take the time to review the fixes we implemented. I know theres a bit there, however they are required to get the previous code base to run reasonably well (we did both demo's with it and now have a prototype fleet running it in development).

@jasnell
Copy link
Member Author

jasnell commented Feb 23, 2022

Hey @splitice ! Yeah I actually have all of your fixes open in a set of tabs that I'm going to start going through later this week :-)

@splitice
Copy link

@jasnell then I appologise in advance for all my sins :)

@jasnell
Copy link
Member Author

jasnell commented Feb 23, 2022

Heh, don't apologize! Based on all the work you've done I owe you the beverage of your choice if we ever end up in the same location face to face.

@splitice
Copy link

splitice commented Apr 4, 2022

@jasnell I havent merged your squashed build changes however 2 more for the list.

  1. Session close() on the client side takes a long time to resolve, appears to be until the server starts it's own closure (due to timeout)

  2. Closing the endpoint with an active session results in:

node: ../deps/uv/src/unix/udp.c:103: uv__udp_finish_close: Assertion `!uv__io_active(&handle->io_watcher, POLLIN | POLLOUT)' failed.
Aborted (core dumped)
Thread 1 "node" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff75d5535 in __GI_abort () at abort.c:79
#2  0x00007ffff75d540f in __assert_fail_base (fmt=0x7ffff7737ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x55555760ed98 "!uv__io_active(&handle->io_watcher, POLLIN | POLLOUT)", file=0x55555760efd0 "../deps/uv/src/unix/udp.c", line=103, function=<optimized out>) at assert.c:92
#3  0x00007ffff75e3102 in __GI___assert_fail (assertion=assertion@entry=0x55555760ed98 "!uv__io_active(&handle->io_watcher, POLLIN | POLLOUT)",
    file=file@entry=0x55555760efd0 "../deps/uv/src/unix/udp.c", line=line@entry=103, function=function@entry=0x55555760f240 <__PRETTY_FUNCTION__.9212> "uv__udp_finish_close")
    at assert.c:101
#4  0x0000555556871071 in uv__udp_finish_close (handle=handle@entry=0x5555598bcf98) at ../deps/uv/src/unix/udp.c:103
#5  0x0000555556860f18 in uv__finish_close (handle=0x5555598bcf98) at ../deps/uv/src/unix/core.c:295
#6  uv__run_closing_handles (loop=0x5555597505c0 <default_loop_struct>) at ../deps/uv/src/unix/core.c:321
#7  uv_run (loop=0x5555597505c0 <default_loop_struct>, mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:399
#8  0x0000555555d13ae0 in node::Environment::CleanupHandles() () at ../deps/uv/src/unix/core.c:863
#9  0x0000555555d185ab in node::Environment::RunCleanup() () at ../deps/uv/src/unix/core.c:863
#10 0x0000555555ccd240 in node::FreeEnvironment(node::Environment*) () at ../deps/uv/src/unix/core.c:863
#11 0x0000555555dd44d3 in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) () at ../deps/uv/src/unix/core.c:863
#12 0x0000555555d549ee in node::Start(int, char**) () at ../deps/uv/src/unix/core.c:863
#13 0x00007ffff75d709b in __libc_start_main (main=0x555555cc3b60 <main>, argc=2, argv=0x7fffffffbf38, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffbf28) at ../csu/libc-start.c:308
#14 0x0000555555cc612a in _start () at ../deps/uv/src/unix/core.c:863

The following test script is supplied:

const { Endpoint } = require('node:net/quic'),
        crypto = require('crypto')

class DumpCertificate{
    constructor(ip, port){
        this._ip = ip
        this._port = port
    }

    async retrieveCert() {
        let peerCertificate = null
        const client = new Endpoint()
        try {
            let session = await this._clientConnect(client)
            peerCertificate = session.peerCertificate
// for 20 await session.close() here
        } finally {
            await this._closeClient(client)
        }
        return peerCertificate
    }

    async _clientConnect(client){
        const session = client.connect({
            address: this._ip,
            port: this._port
        }, { alpn: 'stuff', hostname: 'certificate' })
        await session.handshake;
        return session
    }

    async _closeClient(client){
        try {
            await client.close()
        } catch (err) {
            debug(`Failed to close connection to ${this._ip}: ${err}`)
        }
    }
}

async function main(){
    const ip = process.argv[1]
    if (!ip) {
        console.log('No ip')
        process.exit(1)
    }

    const port = parseInt(process.argv[2])
    if (!port) {
        console.log('No port')
        process.exit(2)
    }

    const dc = new DumpCertificate(ip, port)
    const start = Date.now()/1000
    let peerCertificate
    try {
        peerCertificate = await dc.retrieveCert()
    } catch(ex){
        const elapsed = Date.now()/1000 - start
        console.log(JSON.stringify({error: ex.message, elapsed}))
        return
    }
    let result, elapsed = Date.now()/1000 - start
    if(peerCertificate){
        const validTo = new Date(peerCertificate.validTo).getTime()/1000
        result = {
            validTo,
            validRemains: validTo - (Date.now() / 1000),
            elapsed
        }
    } else {
        result = {
            error: 'No peer certificate retrieved',
            elapsed
        }
    }
    console.log(JSON.stringify(result))
}

main()

Feedback / gut feelings welcome @jasnell . I can spend some time investigating tomorrow and any direction you can supply would be very helpful.

@splitice
Copy link

splitice commented Apr 4, 2022

A setTimeout to delay process exit afterclient.close prevents the segfault with item 21.

So a perhaps the destruction of the handle due to process exit while the handle is being closed. Maybe no reference being maintained for the close cb? That would make sense I think. Also that close cb should resolve the closed promise (which is resolved earlier!).

@splitice
Copy link

splitice commented Apr 5, 2022

Fix for issue 13

HalleyAssist@8d1503b

Glad I'm not paid by line of code for that one. I'm embarrased to admit how long it took to find.

@jasnell
Copy link
Member Author

jasnell commented May 23, 2022

Unfortunately, I think I need to accept the fact that I'm not likely to get back to this PR any time in the near future (or likely this year). Try as I might, this is just going to take way too much time to finish up... Therefore, I'd like to ask if there is anyone willing to take this activity over and get the QUIC implementation to completion.

@flowck
Copy link

flowck commented Jun 9, 2022

Unfortunately, I think I need to accept the fact that I'm not likely to get back to this PR any time in the near future (or likely this year). Try as I might, this is just going to take way too much time to finish up... Therefore, I'd like to ask if there is anyone willing to take this activity over and get the QUIC implementation to completion.

Hi @jasnell,
Is there a roadmap / backlog tied to the work you did, so that people can continue where you left off?

I am not a C/C++ developer, but I am would be open to contribute with anything non-C/C++ related.

@splitice
Copy link

splitice commented Jun 9, 2022

I know it's hard to do but a roadmap to (potential) merge would be good. Particularly if the new project leader for this contribution was not a regular Node contributor.

Unfortunately it's not likely to be me. The client I consulted with for the work to date does not have a sufficient budget for it. However I'm happy to continue to report and/or fix any bugs discovered in our prototype client. Currently despite the few minor known issues remaining it actually works pretty well. Our prototype has got quite the use internally.

@jasnell
Copy link
Member Author

jasnell commented Jun 9, 2022

In the next couple of weeks I will put something together. I've been talking to @mcollina this week about this and there likely is at least some part that I can continue pushing forward but help will be needed to finish it. So either way I'll need to put together a roadmap. Give me a week or two to get that together

@flowck
Copy link

flowck commented Jun 9, 2022

@splitice I appreciate your reply. :)

Hi @jasnell, thanks for the update!

@jasnell
Copy link
Member Author

jasnell commented Jun 10, 2022

Ok, so update. I am going to split this PR into two. The first will contain only the c++ pieces, the other will contain the JavaScript apis, tests, and JavaScript docs. I will keep iterating on the c++ pieces myself starting next week, but will have to leave the JavaScript pieces to someone else to help. By the end of next week I will have the roadmap documented on the remaining todos so folks can help.

Initially, to make reviewing and landing this early, I suggest that we not worry about making sure that every bug is squashed before this lands. We can iterate after to make sure everything gets done as we go.

As before, I'm happy to jump on to calls describing what is here but I will also work on a detailed documentation of the design also.

@flowck
Copy link

flowck commented Jun 10, 2022

Hi @jasnell, I like the strategy you’ve outlined, it will help split the work into smaller chunks. 🙂🙂🙂

@CMCDragonkai
Copy link

@jasnell looking forward to the plan!

I was wondering whether just using the msquic (https://github.com/microsoft/msquic) library would speed up development?

Or is it because of inconsistencies in the C/C++ codebase in nodejs if using a big external library.

@splitice
Copy link

@CMCDragonkai why would it be a better fit than ngtcp2 (which is already in use)?

@jasnell
Copy link
Member Author

jasnell commented Aug 8, 2022

Just FYI... I've started working on this again. It'll be slow going but I am making progress.

@jasnell jasnell mentioned this pull request Aug 21, 2022
@jasnell
Copy link
Member Author

jasnell commented Aug 21, 2022

This PR is superseded by #44325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. quic Issues and PRs related to the QUIC implementation / HTTP/3. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet