Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ALPN fallback #45056

Closed
pimterry opened this issue Oct 18, 2022 · 11 comments
Closed

Allow ALPN fallback #45056

pimterry opened this issue Oct 18, 2022 · 11 comments
Labels
feature request Issues that request new features to be added to Node.js. tls Issues and PRs related to the tls subsystem.

Comments

@pimterry
Copy link
Member

What is the problem this feature will solve?

I want to create a TLS server that uses ALPN but accepts all protocols from the client.

There's all sorts of interesting cases for this, but the simplest is when the server is really a TLS-terminating proxy of some kind that's just forwarding data upstream. ALPN is necessary in these cases because some clients (e.g. GRPC) will completely fail to connect to a server that does not explicitly support their protocol of choice. I suspect that'll become even more common in future as ALPN becomes widespread outside HTTP/2.

My case is a little more specific: my server is a proxy that supports network debugging, and I specifically want to accept all protocols so that I can capture and debug unexpected traffic, rather than rejecting it at the door. I think this applies more widely though.

What is the feature you are proposing to solve the problem?

An additional TLS server option like allowALPNFallback: <boolean>.

If set, at the end of the SelectALPNCallback callback if status == OPENSSL_NPN_NEGOTIATED then SSL_TLSEXT_ERR_OK should be returned instead of SSL_TLSEXT_ERR_ALERT_FATAL, thereby accepting the out value set by SSL_select_next_proto in the lines above.

I believe this works because the feature described is already the default behaviour of OpenSSL's SSL_select_next_proto function, which sets out in this callback to the client's first protocol preference by default if no matching server protocol is found. Docs here: https://www.openssl.org/docs/man3.0/man3/SSL_select_next_proto.html. The only reason that this doesn't work already is the additional check the Node adds, which sends an alert and kills the connection instead.

What alternatives have you considered?

I've opened this because this behaviour was recently changed in Node 19 in #44031. Previously in many cases it was possible to at least detect and in some cases fudge these connections because the connection continued (without an ALPN response but without failure) despite the lack of ALPN protocol agreement. Now it will always be rejected explicitly, which makes this use case more complicated.

I do agree the new behaviour in #44031 is correct as the default, but I just don't think that the RFC quoted there mandates this strict behaviour for all TLS servers. As quoted in that PR, the RFC says:

It is expected that a server will have a list of protocols that it supports, in preference order, and will only select a protocol if the client supports it. In that case, the server SHOULD select the most highly preferred protocol that it supports and that is also advertised by the client. In the event that the server supports no protocols that the client advertises, then the server SHALL respond with a fatal "no_application_protocol" alert.

In my case, the server effectively supports all protocols (i.e. it doesn't care, it's just proxying raw traffic) and so it's not required to send the alert - it's reasonable to accept the client's preference for unknown protocols.

A more general alternative would be an ALPN callback, to allow TLS servers to implement arbitrary logic like this or other ways to pick the ALPN protocol from JS directly. Imo that's interesting and all very cool, but not really necessary, and quite a bit more complicated, so probably not worthwhile. Could be worth considering if others see a good use cases for this though.

@pimterry pimterry added the feature request Issues that request new features to be added to Node.js. label Oct 18, 2022
@bnoordhuis
Copy link
Member

Pull request welcome but you'll want to read through #44755 because the question asked there is very similar to your feature request.

Your proxy example wouldn't work (not unless you know in advance what your backend supports) because the SSL_select_next_proto() callback must return an answer immediately - you can't tell openssl "stand by while I look that up for you."

@bnoordhuis bnoordhuis added the tls Issues and PRs related to the tls subsystem. label Oct 18, 2022
@pimterry
Copy link
Member Author

Opened a PR: #45075

It does still sound like it'd be possible (and maybe even useful) to have an ALPNCallback option that could take the list sent by the client and basic socket info and synchronously return the server's preference. I think there might be interesting cases there, but it still doesn't matter for me, so I've stuck with the simpler change here to just support this fallback behaviour.

@tniessen
Copy link
Member

My case is a little more specific: my server is a proxy that supports network debugging, and I specifically want to accept all protocols so that I can capture and debug unexpected traffic, rather than rejecting it at the door. I think this applies more widely though.

@pimterry I agree with @bnoordhuis that #45075 does not fully address your specific case. If the client sends [a, b, c], #45075 would fix the protocol to a, regardless of whether the remote end supports a. If the remote end only supports [b, c, d, e], the remote end will then reject it at the door, as you put it.

@pimterry
Copy link
Member Author

I agree with bnoordhuis that #45075 does not fully address your specific case. If the client sends [a, b, c], #45075 would fix the protocol to a, regardless of whether the remote end supports a. If the remote end only supports [b, c, d, e], the remote end will then reject it at the door, as you put it.

In my specific case, my app (httptoolkit.com - HTTP-focused but not exclusive) is a general-purpose intercepting proxy, not a reverse proxy. I usually don't know who the remote end will be until after TLS negotiation has completed (e.g. in the HTTPS case I want to look at the Host header). There's typically no way I can ask the remote end which protocols will work before the handshake completes, and in the cases where I can it would be asynchronous, which means it's not implementable here anyway.

In this example, negotiating a and failing, but being able to see and debug the resulting connection and initial incoming data in the proxy (and then reconfigure the proxy accordingly to handle this better in future) is the best possible result.

All I really want to do is accept all TLS connections, and also use ALPN at the same time. Before that was sort-of possible, ish, but with Node 19 it will become effectively impossible.

For more context: in my case what actually happens after the connection is a fun game with many options, but some examples:

  • The proxy could be configured to accept but then intentionally timeout all incoming connections for any protocol sent to a specific server name to test connection issues. This works for any protocol. With ALPN, accepting the first ALPN protocol offered and then just ignoring the resulting stream does exactly this. "Wait X then connection reset" works similarly.
  • The proxy could be configured to forward all traffic raw, for all protocols, using the hostname from SNI - in this case I do have an upstream server, but it's dynamic so there will never be a way to check protocol support synchronously. Accepting the first protocol, logging the data, and attempting to negotiate the same upstream is the best synchronous option, and will usually work fine (because most clients - excluding web browsers & HTTP - that request a specific protocol will request it from a server that supports that protocol).
  • For recognized protocols (e.g. HTTP) the proxy can parse the content fully once TLS is negotiated, and then handle the requests with more complex rules (maybe forward according to the Host header, maybe inject an immediate response, maybe close the connection or timeout, etc).
  • For completely unrecognized protocols with no configuration or SNI, I still want to receive the initial data, log that, and prompt the user with 'We got a request for protocol X containing "ABC..." - what do you want to do with this in future?'

There's a lot of other options here too, but much of this becomes impossible if you can never ever use ALPN without rejecting unrecognized protocols.

@pimterry
Copy link
Member Author

pimterry commented Oct 20, 2022

Separately, details of my own use case aside, this discussion is making me realize that there really could be interesting improvements available in some of these cases if there was an ALPN callback instead, despite the complexity.

That would especially help in the traditional reverse proxy case. You can't check upstream dynamically, but you could often have that upstream protocol compatibility data available synchronously, either by querying upstream at intervals, by caching results from incompatible ALPN negotiations, or by allowing protocols to be configured per-hostname manually. Adding a callback would support this, and could provide a neat extension point for all sorts of other advanced connection negotiation logic:

const reverseProxy = tls.createServer({
  // ...
  ALPNCallback: ({ clientALPN, servername }) => {
    return firstMatch(clientALPN, serverProtocols[servername]);
    // Return string -> send that result. Return undefined -> fatal alert.
  }
});

Is there any appetite for that? This would also solve my issue, and potentially support these use cases, so I'm happy to explore a PR for this if so.

The main questions I see are:

  • Can we do that without creating a performance hit when it's not used? I'm not familiar enough with the options here to know how to best handle that.
  • What are the arguments given to the callback? I think the server name & the client's ALPN protocols are essential. I can imagine cases where the incoming socket local & remote addresses might be useful too. I'm not sure if there's anything else or what the best/most performant way to expose that info to the callback would be.

@bnoordhuis
Copy link
Member

The general idea seems okay to me but I have two questions about how it interacts with other features:

  1. how does it interact with the existing SNICallback? That runs at the same time.

  2. how will it interact with ECH (Encrypted ClientHello - basically: encrypted SNI)? The RFC draft is making progress and I expect to see it finalized sometime next year.

Can we do that without creating a performance hit when it's not used?

Yes. Reading JS properties from w->object() is slow but a w->have_alpn_callback property on the C++ TLSWrap object that acts as a guard is fast.

It's okay to read JS properties as long as you don't regress performance of the common case, hence the need for a guard.

What are the arguments given to the callback?

Well... in a perfect world node would have a generalized way of making fields from the ClientHello available but we don't live in that world. ALPN and SNI are hopefully good enough for a first pass but with SNI in particular ECH is relevant.

@pimterry
Copy link
Member Author

how does it interact with the existing SNICallback? That runs at the same time.

It looks like ALPN always runs after the cert CB, so SNICallback will always have completed already. That's confirmed in the docs at https://www.openssl.org/docs/man3.0/man3/SSL_select_next_proto.html#NOTES:

The ALPN callback is executed after the servername callback; as that servername callback may update the SSL_CTX, and subsequently, the ALPN callback.


how will it interact with ECH (Encrypted ClientHello - basically: encrypted SNI)? The RFC draft is making progress and I expect to see it finalized sometime next year.

In my understanding, ECH wouldn't affect the ALPNCallback API described here. If you're handling an incoming TLS client hello with an ECH inner hello, you have to decrypt the inner hello before doing anything, and then you'd just call ALPNCallback & SNICallback with the decrypted inner parameters, just as we do now with non-encrypted parameters.

Effectively there's extra steps internally before & after, but AFAICT the core ALPN/SNI/etc logic remains the same once you've successfully unwrapped the ECH.

The SNICallback API might need to change to support more complex logic for ECH retries - i.e. what should it do if ECH decryption fails, since there'll almost always be an outer SNI extension available regardless, and a callback to set the cert & other params will still be useful in this case. That could reasonably be a separate ECHFailureCallback or something though. SNICallback could also change to expose both the inner & outer servernames to the callback, but I'd be surprised if that was particularly useful.

None of that affects ALPN either way - if ECH decryption fails, there's no point doing ALPN. All you can do is return server hello with a cert for the outer SNI, and that will either trigger a handshake retry (either with different ECH keys or with ECH disabled) or it'll fail the handshake entirely.

That's just the theory ofc - it's hard to say what the actual OpenSSL APIs for this will look like in practice. I don't think any work has started on that yet, and the protocol is technically still just a draft that could change anyway.

The rest of those details make sense, thanks. I'll take a look at building out a new PR shortly for ALPNCallback, either today or early next week.

@pimterry
Copy link
Member Author

New PR opened with the callback approach: #45190

@github-actions
Copy link
Contributor

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Apr 25, 2023
@pimterry
Copy link
Member Author

This is actually already fixed - there's a good solution in #45190 that AFAIK is good to go, and it's just been stuck there waiting to merge for a few months. There are some questions in the PR description at the top of that PR to confirm before merging, but I think 'all of that is fine' would be a reasonable answer to those.

It would be great to see this implemented, since this resolves some breakage caused by changes in Node 19, which made certain ALPN use cases impossible with Node. I'm currently avoiding upgrading my application beyond Node 18 because of this, and obviously I can't do that forever.

Is there anything we can do to wrap this up?

(For context: I've actually opened two independent PRs to fix this - #45075 and #45190 - but the latter is the more flexible and complete solution with no real downsides, so imo we should merge that and close the other)

@github-actions github-actions bot removed the stale label Apr 26, 2023
@pimterry
Copy link
Member Author

Fixed by #45190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. tls Issues and PRs related to the tls subsystem.
Projects
Status: Pending Triage
Development

Successfully merging a pull request may close this issue.

3 participants