Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] use llhttp as default HTTP parser #9033

Closed
wants to merge 17 commits into from

Conversation

derekargueta
Copy link
Member

Description: Uses llhttp to parse HTTP/1.1 traffic. A compile-time switch is provided to use the legacy Node http-parser. Preprocessor macros were selected due to global naming conflicts between the two libraries. The alternative to make it a runtime switch would require #includeing the libraries within separate namespaces (I can backtrack to this option if it's more desirable).
TODO

  • resolve lenient_http_headers issue as mentioned on the ticket
  • fix BadRequestNoStream test to not open a new stream

There are other places that still use the http-parser, notably the http_inspector network filter and the url parsing utilities.

Risk Level: medium
Testing: existing
Docs Changes:
Release Notes:
Fixes #5155
Deprecated: http-parser

Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
@lizan lizan self-requested a review November 14, 2019 20:22
@indutny
Copy link

indutny commented Nov 14, 2019

Fantastic work, @derekargueta ! I'm curious if this will bring any noticeable performance improvements to envoyproxy!

sendProtocolError();
throw CodecProtocolException("http/1.1 protocol error: " +
std::string(http_errno_name(HTTP_PARSER_ERRNO(&parser_))));
std::string(parser_errno_name(parser_get_errno(&parser_))));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a new opportunity for detailed errors here. One could potentially use llhttp_get_error_reason to get the detailed reason for the error.

@@ -2,6 +2,7 @@ licenses(["notice"]) # Apache 2

load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_binary",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used?

ENVOY_CONN_LOG(trace, "message complete", connection_);
if (handling_upgrade_) {
// If this is an upgrade request, swallow the onMessageComplete. The
// upgrade payload will be treated as stream body.
ASSERT(!deferred_end_stream_headers_);
ENVOY_CONN_LOG(trace, "Pausing parser due to upgrade.", connection_);
#ifdef ENVOY_ENABLE_LEGACY_HTTP_PARSER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always return HPE_PAUSED and let the calling site call http_parser_pause? i.e. here: https://github.com/envoyproxy/envoy/pull/9033/files#diff-2ba39613a1b78b994ec31ddd03cf42f5R342

same for other places.

nread = llhttp_get_error_pos(parser) - slice;
if (err == HPE_PAUSED_UPGRADE) {
err = HPE_OK;
llhttp_resume_after_upgrade(parser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't always handle upgrade today? don't we need to check handling_upgrade_ here before resume?

@htuch
Copy link
Member

htuch commented Nov 14, 2019

@envoyproxy/security-team for review as well.

@mattklein123
Copy link
Member

@derekargueta awesome stuff. I put this comment in the issue, but is it possible to compile both in for a period of time so we can feature flag which parser to use?

/wait-any

@alyssawilk
Copy link
Contributor

+1 to what Matt said. I'm super excited for the switch and it'd be even better if we can runtime switch this for a release or two!

@indutny
Copy link

indutny commented Nov 18, 2019

FWIW, I've created a PR to introduce lenient parsing to llhttp: nodejs/llhttp#33 .

@derekargueta
Copy link
Member Author

derekargueta commented Nov 18, 2019

@mattklein123 @alyssawilk yeah I started working on a runtime switch. The catch is that llhttp.h and http_parser.h can't be in the same compilation unit due to conflicting enum definitions (ODR \o/) so the solution I have working so far uses the pimpl pattern to contain those libraries in source files. I'm a bit concerned about introducing pointer-indirection in the hot path, but am planning on writing a few benchmark tests to justify the change. Doesn't seem we have codec benchmark tests yet, and it could also give perf feedback on the difference between the two libraries. If I'm going overboard or there's a better pattern I'd be happy to hear it 🙂

@lizan
Copy link
Member

lizan commented Nov 18, 2019

@derekargueta from a quick glance it doesn't seems have conflicting enum between llhttp and http_parser, seems everything in llhttp is prefixed with llhttp_. I don't think pimpl solves the ODR because you still link them together at the end.

@derekargueta
Copy link
Member Author

@lizan sorry I misspoke, it's not the enum's themselves (which you are correct, llhttp is prefixed with llhttp_) but the enum values since they are plain C enums and not C++ class-enums.

When the 2 are included into the same context (directly or transitively):

...
In file included from source/common/http/http1/comparison.cc:7:
external/com_github_nodejs_http_parser/http_parser.h:210:19: error: redefinition of enumerator 'HTTP_REBIND'
  HTTP_METHOD_MAP(XX)
                  ^
external/com_github_nodejs_llhttp/include/llhttp.h:124:3: note: previous definition is here
  HTTP_REBIND = 17,
  ^
In file included from source/common/http/http1/comparison.cc:7:
external/com_github_nodejs_http_parser/http_parser.h:210:19: error: redefinition of enumerator 'HTTP_UNBIND'
  HTTP_METHOD_MAP(XX)
                  ^
external/com_github_nodejs_llhttp/include/llhttp.h:125:3: note: previous definition is here
  HTTP_UNBIND = 18,
  ^

@jmarantz
Copy link
Contributor

jmarantz commented Nov 18, 2019

Can we change llhttp to prefix it's enum values? I am not sure that is necessary though.

We should have an abstraction layer with its own enum to enable the switch, right?

@lizan
Copy link
Member

lizan commented Nov 19, 2019

@derekargueta thanks that makes sense, yeah then you need two classes implement same interface in different compilation unit to make it work.

@stale
Copy link

stale bot commented Nov 26, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 26, 2019
@alyssawilk alyssawilk added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Nov 26, 2019
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
@derekargueta
Copy link
Member Author

Still very much a WIP, not yet ready for full review but high-level comments welcome. Wrestling with getting the static-casting right from void* to ParserCallbacks*, then likely debugging a few tests, and cleaning up misc comments & cruft. Also still massaging the abstractions a bit.

This introduces quite a few abstractions with the aim of completely decoupling the HTTP parser from the codec/connectionimpl which is necessary for the 2 libraries to coexist. Main components are

  • ParserCallbacks, an interface the {Server|Client}ConnectionImpl classes implement to hook into the parser callbacks
  • ParserFactory for constructing the parser
  • Parser/LlHttpParserImpl/LegacyHttpParserImpl which is where the PImpl pattern mentioned earlier is being used to encapsulate the libraries in separate compilation units.

Also keeping an eye on #8667 which looks like will have conflicts with this PR

Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Signed-off-by: Derek Argueta <dereka@pinterest.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah for progress!
some high level comments while you're doing clean up :-)

@@ -12,6 +12,7 @@ Version history
* health check: gRPC health checker sets the gRPC deadline to the configured timeout duration.
* http: added the ability to sanitize headers nominated by the Connection header. This new behavior is guarded by envoy.reloadable_features.connection_header_sanitization which defaults to true.
* http: support :ref:`auto_host_rewrite_header<envoy_api_field_config.filter.http.dynamic_forward_proxy.v2alpha.PerRouteConfig.auto_host_rewrite_header>` in the dynamic forward proxy.
* http: use llhttp as default http parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment as you clean things up, I wouldn't default this on until we've had some fuzzing aimed at it.
I'd be inclined to land default off, make sure we have comparable fuzzing coverage, encourage folks to smoke test it on, fuzz for a few weeks, and then switch the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will flip this off-by-default for now

@@ -314,47 +314,13 @@ void RequestStreamEncoderImpl::encodeHeaders(const HeaderMap& headers, bool end_
StreamEncoderImpl::encodeHeaders(headers, end_stream);
}

http_parser_settings ConnectionImpl::settings_{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're doing your clean-up, WDYD of landing some of the pure refactor as its own PR?
It'd let us land some of this ASAP, make for easier and smaller reviews, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can break this up a bit (I was admittedly surprised how large the delta was once I got the refactor building)


bool ParserFactory::usesLegacyParser() { return use_legacy_parser_; }

void ParserFactory::useLegacy(bool use_legacy_parser) { use_legacy_parser_ = use_legacy_parser; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're able to link both, why not make this based on runtime?
I could imagine finding an issue where things don't work as expected, and switching all new connections over is safer than changing command line flags and doing fleet-wide restarts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting just runtime value and not introducing a flag? Or keeping the flag but allow runtime to override? I'm guessing the former since the latter could be confusing to someone seeing the flag enabled but not knowing the runtime value is overriding it, but just want to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think it's worth having runtime. Neutral about flags - last I heard from Matt most non-google companies don't use command line flags heavily so if you folks don't need it I'd lean towards sticking with one configuration method for consistency, as you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would probably just do runtime.

if (strict_header_validation_) {
if (!Http::HeaderUtility::headerIsValid(header_value)) {
ENVOY_CONN_LOG(debug, "invalid header value: {}", connection_, header_value);
error_code_ = Http::Code::BadRequest;
sendProtocolError();
throw CodecProtocolException("http/1.1 protocol error: header value contains invalid chars");
}
} else if (header_value.find('\0') != absl::string_view::npos) {
} else if (ParserFactory::usesLegacyParser() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an optimization because we're confident the new parser handles this for us? I'd think we wouldn't want to allow \0 for the new version either.
(calling out because if we make this a runtime switch we can't depend on the previously stable value)

sendProtocolError();
throw CodecProtocolException("http/1.1 protocol error: " +
std::string(http_errno_name(HTTP_PARSER_ERRNO(&parser_))));
throw CodecProtocolException("http/1.1 protocol error: " + std::string(parser_->errnoName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::StrCat would be a good think to use here.

virtual ~Parser() = default;
virtual int execute(const char* slice, int len) PURE;
virtual void resume() PURE;
virtual int pause() PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to return ParserStatus here? Or would it be too much work to translate from implementation specific status to the ParserStatus?

@@ -607,12 +569,13 @@ void ServerConnectionImpl::onEncodeComplete() {
void ServerConnectionImpl::handlePath(HeaderMapImpl& headers, unsigned int method) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make method to be type Method?


// Currently, CONNECT is not supported, however; http_parser_parse_url needs to know about
// Currently, CONNECT is not supported, however; llhttp_parse_url needs to know about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do both parsers need to know about CONNECT?

current_header_map_.reset();
header_parsing_state_ = HeaderParsingState::Done;

// Returning 2 informs llhttp to not expect a body or further data on this connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code common to both http_parser and llhttp? What would http_parser do with 2?

Comment on lines 189 to 190
strip_prefix = "llhttp-release-v2.0.1",
urls = ["https://github.com/nodejs/llhttp/archive/release/v2.0.1.tar.gz"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llhttp 2.0.4 was just released - https://github.com/nodejs/llhttp/releases/tag/v2.0.4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Derek Argueta <deargueta@tesla.com>
@mattklein123 mattklein123 removed the no stalebot Disables stalebot from closing an issue label Apr 22, 2020
@stale
Copy link

stale bot commented Apr 29, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 29, 2020
Signed-off-by: Derek Argueta <darguetap@gmail.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label May 6, 2020
Signed-off-by: Derek Argueta <darguetap@gmail.com>
@stale
Copy link

stale bot commented May 13, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label May 13, 2020
@stale
Copy link

stale bot commented May 20, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this May 20, 2020
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps Approval required for changes to Envoy's external dependencies stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch from http-parser to llhttp
9 participants