Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept certificates with invalid serial number encodings #232

Open
benjyw opened this issue May 7, 2021 · 41 comments
Open

Accept certificates with invalid serial number encodings #232

benjyw opened this issue May 7, 2021 · 41 comments

Comments

@benjyw
Copy link

benjyw commented May 7, 2021

We've noticed an issue which we think is due to webpki failing to parse a cert that was rewritten by an SSL inspection system.

I wrote up a PR that exposes the issue, and provides more details: #231

See pantsbuild/pants#12000 for context. Note that none of the participants on that issue, including yours truly, are experts at TLS (or Rust). So this is about as far as we were able to get.

Thanks!

@benjyw
Copy link
Author

benjyw commented May 7, 2021

cc @cjntaylor @tdyas

@ctz
Copy link
Contributor

ctz commented May 7, 2021

I think what is happening here is: the serial number is incorrectly encoded, and so is negative. This is illegal -- RFC5280:

   The serial number MUST be a positive integer assigned by the CA to
   each certificate.  It MUST be unique for each certificate issued by a
   given CA (i.e., the issuer name and serial number identify a unique
   certificate).  CAs MUST force the serialNumber to be a non-negative
   integer.

In the case of the test given here, the serial number is -0x3FC1DAC87904ACD759C456F2816694AC. It's worth noting that if this certificate were issued by a real CA, it wouldn't be allowed by the baseline requirements and would be a misissuance:

CAs SHALL generate non‐sequential Certificate serial numbers greater than zero (0)

Though naturally a private CA is not subject to these requirements.

@ctz
Copy link
Contributor

ctz commented May 7, 2021

@benjyw
Copy link
Author

benjyw commented May 7, 2021

Thanks for the quick response! Makes total sense.

Are you amenable to the argument from the golang case that "This buggy software, sadly, appears to be common enough that we should let these errors pass"? :)

@briansmith
Copy link
Owner

@cjntaylor @benjyw The most correct solution is to fix the broken certificate, assuming the serial number is malformed as was explained to me. Previously people have reported bugs against the webpki project because webpki's parser is really strict and we've been successful in convincing people, e.g. the AWS certificate authority team, to fix the bug on their end. Can you say which buggy software created the malformed certificate?

@briansmith
Copy link
Owner

According to the PR, the system is Focepoint NGFW Enterprise Firewall (https://www.forcepoint.com/product/ngfw-next-generation-firewall). Let's find a contact with them.

@briansmith briansmith changed the title Cert rewritten by SSL inspection system is not parseable Cert rewritten by Forcepoint NGFW Enterprise Firewall is rejected May 7, 2021
@benjyw
Copy link
Author

benjyw commented May 7, 2021 via email

@sayrer
Copy link

sayrer commented May 7, 2021

My 2 cents: tell users that encounter this problem run with system TLS on their own fork. It will be less uniform across platforms, less reproducible, and subject to a variety of issues that Rustls/webpki doesn't have. This is the line to change:

https://github.com/pantsbuild/pants/blob/af5d5a297ab4f513dd98850d7fab229021d847aa/src/rust/engine/Cargo.toml#L126

You'll get whatever version of OpenSSL is on a Linux system, for instance.

@cjntaylor
Copy link

I agree that we should definitely report this bug to them. The problem is that this is enterprise software. So even if they agree that this needs fixing, we'd be waiting on their release, which could take months, followed by the enterprise IT department agreeing to and then executing on the upgrade, which could take months or even years longer.

On Fri, May 7, 2021 at 1:03 PM Brian Smith @.***> wrote: According to the PR, the system is Focepoint NGFW Enterprise Firewall ( https://www.forcepoint.com/product/ngfw-next-generation-firewall). Let's find a contact with them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#232 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5F7HF5PPEB2IV37XMASLTMRBR7ANCNFSM44KT2EBQ .

@benjyw beat me to it. Forcepoint is managed at an enterprise scale. It's not a tool I *choose* to use - it's forced upon me anytime I make ANY ssl connection inside my company's intranet. Optimistically, if you can get Forcepoint to agree to fix this issue (if they even agree) in 6 months, and I can get a whole different department I'm not even a part of with a whole different release cycle to redeploy what amounts to a critical piece of security infrastructure for more than 8000 staff in 6 months (both of those time periods are incredibly generous, especially the latter with all of the testing and verification that would absolutely have to happen), we're looking at a year, at best, before I could even realistically hope to have this fixed.

In the meantime, I'm dead in the water. I can't do anything that requires using webpki inside our intranet because EVERY SSL connection is rewritten this way. There is no workaround. No mitigation. No alternative. Thats the point. This is security software - it's used to do stateful inspection of SSL traffic to prevent malicious transmission - there can't be any alternative or it won't do what it's intended to do. And believe me, even though I'm not involved in it's management, I'm in touch enough with the folks involved to know that we've had known malicious actors in the network - that's a big part of why it's in place.

I guess I just don't understand why there can't be any flexibility here. I perfectly understand that the RFC makes the negative serial number invalid. I also know that Forcepoint isn't the only tool that generates negative serial numbers, it's incredibly common, especially amongst Microsoft SSL tooling. There is a point at which following the RFC is detrimental to the practical usage of a tool. As it stands, I'll have to put out a warning to our entire staff not to use webpki for any purpose (The Johns Hopkins Applied Physics Laboratory, feel free to look us up). That's what you're choosing here; the grand majority of the remaining ecosystem of SSL systems (based on OpenSSL) are permissive to this RFC violation so you're choosing to stand alone on this issue.

All this said, I don't disagree that we should fix this the right way. All I'm asking for is a bit of consideration. An option? A flag? Some way to run in "slightly less strict mode"? At least during the transition period while we wait what will be years for the right fix to be put in place.

@cjntaylor
Copy link

My 2 cents: tell users that encounter this problem run with system TLS on their own fork. It will be less uniform across platforms, less reproducible, and subject to a variety of issues that Rustls/webpki doesn't have. This is the line to change:

https://github.com/pantsbuild/pants/blob/af5d5a297ab4f513dd98850d7fab229021d847aa/src/rust/engine/Cargo.toml#L126

You'll get whatever version of OpenSSL is on a Linux system, for instance.

"tell users that encounter this problem to build, manage and distribute an entire ecosystem of the tool they want to use to fix a non-obvious mostly invented problem of a downstream package that is unique to a second-order transitive dependency of the tool" FTFY

Telling users to build from scratch and manage a complex ecosystem of packages to fix your own issues isn't a solution. Way to kick the can. Thanks. I hate it.

@sayrer
Copy link

sayrer commented May 7, 2021

Hey, I think our messages overlapped, and I didn't realize you were so frustrated.

That said, all Cargo packages are built from scratch, and I think you would just need to maintain a patch file (either in Cargo or external to it) to remove the "rustls-tls" feature from reqwest. That approach has the downsides I mentioned, but I believe it would work.

@briansmith
Copy link
Owner

@cjntaylor In case it isn't clear, I didn't close this issue. It's still open. So, implementing a workaround hasn't been rejected.

@tdyas
Copy link

tdyas commented May 7, 2021

Looking at the golang PR golang/go#8265, they note that RFC 5280 states in Section 4.1.2.2:

The serial number MUST be a positive integer assigned by the CA to
each certificate. It MUST be unique for each certificate issued by a
given CA (i.e., the issuer name and serial number identify a unique
certificate). CAs MUST force the serialNumber to be a non-negative
integer.

Given the uniqueness requirements above, serial numbers can be
expected to contain long integers. Certificate users MUST be able to
handle serialNumber values up to 20 octets. Conforming CAs MUST NOT
use serialNumber values longer than 20 octets.

Note: Non-conforming CAs may issue certificates with serial numbers
that are negative or zero. Certificate users SHOULD be prepared to
gracefully handle such certificates.

The last paragraph suggests providing a way to deal with the gracefully would be a good option to have.

To that point, would webpki be willing to either:

  1. Expose an opt-in config option that would allow callers to relax the strictness requirement around positive serial numbers?
  2. Is there a way to delegate that decision to the caller via a trait defined in webpki?

@cjntaylor
Copy link

Hey, I think our messages overlapped, and I didn't realize you were so frustrated.

That said, all Cargo packages are built from scratch, and I think you would just need to maintain a patch file (either in Cargo or external to it) to remove the "rustls-tls" feature from reqwest. That approach has the downsides I mentioned, but I believe it would work.

Apologies, they did a bit, and I'll rein it in a bit. It's a little more complicated than that with pants given how it's built and maintained, and would put a great deal of burden on anyone wanting to use the fork. Yes, that would work - it's just the principle of the thing. Of course I can fix this problem by removing the dependency on webpki. But isn't that "curing the disease by killing the patient"?

@tdyas
Copy link

tdyas commented May 7, 2021

As relevant background, Pants uses Rust internally by embedding a shared object file into a Python wheel. This Python wheel is then installed into a Python virtual environment that is then used to run Pants as the build tool for a user's repository. Given this distribution model, it is a hard ask to have users build custom versions of Pants.

@tdyas
Copy link

tdyas commented May 7, 2021

That said, Pants has had to fork certain crates in the past and could do so here, but I would rather find an alternative before taking on that maintenance burden.

@briansmith
Copy link
Owner

We don't need more advocacy discussion for or against here.

@briansmith
Copy link
Owner

I don't think it makes sense to add a "strict mode" vs "non-strict" mode flag for this. It's more work, more code, but the strict mode doesn't seem like more security, so it seems like a net loss. If the default were strict mode then every user that would run into such issues would have to do work to cope with it. So I think the main choice in the short term is to be lenient always or strict always. I am open to the possibility of matching what Chromium and Firefox do, since generally when we compromise on strictness that's the upper limit on our compromise.

I am working on some testing infrastructure that will make it easier to accept a PR that would implement a workaround for this. That will need to be done first.

@benjyw
Copy link
Author

benjyw commented May 7, 2021

@briansmith Thanks! Appreciate the effort and the flexibility.

@ctz
Copy link
Contributor

ctz commented May 8, 2021

I think what is happening here is: the serial number is incorrectly encoded

Just to confirm this: I cleared the sign bit in the serial number and webpki was able to parse it. This is likely the only problem with this cert rather than the tip of an unpleasant iceberg.

@benjyw
Copy link
Author

benjyw commented May 11, 2021

Just to confirm this: I cleared the sign bit in the serial number and webpki was able to parse it. This is likely the only problem with this cert rather than the tip of an unpleasant iceberg.

That's a relief!

@benjyw
Copy link
Author

benjyw commented May 21, 2021

Hi, just checking in to see what it would take to get this resolved, and how I/we might be able to help. There was mention of needing some testing infrastructure , not sure what the status is on that. Thanks!

@benjyw
Copy link
Author

benjyw commented Jun 7, 2021

Hi, gentle reminder that we'd love to see this resolved and are happy to help make it happen. Thanks!

@briansmith
Copy link
Owner

An ASN.1 integer that has its highest bit set is a negative value. Serial numbers cannot be negative. Serial numbers are used at least for revocation (CRL, OCSP) and duplicate detection and maybe other things. We don't implement revocation checks yet but that is planned. With that in mind, we need to have a clear understanding of what to do when the serial number appears to be negative. Here's my proposal:

  • Do not consider negative serial numbers to be a thing. Instead, assume that if a serial number is negative according to ASN.1 encoding rules, that really the producer intended it to be positive, but forgot a leading 0x00. That is, we would only deal with positive serial numbers. We would continue to reject the serial number 0. Our parser for serial numbers would consider the leading 0x00 to be optional if and only if the next byte is 0x80 or higher. If the leading byte is zero and the next byte is not 0x80 or higher then we would reject the certificate as we do today.

  • If/when in the future we expose an accessor for serial numbers, we should expose the serial number as a type SerialNumber. The SerialNumber type reference/contain the serial number with any optional leading 0x00 already stripped. Then PartialEq and Eq can be derived.

  • If/when we implement revocation mechanisms or any other serial number parsing, we'll need to use this special serial number parser and do equality tests using the PartialEq implementation of SerialNumber. In particular, it is important that we consider for revocation checking that one serial number encoded with the leading 0x00 and the other encoded without the leading 0x00 to be equal so that one can use an OCSP/CRL response with correctly-encoded serial numbers to revoke an incorrectly-encoded certificate.

  • RFC 5280 says that the maximum length of a serial number is 20 octets. Different people have interpreted this to include or exclude leading zeros. We would (continue to) interpret this as disregarding any leading zero.

The above proposal assumes that its serial number comparison rules are sufficient to make it impossible for an attacker to confuse us, so that there would be no security value in a "strict" mode that rejects serial numbers that are, according to standard ASN.1 encoding rules, negative.

In order to implement this, we would start by copy-pasting ring's io::der::positive_integer (and nonnegative_integer) and its tests into webpki in an initial PR. Then we'd follow that PR with one that renames positive_integer to serial_number and implements the rules above, modifying the tests accordingly. I will work to get the additional testing framework into webpki ahead of time so that the tests for the latter PR would also include tests of validation of complete certificate chains containing certs with negative (acording to ASN.1 rules) and zero serial numbers.

WDYT?

@est31
Copy link

est31 commented Jun 16, 2021

@briansmith I've read your proposal and I generally like the idea of supporting negative serials, even though the spec forbids them. But I'm a bit critical of the proposal to strip the sign bit.

How does your proposal try to issue OCSP requests? Will it now issue two OCSP requests, one with a stripped leading 0 and one with a leading 0 added? If the webpki internal representation of the serial has all leading 0s stripped, does this mean that webpki will have to always issue two OCSP requests, even when the certificate has originally been encoded validly, because webpki can't distinguish between validly and invalidly encoded certificates?

If, as hypothesized in your proposal, webpki ever gains an accessor and exposes the serial with the sign bit stripped, tools that use this data might create inconsistent output. E.g. think of a CA that uses two different systems for issuance and CRL creation of certificates. The issuance side has the bug that it creates negative serial certificates. This appears to happen in the real world. The CRL creation side then uses webpki to parse an incoming list of certificates the module should integrate into the CRL. Then it uses that parsed result to integrate the serial into the CRL. The CRL would now contain a different serial than originally intended. I think your proposal tries to handle this case, but it can also partake in creating this case.

Personally I feel that webpki should preserve the raw bytes used to encode the serial, including whether it contains a leading zero or not. The case where CAs output different signs for their CRLs and for their certificates should be rare to non existent, and especially I think one can't blame webpki if something goes wrong if it opaquely handles the serial. If one wants to be safe, one can have OCSP handling, CRL checking etc all fail if it's enabled and a negative serial is encountered.

@briansmith
Copy link
Owner

@est31 wrote:

But I'm a bit critical of the proposal to strip the sign bit.

I'm not proposing the strip the sign bit. Instead, I'm proposing to, effectively, insert the missing 0x00 and continue on as if it were there.

For example, if the value is encoded 0xff, then that is a negative value according to ASN.1 syntax. But, instead, for serial numbers only, my proposal is to pretend it was [0x00, 0xff], which would normally parse to a serial number value of [0xff] (255).

@est31
Copy link

est31 commented Jun 16, 2021

@briansmith yeah I understand, and I read up extra on how integer DER/BER encoding works (two's complement) before I wrote the comment you replied to. My argument still holds if you substitute "stripping the sign bit" with "adding a leading 0x00", this was mainly just a figure of speech.

I think webpki shouln't try to improve the data here, but instead just either fully support negative serials, or fully reject them. Not try to do a smart trick during parsing.

For transparency: just recently I got a bug repoort in rcgen where it couldn't (validly) sign some certs... as it turns out, rcgen's internal data representation is not powerful enough (yet) to fully preserve the subject name. It turned a printable into a utf8 string, invalidating the subject name: rustls/rcgen#59 . The same issue applies here.

@briansmith
Copy link
Owner

I think webpki shouln't try to improve the data here, but instead just either fully support negative serials, or fully reject them. Not try to do a smart trick during parsing.

If an entry in a CRL (a revoked one) is encoded correctly with the leading zero, but the certificate isn't, or vice versa, then we should consider the certificate to be revoked. If we required the encoding in the CRL to also be a negative number then we'd be potentially forcing the CRL software maker to have to add buggy behavior in order to revoke a certificate created by a buggy serializer, which seems like a non-starter to me, especially because certificates are revoked in emergency situations.

In other words, I think it is a hard requirement that a syntactically-correct CRL should be able to revoke a certificate with a syntactically-incorrect serial number, for any form of serial number we support.

@briansmith
Copy link
Owner

For transparency: just recently I got a bug repoort in rcgen where it couldn't (validly) sign some certs... as it turns out, rcgen's internal data representation is not powerful enough (yet) to fully preserve the subject name. It turned a printable into a utf8 string, invalidating the subject name: est31/rcgen#59 . The same issue applies here.

The rcgen issue is quite different, I think, for reasons that are specific to how name encodings need to be handled in certificates.

@briansmith
Copy link
Owner

How does your proposal try to issue OCSP requests? Will it now issue two OCSP requests, one with a stripped leading 0 and one with a leading 0 added?

OK, I see now why you are concerned about my proposal. Yes, I guess we'd have to request both if we were to ever implement OCSP fetching. I expect that we'll likely only have built-in support for OCSP stapling, not fetching, though.

@est31
Copy link

est31 commented Jun 17, 2021

OK, I see now why you are concerned about my proposal.

Oh right yeah we've talked past each other 🤣

Indeed it would work with OCSP stapling.

crlite might run into problems tho, as it was only designed to support serials from a pool of known ones, and it might regard a serial as revoked if it's not known to the bloom filter cascade. Querying the crlite structure twice might thus create false negatives.

Btw, openssl shows the serial as negative:

$ openssl x509 -in rewritten_cert.der -inform der -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
             (Negative)3f:c1:da:c8:79:04:ac:d7:59:c4:56:f2:81:66:94:ac
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Johns Hopkins University Applied Physics Laboratory, CN = JHUAPL SSL Inspection

go does the same (using certinfo to print it):

HiCertificate:
    Data:
        Version: 3 (0x2)
        Serial Number: -84747914476526358753480801622063223980 (-0x3fc1dac87904acd759c456f2816694ac)
    Signature Algorithm: SHA256-RSA
        Issuer: C=US,O=Johns Hopkins University Applied Physics Laboratory,CN=JHUAPL SSL Inspection

@briansmith
Copy link
Owner

Yes, great points. We might have to rethink this when we actually do revocation checking then.

@stuhood
Copy link

stuhood commented Jun 28, 2021

Hey folks: thanks a lot for the thoughtful design work!

If someone is able to mentor this ticket and add some instructions to help implement it, we'd be able to get somebody to tackle it... without that though I think we won't know where to start.

@briansmith
Copy link
Owner

Yes, great points. We might have to rethink this when we actually do revocation checking then.

OK, I think I'm ready to make a decision. Consider that nobody really wants to do OCSP fetching, maintaining and using CRLSet/crlite/etc. infrastructure isn't practical for many people, and we don't have any revocation checking implemented now. Also Rustls doesn't provide an interface for customizing certificate verification that could be used to provide a lenient/strict switch that would be useful for most Rustls users.

The most likely next step for revocation checking would be OCSP stapling, and that's the revocation checking mechanism I would add first. We could add support for OCSP stapling using the lenient approach I suggested above. We could defer the decision of what to do in the conflict between the mis-encoded certificates and revocation checking for quite a while. In the meantime products that mis-encode certificates can hopefully be fixed and have their fixed versions deployed, and/or we can provide better configuration interfaces in users of webpki (Rustls, etc.). When we get to the point when the lenient approach would be practically troublesome, we can decide what to do. That might mean releasing a new major version of webpki that stops being lenient or provides an option to be strict or lenient, or something else.

So, for now, let's go with the approach I outlined above.

@est31
Copy link

est31 commented Jul 13, 2021

@briansmith any update on the attempts to contact the vendor of that firewall?

I think this issue is a blocker for rustup adoption of rustls. The maintainer asked me to use the builtin certificate store to support precisely such firewalls. Right now rustls can be turned on optionally, but is not enabled by default.

@briansmith briansmith changed the title Cert rewritten by Forcepoint NGFW Enterprise Firewall is rejected Accept certificates with invalid serial number encodings Dec 10, 2021
@briansmith
Copy link
Owner

I've decided to morph this issue from being specific to Forcepoint NGFW to being about accepting certificates with serial numbers that are missing the leading zero that would indicate that the serial number is non-negative.

Regarding the above discussion about the fact that it would be hard for every possible revocation mechanism to properly handle the ambiguity that arises from accepting such certificates: Rather than storing the certificate serial number as a slice of bytes with no other information, instead we can model it like so:

enum SerialNumber<'a> {
    // The 
    SyntacticallyValid((&'a [u8]),

    /// The certificate serial number is syntactically a negative number, which is problematic
    /// and ambiguous but unfortunately common.
    ///
    /// More details about how revocation logic should deal with this, based on the 
    /// discussion above.
    SyntacticallyValidInvalid(&'a [u8])
}

Then revocation checking logic, when added, would accept a SerialNumber as input. If the revocation checking mechanism is incompatible with the ambiguity described above, then it would probably have to reject the certificate. If the revocation checking mechanism isn't incompatible with the ambiguity then it would handle it. Probably most certificates with invalid encoding of the serial number are not ever going to have any real revocation checking done on them anyway.

WDYT?

@briansmith
Copy link
Owner

Further, whether to accept invalid serial numbers at all is probably a function of the trust anchor. So we should probably have per-trust-anchor configuration for whether to accept a bad serial number for that trust anchor. Then we'd accept the invalid serial number during parsing, but then path building would only succeed if a chain to a trust anchor with that "allow bad serial numbers" bit is found.

@est31
Copy link

est31 commented Dec 11, 2021

@briansmith your suggestion sounds good. A per trust anchor setting is probably the best.

@benjyw
Copy link
Author

benjyw commented Dec 12, 2021

Nothing sensible to add, just wanted to say thanks for your ongoing work on this issue!

@benjyw
Copy link
Author

benjyw commented Mar 28, 2022

Hi folks, I just wanted to check in on this issue. As @stuhood mentioned above, given some guidance we could help tackle this if that would help resolve it sooner.

@benjyw
Copy link
Author

benjyw commented Dec 30, 2022

Hi folks, checking in on this issue again. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants