Skip to content
John Schanck edited this page Jun 3, 2022 · 24 revisions

See the Technology Overview for details on the tools

FAQ

How much does CRLite compress data?

CRLite promises substantial compression of the dataset; In our staging environment, the binary form of all unexpired certificate serial numbers comprises about 16 GB of memory in Redis; the binary form of all enrolled and unexpired certificate serial numbers comprises about 1.7 GB on disk, while the resulting binary Bloom filter compresses to approximately 5 MB.

These artifacts are in the mlbf folder for a given run, available in the published data-sets. See "Where can I get the CRLite data that is used to make filters?"

Why is CRLite able to compress so much data?

Bloom filters are probabilistic data structures with an error rate due to data collisions. However, if you know the whole range of data that might be tested against the filter, you can compute all the false positives and build another layer to resolve those. Then you keep going until there are no more false positives. In practice, this happens in 25 to 30 layers, which results in substantial compression.

Bloom filters have a false-positive rate; how can CRLite be relied upon?

The key innovation for CRLite is that Certificate Transparency (CT) data can be used as a stand-in for "all the certificates in the Web PKI". It's reasonably easy to tell if a certificate is in Certificate Transparency: Was it delivered with a Signed Certificate Timestamp (SCT) from a CT log? Similarly, it's reasonably easy to tell that a certificate was known to a CT log at the time that the CRLite filter was constructed: Was the SCT at least one Maximum Merge Delay older than the CRLite filter?

The remaining issues are whether the Issuer is included/enrolled in the CRLite filter set, which is provided as a flag along with the Firefox Intermediate Preloading data.

How large are the delta updates for CRLite?

They tend to be between 20kB and 50kB, in a form we call "stashes". You can use the crlite_status tool to investigate the sizes of recent runs. Similarly, you can use rust-query-crlite to read and evaluate certificates against the filter+stash sets.

You can see an output of the crlite-status tool, which shows filter statistics by date, here: https://gist.github.com/jcjones/1fd9f63f93c7b85f87f4ac9b0f134905

How do you pick what CAs are included in CRLite?

All CAs that have fresh Certificate Revocation Lists (CRLs) encoded into their issued certificates get included into CRLite. Freshness meaning that the CRLs' signatures are valid and that they aren't passed their NextUpdate time.

We initially thought we would hand-pick some issuing CAs, but automation was simpler.

Analysis why issuers become unenrolled in CRLite is still active, but the usual culprit in the logs is that the next CRL simply can't be downloaded by the CRLite aggregate-crls tooling, which has limited retry and resume functionality. There is audit data available using the crlite-status tool with the --crl options to analyze when issuers are being enrolled or unenrolled in CRLite.

What happens if a certificate is too new?

Firefox will use OCSP (stapled or actively queried) if the certificate's Signed Certificate Timestamps are too new for the current filter.

What happens if an issuer is unknown?

CRLite won't be used. If the issuer is truly unknown, Firefox will give an unknown issuer warning like always, nothing there will change. If the issuer is not in the Mozilla Root Program, then it won't be eligible for CRLite.

How can you know if a given issuer has its data in CRLite?

Each CRLite filter is published with a list of enrolled issuers. The easiest way to check if an issuer is enrolled is to query a certificate from that issuer against a filter using the rust-query-crlite tool with verbose logging (-v). If the issuer is not enrolled, the tool will output NotEnrolled. For more detailed instructions see "How can I query my CRLite filter".

What happens if CRLite says a certificate is revoked but OCSP says it's valid?

At Internet-scale, this is likely a common occurrence: Certificate Authorities generally have lag in updating revocation information, and there's no requirement that CRLs and OCSP update together. Firefox can be configured to double-check revoked certificate results from CRLite against OCSP (by setting security.pki.crlite_mode = 3 in about:config). In this mode, if CRLite says a certificate is revoked, and OCSP says it is valid, then the OCSP result is used. This is currently the default behavior on Firefox Beta and Nightly channels. CRLite is not yet enabled on the Release channel.

Where can I get CRLite data that Firefox uses?

The CRLite filters are published manually at Firefox Remote Settings. You can examine the data using JSON tooling at this URL: https://firefox.settings.services.mozilla.com/v1/buckets/security-state/collections/cert-revocations/records

The rust-query-crlite tool can be used to download a filter and store it in the format used by Firefox. See How can I query my CRLite filter below.

How can I query my CRLite filter?

Install the rust-query-crlite program from the CRLite repository by running cargo install --path ./crlite/rust-query-crlite.

Your Firefox profile contains a subdirectory called security_state. To query the CRLite filter used by your Firefox profile, run:

rust-query-crlite -vvv --db /path/to/security_state x509 /path/to/certificate1 /path/to/certificate2 [...]

or

rust-query-crlite -vvv --db /path/to/security_state https host1.example.com host2.example.com [...]

The provided security_state directory does not have to be in a Firefox profile. To download the current CRLite filter pass --update prod. (Note that this cannot currently be used to populate an empty Firefox security_state directory because Firefox requires additional metadata about the freshness of filters which is not populated by rust-query-crlite.)

Where can I get the CRLite data that is used to make filters?

The production data is hosted in Google Cloud Storage in a bucket named crlite-filters-prod. The web interface for the files is accessible publicly here, though browsing it requires a Google login: https://console.cloud.google.com/storage/browser/crlite-filters-prod

The staging environment, which contains only a fraction of the WebPKI, is here: https://console.cloud.google.com/storage/browser/crlite-filters-stage

The Google gsutil tool is handy for downloading entire datasets (~7 GB each). These commands would download all the files:

mkdir crlite-dataset/
gsutil -m cp -r gs://crlite-filters-prod/20200101-0 crlite-dataset/

The known folder contains JSON files named by the enrolled issuing CA of all their unexpired DER-encoded serial numbers. The revoked folder has files of the same issuing CA format, but contains DER-encoded serial numbers of the revoked certificates. The serials in revoked are not guaranteed to be a subset of known, as many are likely expired, so set math is required to get known revoked from the directories.

The mlbf folder contains the filter and its metadata as-generated.

The log folder contain all the logs for the runs. As of this writing, many errors and warnings are still emitted that require bugfixing in one fashion or other. There are also many pointers to potential CRL problems with CAs, though few are compliance issues, and at least some are known to be innocent problems.

How can I access statistics about the available filters?

The crlite-status tool is probably what you're looking for. You can get it from pypi:

pip3 install crlite-status
crlite_status 8

How can I produce my own CRLite filter?

Install the rust-create-cascade program from the CRLite repository by running cargo install --path ./crlite/rust-create-cascade.

With a full dataset at hand from the above gsutil command:

rust-create-cascade -vv --known ./20200101-0/known/ --revoked ./20200101-0/revoked/

How can I run the CRLite backend infrastructure myself?

See the main README.md.

Why don't you also scrape OCSP?

It's extremely inefficient, having to do so many OCSP queries. While the original paper's implementation did it, and so did casebenton/certificate-revocation-analysis (our initial proof-of-principal), downloading CRLs scales much better. If CRLite gains traction, OCSP bandwidth savings and speedups may prove to be reasons for CAs to issue CRLs.

What are the "stashes"?

They're binary-encoded flat lists of Issuer Subject Public Key Information hashes, followed by a list of serial numbers.

The read_keys.py script can read stash files.

What determines whether a new filter gets distributed, or a new stash distributed?

Currently CRLite uses a heuristic that end-users will collect stashes until the total size of the collected stashes is going to be larger than a new filter. At that point, the infrastructure will switch over to a new filter and clear all existing stashes.

The contract between CRLite clients and the infrastructure allows the infrastructure to adjust this heuristic at will. Most likely, this will be modified over time to optimize client-side searches, as searching the stashes is slower than searching the Bloom filter cascade, and purely choosing to update the filter on file-size does not account for those speed differences.

What CT logs are monitored?

A CT log is monitored if its crlite_enrolled flag is set in the ct-logs Remote Settings collection. This collection is periodically updated with entries from Google's log list, but the crlite_enrolled flag is only set after manual review by a Mozilla engineer.

What gets stored in Redis?

ct-fetch stores certificate serial numbers and CRL distribution points in the Redis database.

Serial numbers are stored as Redis sets with the keys being named in the form serials::<expiration date and hour>::<issuer>, with each key's expiration set to automatically expunge upon reaching the expiration day-and-hour.

CRL distribution points are also stored as Redis sets, with keys in the form crls::<issuer>, and CRL DPs do not expire; as they are discovered, CRLite assumes they will be updated until the retirement of the issuer.