Skip to content

The full build->deploy->crash->report->process pipeline for rust-minidump and friends

Notifications You must be signed in to change notification settings

rust-minidump/minidump-pipeline

Repository files navigation

minidump-pipeline

This project attempts to be a whole-system pipeline demonstrating how to use the various minidump-based crash-reporting tools written in Rust, across the entire development pipeline:

  1. Build: Compile Your Project And Collect Symbols
  2. Symbols: Host Your Symbols
  3. Client: Crash On The Client And Send A Report
  4. Processor: Receive A Report And Process It (With Symbols)
  5. Reporter: Aggregate/Search/Display Processed Reports

Because there are 5 completely different environments and lots of configuration you might want to do for each, the ecosystem is covered by various projects, even though we're all kind of working together on this stuff.

Don't know what a minidump is? Check out our docs for some background!

minidump-pipeline Usage

When you cargo run this repo, it will:

  • build the crash-client example app
  • collect its symbols up with dump_syms
  • ask the app for its crash suite
  • for each crash suite item:
    • run the app (via minidumper-test), which will crash, and produce a minidump with minidumper
    • process the minidump with minidump-stackwalk, producing reports and logs
  • compare the results against expectations (just run-pass kind of checking, no assertions yet on the final output's contents)
  • emit a big full-report.json linking all the intermediate artifacts and results
  • set non-zero status if any tests unexpectedly failed

Configuration is done through pipeline.toml. This file can be edited locally, or you can make your own config and pass it in with --config. You can also specify the RUN_NAME with --run, which will be useful later.

These sections of the config specify how minidump-pipeline will install different binaries. You can either fetch from crates.io, git, or a local path. This lets you easily test out the whole pipeline with different toolchains swapped out.

If the tool runs successfully, then under runs/$RUN_NAME you will find:

  • syms/ containing the .syms emitted by dump_syms
  • dumps/ containing the .dmps emitted by minidumper
  • reports/ containing the processed crash reports (and logs) emitted by minidump-stackwalk
  • build/ containing the raw build dir
  • install/ containing the final binaries

If you do two different runs with different RUN_NAMEs, then you can theoretically diff the results to compare how two different toolchains work.

TODO

  • integrate socc-pair for diffing two runs?
  • integrate minidump-debugger for inspecting the processed dump?
  • build crash-client inside of the run dir, so the native artifacts are part of it?
  • make it easier to swap out the rust toolchain crash-client is built with?

Production: Fully Streamlined Workflow (Pay Someone Else)

Just use sentry.io.

Yeah uh, doing this all yourself is a ton of work, and you can just give people money to deal with a lot of the details for you? A lot of the libraries in The Pipeline are comaintained by them, and they use all this stuff in their own tooling, so, really that is the easiest way to use this stuff.

I don't even work for them, I just wanted to be honest about the path of least resistance. But ok let's dig into what to use if you want to do It All Yourself (or you know, work for one of the companies that actually maintains these libraries/applications).

Production: Streamlined Workflow

The most streamlined summary of what to do is:

  1. Build with dump_syms
  2. Host Your Symbols With ??? ("just" need a static file server)
  3. Deploy with minidumper
  4. Process with minidump-stackwalk
  5. Display with ??? (on your own... minidump-debugger?)

Ok that's a bit too summarized, let's bust open those steps. Several of these steps currently require you to write some glue code, which will be indicated [like this].

Step 1: Your Build (Server/Task/Script)

  1. [build glue] build your application
  2. run dump_syms on your build dir to convert native symbols to the breakpad .sym format
  3. [http glue] upload the resulting .sym to your symbol server
  4. [http glue] upload your binaries to be deployed to clients

This stage of the pipeline is largely covered by the symbolic family of crates, with dump_syms wrapping all that functionality up into one CLI application.

All that really needs to happen in your build is to grab the "debuginfo" or "symbols" for your application. (Everyone has different names for different subsets of this stuff, but "symbols" is kind of the common terminology in the minidump world, so let's go with that).

dump_syms does this by converting the native versions of that info into a platform-agnostic format with all the details your crash-reporting infrastructure doesn't need ripped out.

With that information isolated and packaged up, you are now free to ship fully "stripped" binaries, as your crash report processor will be able to retrieve that information from your symbol servers (Step 2).

Technically speaking, dump_syms is optional, but it simplifies a lot of things and amortizes a lot of expensive steps, so we'll be assuming you want to use it. That said, it's still useful to understand the underlying infrastructure that dump_syms simplifies away, as some details of that system still remains.

What Are Symbols And How Are They Used?

Your symbols are enormous tables (like 100s of MB or even many GB) that your compiler emits as part of the build (depending on various flags). These tables map every single address in your binary to information like:

  • "what source file/function/line is this address"
  • "how do you unwind the stack from here" ("CFI", "call frame information", "unwind tables")
  • "what functions were inlined here"
  • "how do you recover function arguments from here"

For crash reporting, the first two are really important, the third is nice to have, and the last isn't very important. This information allows us to produce human-readable backtraces for all your threads at the time of the crash.

These tables are generally represented in native platform-specific formats like DWARF or PE32+, and can be stored in your executable, shared libraries, intermediate artifacts, or dedicated debug files like [pdbs][].

In the terminology of minidumps, your application contains several modules, and each module can have a code file and debug file. Your executable is one module, and every dynamic/shared library it loads is another module. Those native binaries (before stripping) are their own code files. Things like [pdbs][] are debug files.

Compilers generally have some system for generating timestamps/hashes/guids/uuids that uniquely identify the code/debug files for a particular build. These ids are your code ids, debug ids, and versions, and should be available in your binaries/libraries at runtime.

When a crash happens, the minidump generator (Step 3) will gather up those various ids and store them in your minidump as Module Records. When processing a minidump (Step 4), these records are used to query your symbol server (Step 2) for the symbols it needs.

Keen-eyed readers at this point may notice a problem: the specific shared libraries that your application will link to on the client aren't available at build time! This is what we in the biz call A Huge Fucking Pain In The Ass, and different platform vendors do better/worse jobs of helping you solve it.

Some vendors make system symbols available locally, so moving processing (Step 4) onto the client has some merits, but this isn't currently supported by our pipeline. It's also inherently sketchy because we are talking about a system that just crashed our application. It's best to not overstay our welcome and to just do the minimum amount of work to submit the report.

Some vendors host system symbols online. For instance, Microsoft's symbol servers host the code files and debug files for every system library they've ever shipped ever. We'll discuss that in more detail in Step 2, as Microsoft's protocol is the entire basis for it.

Other vendors tell you to go fuck yourself and make it impossible or difficult to get system symbols.

Companies like Mozilla and Sentry have tons of random workflows for trying to scrape and process the symbols for platforms we care about however we can. This includes things like "yeah this one person just runs a bunch of scripts on various macos/xcode versions". This Sucks.

Step 2: Your Symbol Server

  1. [http glue] receive symbols from the build
  2. [storage glue] store the symbols
  3. [http glue] host a symbol server endpoint for Step 4 to query
    1. For native symbols, we support the microsoft symbol server protocol
    2. For .syms, we support the tecken symbol server protocol

TLDR: run a static file server that hosts a .sym for a module at DEBUG_FILE/DEBUG_ID/DEBUG_FILE.sym (DWrite.pdb/c10250ffba478e770798871932c7d8c51/DWrite.sym).

In the previous section we mentioned how Microsoft's symbol servers host symbols for basically every system library they've ever shipped, which fucking rules. Here's another fun fact: the Minidump format itself was also created by Microsoft.

Given these two facts, it may not surprise you to learn that these two pieces of technology were built to be slapped together. Whether you open an Official Windows Minidump in an Official Microsoft Tool like windbg, or process an extended breakpad-style minidump with something like minidump-stackwalk, the symbol server protocol is at the heart of it.

Thankfully this protocol is incredibly simple, and is simply a static file server with files hosted at the appropriate path schema. Let's look at a particularly copy of the Microsoft system library DWrite.dll:

code_file        = "C:\Windows\System32\DWrite.dll"
code_identifier  = "29a9e8ad27f000"
debug_file       = "DWrite.pdb"
debug_identifier = "c10250ff-ba47-8e77-0798-871932c7d8c5-1"

Microsoft's servers host the native code_file and debug_file at /FILE_NAME/ID/FILE_NAME:

(Note how code_file has the directory removed, and debug_identifier has the -'s removed!)

Mozilla's servers host the .sym as if it was the debug_file, but with the second .pdb changed to .sym:

(Note that dump_syms merges the info from the code_file and debug_file into one sym!)

Our tooling for processing minidumps (Step 4) technically supports both of these protocols, although the .sym format is preferred, and the native format is disabled by default (it's hackily implemented by downloading the binaries and running dump_syms to get a .sym, but in the future we'd like to cut out the middle-man and handle binary formats natively).

In either case you give it a URL prefix to append the query paths to (https://symbols.mozilla.org/, https://msdl.microsoft.com/download/symbols/). Only the full query paths need to resolve to anything. We recommend making symbol servers case-insensitive, but some third-party servers aren't. Our tooling tries to preserve casing at all points to interoperate with all systems.

If we do move towards handling more native debuginfo, then we may also want to support things like the elfutils debuginfod protocol, which various linux distros are increasingly hosting their own system symbols on, but that's far-future work.

Step 3: Your App (Running On The Client, Crashing, And Reporting)

  1. spawn two processes: your app, and the crash-monitor (either ship two binaries, or spawn two instances of one)
  2. in your app, create a minidumper::Client, which will act as a signal handler
  3. in your crash-monitor, create a minidumper::Server, which will create a minidump for your app
  4. [http glue] send the report+minidump to your servers

This stage of the pipeline is handled by the crash-handling family of crates, with minidumper wrapping it all up into one convenient library.

This is the part I'm fuzziest on, so there will be less details for now, but I'll try my best.

It may be surprising/annoying to learn that this system requires you to have two binaries running on the client. In the past we tried to do everything in one application, and it kind of works ok, but it's an inherently messy and problematic thing to do.

Even though a minidump is mini it still has a fair amount of data-collection to do, and then needs to open up an http connection and send the minidump to your servers. This isn't that big of a deal for a normal application to do, but it's a lot messier to do in an application that literally just crashed, possibly due to a memory corruption bug.

The multi-process architecture is simply more reliable, and the only thing supported by this pipeline. That said, some work does still need to be done in your app:

  • detecting a crash (handling signals)
  • gathering a crash-context (some minimal core details)
  • sending the crash-context to the crash-monitor process

At this point, the crash-monitor process needs to:

  • gather all the other details about the system and crashed process
  • write a minidump
  • send that minidump (and potentially other details) to your crash reporting server

This is mostly just a bunch of platform-specific code for enumerating threads, dumping registers, getting stack memory, getting system info, and so on.

Step 4: Your Crash Report Processor Server

  1. [http glue] receive a report+minidump
  2. run minidump-stackwalk --json --symbols-url="https:/your.symbol.server.com/" to produce a json report
  3. [http glue] send the json report to crash reporting application (Step 5)

This stage of the pipeline is handled by the rust-minidump family of crates, with minidump-stackwalk wrapping it all up into one CLI application, which itself is just a thin client around the minidump-processor library.

I'm emphasizing minidump-stackwalk and the json report here because they're stable interfaces with proper schemas/docs, but if you don't like that stuff and are fine with potential API breakage in new versions, then you can also totally use minidump-processor directly and convert the in-memory structures into whatever you want.

If you use minidump-stackwalk, fetching symbols from a symbol-server over http will be enabled by default, while it will be disabled by default in minidump-processor. Either way, this functionality is currently provided by the breakpad-symbols library, although we have a very Dread Pirate Roberts relationship with it, in that every night we go to promise ourselves that we'll kill it Soon (and replace it with something based on symbolic).

Mozilla likes the minidump-stackwalk + http + json-report workflow, Sentry likes the minidump-processor + hackily-intercept-and-fulfill-symbol-queries-with-their-own-backend + produce-their-own-custom-reports workflow. It takes all kinds. :)

minidump-stackwalk is primarily designed around parallelism by spawning multiple instances of it that all share an on-disk cache, but we're starting to teach it to be more async and concurrent within one process.

Step 5: Your Processed Crash Report Display

  1. digest the json crash report into a database
  2. build some kind of CRUD app on top of that database
  3. build a web-based UI for viewing crash reports

At this point you're kind of outside of this project's purview, although maybe it shouldn't be?

Some tooling from step 4 provides basic human-readable views:

More complex stuff like search/aggregation/monitoring puts you more in the territory of "just pay for sentry.io", although mozilla has its own infra so it's not impossible to build your stuff. Just, a lot of work.

A Rough Diagram Of The Pipeline

On the off-chance that this helps you visualize all the moving parts, here's a vague diagram of everything involved. I've broken the architecture into 3 major sections:

  • your build
  • your client (user)
  • your server (you)

The client and server are further broken up into separate processes.

Rough atlas:

  • solid arrows: code invoking
  • dashed arrows: data flow
  • diamond boxes: data (files/structs)
  • square boxes: specific rust-minidump/crash-handling/symbolic crates
  • round boxes: "glue" (your problem for now)
graph TD;
subgraph build
  cargo
  dump_syms
  http-symbol-sender(http-symbol-sender)
  binary{.exe/.pdb}
  sym{.sym}
end

subgraph client
  subgraph app
    YOUR_ACTUAL_APP(YOUR ACTUAL APP)
    minidumper::Client
    crash-handler
  end

  subgraph crash-monitor
    minidumper::Service
    minidump-writer
    http-dump-sender(http-dump-sender)
    context{CrashContext}
    dump{.dmp}
  end
end

subgraph server
  subgraph processor-server
    http-dump-receiver(http-dump-receiver)
    minidump-processor
    minidump
    structMinidump{Minidump}
    report{report.json}
  end
  subgraph symbol-server
    tecken(static-file-server)
    http-symbol-receiver(http-symbol-receiver)
  end
  subgraph display
    socorro(database?)
    webapp(webapp?)
  end
end

cargo -.-> binary
binary -.-> dump_syms
binary -.-> deploy(deploy)
dump_syms -.-> sym
sym -.-> http-symbol-sender

crash-handler ---> minidumper::Client
minidumper::Client ---> minidumper::Service
minidumper::Service ---> minidump-writer
minidump-writer -.-> dump
dump -.-> http-dump-sender
minidumper::Service ---> http-dump-sender 
minidumper::Service -.-> context
context -.-> minidump-writer

http-dump-receiver ---> minidump-processor
http-dump-receiver ---> minidump
minidump -.-> structMinidump
structMinidump -.-> minidump-processor
tecken -. .sym .-> minidump-processor
minidump-processor -.-> report
report -.-> http-dump-receiver
http-dump-receiver -. report.json .-> socorro

http-symbol-sender -. .sym .-> http-symbol-receiver
http-dump-sender -. .dmp .-> http-dump-receiver

About

The full build->deploy->crash->report->process pipeline for rust-minidump and friends

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages