Run fuzzing tools to discover safety holes #30

mre · 2018-07-08T15:05:34Z

In a discussion on Reddit, user /u/Shnatsel suggested the use of fuzzing to test hyperjson for safety issues and panics. There are a few tools around, which could be helpful:

cargo-fuzz — currently supports libfuzzer
hongfuzz-rs — helped discover issues in crates such as hyper and h2
afl.rs — discovered issues in rustc and serde
Rust Fuzz Trophy Case — lists issues found in crates and the fuzzers used

(List assembled by @memoryruins taken from here)

If somebody wants to take a stab at it, please comment here with any questions.

Shnatsel · 2018-07-08T15:56:31Z

To clarify, I have listed a bunch of approaches to software verification, and I do not expect fuzzing specifically to discover many interesting bugs, unless there is gross negligence in FFI. Still, for something that takes half an hour to set up, it provides a really good return on investment, so there's no reason not to do it.

https://github.com/rust-fuzz/targets already provides a fuzzing target for serde_json deserialization, including roundtrip equivalence check. It does not come with a starting corpus, though.

Do not forget to use address and memory sanitizers if you're looking for bugs in unsafe code, otherwise they will not be discovered. You can omit sanitizers when looking for panics or building the initial corpus.

Shnatsel · 2018-07-08T17:21:53Z

All unsafe blocks in serde seem to be in serialization, not in deserialization. So fuzzing is unlikely to help you find those, since non-specialized fuzzers are unable to systematically produce interesting Python objects to serialize.

I've looked at what kind of unsafes there are in serde and serde-json, here's what I've found:

There is a number of str::from_utf8_unchecked() calls in both serde and serde-json. If the input is not actually utf8, the result is undefined behavior. This should be reasonably easy to verify since Python3 has a built-in unicode string type. You just need to make sure that the FFI library does not cast Python binary strings to Rust strings behind your back, and you're set.

serialize_display_bounded_length() in serde core is trickier because in addition to from_utf8_unchecked() it also trusts the Display trait implementation on an arbitrary input object to correctly report the length of data it has written.

dtolnay · 2018-07-08T17:39:27Z

I don't think you have accurately characterized the assumptions around utf8 in Serde:

it also trusts the Display trait implementation on an arbitrary input object to correctly report the length of data it has written.

The code only involves the Display impls of 4 specific types from std::net in the standard library, although this is not relevant for safety and the code is safe for an arbitrary implementation of Display.
A Display impl does not "report the length of data it has written".

The key observation, which is called out in a comment, is that there is no way to emit non-utf8 encoded bytes through any API in std::fmt. You can see this by looking at the methods of std::fmt::Write which the module is built around.

Shnatsel · 2018-07-08T18:30:35Z

@dtolnay than you for the clarification! I'd really appreciate if you could add a code comment explaining why a poorly implemented Display trait would not be a security issue. Here's the relevant snippet:

macro_rules! serialize_display_bounded_length {
    ($value:expr, $max:expr, $serializer:expr) => {{
        let mut buffer: [u8; $max] = unsafe { mem::uninitialized() };
        let remaining_len = {
            let mut remaining = &mut buffer[..];
            write!(remaining, "{}", $value).unwrap();
            remaining.len()
        };
        let written_len = buffer.len() - remaining_len;
        let written = &buffer[..written_len];
        //  <snip>

By reading this code I observe that:

This macro can be invoked on an arbitrary input object with an arbitrary, potentially incorrect, trait implementations
This macro creates an array with uninitialized memory in it
Said array is populated and its new length of it is determined via a call to write! macro.
If the remaining slice length after calling write! macro is not set to a correct value, this function contains an information disclosure vulnerability.
The write! macro invokes the (potentially incorrect) implementation of Display trait internally.

This leads me to believe that in order to verify that this is safe, I need to inspect the implementation of the write! macro and ensure that it always writes the bytes and modifies input slice length as advertised for an arbitrary implementation of Display trait; failing that, I need to audit the implementation of Display trait for all objects that macro can be invoked on.

This makes some critical invariants non-local. So an explanation in a code comment on why write! always writes the bytes as advertised and mutates the length of the input slice correctly for an arbitrary and possibly incorrect implementation of Display trait would be very helpful.

dtolnay · 2018-07-08T19:11:36Z

Thanks for the feedback @Shnatsel. I would love to get to the root of where our perspective differs. In response to your observations:

in order to verify that this is safe, I need to inspect the implementation of the write! macro and ensure that it always writes the bytes and modifies input slice length as advertised

I guess I don't see this use of write! as fundamentally different from any other place that Serde relies on standard library functionality to behave "as advertised". If our threat model includes write! possibly dumping garbage bytes into the output and returning a made-up integer then we are going to have a bad time no matter what, even in code that does not visibly use unsafe nearby. Practically everything is implemented in terms of something unsafe. Just calling to_string() on an integer or inserting into a BTreeMap involve dozens of lines of unsafe code. The only way to stay sane is assume the standard library behaves "as advertised" and inspect the implementation if there is any doubt, whether or not the callers are safe code.

[Emphasis mine in the following quote]

code comment on *why* write! always writes the bytes as advertised and mutates the length of the input slice correctly

As above, I'm not sure there is anything we could write in Serde that would justify why write! does what it says it does, beyond noting this is what it is supposed to do. The rest of Serde needs to trust every other standard library API (String, Vec, BTreeMap, etc) just as much as we trust write! here.

for an arbitrary and possibly incorrect implementation of Display trait

I've tried not to distinguish between "correct" and "incorrect" implementations of Display because the reasoning applies equally well to any possible way that Display could be implemented. Maybe I'm missing the sort of incorrectness you have in mind -- could you give an example of what you would consider an incorrect implementation of Display in this context?

Shnatsel · 2018-07-09T10:13:28Z

I was wondering if it was possible for an implementation of Display to actually write a different amount of bytes than is eventually reported to this code via slice mutation in write!.

Turns out this is not possible because Display passes a string to the Write implementation in the formatter, which handles writing and updating the length. Display cannot mess up the length of the string it passes without abusing unsafe code, so this code is safe for any safe implementation of Display.

Shnatsel · 2018-07-09T10:31:11Z

Thanks a lot to @dtolnay for clarifications and for bearing with me. This attitude to safety is commendable.

Apologies to mre for somewhat unrelated discussion on the issue.

mre · 2018-07-09T10:50:37Z

I quite enjoyed it tbh. 😃

RSabet · 2018-07-09T23:04:38Z

We definetly need more tests and fuzzing is a good start, in Haskell i really love quickcheck and i was happy to see Rust-Quickcheck

what about pyo3?
In pyo3-0.2.7:

grep -rn unsafe . | wc -l
742

so maybe add hypothesis-python

Shnatsel · 2018-07-09T23:44:35Z

pyo3 worries me both due to inherent unsafety of FFI and past track record of some of its developers. I would love to see it audited, but I have to admit my competence is insufficient for a meaningful audit.

mre · 2018-07-16T22:23:14Z

Thanks to @RSabet for integrating our first hypothesis-based tests. This should make the codebase quite a bit more robust with regards to edge-cases.

Shnatsel · 2018-07-19T00:41:23Z

I've checked pyo3 bugtracker, and it seems the library needs some serious work before you can rely on it to be safe for passing arbitrary data. Check this out:

PyO3/pyo3#159
PyO3/pyo3#145
PyO3/pyo3#94
PyO3/pyo3#71

Some of these issues are over a year old. And these are just the bugs that have been sufficiently dramatic to cause a segfault. You need something to go very, very wrong to trigger a segfault - most memory corruption is not detected unless you build the binary with address sanitizer.

TL;DR: you would need to audit the relevant parts of pyo3 before you can claim any kind of safety.

davidhewitt · 2020-05-03T22:17:03Z

Two years later, drive by hello from a new (ish) pyo3 maintainer. 😄

I treat the soundness of pyo3 as the number one priority of the library. IMO using Rust, a stricter language than C++, gives you design constraints which C++ doesn't have. Also, the C++ pybind11 library is already a very convenient choice for Python extensions written in a compiled language. So in my opinion most people who are using pyo3 really care about Rust's safety aspects.

It's a shame that pyo3 has historical safety issues; I have been working to see them removed. As of today I think I have submitted PRs to resolve the last of the issues labelled Unsound in the pyo3 bug tracker 🎉

I wouldn't be surprised if there are still skeletons in the closet though - if you do end up doing any fuzzing and find any issues related to pyo3, please do post them: I'll work to clean them up asap!

RSabet mentioned this issue Jul 16, 2018

first hypotests-python tests #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run fuzzing tools to discover safety holes #30

Run fuzzing tools to discover safety holes #30

mre commented Jul 8, 2018

Shnatsel commented Jul 8, 2018

Shnatsel commented Jul 8, 2018

dtolnay commented Jul 8, 2018

Shnatsel commented Jul 8, 2018 •

edited

dtolnay commented Jul 8, 2018

Shnatsel commented Jul 9, 2018

Shnatsel commented Jul 9, 2018

mre commented Jul 9, 2018

RSabet commented Jul 9, 2018

Shnatsel commented Jul 9, 2018

mre commented Jul 16, 2018

Shnatsel commented Jul 19, 2018

davidhewitt commented May 3, 2020 •

edited

Run fuzzing tools to discover safety holes #30

Run fuzzing tools to discover safety holes #30

Comments

mre commented Jul 8, 2018

Shnatsel commented Jul 8, 2018

Shnatsel commented Jul 8, 2018

dtolnay commented Jul 8, 2018

Shnatsel commented Jul 8, 2018 • edited

dtolnay commented Jul 8, 2018

Shnatsel commented Jul 9, 2018

Shnatsel commented Jul 9, 2018

mre commented Jul 9, 2018

RSabet commented Jul 9, 2018

Shnatsel commented Jul 9, 2018

mre commented Jul 16, 2018

Shnatsel commented Jul 19, 2018

davidhewitt commented May 3, 2020 • edited

Shnatsel commented Jul 8, 2018 •

edited

davidhewitt commented May 3, 2020 •

edited