You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The unsafe behavior of Raw is trivially shown to be unsound. str is required to be valid UTF-8, and its methods assume that it contains valid UTF-8 for the purpose of, for example, scanning for code points. I'm able to produce UB by creating a simple Serializer that serializes strings as an array of chars, and then feeding it malformed UTF-8 data:
/// Serializer that serializes strings as a list of charstructCharListSerializer{output:Vec<char>,}impl ser::Serializerfor&mutCharListSerializer{fnserialize_str(self,s:&str) -> Result<Self::Ok,Self::Error>{self.output.extend(s.chars());Ok(())}// Other methods simply panic}fnmain(){letmut ser = CharListSerializer{output:Vec::new()};let data = "ABC";
data.serialize(&mut ser).unwrap();assert_eq!(ser.output,['A','B','C']);
ser.output.clear();// An emoji: 😣let data = [0xf0,0x9f,0x98,0xa3];let data = Raw::from_utf8(data.as_slice().to_owned());
data.serialize(&mut ser).unwrap();assert_eq!(ser.output,['😣']);
ser.output.clear();// Uh oh. 0xf0 signals a 4-byte UTF-8 code point,// but there are only two bytes in this buffer.let data = [0xf0,0x9f];let data = Raw::from_utf8(data.as_slice().to_owned());
data.serialize(&mut ser).unwrap();}
When I run this normally, I get a crash:
error: process didn't exit successfully: `target\debug\rust-playground.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)
And when I run with cargo miri run, it immediately detects the undefined behavior:
error: Undefined Behavior: entering unreachable code
--> C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
|
51 | unsafe { intrinsics::unreachable() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^ entering unreachable code
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::hint::unreachable_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
= note: inside `std::option::Option::<&u8>::unwrap_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\option.rs:877:30
= note: inside `core::str::validations::next_code_point::<std::slice::Iter<u8>>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\validations.rs:56:27
= note: inside `<std::str::Chars as std::iter::Iterator>::next` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\iter.rs:44:18
= note: inside `std::vec::Vec::<char>::extend_desugared::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2696:35
= note: inside `<std::vec::Vec<char> as std::vec::spec_extend::SpecExtend<char, std::str::Chars>>::spec_extend` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\spec_extend.rs:18:9
= note: inside `<std::vec::Vec<char> as std::iter::Extend<char>>::extend::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2670:9
note: inside `<&mut CharListSerializer as serde::Serializer>::serialize_str` at src\main.rs:86:9
--> src\main.rs:86:9
|
86 | self.output.extend(s.chars());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: inside `<rmp_serde::Raw as serde::Serialize>::serialize::<&mut CharListSerializer>` at C:\Users\Lucre\.cargo\registry\src\github.com-1ecc6299db9ec823\rmp-serde-1.0.0\src\lib.rs:190:9
note: inside `main` at src\main.rs:211:5
--> src\main.rs:211:5
|
211 | data.serialize(&mut ser).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
error: process didn't exit successfully: `C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\bin\cargo-miri.exe target\miri\x86_64-pc-windows-msvc\debug\rust-playground.exe` (exit code: 1)
This type should be either:
Removed,
Redesigned to use serialize_bytes
Marked as unsafe and changed to require valid utf-8.
To be frank, it's not at all clear to me why this type exists in the first place? It isn't used by RMP anywhere, and I don't really understand why a user would ever be allowed to call serialize_str on non-UTF8 data, when the entire point of that method is that valid UTF-8 can be assumed. It appears to exist for no other purpose than to create this kind of soundness hole.
I think this functionality could be implemented without the invalid transmute. Serde doesn't support specialization directly, but it is possible to hack it by abusing newtype serialization with a custom name. This is hack is already used for ExtSerializer injecting _ExtStruct((_,_)). So similarly, the library could tell serde to serialize invalid UTF-8 string as _InvalidStringStruct(&[u8]) and have msgpack serializer react to this appropriately.
I don't have time to implement this workaround, and I also don't want to break existing users even if they rely on the bad implementation, so for now I'll deprecate the Raw/RawRef newtypes.
The
unsafe
behavior ofRaw
is trivially shown to be unsound.str
is required to be valid UTF-8, and its methods assume that it contains valid UTF-8 for the purpose of, for example, scanning for code points. I'm able to produce UB by creating a simple Serializer that serializes strings as an array of chars, and then feeding it malformed UTF-8 data:When I run this normally, I get a crash:
And when I run with
cargo miri run
, it immediately detects the undefined behavior:This type should be either:
serialize_bytes
unsafe
and changed to require valid utf-8.To be frank, it's not at all clear to me why this type exists in the first place? It isn't used by RMP anywhere, and I don't really understand why a user would ever be allowed to call
serialize_str
on non-UTF8 data, when the entire point of that method is that valid UTF-8 can be assumed. It appears to exist for no other purpose than to create this kind of soundness hole.My full reproduction code is available as a gist.
The text was updated successfully, but these errors were encountered: