HashMap should be more widely usable #11

Ericson2314 · 2018-03-07T19:54:05Z

Right now it can only be used on regular platforms due to the way the default hasher is initialized. This is a such a silly little thing. Yes, it's a one-of issue, but I view it as a good litmus test.

jethrogb · 2018-03-07T20:12:46Z

I think there's an issue for better handling of default type parameters, but I can't find it right now.

Ericson2314 · 2018-03-07T20:58:00Z

There is. But I rather not wait for such language change when we can just move it into alloc and wrap in a new type. We could have done that years ago.

jethrogb · 2018-03-07T21:03:39Z

That would mean that users of std and alloc are seeing a different type, no?

Ericson2314 · 2018-03-09T17:05:23Z

@jethrogb Yes. But that's no worse than today when its not even available in alloc. While alloc is unstable, a stop-gap solution like the newtype need not tie our hands.

WildCryptoFox · 2018-11-26T14:30:24Z

@Ericson2314 @jethrogb A newtype could have worked, but now the default type alias should work today. Is there anything blocking this today?

Ericson2314 · 2018-11-27T21:15:44Z

type aliases cannot have default type parameters.

WildCryptoFox · 2018-11-28T02:16:07Z

@Ericson2314 Er. I use this feature often.

pub type Result<T, E = Error> = core::result::Result<T, E>;

Ericson2314 · 2018-11-28T02:28:35Z

Woah I didn't know that was even allowed in stable Rust. @eddyb?

Still there are issues:

pub type Result<T, E = ()> = core::result::Result<T, E>;

fn main() {
    Ok(());
    // fn asdf<E = ()>() -> Result<(), E> { Ok(()) }
    //asdf();
    println!("Hello, world!");
}

Ok is ambiguous, and parameter defaults are not accepted on functions.

WildCryptoFox · 2018-11-28T02:38:22Z

@Ericson2314 Of course that Ok is ambiguious, it is Result<(), _>::Ok.

Indeed functions don't get this treat, but for all that it matters for HashMap you just need a impl<K: Eq + Ord, V> HashMapStd for HashMap<K, V> { fn new; fn with_capacity; }; where the HashMap type alias in std includes the default. See my suggestion in rust-lang/rust#56192

WildCryptoFox · 2018-11-28T02:59:48Z

@Ericson2314 I must admit I was surprised to find the following doesn't work.

type Result<T, E = ()> = std::result::Result<T, E>;

// doesn't build
//use Result::*;

enum XResult<T, E = ()> {
    XOk(T),
    XErr(E),
}

// works as expected
use XResult::*;

Edit: Although you can't use XOk(()); either as it doesn't know the E type concretely; thus no difference or loss of features. Although it would be nice if it defaulted the type and adopted the concrete types when made concrete later.

Ericson2314 · 2018-11-28T16:53:48Z

To not break compat, we have to default those method type vars taken from the inherent impl. This isn't possible. We don't have the same problem with variant constructurs since HashMap has none exposed, but if it did that would be the same problem.

Amanieu · 2018-11-28T17:12:58Z

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

Ericson2314 · 2018-11-28T23:02:33Z

That's a good idea if you don't need to worry about other libraries consuming a hash map. My RFC rust-lang/rfcs#2492, which will hopefully get more consideration after the Rust 2018 crunch, would get us a way to defer the definition of the default so the main one can be moved. (And I'd hope the general thrust of the portability push would allows @Amanieu's alternative hashmaps and locks to be substituted in custom-built standard libraries easily so that they can be used more widely.)

cbeck88 · 2019-01-23T01:49:39Z

I hope that I'm not out-of-line in contributing to this conversation given that
I'm not part of the working group, or an active rust contributor.

Let me just suggest that it's helpful to the users if we can avoid letting perfect
be the enemy of good.

Even if lang_items are ugly and the proliferation of lang_items
is "a bad thing", it would be have been nice if 6+ months ago when this issue was
first raised (WG issue 11 Mar 27, 2018 #11,
PR June 27, 2018 rust-lang/rust#51846)
the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever,
lang_item was acceptable in the near term, with the expectation that it would be removed when
existential types RFC was eventually (if ever?) merged, so that there is a "long term solution"
and a "quick fix to tide us over, which is self-contained and without negative
consequences for other parts of the system".

From the point of view of my company:

We are trying to use rust to build Intel SGX enclaves.
These are ``trusted computing'' environments, where access to the OS is limited.
There are several teams at other companies that have worked on creating
"enclave frameworks" in rust that provide basic std like functionality,
piercing the enclave in Intel's prescribed way to make specific sets of system calls.
(rust-sgx-sdk: https://github.com/baidu/rust-sgx-sdk, fortanix: https://github.com/fortanix/rust-sgx)
(fortanix is also contributing patching directly to rust stdlib to support an sgx target)
We believe that these frameworks are very complex and somewhat defeat the purpose
of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.
Additionally, there are papers that recommend that the SGX OCALL's on which these
frameworks are based when they call out across the enclave to the OS, are fundamentally
a source of vulnerabilities and should be avoided.
(https://www.blackhat.com/docs/us-17/thursday/us-17-Swami-SGX-Remote-Attestation-Is-Not-Sufficient-wp.pdf)
We prefer to build our enclave target as #[no_std] for this reason. Our enclave
only does some serialization and cryptography, and nothing else, and the
untrusted layer of our program outside the enclave is what does network and disk I/O.
Not being able to use the std::HashMap in a no_std crate even with alloc
is really bad for us, we need to be able to have efficient key value stores
in the enclave, and we need to be able to use third party serialization libs,
which all want to provide support for HashMap out of the box.
In fact, the dependency of the serde lib on std, solely for HashMap afaict,
is the source of quite a bit of pain for projects lke ours, because
if a target has dependencies some of which select serde/std and some of which
select serde/alloc, nasty build failures ensue where rust says that certain
traits are not implemented (although they plainly are, it's just for the other
version of serde).
To get the exact same hashmap implementation, we can copy-paste the stdlib
Hashmap in tree, much as hashmap_core crate does. And then swap out the
references to ThreadRng with calls to RDRAND intrinsics on x86.
But this is a significant maintanence burden, and since all of our code is
security-critical and will have to be audited externally, this means that
the auditors will have to audit the entire 20k+ lines of rust standard hashmap
that we copied in tree.
Using hashbrown crate as mentioned above is only a partial solution because
it too would need to be audited, and it seems to have less test coverage than
the rust standard hashmap (as would be expected).
While the rust-sgx-sdk etc. options can be reconsidered at this point since
they make it easy to use std::HashMap the truth is that these sdk's would
also need to be audited, and they are very large and by their nature circumvent
some of the security of the enclave by attempting to make it easy to talk to
the OS.

The inability to move HashMap out of std has led to fallout for the Rust
ecosystem at large, because it has significantly complicated serde and the use
of serde in no_std crates -- even though serde is merely a struct-field
reflection / serialization framework, consists entirely of traits / generic code,
and should not need to talk to the OS at all (!)

Intel SGX Enclaves seems at a high level to be the perfect application
for Rust -- almost anything that someone does in an enclave is security sensitive
and therefore stronger memory safety guarantees have an almost compelling appeal.
And Rust's serde is a significantly nicer in-language serialization framework
than what can be done in other systems-languages like C and C++ which usually
require external code-gen tools integrated with the build-system.
Almost the only sensitive part that actually needs to be in the enclave is the
actual cryptography and serialization routines, which seemingly should be trivial
to perform without the OS and in a no_std environment.
But if you can't use rust in practice for this without doing really sketchy stuff, e.g.

can't use open-source third-party serialization libraries without patching them for no_std support,
can't use standard hashmap without bringing in tree / patching it,
have to commit to using a nightly compiler for the foreseeable future

then there's a point at which engineers start to say it would be simpler and
better for the company if we just used C++ instead. In C++ the hashmaps don't use
randomness. It's trivial to replace things like malloc and free by interposing
symbols at link time. And if I want to use anything from the standard library
template headers, I can just include those headers, and as long as I tell the
linker not to link to libstdc++, pthreads, etc. we won't get any of the
OS-specific stuff.

Anyways -- I hope I've made my point without being too annoying. I want to thank
you all for your hard work. And I want to let you
know that we are keenly watching the resolution of this issue and the
RFC (rust-lang/rfcs#2492)!

Ericson2314 · 2019-01-23T04:44:35Z

@cbeck88 the biggest delay is not due to the scope of that RFC but due to the limit bandwidth of the core team. Even some sort of lang item fix would have been a long process (RFC, PR, etc...).

I'm totally with you that pure things need to work, and big "fameworks" that drag everything do contradict the ideal of fine grained composition and use-only-what-you-need.

Hopefully now that 2018 is out, we can get things moving with the RFC. Please comment sharing your usecase there!

jethrogb · 2019-01-23T06:45:22Z

@cbeck88

I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.

Not at all, all comments are welcome!

Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.

In my experience Rust generally favors the status quo in favor of ad-hoc solutions if a general solution is in sight.

it would be have been nice if ... the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever, lang_item was acceptable in the near term, with the expectation that it would be removed when existential types RFC was eventually (if ever?) merged, so that there is a "long term solution" and a "quick fix to tide us over, which is self-contained and without negative consequences for other parts of the system".

It would've been nice too if we had const generics already. The Rust development process is community-driven and as @Ericson2314 mentions the priorities are set by the core team.

This issue wasn't first raised 10 months ago (it's just that the Portability WG was formed around that time) but rather many years ago by myself and other people facing the exact same problems you described. I first released the core_collections crate almost 3 years ago for the exact purpose you want it for now. I don't think the arguments in support of this feature have changed much and I also don't think your usecase is a new one that hasn't been considered.

So, the only reason no progress has been made is because of prioritization. Please add your voice to the demand for this feature and hopefully something can be done in the future!

(people who don't care about SGX can stop reading here)

I'm the author of the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I think it perfectly fits your usecase and you should use it. I'll respond to some of your points here, but the discussion regarding Rust & SGX usage is entirely off topic for this issue I ask it be continued elsewhere. I'd be happy to talk about your specific security concerns, and if possible make changes to our platform address them. We can setup a call, we can talk on the #rust-sgx slack channel, or communicate in whatever other forum you suggest.

Intel SGX Enclaves seems at a high level to be the perfect application for Rust -- almost anything that someone does in an enclave is security sensitive and therefore stronger memory safety guarantees have an almost compelling appeal.

I agree! That's why I've been working on accomplishing exactly this for over 3 years and have started a company that has made this a reality. For a long time, Fortanix was the only company running Rust & SGX in production. Our considerable experience actually writing and using enclaves has been invaluable in designing the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I hope you can learn from our experience instead of having to re-encounter all the problems yourself by building from scratch.

piercing the enclave in Intel's prescribed way to make specific sets of system calls [...] SGX OCALL's on which these frameworks are based

I'm not sure what you mean by this exactly, but the Fortanix EDP isn't based on the Intel SGX SDK.

Additionally, there are papers that recommend that the SGX OCALL's [...], are fundamentally a source of vulnerabilities and should be avoided.

I think you are reading too much into the Swami paper. What he describes (but doesn't cite) are known as Iago attacks and yes careful interface design is needed to avoid such attacks. However, there are even more damning attacks on the userspace-enclave interface, such as https://arxiv.org/abs/1710.09061 .

Because getting this interface right is so important (it's the most direct attack surface an attacker can use!), I highly recommend that you do not design it from scratch, lest you run into the same pitfalls as everyone else before you.

The x86_64-fortanix-unknown-sgx target uses a minimal interface that has been carefully reviewed (by us, academia, 3rd party security auditors) and refined over time, and that we believe is secure. The implementation uses Rust language features specifically designed to prevent misuse of the interface.

We believe that these frameworks are very complex and somewhat defeat the purpose of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.

[...]

by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.

I don't think this is a fair characterization. It sounds like your enclave will largely consist of the memory allocator, the alloc crate which you apparently intend to use and serde. The complexity of those parts is far greater than anything you'd be pulling in alongside it from std. The “talking to the OS” part of std is rather minimal, especially on x86_64-fortanix-unknown-sgx. Besides, if you don't call TcpStream::new you will not have to add that part of the “framework” to your complexity budget. Your auditors should be able to confirm that there is no use of e.g. std::io/std::net.

we can copy-paste the stdlib Hashmap in tree [...] But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree.

I don't understand this argument. Either you need to audit the HashMap implementation, or you don't, but whether you pull it from std or alloc or some fork of the source code shouldn't matter for a security audit.

have to commit to using a nightly compiler for the foreseeable future

x86_64-fortanix-unknown-sgx is on track to land in 1.33.0 stable.

if we just used C++ instead

It's not clear to me if you intend to use the the Intel SGX SDK here or not, but the “trusted” parts of that cloc in at about 60000 CLOC of unsafe code! The x86_64-fortanix-unknown-sgx-specific parts of std are only 3800 CLOC.

In C++ the hashmaps don't use randomness.

Depending on your usecase, this could in and of itself be a security vulnerability!

cbeck88 · 2019-01-26T19:30:03Z

Hi Jethro,

Thanks very much for your comments. We are still discussing internally several
of the things that you mentioned.

It is great to hear that the fortanix sgx target is on track to hit stable in
1.33.0!

XVilka · 2020-08-26T05:02:27Z

We recently met the same problem in petrgaph, it makes graph code that uses Hash* looking a bit messy with all these feature switches. For now indexmap and hashbrown are helpful, but it would be amazing to get fully-features Hash* available for everyone.

joshlf · 2021-02-06T00:07:17Z

Fuchsia needs this in order to make Netstack3 no_std.

Amanieu · 2021-02-06T01:19:33Z

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

joshlf · 2021-02-06T02:57:15Z

Ah I missed your comment there, thanks!

Stargateur · 2021-11-10T18:38:35Z

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

But that only useful for application, If I use this in a library, people can't put the HashMap of std in place of the HashBrown HashMap.

jethrogb added the std label Mar 7, 2018

jethrogb mentioned this issue Nov 26, 2018

Move HashMap to alloc::collections rust-lang/rust#56192

Closed

WildCryptoFox mentioned this issue Nov 29, 2018

No std (extends 2018 edition) orlp/slotmap#18

Closed

josephlr mentioned this issue Jun 26, 2019

Use getrandom crate rust-lang/rust#62082

Closed

alexpovel mentioned this issue Aug 23, 2023

Make crate no_std-compatible alexpovel/decompound#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashMap should be more widely usable #11

HashMap should be more widely usable #11

Ericson2314 commented Mar 7, 2018 •

edited

jethrogb commented Mar 7, 2018

Ericson2314 commented Mar 7, 2018 •

edited

jethrogb commented Mar 7, 2018

Ericson2314 commented Mar 9, 2018

WildCryptoFox commented Nov 26, 2018

Ericson2314 commented Nov 27, 2018

WildCryptoFox commented Nov 28, 2018

Ericson2314 commented Nov 28, 2018 •

edited

WildCryptoFox commented Nov 28, 2018

WildCryptoFox commented Nov 28, 2018 •

edited

Ericson2314 commented Nov 28, 2018

Amanieu commented Nov 28, 2018

Ericson2314 commented Nov 28, 2018 •

edited

cbeck88 commented Jan 23, 2019 •

edited

Ericson2314 commented Jan 23, 2019

jethrogb commented Jan 23, 2019

cbeck88 commented Jan 26, 2019

XVilka commented Aug 26, 2020

joshlf commented Feb 6, 2021

Amanieu commented Feb 6, 2021

joshlf commented Feb 6, 2021

Stargateur commented Nov 10, 2021

HashMap should be more widely usable #11

HashMap should be more widely usable #11

Comments

Ericson2314 commented Mar 7, 2018 • edited

jethrogb commented Mar 7, 2018

Ericson2314 commented Mar 7, 2018 • edited

jethrogb commented Mar 7, 2018

Ericson2314 commented Mar 9, 2018

WildCryptoFox commented Nov 26, 2018

Ericson2314 commented Nov 27, 2018

WildCryptoFox commented Nov 28, 2018

Ericson2314 commented Nov 28, 2018 • edited

WildCryptoFox commented Nov 28, 2018

WildCryptoFox commented Nov 28, 2018 • edited

Ericson2314 commented Nov 28, 2018

Amanieu commented Nov 28, 2018

Ericson2314 commented Nov 28, 2018 • edited

cbeck88 commented Jan 23, 2019 • edited

Ericson2314 commented Jan 23, 2019

jethrogb commented Jan 23, 2019

cbeck88 commented Jan 26, 2019

XVilka commented Aug 26, 2020

joshlf commented Feb 6, 2021

Amanieu commented Feb 6, 2021

joshlf commented Feb 6, 2021

Stargateur commented Nov 10, 2021

Ericson2314 commented Mar 7, 2018 •

edited

Ericson2314 commented Mar 7, 2018 •

edited

Ericson2314 commented Nov 28, 2018 •

edited

WildCryptoFox commented Nov 28, 2018 •

edited

Ericson2314 commented Nov 28, 2018 •

edited

cbeck88 commented Jan 23, 2019 •

edited