Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for std::num::NonZero* types #49137

Closed
5 of 7 tasks
Centril opened this issue Mar 18, 2018 · 49 comments
Closed
5 of 7 tasks

Tracking issue for std::num::NonZero* types #49137

Centril opened this issue Mar 18, 2018 · 49 comments
Labels
B-RFC-implemented Feature: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Centril
Copy link
Contributor

Centril commented Mar 18, 2018

This is a tracking issue for the RFC "Add std::num::NonZeroU32 and friends, deprecate core::nonzero" (rust-lang/rfcs#2307).

Steps:

Unresolved questions:

  • Should the signed types be included?
  • Decide on naming -- NonZeroU32 or PositiveU32? Should the U be dropped?
  • Should the memory layout of e.g. Option be a language guarantee?
  • Settle issues wrt. const generics before stabilization.
    -- We should either end up with a generic type or with a clear explanation of why a generic type can't work or doesn't work well.
@Centril Centril added B-RFC-approved Feature: Approved by a merged RFC but not yet implemented. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Mar 18, 2018
@SimonSapin
Copy link
Contributor

Implementation PR: #48265

Documentation: @Centril, do doc-comments in this PR look sufficient to you?


My opinion on unresolved questions:

Signed types (e.g. NonZeroI32): as discussed in the RFC thread, they are included in the RFC and implementation PR because it was easy and because core::nonzero::NonZero<T> supports them too. I see a downside to having them, but I also haven’t had a use case for them. And they are easy to build outside of std in a crate that uses the unsigned types internally and does the conversion in new, new_unchecked, and get. (Unlike e.g. non-zero integers of different sizes if only NonZeroUsize were available.)

Positive* names: The fact that zero is not positive in English is subtle, and it is not shared by some other languages. (For example zero is both positive and negative in French, and the correct translation for “nombre positif” is “non-negative number”.) I don’t think relying on such a subtlety to communicate the core property of these types is a good idea.

Dropping the U in names: this is only possible if we also remove the signed types. But even so, I think that including the full name of the type accepted by new and returned by get is desirable.

Separate types v.s. one generic type: this was discussed at length in the RFC. The former has precedent with std::sync::atomic::Atomic*. The latter brings other unresolved questions that I feel don’t have satisfactory answers: trait bounds on that type parameter, and implementations of relevant traits for composite for types outside of std.

@SimonSapin
Copy link
Contributor

SimonSapin commented Mar 18, 2018

Making the memory layout of Option<num::NonZero*> a language guarantee: I explicitly made the RFC not make this guarantee in order to keep the RFC within the realm of the standard library (and hopefully easier to get accepted). But if @rust-lang/lang feels comfortable making that guarantee, that sounds good to me.

@Centril
Copy link
Contributor Author

Centril commented Mar 18, 2018

@SimonSapin

Documentation: @Centril, do doc-comments in this PR look sufficient to you?

Pretty much, yeah =)


On the unresolved questions:

  1. Maybe we should skip the signed types but add them when someone asks for them with an issue or PR or on IRC (don't need to go to another RFC then...)?

  2. I agree that NonZeroU32 is best.

@Centril Centril added B-RFC-implemented Feature: Approved by a merged RFC and implemented. and removed B-RFC-approved Feature: Approved by a merged RFC but not yet implemented. labels Mar 18, 2018
@SimonSapin
Copy link
Contributor

For what it’s worth the implementation of the 6 signed types costs exactly an additional 111 characters in a macro invocation. They’re not a maintenance burden.

@Centril
Copy link
Contributor Author

Centril commented Mar 18, 2018

For what it’s worth the implementation of the 6 signed types costs exactly an additional 111 characters in a macro invocation. They’re not a maintenance burden.

This is true; but they cost in compile times -- but probably not a noteworthy amount of time.

@clarfonthey
Copy link
Contributor

Honestly, compile time of end-user crates matters a lot more than compile time of libstd and co. because for most users libstd will already be compiled. So, even adding a minute of compile time to libstd isn't super bad.

@scottmcm
Copy link
Member

Sounds like I'm the only one who likes Positive, so let's mark that resolved and stick with NonZeroU32.

Maybe just put the signed ones under a different feature gate? That way we'll find out which ones people use, and there's still nobody "attached" to them, stabilize/remove them separately. (Personally I think 0 is a weird hole in a signed type, and would much rather have nice named types for symmetric integers [not INT_MIN] than ones without zero. [Until we get const generics and can make our own holes.])

@clarfonthey
Copy link
Contributor

I honestly think that Positive8 is a bit ambiguous considering how it could mean only 1 to 127 instead of 255.

NonZeroed8 may be a bit (heh) cleaner as it indicates that the bits aren't all zeroed, rather than the fact that 0 is an excluded value.

frewsxcv added a commit to frewsxcv/rust that referenced this issue Mar 23, 2018
Add 12 num::NonZero* types for primitive integers, deprecate core::nonzero

RFC: rust-lang/rfcs#2307
Tracking issue: ~~rust-lang#27730 rust-lang#49137
Fixes rust-lang#27730
alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 23, 2018
Add 12 num::NonZero* types for primitive integers, deprecate core::nonzero

RFC: rust-lang/rfcs#2307
Tracking issue: ~~rust-lang#27730 rust-lang#49137
Fixes rust-lang#27730
@SimonSapin SimonSapin changed the title Tracking issue for RFC 2307, "Add std::num::NonZeroU32 and friends, deprecate core::nonzero" Tracking issue for std::num::NonZero* types Mar 23, 2018
@SimonSapin
Copy link
Contributor

The implementation #48265 has landed, it should reach Nightly in a few hours.

@Centril
Copy link
Contributor Author

Centril commented Mar 23, 2018

@SimonSapin how do you feel about @scottmcm's note that we should:

Maybe just put the signed ones under a different feature gate?

@SimonSapin
Copy link
Contributor

How useful is this to do now? (v.s. changing the status later of only a subset)

My thoughts lately on the signed types is that there isn’t really a strong case against them, but there isn’t much of a use case either. Adding that they can be easily be built out of tree on top of the unsigned types, I’d be ok with removing or deprecating them and see if anyone objects. They’re easy to add (back) later if we decide to do so.

@Centril
Copy link
Contributor Author

Centril commented Mar 23, 2018

@SimonSapin

I’d be ok with removing or deprecating them and see if anyone objects. They’re easy to add (back) later if we decide to do so.

👍

SimonSapin added a commit to SimonSapin/serde that referenced this issue Mar 24, 2018
… gated on the `unstable` Cargo feature.

These are new standard library types. Tracking issue:
rust-lang/rust#49137
SimonSapin added a commit to SimonSapin/serde that referenced this issue Mar 24, 2018
… gated on the `unstable` Cargo feature.

These are new standard library types. Tracking issue:
rust-lang/rust#49137
@SimonSapin
Copy link
Contributor

Deprecation of NonZeroI* #49324

Support for NonZeroU* in serde: serde-rs/serde#1191

@Kixunil
Copy link
Contributor

Kixunil commented Aug 25, 2018

Would using an unsafe trait to specify invalid ranges be viable? I think traits are a great way to specify interfaces, so that idea appeals to me at least from this point of view.

Example:

/// Allows for a type to specify invalid bit pattern in order to allow enum layout optimization.
#[some_magic_lang_item_annotation]
unsafe trait InvalidValues: Sized {
    /// How many invalid values can be written into the same memory Self fits in.
    const INVALID_VALUE_COUNT: usize;

    /// Must map discriminant to some invalid bit pattern and write it to `dest`.
    /// The caller must guarantee that `discriminant < Self::INVALID_VALUE_COUNT`.
    /// The caller must also provide a valid pointer. The pointer must point to uninitialized memory.
    const unsafe fn write_invalid_value(dest: NonNull<Self>, discriminant: usize);

    /// This function must inspect the memory pointed to by `value` and return the same discriminant as previously written by `write_invalid_value()`. In case the memory contains valid value, this function must return `Self::INVALID_VALUE_COUNT`.
    const unsafe fn get_discriminant(value: NonNull<Self>) -> usize;
}

It would allow users to specify arbitrary invalid bit patterns. That way we could even have things like:

#[repr(transparent)] // not necessary, but why not?
struct LessThan253(u8);

impl InvalidValues for LessThan254 {
    const INVALID_VALUE_COUNT: usize = 2;

    unsafe fn write_invalid_value(dest: NonNull<Self>, discriminant: usize) {
        mem::write(dest.as_ptr() as *mut u8, discriminant as u8 + 254);
    }

    unsfe fn get_discriminant(value: NonNull<Self>) -> usize {
        let value = mem::read(value.as_ptr() as *mut u8) as usize;
        if value < 254 {
            Self::INVALID_VALUE_COUNT
        } else {
            value - 254
        }
    }
}

fn test() {
    enum Foo {
        Value(LessThan254),
        Bar,
        Baz,
    }

    assert_eq!(mem::size_of<Foo>() == mem::size_of<LessThan254>());
    assert_eq!(mem::size_of<u8>() == mem::size_of<LessThan254>());
}

What do you think?

@eddyb
Copy link
Member

eddyb commented Aug 26, 2018

@Kixunil Using just associated consts could maybe work, but I really really do not want to involve the trait system during layout computation and risk weird inconsistencies because of how people wrote the impls, or have to do complex checking ahead of time, when the impls are first written.

Using methods is a no-no, it means you're getting no LLVM optimization (), no nested reuse (for the same reason Option<Option<(&T, &U)>> is not the same size as Option<(&T, &U)>), miri wouldn't work, and probably a myriad of other things that escape me right now.

IMO the only reliable way to get this is const generics and a wrapper type that removes a values of a certain width (or just full width), at a certain location (or just offset 0), from the type.
The name is (obviously) up for bikeshed, but WithInvalid<u8, 254..=255> is one option.

I think traits are a great way to specify interfaces

Sure. But this is not an interface, it's a fundamental representational property of the data type.
You can totally emulate Option<T> using some chosen value of T for None, in a library, but this is not that - the invalid values the compiler knows about are UB to exist.

In your case, who's stopping me from creating LessThan254(255)? You have to write checks in unsafe code, and manually maintain the invariants, whereas a wrapper type would do all that for you.

@Kixunil
Copy link
Contributor

Kixunil commented Sep 9, 2018

@eddyb Could you give an example of weird inconsistency that might happen? People can already shoot themselves in the foot in unsafe code. I don't see a problem with careful use of unsafe when it's really needed.

Using methods is a no-no, it means you're getting no LLVM optimization

Why? If LLVM can optimize some functions, what's preventing it from optimizing other functions?

miri wouldn't work

Why one const fn works in miri but not other const fn?

the invalid values the compiler knows about are UB to exist.

would using &mut MaybeUninitialized<Self> in the signature help?

In your case, who's stopping me from creating LessThan254(255)

What's preventing me to implement e.g. Mutex incorrectly? Nothing. I have to use the unsafe keyword, though. If someone impls that trait for their type incorrectly, it's the same thing as screwing up other unsafe code. I don't understand, why this unsafe code should be special.

WithInvalid could be implemented in terms of InvalidValues. Additionally, InvalidValues can be used for things like OddNumber, which can't be achieved with simple range.

@eddyb
Copy link
Member

eddyb commented Sep 9, 2018

Could you give an example of weird inconsistency that might happen?

The problem is a trait impl is not "inherent" to a type. We'd have to pile on a bunch of checks, to ensure impls of this trait are "boring" wrt to when they hold. Most importantly, the implementation must not depend on type parameters, nor introduce relationships between lifetime parameters (or just not have generic parameters at all). In a way, this is similar to the restrictions of the Drop trait.

With all of that work, and const generics on the horizon, I prefer an "obviously correct" solution using const generics, instead.

Why? If LLVM can optimize some functions, what's preventing it from optimizing other functions?

I mean we can't tell LLVM anything about value validity, if you have functions doing encoding and decoding. Our existing contiguous "(in)valid ranges" actually lower to LLVM !range metadata.

Why one const fn works in miri but not other const fn?

We don't have const fn in traits yet.

would using &mut MaybeUninitialized<Self> in the signature help?

Sure, but you actually want a custom union between Self and a type with the same primitive field but no validity semantics, because otherwise you can't access the primitive.

Additionally, InvalidValues can be used for things like OddNumber, which can't be achieved with simple range.

I personally have no intention of supporting those kinds of optimizations. they can significantly complicate discriminant decoding (which happens at runtime, unlike encoding), whereas, currently, all of our enums, even when using invalid values, take at most one subtraction and one comparison, to decode the discriminant.

@Kixunil
Copy link
Contributor

Kixunil commented Sep 16, 2018

That makes a lot more sense to me now, thanks for explaining!

@frankmcsherry
Copy link
Contributor

Somewhere else (the PR that deprecated them) one is requested to drop in here and indicate if you use (or would use, I guess) signed NonZero variants. I do / would, lots. The most common use case is in differential dataflow (but it is probably more common) where you record changes to things. You don't ever need to record a zero change, and indeed often you want to use types to communicate the non-zero'd ness.

For example, I'd love to be using a trait

trait Addition {
    /// returns the result of the summation if non-zero.
    fn add(self, other: Self) -> Option<Self>;
}

The important point for me is that I'd like the opportunity for the zero value to not inhabit the types, so that e.g. () could implement GF(2), which is important for zero-overhead implementation of incremental updates to collections with set semantics (vs tracking a 0/1 multiplicity as an associated integer).

Without non-zero signed integers, the above Option<isize> ends up being a bit bigger and probably a bit more complicated logically (vs simply addition with the zero value interpreted as None).

Anyhow, I can work around forever, obvs, but I thought you all might want to know! :D

@eddyb
Copy link
Member

eddyb commented Nov 25, 2018

@frankmcsherry FWIW this should be sufficient:

#[derive(Copy, Clone)]
pub struct NonZeroI32(NonZeroU32);

impl NonZeroI32 {
    pub fn new(x: i32) -> Option<Self> {
        NonZeroU32::new(x as u32).map(NonZeroI32)
    }
    pub fn get(self) -> i32 {
        self.0.get() as i32
    }
}

@andrewhickman
Copy link
Contributor

There is also a crate implementing signed NonZero type

@frankmcsherry
Copy link
Contributor

Yup, thank you! I'm def happy working around, but equally happy to let you folks know about use cases (and if other people have them, put something like this next to NonZeroU32 in the stdlib).

@SimonSapin
Copy link
Contributor

Given the existence of a work-around that preserves the enum layout optimization, this doesn’t seem very pressing. On the other hand, adding them is very easy, and not really any additional maintenance burden. So I’m really neutral on this one. Consider nagging some other member of the libs team :)

@glaebhoerl
Copy link
Contributor

This is probably more amusing than significant, but another real-world use case of signed-numbers-without-zero is years.

@yoshuawuyts
Copy link
Member

Another use case for signed non-zero numbers: I'm currently working with a C library where the number 0 is significant. Representing this as an enum would probably work well I reckon.

libnghttp2_sys priority_spec.stream_id

API Example

An example of how to represent the libnghttp API above:

pub type StreamId = NonZeroI32;

pub enum StreamDependency {
  Independent,
  Dependent(StreamId),
}

@clarfonthey
Copy link
Contributor

For now, can't signed nonzero types just be implemented as a separate crate? Then people can judge inclusion into libstd based upon how many people use it.

@SimonSapin
Copy link
Contributor

Yes, https://crates.io/crates/nonzero_signed was mentioned upthread.

@clarfonthey
Copy link
Contributor

...oh. In that case, perhaps this issue could be locked from future comments on that and a larger note could be made to mention that crate?

@SimonSapin
Copy link
Contributor

#57475 Add signed num::NonZeroI* types

SimonSapin added a commit to SimonSapin/rust that referenced this issue Jan 17, 2019
Multiple people have asked for them, in
rust-lang#49137.
Given that the unsigned ones already exist,
they are very easy to add and not an additional maintenance burden.
Centril added a commit to Centril/rust that referenced this issue Jan 18, 2019
Add signed num::NonZeroI* types

Multiple people have asked for them in rust-lang#49137. Given that the unsigned ones already exist, they are very easy to add and not an additional maintenance burden.
bors added a commit that referenced this issue Jan 22, 2019
Add signed num::NonZeroI* types

Multiple people have asked for them in #49137. Given that the unsigned ones already exist, they are very easy to add and not an additional maintenance burden.
VardhanThigle pushed a commit to jethrogb/rust that referenced this issue Jan 31, 2019
Multiple people have asked for them, in
rust-lang#49137.
Given that the unsigned ones already exist,
they are very easy to add and not an additional maintenance burden.
@elichai
Copy link
Contributor

elichai commented Sep 26, 2019

Will this ever be available to users?
i.e. https://play.rust-lang.org/?gist=5c8f2f2438da9952963f88449bf0239e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-implemented Feature: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests