Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is rustc guaranteed to reject code which "turns a pointer into raw bytes" in const code? #401

Open
Tracked by #170
joshlf opened this issue May 20, 2023 · 23 comments

Comments

@joshlf
Copy link

joshlf commented May 20, 2023

In google/zerocopy#171 (comment), we're adding the ability to safely convert between byte slices/arrays and raw pointers. Currently, we have a test that is successfully failing if we try to perform this operation in a const context. What I want to know is: Is it guaranteed that code that performs any such conversion in a const context will always be rejected, or is this a best-effort analysis? If it's the former, then we can soundly emit the trait impls proposed in that PR, but if not, those impls would be unsound.

@Lokathor
Copy link
Contributor

I have been cautioned before about treating pointers as "just funny integers", because actual integers don't carry provenance. I have been told that the most likely path for Rust will eventually involve any plain transmutation of integer data to pointer data will end up producing an invalid pointer.

I don't want to overstate things, I don't think this is a settled matter, but I suspect these impls would be ill advised to add right now, even if they're only used at runtime.

@thomcc
Copy link
Member

thomcc commented May 20, 2023

I think you've misinterpreted the question (or at least I don't see the connection). They're asking if they can rely on a const-eval failure here, which I believe the answer from the const_ub RFC is no, that we don't guarantee const evaluation to fail for UB. It's possible that this case could be strengthened though, but it might require lang decision.

@digama0
Copy link

digama0 commented May 20, 2023

The docs on these types didn't make it clear whether they are expected to be round-tripping, maybe it was assumed that this is the case? As far as I know, both FromBytes and AsBytes are (separately) legal for pointers, in the sense that you can safely convert a pointer to a byte sequence (which I guess means [u8; N]?) and you can safely convert any byte sequence to a pointer value. However, the roundtrip is not lossless, and the resulting pointers cannot be dereferenced unless you have previously exposed the affected pointers beforehand.

@thomcc I think this is relevant not for the question that was asked but the claim that "If it's the former, then we can soundly emit the trait impls proposed in that PR, but if not, those impls would be unsound", because I'm not sure whether it would be sound even in the former case. Or at least, the impls are sound but foot-gunny, since most code that would use the impls is not sound (although one would have to write additional downstream unsafe to use the pointers).

@joshlf
Copy link
Author

joshlf commented May 20, 2023

Yeah so my question is roughly: "As we all know, it is sound to convert [u8; N] to a raw pointer and vice-versa at runtime. We also know that it is unsound to do either at const eval time, and that sometimes, if you try doing such a conversion at const eval time, the compiler will catch it and emit an error. Is it guaranteed that the compiler will always catch such a conversion?"

IIUC, @Lokathor is questioning the premise of the question and pointing out that it may not even be sound to do a pointer-to-bytes or bytes-to-pointer conversion at runtime, let alone at const eval time.

IIUC, @thomcc is saying that there is no guarantee that the compiler will catch this mistake at const eval time.

IIUC, @digama0 is saying that while it might be sound to do a pointer-to bytes or bytes-to-pointer conversion at runtime in the general case, it would only be sound to actually dereference such a pointer under certain cases. As a result, even if the trait impls I'm proposing are technically sound, they likely constitute a footgun.

Do I understand everyone's points correctly?

@digama0
Copy link

digama0 commented May 20, 2023

At const eval time, I don't think there is any danger of immediate UB, although there is certainly the danger of a compile failure. The compiler is allowed to evaluate ptr-to-int conversions, it just can't in most cases. In particular, if the pointer is a ptr::invalid(N), you may well be able to evaluate a ptr-to-int conversion to get N back.

@joshlf
Copy link
Author

joshlf commented May 20, 2023

At const eval time, I don't think there is any danger of immediate UB, although there is certainly the danger of a compile failure. The compiler is allowed to evaluate ptr-to-int conversions, it just can't in most cases. In particular, if the pointer is a ptr::invalid(N), you may well be able to evaluate a ptr-to-int conversion to get N back.

Okay so if I'm reading this correctly, then we could be in one of two possible situations. Is that right?

Situation 1: Any pointer-to-int or int-to-pointer conversion is valid (although of course dereferencing such a pointer might be invalid). If this is the case, then any such conversion at const eval time is either unsupported (const eval will fail) or is a sound operation at const eval time. There's no possibility to introduce UB at const eval time just by enabling such a conversion.

Situation 2: Some pointer-to-int or int-to-pointer conversions are unsound/instant UB. If this is the case, then performing such a conversion is UB at both runtime and const eval time.

My take-away from this is that either we're in Situation 1, and zerocopy can support these conversions without worrying about it having different soundness at runtime vs const eval time, or we're in Sitaution 2, and it would be unsound for zerocopy to support these conversions at all, even at runtime.

@digama0
Copy link

digama0 commented May 20, 2023

AFAIK we have not gone through the relevant stabilization process to determine what the situation is, but I would put a good probability on situation 1. I don't know any in flight proposals that would require making ptr-to-int or int-to-ptr conversions immediate UB (except for some early versions of the strict provenance model).

@saethlin
Copy link
Member

Unless I have misunderstood the motivating context of this question, I think this has become sidetracked into a question about UB and the validity or semantics of pointer-integer conversions. I don't see any question about UB in the original question, and all I see in motivation is a concern about people using these APIs to I guess smuggle a pointer through a usize. I don't understand why specifically const eval detecting this is a great help... But anyway.

I am not sure this is a Rust semantics question or a question about UB detection in const eval.

All that being said, I also do not think that we guarantee this const evaluation error. I would certainly not rely on it for soundness. But in the current interpreter implementation, I would not classify this detection as "best-effort". For sure the UB detection is best-effort, but this is not the same.

@saethlin
Copy link
Member

saethlin commented May 20, 2023

Also cc @oli-obk (since this is an interpreter question and by asking a question on this repo you have already summoned the rest of the Miri team)

@joshlf
Copy link
Author

joshlf commented May 20, 2023

I think that the implicit (and maybe wrong) assumption in my original question has to do with this error that was encountered while testing the PR:

error[E0080]: evaluation of constant value failed
  --> tests/ui/transmute-illegal.rs:10:1
   |
10 | const POINTER_VALUE: usize = zerocopy::transmute!(&0usize as *const usize);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ unable to turn pointer into raw bytes
   |
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported

I assumed that this error represented the compiler complaining about UB, and so if the compiler were to fail to generate an error for this case or a similar one, it would mean UB slipping by. From the discussion, however, it seems like maybe the error doesn't represent UB, but just represents the compiler saying "hey, I can't do that at const eval time", and if the error were to go away, it would just mean that the compiler can now do a new thing, so all is fine.

If it's the former case, then it would be unsound for zerocopy to produce an API (namely the FromBytes and AsBytes traits) that allow users to safely perform an operation that could introduce UB during const eval. If it's the latter case, then there's no issue.

@saethlin
Copy link
Member

Yeah, thanks for substantiating my guess.

At const eval time, a pointer is not an address. It is an allocation ID and an offset into the allocation. Allocations don't have a base address, memory is just a HashMap<AllocId, Allocation>. I think the implementation currently insists on not giving allocations a base address (as opposed to Miri which bolts on extra runtime state to invent base addresses for allocations) in order to prevent implementation details of how the addresses would be selected from leaking into the const value. But the other interpreter experts would have to check me on that.

@bjorn3
Copy link
Member

bjorn3 commented May 20, 2023

For const eval we can't expose any addresses as those won't be known until the dynamic linker has loaded the program. Instead we have to keep pointers symbolic values whose real value gets filled in by the dynamic linker through relocations. These relocations can only represent a very limited set of values, so we currently don't allow any operations on the address of pointers other than pointer offsets to ensure they stay representable.

@oli-obk
Copy link

oli-obk commented May 21, 2023

There could be a theoretical const evaluator implementation that preserves provenance in ptr2int casts, and thus makes your program compile and allow round tripping with int2ptr at compile and run-time.

If (and that's a very unlikely if) we get that, then yes, it will be sound to use. So either you get this error due to an architectural limitation of the current implementation, or your code suddenly starts compiling correctly after a rustc update.

I would recommend not making your conversion function const. I'm assuming you can't use https://doc.rust-lang.org/std/primitive.pointer.html#method.expose_addr anywhere due to the zero copy nature of the thing? Could you use MaybeUninit<u8> or a similar union of your own definition instead of u8?

@RalfJung
Copy link
Member

RalfJung commented May 21, 2023

As we all know, it is sound to convert [u8; N] to a raw pointer and vice-versa at runtime.

This is not true for some notions of "sound" and depending on what exactly the conversion does. So the reason the discussion got side-tracked is that it uncovered some flawed underlying assumptions. Note that Miri does not currently correctly detect the flaw in your reasoning here, that is issue rust-lang/miri#2182. That issue also contains examples of code that performs such conversions and causes UB:

use std::mem;

fn main() {
    let ptrs = [&42];
    let ints: [usize; 1] = unsafe { mem::transmute(ptrs) };
    let ptr = ints.as_ptr().cast::<&i32>();
    let _val = unsafe { *ptr.read() }; //~ERROR
}

This uses [usize; 1] instead of [u8; 8] but that makes no difference. I assume you assume this code to be UB-free, but that is not currently guaranteed and in fact it is unlikely to be UB-free (see #286).

If you replace u8 by MaybeUninit<u8>, your assumption does become true.


Regarding your original question: I don't think we are ready to make any guarantees regarding const-eval definitely triggering an error in some situation.

@joshlf
Copy link
Author

joshlf commented May 21, 2023

It might help for me to give a bit more context.

The traits in question are FromBytes and AsBytes, which are unsafe marker traits. T: FromBytes indicates that it is sound to transmute any [u8; size_of::<T>()] into T. T: AsBytes indicates that it is sound to transmute any T into [u8; size_of::<T>()]. (There's more than that, but that's the core of it.)

We also provide the transmute! macro. It's just syntactic sugar around a) ensuring that T: AsBytes and U: FromBytes and then, b) calling core::mem::transmute::<T, U>() on its argument. transmute! can be used in a const context. Note that transmute! is not special - given FromBytes and AsBytes and the promises that those traits make about type layout, anyone could write their own transmute!.

The PR that started this discussion simply implements FromBytes and AsBytes for *const T and *mut T (where T: Sized). That PR caused an existing test to fail. The test in question attempts to transmute a raw pointer in a const context. Here's the error:

error[E0080]: evaluation of constant value failed
  --> tests/ui/transmute-illegal.rs:10:1
   |
10 | const POINTER_VALUE: usize = zerocopy::transmute!(&0usize as *const usize);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ unable to turn pointer into raw bytes
   |
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported

My question is: Is it sound for me to implement those traits for *const T and *mut T?

The reason I asked about const eval in particular is that I was worried that, even though the conversions may be sound at runtime, maybe they aren't sound at const eval time? The consensus as I understand it in this discussion is that the error I got wasn't telling me that the conversion was UB at const eval time, but rather just unsupported, and that if it becomes supported at some point in the future, it will not be UB.

However, the discussion also seems to call into question my assumption that the conversions in question are sound even at runtime. Note that I do not support dereferencing raw pointers. IIUC, the trait impls are only unsound if creating invalid raw pointers is in and of itself UB even if those pointers are never dereferenced.

So my new question is: Am I guaranteed that converting between raw pointers and byte arrays/usizes/etc is sound, or is it possible that the conversion itself (not the dereference operation, just the act of producing a pointer value) might be considered UB at some point in the future?

@Lokathor
Copy link
Contributor

So the question you need to answer is if people expect to be able to turn a value into bytes and then back "losslessly", and use that new value exactly like it's the old value.

If people expect this then you shouldn't make the impls for pointers.

The conversion itself is fine, but it is not lossless: the output is always an invalid pointer. Turning a valid pointer into an array of u8 and back into a pointer will always give back an invalid pointer. Even if the pointer's address is correct, the provenance is lost so the output pointer is invalid.

@joshlf
Copy link
Author

joshlf commented May 21, 2023

The conversion itself is fine

Yeah, this is my new question: I know this is roughly true today (I say roughly because it's technically not well-defined, rust doesn't have a memory model, etc etc), but am I safe to assume that this will always be true, or is there a reasonable chance that some future clarification of the memory model would make such a conversion instant UB?

@Lokathor
Copy link
Contributor

I think you're safe from instant ub even into the future, yes.

Still a footgun you probably don't want to add though.

@digama0
Copy link

digama0 commented May 22, 2023

am I safe to assume that this will always be true

This is, unfortunately, exactly the question that stabilization is meant to answer. So until stabilization we really can't say for sure that this is a safe assumption now and in the future. So far it's looking like it will be true, but I would rather you not read more into this than a weather prediction.

@CAD97
Copy link

CAD97 commented May 22, 2023

One extra wrinkle to note is that the transmute! macro doesn't (or at least doesn't have to) do transmute::<[u8; N], Dst>(transmute::<Src, [u8; N]>(src)), it does (or at least can do) just transmute::<Src, Dst>(src). This means that zerocopy::transmute! has the same semantics as std::mem::transmute (because it's just a call to the function) but safe because it checks (a sufficient condition for) the safety prerequisite of the function.

Thus, transmuting pointers instead of doing an as pointer cast doesn't lose provenance, because the trip through raw bytes is logical only and doesn't actually happen (or rather, the raw byte intermediary is exactly the raw abstract bytes that the memory is, and not a byte-typed copy, and even if it were, it could be in the domain of MU<u8> instead of u8).

So, as stated, it comes down to a question of managing expectations. transmuteing a pointer to an integer and back dropping provenance and making an invalid-to-dereference pointer is a footgun1 but it's not strictly unsound, so it's a question just of how strict you want to be with unsafe.

The important part to me that makes me say no is that any safe wrapper around pointers would obviously be unsound to implement FromBytes for. So that makes it like Send/Sync for me; raw pointers are technically thread-safe on their own, and it's only usage of them which isn't, but the pointer types are not Send/Sync because the typical usage shouldn't be, and can easily say that it actually should be rather than having to remember to opt out for soundness2. FromBytes is a bit different due to having a derive rather than being automatic, but I'd still apply the same logic that deriving an implementation is safe and thus ideally shouldn't rely on the lack of unsafe elsewhere to be sound3, and therefore only provide AsBytes for raw pointers, if anything4.

But technically speaking, it's indeed not guaranteed yet that transmuting from pointer to integer is actually a sound operation, and earlier provenance models made doing so UB. It's unlikely that it will be UB given current knowledge, but still possible. Weather forecast for next week rather than next month.

Footnotes

  1. To the point it's been mentioned offhand that by-value transmute could be special-cased to see pointers in the input type and expose their addresses, and pointers in the output type and make them from exposed addresses (instead of just being invalid). Indirect transmutation (pointer casting) necessarily can't really do this (at least not without undoing the benefit of integers not carrying provenance), though, so doing it in transmute only papers over the problem (and makes it easier to not realize it's an issue) rather than actually addressing it.

  2. And the lack of a negative Send/Sync impl is both one of the worst and most subtle issues possible and one of the few soundness bugs which made it to be stable in std (to be fixed shortly thereafter).

  3. Obviously all sound code relies on the lack of unsound unsafe for soundness, because unsafe could do anything. But it's still good practice to do useful encapsulation, and the case where pinning puts unsafe requirements onto the safe Drop trait is pretty universally disliked, up through considering making the trait more magic to allow implementing as fn drop(self: Pin<&mut Self>) instead and perhaps requiring such for not-Unpin types in a future edition. (But any such discussion has only been thought-experiment-tier so far.)

  4. Not providing either is maybe a bit nicer, but one point in favor of implementing just AsBytes without FromBytes is that it gives a clear place to put a comment explaining why FromBytes would be a poor idea to implement.

@joshlf
Copy link
Author

joshlf commented May 22, 2023

Thanks for the thorough writeup, @CAD97! A lot to think about there. There's also a discussion on the zerocopy repo if anyone wants to participate over there and avoid polluting the thread here.

@RalfJung
Copy link
Member

RalfJung commented May 22, 2023

But technically speaking, it's indeed not guaranteed yet that transmuting from pointer to integer is actually a sound operation, and earlier provenance models made doing so UB. It's unlikely that it will be UB given current knowledge, but still possible. Weather forecast for next week rather than next month.

Indeed, I don't think we can commit to this yet.

However, the discussion also seems to call into question my assumption that the conversions in question are sound even at runtime. Note that I do not support dereferencing raw pointers. IIUC, the trait impls are only unsound if creating invalid raw pointers is in and of itself UB even if those pointers are never dereferenced.

This is technically correct. However, I assume you would consider a FromBytes implementation for u8 to be buggy if it always returned 0. (I'm aware these are marker traits but let's just imagine that the transmutation worked but returned 0.) So by that standard, FromBytes for *const T is 'buggy' in the sense that it does not produce the pointer that the user probably expected.

Of course you could be defining these traits to be about the conversion to/from arrays of MaybeUninit<u8>. But in that case, every type implements AsBytes, and FromBytes for u8 would be wrong since the input might be uninit, so that does not seem to be what is happening. Maybe the actual contract you want is that AsBytes means "for valid values every byte is initialized" and FromBytes means "if every input byte is initialized then the input array transmutes to a valid value". We don't have a type that represents "an arbitrary byte with provenance that is definitely initialized", but that is the type you would want here. (You could define this as a newtype around MaybeUninit<u8>.) But then FromBytes for u8 is still questionable since we are not (yet) guaranteeing that transmuting a pointer with provenance to an integer is allowed -- that's exactly the point in the first paragraph I quoted. Under what we can currently guarantee, FromBytes for u8 can only be justified if the array truly has type u8, i.e., does not have provenance.

@RalfJung
Copy link
Member

So I think the original const-eval question is answered and the other discussion hopefully clarified (and we already have issues tracking those provenance transmutation questions).

What remains for me is a question to you: how useful would it be to have these AsBytes/FromBytes for raw pointers, if a roundtrip from pointers to integers to pointers (so not what your transmute! macro does, but e.g. a sequence of two transmutes) is lossy? We have some theoretical/aesthetic reasons to make this roundtrip lossy rather than UB, but it wouldn't hurt to also have a more pragmatic argument. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants