Should `format!("{:p}", ptr)` leak provenance? #322

CAD97 · 2022-03-20T09:46:58Z

The case for it working:

I print the address. I then parse the formatted text, and convert the parsed integer into a pointer. I then dereference that constructed pointer. The constructed pointer is "derived from" the original pointer, so it should be valid.

The case for it not working:

Wow that's annoying. Plus, it'd be very nice to be able to implement e.g. tagged pointers in the obvious way, using future provenance-aware APIs, such as

unsafe fn tag<T: ? Sized>(ptr: *mut T, tag: u8) -> *mut T {
    let addr: usize = ptr.addr();
    debug_assert_eq!(addr & (tag as usize), 0);
    ptr.with_addr(addr | (tag as usize))
}

Once we have provenance-aware APIs, it makes sense to allow manipulation of addresses without leaking provenance, as the developer has indicated that they intend to maintain the pointer provenance, not try to reconstruct it from just the address.

But what about all of the ptr as usize in the standard library today? How much of it can avoid the ptrtoint, how much can potentially be ptr.addr() in the future, and how much will need to be (the moral equivalent of a) ptr::leak to maintain compatibility with today's semantics?

Printing the address of a pointer is the obvious example of a ptrtoint which does not necessarily lead to a inttoptr. I don't actually know if any other obscure ways to effectively get ptrtoint without writing ptr as usize exist in the standard library.

The text was updated successfully, but these errors were encountered:

RalfJung · 2022-03-23T14:36:47Z

"leaking provenance" means wildly different things in different context to different people, so I have no idea what you are even asking, or what the "it" is that might be working or not. ;)

CAD97 · 2022-03-23T15:44:40Z

I mean in the exact way that ptrtoint behaves.

If I do inttoptr(ptrtoint(_)), that's not considered a noöp, right, because of provenance. The way we model ptrtoint is now it's fair game for anyone to inttoptr any equivalent int and get back the ptr.

What I'm interested in is whether format!("{:p}", ptr) necessarily uses ptrtoint, or if it could use a (at this point IIUC hypothetical) weaker form which does not make inttoptr valid.

RalfJung · 2022-03-23T16:29:22Z

So far I have been considering two kinds of models:

Full ptrtoint support, where a ptrtoint has a "broadcast" side-effect that allows future inttoptr. This is a mess, (a) because compilers like to assume casts don't have side-effects such as a "broadcast", and (b) even with such a "broadcast" there is lots of ambiguity if multiple "broadcasts" happened on the same location.
Basically rule out inttoptr casts (a la WIP PROOF-OF-CONCEPT: experiment with very strict pointer provenance rust#95199).

I guess you are saying we could have a variant of (1) where two kinds of ptrtoint casts exist -- those that "broadcast" and those that do not. Sure, we could. However that solves none of the things that make (1) a mess so it doesn't really help I think.

CAD97 · 2022-03-23T17:14:52Z

Although I very much like (2) and agree that it would be a better model, all other things being equal, I worry about the fact that ptr as usize as ptr is defacto allowed today, and retroactively saying it produces an unusable pointer even on existing editions might be too breaking.

So what I'm wondering is if we can't have (2) for new code, but (1) for old code, even if old code is necessarily pessimised by such.

The model in Rust land would roughly be (modulo naming)

ptr::leak(ptr) -> usize: broadcast side effect; assume this SB tag always aliased.
ptr::unleak(usize) -> ptr: access to any broadcast provenance; treat as "unknown" provenance.
ptr::addr(ptr) -> usize: get addr without broadcast.
ptr::with_addr(ptr, usize) -> ptr: set addr with given provenance.
ptr as usize, edition old: ptr::leak.
usize as ptr, edition old: ptr::unleak.
ptr as usize, edition new: error or ptr::addr.
usize as ptr, edition new: error.

The remaining question in such a world is whether existing ptrtoint in the stdlib need to continue to have the broadcast side effect for compatibility, or if they can remove it (and say relying on it was UB).

There's definitely complications I haven't considered, but I'm just ((un)reasonably?) scared of retroactively saying any lib using usize as ptr is UB.

And of course, whatever model is chosen is predicated on what backends (i.e. LLVM) have support for; if LLVM doesn't have a way for a ptrtoint equivalent to broadcast, there's effectively no way for Rust to support ptr::leak either.

(Now I guess I should go rewrite ptr-union's alignment tagging to use wrapping_offset rather than usize bitops...)

RalfJung · 2022-03-23T17:35:27Z

I agree with these concerns, but as long as (1) exists at all, it has all the bad effects I described.

The only cap-out I can imagine is to say "the formal spec doesn't cover code on old editions that still uses the deprecated casts; such code is supported on a best-effort basis". That would actually help the formal efforts.

But anyway that is wildly off-topic from the original question in this thread.^^

CAD97 · 2022-03-23T17:46:57Z

I mean, it's a question of if the OP question is moot. And I think we have reasonable consensus (at least between the two of us, and that's enough for me to close the issue) that

Under (1), the question is moot (ptrtoint always broadcasts)
Under (2), the question is moot (ptrtoint never broadcasts)
Under a mixed (1) and (2) system, support of old edition (1) code is deprecated and best-effort, and thus the stdlib can and should be exclusively (2), never broadcasts, and thus the question is moot.

So to answer the OP question:

No, parsing and then dereferencing the address printed by format!("{:p}", ptr) is UB.

Diggsey · 2022-03-23T23:54:24Z

@CAD97 I'm confused by your conclusion. Surely under (1) parsing and then dereferencing the address is not UB, since the pointer was broadcast, since ptrtointo is presumably needed before the int-to-string conversion.

CAD97 · 2022-03-24T00:07:32Z

Yes, you're correct, it would be defined under (1), I sort of (accidentally) assumed that (1) wasn't the case in the conclusion.

(The question is still moot under that model, though, as there's no way for the answer to be different.)

CAD97 changed the title ~~Should format!("{p:p}") leak provenance?~~ Should format!("{:p}", ptr) leak provenance? Mar 20, 2022

CAD97 closed this as completed Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `format!("{:p}", ptr)` leak provenance? #322

Should `format!("{:p}", ptr)` leak provenance? #322

CAD97 commented Mar 20, 2022

RalfJung commented Mar 23, 2022

CAD97 commented Mar 23, 2022

RalfJung commented Mar 23, 2022 •

edited

CAD97 commented Mar 23, 2022

RalfJung commented Mar 23, 2022

CAD97 commented Mar 23, 2022

Diggsey commented Mar 23, 2022

CAD97 commented Mar 24, 2022

Should format!("{:p}", ptr) leak provenance? #322

Should format!("{:p}", ptr) leak provenance? #322

Comments

CAD97 commented Mar 20, 2022

RalfJung commented Mar 23, 2022

CAD97 commented Mar 23, 2022

RalfJung commented Mar 23, 2022 • edited

CAD97 commented Mar 23, 2022

RalfJung commented Mar 23, 2022

CAD97 commented Mar 23, 2022

Diggsey commented Mar 23, 2022

CAD97 commented Mar 24, 2022

Should `format!("{:p}", ptr)` leak provenance? #322

Should `format!("{:p}", ptr)` leak provenance? #322

RalfJung commented Mar 23, 2022 •

edited