Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should format!("{:p}", ptr) leak provenance? #322

Closed
CAD97 opened this issue Mar 20, 2022 · 8 comments
Closed

Should format!("{:p}", ptr) leak provenance? #322

CAD97 opened this issue Mar 20, 2022 · 8 comments

Comments

@CAD97
Copy link

CAD97 commented Mar 20, 2022

The case for it working:

I print the address. I then parse the formatted text, and convert the parsed integer into a pointer. I then dereference that constructed pointer. The constructed pointer is "derived from" the original pointer, so it should be valid.

The case for it not working:

Wow that's annoying. Plus, it'd be very nice to be able to implement e.g. tagged pointers in the obvious way, using future provenance-aware APIs, such as

unsafe fn tag<T: ? Sized>(ptr: *mut T, tag: u8) -> *mut T {
    let addr: usize = ptr.addr();
    debug_assert_eq!(addr & (tag as usize), 0);
    ptr.with_addr(addr | (tag as usize))
}

Once we have provenance-aware APIs, it makes sense to allow manipulation of addresses without leaking provenance, as the developer has indicated that they intend to maintain the pointer provenance, not try to reconstruct it from just the address.

But what about all of the ptr as usize in the standard library today? How much of it can avoid the ptrtoint, how much can potentially be ptr.addr() in the future, and how much will need to be (the moral equivalent of a) ptr::leak to maintain compatibility with today's semantics?

Printing the address of a pointer is the obvious example of a ptrtoint which does not necessarily lead to a inttoptr. I don't actually know if any other obscure ways to effectively get ptrtoint without writing ptr as usize exist in the standard library.

@CAD97 CAD97 changed the title Should format!("{p:p}") leak provenance? Should format!("{:p}", ptr) leak provenance? Mar 20, 2022
@RalfJung
Copy link
Member

"leaking provenance" means wildly different things in different context to different people, so I have no idea what you are even asking, or what the "it" is that might be working or not. ;)

@CAD97
Copy link
Author

CAD97 commented Mar 23, 2022

I mean in the exact way that ptrtoint behaves.

If I do inttoptr(ptrtoint(_)), that's not considered a noöp, right, because of provenance. The way we model ptrtoint is now it's fair game for anyone to inttoptr any equivalent int and get back the ptr.

What I'm interested in is whether format!("{:p}", ptr) necessarily uses ptrtoint, or if it could use a (at this point IIUC hypothetical) weaker form which does not make inttoptr valid.

@RalfJung
Copy link
Member

RalfJung commented Mar 23, 2022

So far I have been considering two kinds of models:

  1. Full ptrtoint support, where a ptrtoint has a "broadcast" side-effect that allows future inttoptr. This is a mess, (a) because compilers like to assume casts don't have side-effects such as a "broadcast", and (b) even with such a "broadcast" there is lots of ambiguity if multiple "broadcasts" happened on the same location.
  2. Basically rule out inttoptr casts (a la WIP PROOF-OF-CONCEPT: experiment with very strict pointer provenance rust#95199).

I guess you are saying we could have a variant of (1) where two kinds of ptrtoint casts exist -- those that "broadcast" and those that do not. Sure, we could. However that solves none of the things that make (1) a mess so it doesn't really help I think.

@CAD97
Copy link
Author

CAD97 commented Mar 23, 2022

Although I very much like (2) and agree that it would be a better model, all other things being equal, I worry about the fact that ptr as usize as ptr is defacto allowed today, and retroactively saying it produces an unusable pointer even on existing editions might be too breaking.

So what I'm wondering is if we can't have (2) for new code, but (1) for old code, even if old code is necessarily pessimised by such.

The model in Rust land would roughly be (modulo naming)

  • ptr::leak(ptr) -> usize: broadcast side effect; assume this SB tag always aliased.
  • ptr::unleak(usize) -> ptr: access to any broadcast provenance; treat as "unknown" provenance.
  • ptr::addr(ptr) -> usize: get addr without broadcast.
  • ptr::with_addr(ptr, usize) -> ptr: set addr with given provenance.
  • ptr as usize, edition old: ptr::leak.
  • usize as ptr, edition old: ptr::unleak.
  • ptr as usize, edition new: error or ptr::addr.
  • usize as ptr, edition new: error.

The remaining question in such a world is whether existing ptrtoint in the stdlib need to continue to have the broadcast side effect for compatibility, or if they can remove it (and say relying on it was UB).

There's definitely complications I haven't considered, but I'm just ((un)reasonably?) scared of retroactively saying any lib using usize as ptr is UB.

And of course, whatever model is chosen is predicated on what backends (i.e. LLVM) have support for; if LLVM doesn't have a way for a ptrtoint equivalent to broadcast, there's effectively no way for Rust to support ptr::leak either.

(Now I guess I should go rewrite ptr-union's alignment tagging to use wrapping_offset rather than usize bitops...)

@RalfJung
Copy link
Member

I agree with these concerns, but as long as (1) exists at all, it has all the bad effects I described.

The only cap-out I can imagine is to say "the formal spec doesn't cover code on old editions that still uses the deprecated casts; such code is supported on a best-effort basis". That would actually help the formal efforts.

But anyway that is wildly off-topic from the original question in this thread.^^

@CAD97
Copy link
Author

CAD97 commented Mar 23, 2022

I mean, it's a question of if the OP question is moot. And I think we have reasonable consensus (at least between the two of us, and that's enough for me to close the issue) that

  • Under (1), the question is moot (ptrtoint always broadcasts)
  • Under (2), the question is moot (ptrtoint never broadcasts)
  • Under a mixed (1) and (2) system, support of old edition (1) code is deprecated and best-effort, and thus the stdlib can and should be exclusively (2), never broadcasts, and thus the question is moot.

So to answer the OP question:

No, parsing and then dereferencing the address printed by format!("{:p}", ptr) is UB.

@CAD97 CAD97 closed this as completed Mar 23, 2022
@Diggsey
Copy link

Diggsey commented Mar 23, 2022

@CAD97 I'm confused by your conclusion. Surely under (1) parsing and then dereferencing the address is not UB, since the pointer was broadcast, since ptrtointo is presumably needed before the int-to-string conversion.

@CAD97
Copy link
Author

CAD97 commented Mar 24, 2022

Yes, you're correct, it would be defined under (1), I sort of (accidentally) assumed that (1) wasn't the case in the conclusion.

(The question is still moot under that model, though, as there's no way for the answer to be different.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants