Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add socketpair shim #3609

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open

Add socketpair shim #3609

wants to merge 45 commits into from

Conversation

tiif
Copy link
Contributor

@tiif tiif commented May 16, 2024

Fixes #3442
Design proposal: https://hackmd.io/@tiif/Skhc1t0-C

src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
@tiif
Copy link
Contributor Author

tiif commented May 19, 2024

This is not entirely ready, and still waiting for discussion to happen, but I will tag it with waiting-on-review for more visibility.

@rustbot ready

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label May 19, 2024
Copy link
Contributor

@oli-obk oli-obk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also need to handle the case where the other socket was dropped. What do the manpages say about that case? Will we need to report https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.ConnectionAborted or something?

src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the general strategy works, I left some comments to complement Oli's feedback.

There is one issue we need to still consider: the interaction with data race detection. Since this can be used to send data from one thread to another, arguably a write to a buffer should be considered a "release" operation, and a read should be considered an "acquire". The way to test for this is to have a test with -Zmiri-preemption-rate=0 and then

  • have a global static mut VAL: u8 = 0;
  • in main, create a socketpair and then spawn a thread
  • in the main thread, go on by setting VAL to 1 and then sending a byte onto one socket. then call thread::yield_now.
  • in the other thread, read one byte from the socket, then assert_eq!(VAL, 1).

In your PR as-is, this should report UB due to a data race. However, arguably there is no data race here, since the other thread waited to receive a signal from the main thread.

To fix that, you should move the VecDeque<u8> into a new type, Buffer. This buffer needs a second field clock: VClock.

  • When doing a write, you need to do something like clock.join(data_race.release_clock(ecx.get_active_thread(), ecx.machine.current_span())), where data_race is in ecx.machine.data_race but it can be None (in which case you don't need to do anything).
  • When doing a read, you need to data_race.acquire_clock(&buffer.clock, ecx.get_active_thread()).

(We should really have some nicer wrapper functions for this...)

src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
tests/pass-dep/libc/libc-socketpair.rs Outdated Show resolved Hide resolved
tests/pass-dep/libc/libc-socketpair.rs Outdated Show resolved Hide resolved
tests/pass-dep/libc/libc-socketpair.rs Outdated Show resolved Hide resolved
tests/pass-dep/libc/libc-socketpair.rs Outdated Show resolved Hide resolved
@tiif
Copy link
Contributor Author

tiif commented May 23, 2024

Hmm, it seems like I got into some rebase trouble. Sorry if anyone get pinged on this PR.

@RalfJung
Copy link
Member

@rustbot author
(at least I think that reflects the current state)

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels May 24, 2024
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
src/shims/unix/socket.rs Outdated Show resolved Hide resolved
@tiif
Copy link
Contributor Author

tiif commented May 25, 2024

It turns out that x86_64-apple-darwin does not support SOCK_NONBLOCK and SOCK_CLOEXEC. Details are posted here #3442 (comment)

But we have quite a few tasks that should be handled first, maybe we can delay the support x86_64-apple-darwin to another PR?

@tiif
Copy link
Contributor Author

tiif commented May 25, 2024

Don't we have to indicate EOF at some point in read? Like, when the other side got closed and all data has been read, or so?

This will also need to handle the case where the other socket was dropped. What do the manpages say about that case? Will we need to report https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.ConnectionAborted or something?

Yes we need special handling for these cases1, this is the new specification:

For non-blocking

  • read
    • if write end open but no data available
      • fail with EAGAIN/EWOULDBLOCK
    • if write end close
      • if buffer is not empty
        • read as much as we need/can and return number of bytes read
      • else (buffer is empty )
        • return 0
    • if read size < available data
      • read as requested and return
    • if read size >= available data
      • read as much as we can and return (partial read)
  • write
    • if read end open
      • if write size <= buffer size
        • if there is available space to write all requested bytes
          • write and return
        • else (no available space to write all requested bytes)
          • fail with EAGAIN/EWOULDBLOCK
      • if write size > buffer size
        • if available space in buffer is 0
          • fail with EAGAIN/EWOULDBLOCK
        • else (available space in buffer != 0)
          • write as much as we can (partial write)
    • if read end closed
      • fail with EPIPE

For blocking

  • read
    • if write end open but no data available
      • block
    • if write end close
      • if buffer is not empty
        • read as much as we need/can and return number of bytes read
      • else (buffer is empty )
        • return 0
    • if read size < available data
      • read as requested and return
    • if read size >= available data
      • read as much as we can and return
  • write
    • if read end open
      • if write size <= buffer size
        • block until there is sufficient space to write all requested bytes
      • if write size > buffer size
        • block until there is sufficient space to write all requested bytes
    • if read end close
      • fail with EPIPE

Footnotes

  1. https://man7.org/linux/man-pages/man7/pipe.7.html PIPE_BUF section, I can't find exact specification for sockets, I assumed behaviour of NONBLOCKING file descriptors (sockets/pipe/FIFO) are the same here

@tiif
Copy link
Contributor Author

tiif commented May 25, 2024

We also need to implement close.

We can add in a new field that is being shared between 2 socketpair struct:

struct SocketPair {
    readbuf: Rc<RefCell<VecDeque<u8>>>,
    writebuf: Rc<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
    both_end_open: Rc<RefCell<bool>>,
}

This is the initial state where both side is opened,

write to fd[0]  -----> write to buffer0 ----> read from fd[1] 
read from fd[0] <----- write to buffer1 <---- write to fd[1]

When fd[1] is closed, this will be the state of the socketpair:

write to fd[0]  -----> write to buffer0 ----> X
read from fd[0] <----- write to buffer1 <---- X

fd[1] should set both_end_open to false to notify fd[0] that the other end has been dropped.

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2024

How is nonblocking EOF differentiated from no available bytes?

RefCell<bool>

Prefer Cell when the wrapped type is simple and Copy like bool

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2024

Blocking write/read will require some refactorings to allow us to park the thread and continue later. We already have the concept for sleep, something similar will be needed here. I would recommend doing so in a separate PR and just keep emitting unsupported where it would block

@tiif

This comment was marked as resolved.

@oli-obk

This comment was marked as resolved.

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2024

We can add in a new field that is being shared between 2 socketpair struct:

That's not going to work with duplicated file descriptors, we'd need another way to detect that the last copy of one end of the pair was dropped.

What we could do instead is to use Weak instead of Rc for the writer. Once the corresponding reader is dropped, the weak cannot be upgraded to an rc anymore, signalling that the other end is gone.

This does mean you need to upgrade before every write, but that seems ok

@tiif
Copy link
Contributor Author

tiif commented May 25, 2024

What we could do instead is to use Weak instead of Rc for the writer. Once the corresponding reader is dropped, the weak cannot be upgraded to an rc anymore, signalling that the other end is gone.

Ah, that's neat! Since not being able to upgrade to Rc means the other side is dropped, we only need

struct SocketPair {
    readbuf: Weak<RefCell<VecDeque<u8>>>,
    writebuf: Weak<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
}

(Note: this won't be the final design, we still need to consider data race detection)

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2024

readbuf needs to be Rc, not Weak, one side always needs to be Rc, otherwise both Weaks are dangling and can't be upgraded. When thr last Rc gets dropped, all weak become invalid

@tiif
Copy link
Contributor Author

tiif commented May 25, 2024

readbuf needs to be Rc, not Weak, one side always needs to be Rc, otherwise both Weaks are dangling and can't be upgraded.

I see.

Let's say fd[1] is dropped,

(type: Weak) write to fd[0]  -----> write to buffer0 ----> X
(type: Rc) read from fd[0] <----- write to buffer1 <----  X <-- consider this direction

we could use Rc::weak_count to track if the write end is dropped?

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2024

we could use Rc::weak_count to track if the write end is dropped?

Yea, that would work

@RalfJung
Copy link
Member

@tiif your list above is not entirely right -- when all writers are closed but there is data left in the buffer, that data should still be returned. Only then should you indicate EOF.

@RalfJung
Copy link
Member

RalfJung commented May 26, 2024

Checking the counts is suspicious, those are pretty hard to use as a signal for anything... if we ever add some other reference to this buffer then suddenly this logic will break.

Maybe it's better to just have our own has_writer flag that we track explicitly? close on a Socketpair can then set has_writer on its write buffer to false. (close will only be called when the last FD for this file description is closed.)

@tiif
Copy link
Contributor Author

tiif commented May 26, 2024

when all writers are closed but there is data left in the buffer, that data should still be returned.

Yes, I didn't consider this case. The list is now updated.

Maybe it's better to just have our own has_writer flag that we track explicitly? close on a Socketpair can then set has_writer on its write buffer to false. (close will only be called when the last FD for this file description is closed.)

I have been thinking about this for a whlie, but not sure how this is going to work yet.

Initial state, fd[0] is the writer of its writebuf, and fd[1] is the writer of fd[0]'s readbuf.

write to fd[0]  -----> write to buffer0 ----> read from fd[1] 
read from fd[0] <----- write to buffer1 <---- write to fd[1]

If we dup fd[0] , and only consider buffer0

write to fd[0]  ----->    |----------------------|
                          | buffer0              |----> read from fd[1] 
write to newfd  ----->    |----------------------|

Closing fd[0] is a valid action and it will just make the state transit to:

write to newfd  -----> write to buffer0 ----> read from fd[1] 

So I am not sure how close can only be called when the last FD for this file description is closed.

I am thinking of letting the socketpair having refcount for writers of its writebuf and readbuf.

struct SocketPair {
    readbuf: Rc<RefCell<VecDeque<u8>>>,
    writebuf: Weak<RefCell<VecDeque<u8>>>,
    is_nonblock: bool,
    // number of writer for readbuf
    readbufWriterCount: Rc<Cell<i64>>,
    // number of writer for writebuf
    writebufWriterCount: Rc<Cell<i64>>
}
write to fd[0]  ----->    |----------------------|
                          | buffer0              |----> read from fd[1] 
write to newfd  ----->    |----------------------|

For this case, writebufWriterCount of fd[0] and newfd hold the same reference as readbufWriterCount of fd[1]
When fd[0] is dup to newfd, writebufWriterCount of fd[0] and newfd will increase by 1 (So readbufWriterCount of fd[1] will increase by 1 too). To check if all the write end of the readbuf is dropped, we only need to check if the value of readbufWriterCount is 0.

Since these counters is only used to track writer of buffers, it won't break when the buffer is referred for other purpose.

Not sure if I over-complicated this problem, I feel a more elegant solution exists somewhere.

Copy link
Contributor

@oli-obk oli-obk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung this looks ready to me now. Once you've approved, too, let's squash all commits into one

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I have a bunch of comments but they're mostly minor. :)

struct Buffer {
buf: VecDeque<u8>,
clock: VClock,
buf_has_writer: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is also interesting enough that it should have a comment.

@@ -20,17 +38,97 @@ impl FileDescription for SocketPair {
self: Box<Self>,
_communicate_allowed: bool,
) -> InterpResult<'tcx, io::Result<()>> {
// This is used to signal socketfd of other side that there is no writer to it's readbuf.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is used to signal socketfd of other side that there is no writer to it's readbuf.
// This is used to signal socketfd of other side that there is no writer to its readbuf.

@@ -20,17 +38,97 @@ impl FileDescription for SocketPair {
self: Box<Self>,
_communicate_allowed: bool,
) -> InterpResult<'tcx, io::Result<()>> {
// This is used to signal socketfd of other side that there is no writer to it's readbuf.
// If the upgrade fails, there is no need to update as all read ends has been dropped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the upgrade fails, there is no need to update as all read ends has been dropped.
// If the upgrade fails, there is no need to update as all read ends have been dropped.

if data_size == 0 {
if !readbuf_has_writer {
// Socketpair with no writer and empty buffer.
return Ok(Ok(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Ok(Ok(0));
// 0 bytes successfully read indicates end-of-file.
return Ok(Ok(0));

}
}

// Prevent false alarm in data race detection when doing synchronisation via socketpair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Prevent false alarm in data race detection when doing synchronisation via socketpair.
// Synchronize with all previous writes to this buffer.
// FIXME: this over-synchronizes; a more precise approach would be to
// only sync with the writes whose data we will read.

throw_unsup_format!("socketpair write: blocking isn't supported yet");
}
}
// Prevent false alarm in data race detection when doing synchronisation via socketpair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Prevent false alarm in data race detection when doing synchronisation via socketpair.
// Remember this clock so `read` can synchronize with us.

}
}
// Prevent false alarm in data race detection when doing synchronisation via socketpair.
writebuf.clone().unwrap().borrow_mut().clock.join(&ecx.release_clock().unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot unwrap this, that can ICE. Instead do something like

if let Some(clock) = ecx.release_clock() {
  // ...
}


// Parse and remove the type flags that we support. If type != 0 after removing,
// unsupported flags are used.
// FIXME: SOCK_DRAM that supposed to have message boundaries is not supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FIXME: SOCK_DRAM that supposed to have message boundaries is not supported

This is more like "TODO: support SOCK_DGRAM", but I don't think that's something we want/need in the code. It can be an issue if needed.

Comment on lines +156 to +157
// FIXME: To support macos for tokio socketpair, extra handling on the libc::O_NONBLOCK
// flag is needed because macos doesn't have SOCK_CLOEXEC and SOCK_NONBLOCK flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FIXME: To support macos for tokio socketpair, extra handling on the libc::O_NONBLOCK
// flag is needed because macos doesn't have SOCK_CLOEXEC and SOCK_NONBLOCK flag.
// SOCK_NONBLOCK only exists on Linux.

Please also then adjust the check to be == "linux". Currently you re saying that all OSes except macOS support SOCK_NONBLOCK. Have you checked that FreeBSD, Solaris, OpenBSD, and so on all support it? If not, then please only add support on OSes where we are use support exists. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think these tests are necessary. We don't have tests for all unsupported situations, that would quickly get out of hand. OTOH now that you already wrote the tests we may as well keep them. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete socketpair support
6 participants