-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in file_server #9
Comments
After some debugging it looks more like the address of service gets corrupted somehow. |
Ok they resulted in the similar error when compiling with g++-10 I have to say that they used to work -_-|| Will investigate it more tomorrow. |
I haven't actually tested echo server. I also tried to compile using clang - but that had other issues on my end (couldn't find it's C++ header files like string). |
Reinstalling libc++-dev may resolve this issue |
If it would be a normal clang installation for the host itself then maybe - but in this case it's a cross platform tool chain (OSELAS-Toolchain). I'll look into what when wrong while building it another day. |
Regarding this issue, have you tried compiling the library with g++-11? When working on a coroutine library I noticed that g++-10 had a number of small bugs in the way it handled coroutines. These bugs didn't appear in clang 10, but they were still annoying to work around. |
I used g++-11 and it still segfault when running echo_server. Here's the backtrace
@codeinred are you able to get them running? What compiler did you use? |
Seems the only working compiler is clang-9. The stack is cracked as long as mutiple coroutines run in parallel. The crashes can be easily avoided by removing Line 24. In this case only one connection can be accepted at a time. liburing4cpp/demo/echo_server.cpp Line 24 in a9b422b
|
All implementations of C++20 coroutine libraries I can find on the Internet are lazy - the coroutine body wont be executed before it being co_awaited, which means coroutines can never be run in parallel without threads. That makes io_uring mostly useless. I dont bother to do that. |
I'd suggest that you'd better use stackful coroutines for now for both stablility and performance:
|
With regard to coroutine libraries being lazy: In the typical case, an executor can also start a coroutine before it's co_await'd on. So the coroutine is lazy because all the coroutines can be registered with an executor, before they're started. As for allocations with promise objects: calling a function like We can avoid this allocation entirely be changing the way value resolution occurs. Rather than a promise type having a struct cqe_resolver {
std::coroutine_handle<> handle;
int result {};
inline void resolve(int result) {
this->result = result;
handle.resume();
}
}; We can then write a non-allocating trivially destructible awaitable type that represents any struct sqe_awaitable {
cqe_resolver resolver {};
io_uring_sqe* sqe = nullptr;
io_uring& ring;
int& cqe_count;
uint8_t iflags = 0;
// Create the awaitable. This doesn't actually register anything, as that
// occurs once the coroutine suspends. Doing it *before* the coroutine
// suspends has the potential to result in a race condition, but we can just
// register it inside await_suspend()
sqe_awaitable(io_uring& ring, int& cqe_count, uint8_t iflags = 0)
: ring(ring)
, cqe_count(cqe_count)
, iflags(iflags) {}
constexpr bool await_ready() const noexcept { return false; }
void await_suspend(std::coroutine_handle<> handle) {
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
// SQ is full, flushing cqe(s)
io_uring_cq_advance(&ring, cqe_count);
cqe_count = 0;
io_uring_submit(&ring);
sqe = io_uring_get_sqe(&ring);
assert(sqe && "sqe should not be NULL");
}
// Register the handle with the resolver, so that it can be called
// once resolution occurs.
// The resolver contains the data needed to resume the coroutine once
// the result has been obtained. The coroutine will be resumed when
// resolver.resolve() is invoked with the io_uring_cqe::res value
resolver.handle = handle;
// Register the resolver with the io_uring Submission Query Entry data
// structure, so that it knows to resume that resolver
io_uring_sqe_set_flags(sqe, iflags);
io_uring_sqe_set_data(sqe, &resolver);
}
constexpr int await_resume() const noexcept { return resolver.result; }
void cancel() {
io_uring_prep_cancel(sqe, &resolver, 0);
}
}; Whenever you Coincidentally, this also happens to avoid the bug I mentioned above in gcc-10. That bug only applies when awaitables created in a |
Hi everyone, Given the recent code cleanups, and the addition of sqe_awaitable, I'm not sure this bug still exists. I've attached an echo client I'm using to do the test below. When I run the echo server, I get the following output:
This comes from running the echo client, which receives the response it's supposed to:
The echo server accepts multiple successive incoming connections without issue. Shown below is the code for the echo client, which I took from the python network programming cookbook: #!/usr/bin/env python2
# coding=utf8
# Python Network Programming Cookbook -- Chapter – 1
# This program is optimized for Python 2.7. It may run on any
# other Python version with/without modifications.
import socket
import sys
import argparse
host = 'localhost'
def echo_client(port):
""" A simple echo client """
# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Connect the socket to the server
server_address = (host, port)
print "Connecting to %s port %s" % server_address
sock.connect(server_address)
# Send data
try:
# Send data
message = "Test message. This will be echoed"
print "Sending %s" % message
sock.sendall(message)
# Look for the response
amount_received = 0
amount_expected = len(message)
while amount_received < amount_expected:
data = sock.recv(16)
amount_received += len(data)
print "Received: %s" % data
except socket.errno, e:
print "Socket error: %s" %str(e)
except Exception, e:
print "Other exception: %s" %str(e)
finally:
print "Closing connection to the server"
sock.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Socket Server Example')
parser.add_argument('--port', action="store", dest="port", type=int, required=True)
given_args = parser.parse_args()
port = given_args.port
echo_client(port) Could I have some guidance on how to recreate this bug, if it still exists? I also tried running the echo server with the memory sanitizer enabled, and in valgrind, and both of those were happy with it. |
Thanks for the effort! @codeinred I can still reproduce the bug using g++-11 after reverting the temporary workaround introduced here 324f21c#diff-df6aad85015654783d708d610b1aec0ede8a3c8e29cf0057f3c7dd4ede9cbfcd |
I see, thank you! This gives me a good starting point! |
I think maybe the problem before workaround occurs because of the lambda coroutine ? liburing4cpp/demo/echo_server.cpp Line 24 in a9b422b
Before the workaround, in this line, the lambda object do capture 2 variables, but soon after the lambda expires it's lifetime when it was called then suspended, the inner while loop scope exit, the object might be gone. The same thing just happends in the file_server. |
Based on the configuration and work around from #8 I get a SIGSEGV:
This looks to me that the coroutine created in https://github.com/CarterLi/liburing4cpp/blob/async/demo/file_server.cpp#L102 got destroyed before the code reaches https://github.com/CarterLi/liburing4cpp/blob/async/demo/file_server.cpp#L114.
The text was updated successfully, but these errors were encountered: