-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource-safe C interface, part 2 #8997
base: trunk
Are you sure you want to change the base?
Conversation
Thanks for the label, the PR is currently waiting for me to get back to it and rebase. Comments about the text of the PR are still welcome, whereas it is useless to comment on the implementation at the moment. |
3fb0706
to
22ebc9c
Compare
This PR is no longer an RFC. It has been simplified to add only what I found necessary in the API, and I fixed the leaks that I had identified previously. It is ready for review. I think this can interest @xavierleroy, @stedolan and @jhjourdan. It fixes a couple of real bugs and I think that the API additions will be consensual. There are a couple of CI failures that I need to investigate. |
22ebc9c
to
39bece0
Compare
(Note: the description at the top has been updated.) I have added more patches resulting from a full audit of async exceptions in io.c, fixing several leaks, and also the channel state invalidation noticed by @stedolan at #9722. I expect that there are no more leaks/state invalidation in io.c after this. |
590b64a
to
a62da4b
Compare
(Appveyor MSVC64 build is failing.) |
Add new variants caml_*_blocking_section_exn for the user who want to clean-up resources in case of an asynchronous exception.
Simplify for next commit.
Introduce variants of the functions that do not immediately raise the exception, allowing for resource cleanup. This is used in this way in an ulterior commit to avod duplicating cleanup code. Another motivating example is given at ocaml#7423: > caml_invalid_argument(str) is no return and does not free it's argument. So calling it with a string constructed dynamically will mean it'll never get freed. This led at the time to the introduction of caml_invalid_argument_value and caml_failwith_value that accept an OCaml string as argument. The present commit gives a general solution that fits with the rest of the resource-safe API.
Add non-raising variants of caml_enter/leave_blocking_section. The intention is to mimick masking in the release path, when one already holds an exception. Masking is not there yet, so we introduce these functions temporarily. They remain internal, to avoid redundancy in the API.
Prepares for resource-safety audit of io.c
Only stdin is global in Stdlib, and it is not seekable, so its state remained valid on exception. However it does not cost much to keep the offset and the max pointer up to date in all situtations when an interrupt arises. In addition, one can be suspicious that channel locks may leak, which can cause deadlocks. However, this is already dealt with Unlock_exn.
Similar to the previous commit
053ed5e
to
9a751be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a good change: it fixes many resource leaks and real bugs in several functions of the runtime, and introduce a good practice for handling exception in C code.
However, I have some general remarks:
- Having
caml_leave_blocking_section
run async callbacks makes it difficult to free resources/raise the right exception in code that calls it. As discussed in EINTR-based signals, again #9722, I think that not running them would simplify much the code, without delaying too much async callbacks. - You will need to rebase this PR on top of EINTR-based signals, again #9722 once it will be merged, since both PR touch the smae piece of code.
- Stylistically, I prefer not to use
goto
s when possible. This makes it much easier to review the resource flow in the function. So could you please only use them when this indeed factorizes the cleanup code?
CAMLextern void caml_enter_blocking_section (void); // raises | ||
CAMLextern void caml_leave_blocking_section (void); // raises | ||
|
||
CAMLextern value caml_enter_blocking_section_exn (void); | ||
/* Same as [caml_enter_blocking_section], but return the exception if | ||
any. The runtime lock is held upon return if and only if the result | ||
satisfies Is_exception_result. */ | ||
|
||
CAMLextern value caml_leave_blocking_section_exn (void); | ||
/* Same as [caml_leave_blocking_section], but return the exception if | ||
any. The runtime lock is held upon return in all cases. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the discussion in #9722, I am now more and more convinced that caml_leave_blocking_section
should not execute async callbacks (hence no need for caml_leave_blocking_section_exn
). The reason is that, first, as @stedolan explained, it is almost certain that the blocking section returned a result that we need to exploit before doing anything else which might interrupt the call, and second, we will run polling code soon, so there is no risk to loose async callbacks.
On the other hand, polling in caml_enter_blocking_section
is necessary because we will probably block for a some time and and should avoid waiting for async callbacks during this time. However, in order to get exception safety without having to many versions of these functions, I would prefer the approach in #9722 where we use a function like caml_enter_blocking_section_no_pending
. Then, one should simply call caml_process_pending_actions_exn
before if one wants to both poll async callbacks and releasing the runtime lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I just notice it now.
Indeed, I will rebase on top of #9722 and follow whatever decision is made there. However, I mentioned at #9722 a serious breakage of not polling in caml_leave_blocking_section
that I just noticed: it serves to poll after a system call returns EINTR, just before Sys_error
/Unix_error(EINTR,_)
is raised. Once the exception is raised (changing a function raising Sys.Break
to a function raising Sys_error
, say), the argument that one will poll soon enough no longer applies for reasons I explained there. And since it is possible that this pattern could be found in user C libraries, fixing the Sys and Unix modules by hand would not be enough. I propose to discuss this at #9722.
@@ -139,7 +140,10 @@ static void open_connection(void) | |||
if (dbg_socket == -1) | |||
caml_fatal_error("_open_osfhandle failed"); | |||
#endif | |||
dbg_in = caml_open_descriptor_in(dbg_socket); | |||
dbg_socket_in = dup(dbg_socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, dup
is not available on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to use _dup
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I have no idea.
If we want to actually call dup
on unix, then the best is probably to ask @dra27 about what is the right thing to do here.
But see #9786: the original implementation corresponds to a common pattern in OCaml programming with the Unix module, which, I think, should not be considered a bug. So, I think I would prefer keeping the initial implementation, bug guard somewhere else about the double close of file descriptors used in channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dup
is aliased to _dup
, so it'll work - it would be better to use _dup
on Windows, but goodness knows if/when Microsoft will actually remove the POSIX aliases.
What you're doing should work fine on Windows; I have no comment on whether it's necessarily what's wanted on either platform.
{ | ||
struct channel * channel; | ||
|
||
channel = (struct channel *) caml_stat_alloc(sizeof(struct channel)); | ||
channel->fd = fd; | ||
caml_enter_blocking_section(); | ||
*exn = caml_enter_blocking_section_exn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid writing to *exn
when no exception is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two possible conventions:
- Only write to *exn if there is an exception.
- Always write to *exn; i.e. treat exn as an output parameter.
In the latter case, the caller does not need to initialise *exn. Question for seasoned C programmers: are there any coding guidelines saying that one is better or more expected than the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, a common pattern for out parameters is to not write them when there is no corresponding result, and signal by another channel (i.e., the return value of the function) that the out parameter has been (or not) written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but then I would think that exn is the error code, and the channel is the output param. I.e. the signature should be
CAMLexport value caml_open_descriptor_in_exn(int fd, struct channel ** chan)
instead of:
CAMLexport struct channel * caml_open_descriptor_in_exn(int fd, value * exn)
I thought about doing this, but preferred a change that would be perceived as smaller. It seems that we will have to do it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not change the return type of the function. I think this is misleading. Why cannot you guarantee that [exn] is written iff the return value is NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but in addition I would prefer if there is a convention that for _exn
functions one has to first check for Is_exception_result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But this is a convention to be followed by the caller side. This does not prevent you to follow the usual convention that out parameters are only written in case it is needed. Of course, there will be some redundancy, but is that an issue?
|
||
p = caml_stat_strdup_to_os(String_val(filename)); | ||
caml_enter_blocking_section(); | ||
exn = caml_enter_blocking_section_exn(); | ||
if (Is_exception_result(exn)) goto cleanup1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Is_exception_result(exn)) goto cleanup1; | |
if (Is_exception_result(exn)) { | |
caml_stat_free(p); | |
caml_raise(Extract_exception(exn)); | |
} |
Avoiding goto
when possible is always good. In this case, I don't think the code becomes more complex.
dlhandle = caml_dlopen(p, 1, Int_val(global)); | ||
caml_leave_blocking_section(); | ||
caml_stat_free(p); | ||
exn = caml_leave_blocking_section_exn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exn = caml_leave_blocking_section_exn(); | |
exn = caml_leave_blocking_section_exn(); | |
caml_stat_free(p); |
Why not freeing p
here? This would make the change smaller, the lifetime of p
shorter and the resources of the function easier to track.
Of course, you then have to free it in the exceptional case above.
@@ -180,6 +180,11 @@ CAMLexport value caml_enter_blocking_section_exn(void) | |||
return Val_unit; | |||
} | |||
|
|||
void caml_enter_blocking_section_noexn(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please synchronize with @stedolan for the name of the function. As far as I'm concerned, I have a slight preference for _nopending
, since it makes it clear that we are missing a polling point and that it should be replaced if possible by another polling point.
runtime/dynlink_nat.c
Outdated
// TODO: blocking section with masking | ||
if (dlhandle) | ||
if (dlhandle) { | ||
caml_enter_blocking_section_noexn(); | ||
caml_dlclose(dlhandle); | ||
caml_leave_blocking_section_noexn(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other calls to caml_dlclose
are not guarded by blocking sections. Not that I really care about it, but if we decide that this call is blocking, then it should be uniform.
In case of error, sets [exn] to an encoded exception value | ||
[Sys_error], [Sys_blocked_io] or an exception raised from a signal | ||
handler, if any. If the return value is -1 then [exn] contains an | ||
encoded exception. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface is rather weird. Usually, we would expect that when there is an error, no byte is read.
Please make it clear that it is possible that an error is returned and some byte is read. Moreover, is there a point in returning -1 in the case of an error when no byte is read? Why not returning 0 in this case?
Anyway, note that we would not have this weird interface if caml_leave_blocking_section
did not run async callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, these functions write (Val_unit
) to *exn
even if no errror appears. Please either make this clear in the comment or fix this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, as far as I understand, if caml_leave_blocking_section
did not run async callbacks, then all these changes would not be necessary, because it would be safe to raise exceptions from io.c
functions.
runtime/io.c
Outdated
/* caml_do_read is exported for Cash */ | ||
// raises | ||
static int read_fd_into_channel(struct channel * channel, char * buf) | ||
{ | ||
int n; | ||
value exn = Val_unit; | ||
n = caml_read_fd_exn(channel->fd, channel->flags, buf, | ||
channel->end - buf, &exn); | ||
if (n <= 0) { | ||
caml_raise_if_exception(exn); | ||
CAMLassert(n == 0); | ||
return n; | ||
} | ||
channel->offset += n; | ||
channel->max = buf + n; | ||
caml_raise_if_exception(exn); | ||
return n; | ||
} | ||
|
||
/* caml_do_read is exported for Cash. Raises. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is only necessary because caml_leave_blocking_section
runs async callbacks.
runtime/io.c
Outdated
//raises | ||
static void write_channel_to_fd(struct channel * channel, int towrite) | ||
{ | ||
int written; | ||
value exn = Val_unit; | ||
written = caml_write_fd_exn(channel->fd, channel->flags, | ||
channel->buff, towrite, &exn); | ||
if (written < 0) { | ||
caml_raise_if_exception(exn); | ||
CAMLassert(0); | ||
} | ||
if (written < towrite) | ||
memmove(channel->buff, channel->buff + written, towrite - written); | ||
channel->offset += written; | ||
channel->curr = channel->buff + towrite - written; | ||
caml_raise_if_exception(exn); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as in previous commit: this change would not be necessary if caml_leave_blocking_section
did not run async callbacks.
Is there any desire to have this in for 4.12 and if so what would be the time frame? I saw comments on the goto style on another thread so I wanted to address this now:
This PR indeed explicitly advocates in favour of using the goto style for resource clean-up.
We will see how it ends up with the simplification at #9722, but I believe that the codebase will gain in readability if exception-handling code starts being written in a systematic fashion. |
My impression from reading the code here is that a great deal of it was working around the fact that For instance, there is quite a lot of refactoring here to thread exceptions back from In particular, are the non-raising versions of the exception-raising functions (e.g. |
There are still real bugs, but we will see better when I get back to it :) (which can happen faster if I can please have an idea of the schedule.)
Surely, but keep in mind that this was written as part of an evolution path meant to carefully maintain backwards compatibility, which is a good enough reason for the added complexity.
The PR is first meant as extending the public API. As the commit log explains, the I could keep this part of the new api in my freezer until it is useful for the compiler codebase, as I did until now, but the |
Where do we stand with this PR? Obviously, we missed the 4.12 feature freeze and the fact that #9722 was merged should greatly simplify this one, right? |
We are waiting that I rewrite it after #9722, but which should give a simpler PR indeed. It was not clear to me what was the time frame for 4.12 (see above), so I put it on my plate for 4.13. Maybe that your reminder will provide me with the nudge needed to get back to it. |
This PR continues with the resource-safe C interface to allow writing resource-safe C functions in the presence of asynchronous exceptions. The ultimately goal is to make runtime correct in the face of asynchronous exceptions. It introduces new
_exn
variants of current functions to allow cleaning-up after asynchronous exceptions, in the public interface:caml_{enter,leave}_blocking_section_exn
.caml/fail.h
that raise exceptions, e.g.caml_failwith_exn
and many others.In the second part, these are used to fix leaks in several places of the runtime. The general strategy is to make the exception raising explicit through error codes / exception monad by replacing raising functions by functions that return encoded exceptions. This has been made possible by the work on making polling predictable in the C API.
This contains a proposal for a public-facing interface. This is best read commit-per-commit, with further details in the commit logs.
Left for future work:
// raises
. This PR fixes quite a few leaks but those only were obvious ones.