Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure C global symbols are consistently prefixed #10926

Merged
merged 19 commits into from Jun 13, 2022

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Jan 20, 2022

This PR starts out from upstreaming a change which has been carried for some time in opam-repository-mingw. The compilers there add a prefix to the list_ functions in winlist.c.

  • As per https://github.com/ocaml/ocaml/pull/9260/files#r371792438, the Windows runtime functions win_multi_byte_to_wide_char and win_wide_char_to_multi_byte have the prefix win_ changed to caml_. They're internal - the actual conversions should be done using macros.
  • All the remaining functions in the Unix implementation of the Unix library are now prefixed unix_ with the exception of uerror.
  • The same thing is done to most of the remaining functions in the Windows implementation of the Unix library, except that primitives and C functions which were prefixed win_ have stayed that way (for now).
  • tools/check-symbol-names has been updated with these changes.

We could change the last remaining functions in win32unix if we wanted. They are: win_CRT_fd_of_filedescr, win_filedescr_of_channel, win_handle_fd, win_inchannel_of_filedescr, win_outchannel_of_filedescr, win_clear_close_on_exec, win_set_close_on_exec, win_create_process, win_create_process_native, win_terminate_process, win_cleanup, win_startup, win_system, win_alloc_handle, win_alloc_socket, win_set_inherit, win_findclose, win_findfirst, win_findnext, win_waitpid. At present, win32unix specifically is allowed to have win_ as a prefix.

Thoughts on this so far welcomed. I'm very happy to split off bits which are acceptable (e.g. the two win_ functions in the runtime) as approved. It won't be a zero-impact change as there are libraries which hack into the internals of the Unix library, but it's a very detectable change - if we're happy to go ahead, we can do some pre-emptive checking in opam-repository for affected packages.

Changes Outdated Show resolved Hide resolved
@xavierleroy
Copy link
Contributor

Thank you for the clean-up work, it's definitely useful!

You wrote that uerror is too widely used to be renamed. Is that really so?

In perfect hindsight, for the Unix library, it would make sense to use caml_unix_ and caml_win32_ as prefixes instead of unix_ and win32_, just like the Bigarray code now uses caml_ba_.

@dra27 dra27 added this to the 5.0.0 milestone Apr 24, 2022
@xavierleroy
Copy link
Contributor

This PR needs to be rebased and adapted to the recent removal of otherlibs/win32unix. I'm still mildly in favor of this PR, but the 5.0 window is closing, and we need to focus our changes.

@dra27
Copy link
Member Author

dra27 commented May 24, 2022

Rebased - updates to follow.

@dra27
Copy link
Member Author

dra27 commented May 25, 2022

I've added some commits which eliminate the use of win_ as a prefix completely (@MisterDA, @nojb - thoughts + opinions?). In a few cases, we have identically-behaving primitives between what was win32unix and unix but with different names.

Separately, I'll kick off a health-check run to how widespread uerror really is and also just how bad renaming all the C symbols to be caml_unix_ actually would be.

@MisterDA
Copy link
Contributor

Seems reasonable to me. I also support the change from unix_ to caml_unix, or more generally to use the caml_ prefix for all non-static functions.

@shindere
Copy link
Contributor

shindere commented May 25, 2022 via email

@damiendoligez
Copy link
Member

Even if uerror is too widely used to be removed, what about having caml_uerror as the official function and uerror as a backward-compatible alias that we hope to remove someday?

@MisterDA
Copy link
Contributor

And perhaps (I don't think we're doing it already) explain in the manual that the caml_ prefix is reserved for C symbols if that makes sense? I've seen some C stubs in the wild using it.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look at the changes, nothing amiss except yacc.

My vote also goes to caml_unix although this means the PR will touch 150 files instead of 83.

yacc/wstr.c Show resolved Hide resolved
@shindere
Copy link
Contributor

shindere commented May 25, 2022 via email

@dra27
Copy link
Member Author

dra27 commented May 26, 2022

So, just before I pause for a week:

  • An opam-health-check run of this PR convinced me that there should be compatibility macros in socketaddr.h which I pushed. lwt will require a patch (see dra27/lwt@58437ae) because it cribs the socket_domain_table and socket_domain_type values in C. Only one package fails, but more on that in a sec.
  • dra27@768df92 (not at present in this PR) renames uerror to caml_uerror (with unixsupport.h defining an alias) and dra27@fc4f8c3 then goes the whole hog renaming all the symbols to caml_unix_. I tested these separately - only an additional 3 packages break, all for the same reason - it looks like there may be a Mirage library which declares uerror itself rather than using caml/unixsupport.h and so doesn't pick up the compatibility macro. Lwt needs one extra fix for the full caml_unix_ rename, because it uses the unix_exit primitive (see dra27/lwt@12813c0)

However, ocluster.0.1 fails with all three and it does reveal the limitation of these tests - when we build libraries (both C and OCaml) we don't necessarily try linking programs with them! ocluster reveals internal use of cst_to_constr in the extunix package which is otherwise building fine.

I'm not sure this means adding one or both of those additional commits is a bad idea (especially as the fixes where needed in libraries is really quite easy), but a different strategy for bulk testing might be in order. It would be relatively simple to hack Asmlibrarian and Bytelibrarian temporarily to check for the old primitive names, but I won't get to that before 6 June.

Either way, there's still a little bit of time to decide whether this wants to be in alpha1 or not.

@damiendoligez
Copy link
Member

I think we should do it.

I agree.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for one misplaced macro.

#define get_sockaddr caml_unix_get_sockaddr
#define alloc_sockaddr caml_unix_alloc_sockaddr
#define alloc_inet_addr caml_unix_alloc_inet_addr
#define alloc_inet6_addr caml_unix_alloc_inet6_addr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this one inside the #ifdef HAS_IPV6 on line 84.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, yes!

diff --git a/otherlibs/unix/socketaddr.h b/otherlibs/unix/socketaddr.h
index 66e7b02aac..768669d0cc 100644
--- a/otherlibs/unix/socketaddr.h
+++ b/otherlibs/unix/socketaddr.h
@@ -69,7 +69,6 @@ extern "C" {
 #define get_sockaddr caml_unix_get_sockaddr
 #define alloc_sockaddr caml_unix_alloc_sockaddr
 #define alloc_inet_addr caml_unix_alloc_inet_addr
-#define alloc_inet6_addr caml_unix_alloc_inet6_addr
 #endif /* CAML_BUILDING_UNIX */

 extern void caml_unix_get_sockaddr (value mladdr,
@@ -84,7 +83,12 @@ extern value caml_unix_alloc_inet_addr (struct in_addr * inaddr);
 #ifdef HAS_IPV6
 extern value caml_unix_alloc_inet6_addr (struct in6_addr * inaddr);
 #define GET_INET6_ADDR(v) (*((struct in6_addr *) (v)))
-#endif
+
+/* Compatibility definition for the pre-5.0 name of this function */
+#ifndef CAML_BUILDING_UNIX
+#define alloc_inet6_addr caml_unix_alloc_inet6_addr
+#endif /* CAML_BUILDING_UNIX */
+#endif /* HAS_IPV6 */

 #ifdef __cplusplus
 }

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2022

Thanks, @damiendoligez - I also added you to the Changes entry on the last push. Good to go with CI, therefore and to 5.0 and alpha0 🥳

@damiendoligez
Copy link
Member

Before merging, wait for @Octachron to finish his own review.

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2022

ACK

@hannesm
Copy link
Member

hannesm commented Jun 13, 2022

This is a great line of work, thanks for working on it.

Copy link
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the changes look good even with a third pair of eyes.

@Octachron Octachron merged commit 000d15d into ocaml:trunk Jun 13, 2022
@@ -34,7 +34,7 @@
UDP (datagram) sockets.
Returns 0 if OK, a nonzero error code if error. */

static int unix_check_stream_semantics(int fd)
static int caml_unix_check_stream_semantics(int fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't need the caml_unix_ prefix, it's static, and the Windows file doesn't have it.
(oops, too late).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, since the function is static, renaming it is not a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're a few static symbols which could get simplified names in future, yes! (there's caml_unix_error_exn as well). Wood for the trees, 'n all that 🙂

Octachron added a commit that referenced this pull request Jun 13, 2022
Ensure C global symbols are consistently prefixed

(cherry picked from commit 000d15d)
dra27 added a commit that referenced this pull request Jun 13, 2022
dra27 added a commit that referenced this pull request Jun 13, 2022
(cherry picked from commit 0dc4924)
@hannesm
Copy link
Member

hannesm commented Jun 13, 2022

Would it be sensible to have a CI job that ensures all global symbols in (e.g.) libasmrun.a contain a caml_ prefix (by using readelf and grep -v)? To avoid any accidental inclusion of symbols without such a prefix?

@dra27
Copy link
Member Author

dra27 commented Jun 13, 2022

Thanks, @hannesm! Thank you also to everyone for the reviewing time on this - these changes in 5.0 are obviously insignificant next to the headline features, but it's very pleasing to have been able to use the major version bump to correct them.

Would it be sensible to have a CI job that ensures all global symbols in (e.g.) libasmrun.a contain a caml_ prefix (by using readelf and grep -v)? To avoid any accidental inclusion of symbols without such a prefix?

There already is 🙂 It was updated as part of this PR. It's done nm rather than readelf but has the same effect.

djs55 added a commit to djs55/opam-repository that referenced this pull request Jun 19, 2022
CHANGES:

* Ensure compatibility with OCaml 5.0, after `uerror` change (ocaml/ocaml#10926) (mirage/mirage-block-unix#115, @dra27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants