Support ruby 3.2 + 3.3+ #17

jpcamara · 2023-12-28T16:13:30Z

This is still WIP, but I wanted to open it up so the general approach is clear. Main oddities for me right now:

Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak
Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing
I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

Commit notes

There are lots of changes to how Ruby 3.3+ works with thread instrumentation.
There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread
As a result, you have to use the event_data->thread member in conjunction with rb_internal_thread_specific_get/set to emulate "thread local" data tracking. You can't use actual thread locals anymore because the hooks are not fired on the same thread
As a result, we can't reliably retrieve the native thread id because ruby does not expose a way to do this and using system calls won't work on different threads. Now uses object_id in ruby 3.3+
These changes also work better with the M:N thread scheduler, since even if we could access native thread ids the M:N scheduler mostly reuses threads so the associated thread ids would be wrong
Because the events can fire from different threads, the current way of getting the thread name (pthread_getname_np), will probably be wrong
Using rb_thread_current() in the instrumentation hook calls causes a segfault, but it seems to work fine in the GC methods and when starting/stopping tracing
Because we malloc the thread data in ruby 3.3, we free it up when we stop tracing, or whenever a thread exits

* There are lots of changes to how Ruby 3.3+ works with thread instrumentation. * There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread * As a result, you have to use the `event_data->thread` member in conjunction with rb_internal_thread_specific_get/set to emulate "thread local" data tracking. You can't use actual thread locals anymore because the hooks are not fired on the same thread * As a result, we can't reliably retrieve the native thread id because ruby does not expose a way to do this and using system calls won't work on different threads. Now uses object_id in ruby 3.3+ * These changes also work better with the M:N thread scheduler, since even if we could access native thread ids the M:N scheduler mostly reuses threads so the associated thread ids would be wrong * Because the events can fire from different threads, the current way of getting the thread name (pthread_getname_np), will probably be wrong * Using rb_thread_current() in the instrumentation hook calls causes a segfault, but it seems to work fine in the GC methods and when starting/stopping tracing * Because we malloc the thread data in ruby 3.3, we free it up when we stop tracing, or whenever a thread exits

jpcamara · 2023-12-28T16:38:41Z

~~Just noticed a mistake I'll fix around the thread access in Ruby 3.3, will push that up later today~~

Windows is still broken which I thought was turned off since it would never work but I'll check it another time

jpcamara · 2023-12-31T23:04:57Z

Any chance you could weigh in on the approach @byroot?

In terms of the need to manually free, I've started trying to use TypedData_Make_Struct, but don't have it fully working yet and I'm not sure it's the right approach since that seems more meant for externally exposed structs?

Also not sure the best approach for getting thread names to display. Most of my testing involves the threads exiting before the block can pull them from Thread.list so the names are always either wrong or empty

amomchilov · 2024-01-02T17:28:08Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+    thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
+    memset(thread_meta, 0, sizeof(thread_meta_t));
+    thread_meta->thread = thread;
+    thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));


Could Thread#native_thread_id work here?

Although there's a stipulation that might make it unusable here:

Note
If the thread is not associated yet or already deassociated with a native thread, it returns nil. If the Ruby implementation uses M:N thread model, the ID may change depending on the timing.

Thanks for the input @amomchilov !

There are a couple obstacles, unfortunately.

First, as far as I can tell there is no way to safely access the native_thread_id method from C without the GVL. The thread api exposed to C extension code is pretty limited, that is safe to run without the GVL. With the way this code works it is always updating the file directly during the hook events, which have no GVL guarantees. Part of me is hoping you'll tell me i'm wrong here and there is a way to invoke thread methods safely in this context :D

Second is like you mentioned - ideally we'd solve M:N support here as well. The M:N model is actually even more strict than that note, it always returns nil in the current implementation.

Makes sense! I’m just squinting and doing loose pattern matching here. I have no expertise in this domain :)

There aren’t all that many threads. Touching their object IDs would be totally fine.

Since for M:N threads the notion of a native thread id no longer exists in a lot of cases, I think it's a good idea to use the object id instead.

Having said that, I expect that for a while most folks will not be using M:N threads for Ruby 3.3 (e.g. most people won't be using ractors + won't be using M:N scheduling for the main ractor).

It would be nice to still use the native thread for such cases, methinks!

Having said all that, I realize there's already quite a few moving parts for this PR so I'm perfectly fine with starting with object id for all on 3.3 and punt on doing a fancier thing ;)

casperisfine

I haven't reviewed too deeply, but the changes overall make sense to me.

For a sec I was worried the state allocation in the callback may be racy, but actually it's fine because a given thread can't process two events concurrently, so seems OK to me.

casperisfine · 2024-01-11T12:43:46Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+    thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
+    memset(thread_meta, 0, sizeof(thread_meta_t));


Suggested change

thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));

memset(thread_meta, 0, sizeof(thread_meta_t));

thread_meta = ZALLOC(thread_meta_t);

casperisfine · 2024-01-11T12:45:51Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+  #elif HAVE_GETTID
+    native_thread_id = gettid();
+  #else
+    native_thread_id = thread_local_meta->current_thread_serial; // TODO: Better fallback for Windows?


Not sure if you know, but the GVL API is a noop on Windows

Uh, I didn't know about that!

In that case, maybe go with removing current_thread_serial and adding an ifdef to make the gem become a no-op on windows may be the way to go.

(I think it's still nice if the gem builds/installs, so that folks don't need to have conditions on their Gemfiles)

jpcamara · 2024-01-13T17:16:04Z

I haven't reviewed too deeply, but the changes overall make sense to me.

For a sec I was worried the state allocation in the callback may be racy, but actually it's fine because a given thread can't process two events concurrently, so seems OK to me.

@casperisfine Thanks for the feedback! I'll take a look at GVL tracing and see if I can clean this up a bit based on that.

Do you have any thoughts on a way to track thread names that die before we can call Thread.list at the end?

byroot · 2024-01-14T07:52:18Z

ext/gvl_tracing_native_extension/gvl_tracing.c

 // Used to mark function arguments that are deliberately left unused
 #ifdef __GNUC__
  #define UNUSED_ARG  __attribute__((unused))
 #else
  #define UNUSED_ARG
 #endif

+typedef struct {
+  VALUE thread;


I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.

But if you do that, it also solves your question about how to get the thread name.

I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.

@byroot Thanks for the feedback here! Can you provide an example of how to do this in the context of this code? I've tried piecing it together hopping between gvltools and cruby, but hours in and I haven't gotten anything working. I can keep chipping away, but tbh the appropriate ways of when and where to mark in cruby are still pretty opaque to me.

Same goes for things like TypedData_Make_Struct. I thought that might be a good approach because of the free/mark options it has, but I landed on this current version because I just couldn't get that working.

But if you do that, it also solves your question about how to get the thread name.

I think this lines up with @ivoanjo's comment:

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

So i'm assuming a combo of mark + a large-ish C array (maybe 1024 like the CRuby instrumentation tests?) would suffice to hold onto the metadata I need so I can get the thread names once the GvlTracing block finishes.

I am curious - for something like vernier, which presumably would be used to wrap long running ruby processes, how would you expect it to efficiently keep track of thread names? For gvl-tracing, you're not terribly likely to be running 1k+ threads. But for something like vernier there really are no limits to what the code running inside of it might do.

@jpcamara Not sure if you know this guide -- https://silverhammermba.github.io/emberb/c/#data -- it has a pretty good guide of how to setup a TypedData object, including the marking.

Currently, gvl-tracing is all class methods ("static"), but I guess to be able to mark we'd need to introduce a TypedData object, so maybe something like a GvlTracing::State object, which would contain inside it the threads array, that would then get marked.

I am curious - for something like vernier, which presumably would be used to wrap long running ruby processes, how would you expect it to efficiently keep track of thread names? For gvl-tracing, you're not terribly likely to be running 1k+ threads. But for something like vernier there really are no limits to what the code running inside of it might do.

This is a good point. I haven't looked at how Vernier does it but we actually don't need to necessarily do it at the end of the tracing. We could check from time to time if a name is available (e.g. to avoid doing it on every event), and then render it and move on, rather than waiting until the end.

The "wait until the end" strategy was just a bit of a simplification ;)

ivoanjo

Hey!

First of all, big thanks for working on this! And for the patience with my sloooow review haha >_>

Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak

This seems very fine! In fact, it's better than the previous design, where the data would "live forever" as part of the native thread's thread-local stuff.

Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing

Yeah... This is a bit annoying. Maybe something similar to @casperisfine suggested would work: we could keep references to all the threads we've seen (even those that die) and then in stop, we'd use that instead of Thread.list to iterate and get names from.

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

I think if you create enough threads for object_id calls to add any measurable overhead, you're probably already holding it wrong 🤣 .

There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

ivoanjo · 2024-01-20T12:58:44Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+  void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);
+  thread_meta_t* thread_meta = NULL;


Minor: Assigning directly to thread_meta simplifies a few things below I think :)

Suggested change

void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);

thread_meta_t* thread_meta = NULL;

thread_meta_t* data = (thread_meta_t *) rb_internal_thread_specific_get(thread, gvl_tracing_key);

ivoanjo · 2024-01-20T13:10:24Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+    thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
+    memset(thread_meta, 0, sizeof(thread_meta_t));
+    thread_meta->thread = thread;
+    thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));


Since for M:N threads the notion of a native thread id no longer exists in a lot of cases, I think it's a good idea to use the object id instead.

Having said that, I expect that for a while most folks will not be using M:N threads for Ruby 3.3 (e.g. most people won't be using ractors + won't be using M:N scheduling for the main ractor).

It would be nice to still use the native thread for such cases, methinks!

Having said all that, I realize there's already quite a few moving parts for this PR so I'm perfectly fine with starting with object id for all on 3.3 and punt on doing a fancier thing ;)

ivoanjo · 2024-01-20T13:11:17Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+    thread_meta->thread = thread;
+    thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));


I'm thinking there's a few places where we're initializing the thread-local structure. What do you think of adding a function to initialize it that can be used throughout?

ivoanjo · 2024-01-20T16:16:23Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+static inline void initialize_thread_id(thread_meta_t *thread_meta) {
+  thread_meta->current_thread_seen = true;
+  thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread_meta->thread));
+}


This seems somewhat redundant/duplicate from current_thread_meta above?

ivoanjo · 2024-01-20T16:21:11Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+  #elif HAVE_GETTID
+    native_thread_id = gettid();
+  #else
+    native_thread_id = thread_local_meta->current_thread_serial; // TODO: Better fallback for Windows?


Uh, I didn't know about that!

In that case, maybe go with removing current_thread_serial and adding an ifdef to make the gem become a no-op on windows may be the way to go.

(I think it's still nice if the gem builds/installs, so that folks don't need to have conditions on their Gemfiles)

ivoanjo · 2024-01-20T16:21:24Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+static thread_meta_t* current_thread_meta(VALUE thread) {
+  return &thread_local_meta;
+}


I really like this approach!

ivoanjo · 2024-01-20T16:27:24Z

lib/gvl-tracing.rb

@@ -49,6 +49,10 @@ def start(file)

    def stop
      thread_list = Thread.list
+      thread_list.each { |t|
+        GvlTracing._clean(t)
+        thread_label(t)


Why is calling thread_label needed here? 🤔

byroot · 2024-01-20T16:43:40Z

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

Yes, it's fired from the parent thread. That simplified the implementation quite a bit.

jpcamara · 2024-01-21T04:43:19Z

Hey!

First of all, big thanks for working on this! And for the patience with my sloooow review haha >_>

Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak

This seems very fine! In fact, it's better than the previous design, where the data would "live forever" as part of the native thread's thread-local stuff.

Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing

Yeah... This is a bit annoying. Maybe something similar to @casperisfine suggested would work: we could keep references to all the threads we've seen (even those that die) and then in stop, we'd use that instead of Thread.list to iterate and get names from.

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

I think if you create enough threads for object_id calls to add any measurable overhead, you're probably already holding it wrong 🤣 .

There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

Thanks for the feedback! I'll make some changes in the next few days so we can get this buttoned up 🎉

jpcamara · 2024-02-19T03:33:44Z

Closing in favor of #18

@byroot

Fix #17 Paired with @byroot This PR adds support for Ruby 3.3 by using the new `rb_internal_thread_specific_get` which allows accessing data specific to a Ruby thread without requiring the GVL to be locked. It also handles N:M threads automatically. More info here: ruby/ruby@352a885 In Ruby 3.3, the thread events don't necessarily run from the thread they correspond to, so the thread corresponding to the event is sent in `event_data`, and now `GT_LOCAL_STATE` needs to take the thread too. To handle GC, the thread specific data is wrapped into a TypedData struct and stored in the Ruby thread as a instance variable. This requires getting the GVL, so we have an `allocate` flag on `GT_LOCAL_STATE` too, which is false when we're not running from the thread and can't allocate. The hook only runs from its corresponding thread for the `RUBY_INTERNAL_THREAD_EVENT_STARTED` and `RUBY_INTERNAL_THREAD_EVENT_RESUMED` events, so we can allocate there. If the local state hasn't been initialized yet, we just return early. This means we would skipping events from already existing threads that haven't been resumed yet. To avoid this, we initialize all threads when tracing starts. We do this from Ruby, which ensures we have the GVL. We added a spec for that, which passes on main on 3.2, but not on 3.3. To support N:M threads (and differentiate events from different threads but the same native thread), instead of returning the native thread id, we know return the address of the Ruby thread in `tid` (on Ruby 3.3).

VALUE thread is needed for ruby 3.3, accidentally removed it

e1bf824

amomchilov reviewed Jan 2, 2024

View reviewed changes

casperisfine mentioned this pull request Jan 11, 2024

Fix compatibility with Ruby 3.3 Shopify/gvltools#12

Merged

casperisfine reviewed Jan 11, 2024

View reviewed changes

byroot reviewed Jan 14, 2024

View reviewed changes

ivoanjo reviewed Jan 20, 2024

View reviewed changes

etiennebarrie mentioned this pull request Feb 6, 2024

Support for 3.3 and N:M threads #18

Merged

jpcamara closed this Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ruby 3.2 + 3.3+ #17

Support ruby 3.2 + 3.3+ #17

jpcamara commented Dec 28, 2023 •

edited

jpcamara commented Dec 28, 2023 •

edited

jpcamara commented Dec 31, 2023

amomchilov Jan 2, 2024

jpcamara Jan 3, 2024

amomchilov Jan 3, 2024 •

edited

ivoanjo Jan 20, 2024 •

edited

casperisfine left a comment

casperisfine Jan 11, 2024

casperisfine Jan 11, 2024

ivoanjo Jan 20, 2024

jpcamara commented Jan 13, 2024

byroot Jan 14, 2024

jpcamara Feb 3, 2024 •

edited

ivoanjo Feb 18, 2024

ivoanjo left a comment

ivoanjo Jan 20, 2024

ivoanjo Jan 20, 2024 •

edited

ivoanjo Jan 20, 2024

ivoanjo Jan 20, 2024

ivoanjo Jan 20, 2024

ivoanjo Jan 20, 2024

ivoanjo Jan 20, 2024

byroot commented Jan 20, 2024

jpcamara commented Jan 21, 2024

jpcamara commented Feb 19, 2024

		thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
		memset(thread_meta, 0, sizeof(thread_meta_t));

	thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
	memset(thread_meta, 0, sizeof(thread_meta_t));
	thread_meta = ZALLOC(thread_meta_t);

		void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);
		thread_meta_t* thread_meta = NULL;

	void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);
	thread_meta_t* thread_meta = NULL;
	thread_meta_t* data = (thread_meta_t *) rb_internal_thread_specific_get(thread, gvl_tracing_key);

		thread_meta->thread = thread;
		thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));

Support ruby 3.2 + 3.3+ #17

Support ruby 3.2 + 3.3+ #17

Conversation

jpcamara commented Dec 28, 2023 • edited

Commit notes

jpcamara commented Dec 28, 2023 • edited

jpcamara commented Dec 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amomchilov Jan 3, 2024 • edited

Choose a reason for hiding this comment

ivoanjo Jan 20, 2024 • edited

Choose a reason for hiding this comment

casperisfine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpcamara commented Jan 13, 2024

Choose a reason for hiding this comment

jpcamara Feb 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo Jan 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byroot commented Jan 20, 2024

jpcamara commented Jan 21, 2024

jpcamara commented Feb 19, 2024

jpcamara commented Dec 28, 2023 •

edited

jpcamara commented Dec 28, 2023 •

edited

amomchilov Jan 3, 2024 •

edited

ivoanjo Jan 20, 2024 •

edited

jpcamara Feb 3, 2024 •

edited

ivoanjo Jan 20, 2024 •

edited