Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring DTrace/eBPF support back #44550

Closed
RafaelGSS opened this issue Sep 7, 2022 · 16 comments
Closed

Bring DTrace/eBPF support back #44550

RafaelGSS opened this issue Sep 7, 2022 · 16 comments
Labels
feature request Issues that request new features to be added to Node.js. stale

Comments

@RafaelGSS
Copy link
Member

RafaelGSS commented Sep 7, 2022

Hello folks!

At Diagnostics WG we are assessing all the supported diagnostic tooling following https://github.com/nodejs/node/blob/main/doc/contributing/diagnostic-tooling-support-tiers.md. The assessment is almost done, the remaining piece is the nodejs/diagnostics#535 (which will be partially finished on #44549 merge).

However, I have talked directly to @No9 and we share the same strong feeling that Node.js should really keep the support of eBPF available while it can certainly be a game changer when dealing with large applications. For context, eBPF was discussed previously on nodejs/diagnostics#386 and it got no traction, then, closed. Recently, we’ve dropped the support to DTrace on #43652 making the eBPF support out-of-date.

I would like to revert the #43652 and make the eBPF support officially supported by including a few other tracepoints, and obviously, help with any build problems that we might have with this support(maybe, create a @nodejs/ebpf team?).

To move towards official support, what would this feature need?

cc: @nodejs/diagnostics

@tony-go
Copy link
Member

tony-go commented Sep 7, 2022

Thanks for this initiative @RafaelGSS 👏

Regarding this issue (description), maybe could you:

  • explain why "it can certainly be a game changer"
  • add a few links regarding eBPF itself

Regarding the feature, the only criteria I'd like to advocate for is a guide/doc within the Node.js website, but I think you already have that in mind.

@bnoordhuis
Copy link
Member

When you say "eBPF" you mean the static USDT probes, right?

I removed them because they were an ongoing maintenance drag and because, to a first approximation, no one uses them. I literally couldn't find anyone who did.

Functionality-wise, the V8 inspector protocol can do everything the probes did, and much more besides.

@RafaelGSS
Copy link
Member Author

RafaelGSS commented Sep 7, 2022

When you say "eBPF" you mean the static USDT probes, right?

For the current functionality, yes. But, I'd include dynamic probes too.

Functionality-wise, the V8 inspector protocol can do everything the probes did, and much more besides.

V8 Inspector can do it locally, but production-wise the V8 inspector protocol adds a considerable overhead when enabled. When diagnosing production apps, USDT probes are definitely a good way to go.

I removed them because they were an ongoing maintenance drag and because, to a first approximation, no one uses them. I literally couldn't find anyone who did.

Could you list a few examples? I'm happy to help with that maintenance.

@Qard
Copy link
Member

Qard commented Sep 7, 2022

The inspector protocol is not at all suitable for production use. The overhead of just turning it on is enormous. Static probes, on the other hand, have basically zero impact until something listens to them.

I'd agree about the maintenance issue because the previous implementation lacked any coherence and we should probably have some sort of cross-platform compatibility layer for it to work across platforms without so much duplication.

@bnoordhuis
Copy link
Member

I'd include dynamic probes too

Those Just Work, right? That's what the node-dtrace-provider package is all about.

Could you list a few examples?

Sure. They broke like clockwork with every major (and sometimes minor) V8 upgrade.

I'm happy to help with that maintenance.

Be careful what you're volunteering for! Go over to the node-v8 repo to see what a massive pain every upgrade is. Shout out to unsung hero @targos.

@bnoordhuis
Copy link
Member

By the way, one more indicator that no one used the static probes: the introduction of V8's Ignition bytecode interpreter in 2017 broke v8ustack.d, the thingy that prints JS stack frames. No one complained.

@RafaelGSS
Copy link
Member Author

Honestly, I don't think it's a good decision to just drop it because it's breaking for some weird reason. I can certainly agree with removing it if no one helps in that maintenance, that's totally fair.

However, it's not fair to say that no one is using it, the reason might be just because there is no content about it using Node.js. Developers might not be using it, because they are unaware of this possibility.

Would be great to have some examples of breaking builds due to DTrace support. Collecting all the content we have in order to measure what would need to bring that support back in a stable way (obviously, not adding extra work on each release) is reasonable IMHO.

@bnoordhuis
Copy link
Member

it's not fair to say that no one is using it, the reason might be just because there is no content about it

Microsoft advertised ETW support when it landed (which piggybacks on the dtrace infrastructure) but that doesn't seem to have lead to a measurable uptick. The Windows user base is big so that's indicative.

@RafaelGSS
Copy link
Member Author

Microsoft advertised ETW support when it landed (which piggybacks on the dtrace infrastructure) but that doesn't seem to have lead to a measurable uptick. The Windows user base is big so that's indicative.

IMHO a really small piece of the market uses Windows + Node.js on a production server, which is the real value of this support.

@Qard
Copy link
Member

Qard commented Sep 10, 2022

Yeah, lots of people dev on Windows, very few run production workloads on Windows. Prod is where static probes matter, because turning on the inspector in prod will take down the server in most heavy workloads.

@jasnell
Copy link
Member

jasnell commented Sep 10, 2022

As @bnoordhuis points out the stuff that was removed was removed because it was simply unmaintained. I'm perfectly fine with that stuff coming back in so long as it is going to be actively maintained, otherwise it's just tech debt as soon as it lands.

@RafaelGSS
Copy link
Member Author

Exactly @jasnell, that's the goal of this issue. I'm trying to collect all the pain of maintaining DTrace and discuss better ways to support it without being a tech debt. If we bring this support back, I want to make sure that at the very least we'll have guides and content about it.

@Qard
Copy link
Member

Qard commented Sep 10, 2022

Before bringing it back, I think we need a compatibility layer to have one consistent interface across platforms. Something that is for static probes what libuv is for async I/O. Without such a thing I don't see this being stable or maintained enough to warrant its return.

I think we also would need a more clear vision on exactly what we should have probes for, what data should be shared through them, and document all that extensively. The past stuff was so poorly documented that it was essentially useless to anyone that hadn't dug through the code themselves to figure out what was actually there.

@tniessen tniessen added the feature request Issues that request new features to be added to Node.js. label Sep 10, 2022
@RafaelGSS
Copy link
Member Author

RafaelGSS commented Oct 2, 2022

UPDATE:

We've talked about it at the Node.js Collab Summit and to make it happen we would need to create a proposal covering at least the following points:

  • How the current pain points would be solved?
  • What are the real-world use cases?
  • What are the needed static probes?
  • Who will support it? Is there any company behind the support?

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2023

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Apr 1, 2023
@RafaelGSS
Copy link
Member Author

While I really believe we should bring it back, I don't have the bandwidth to work on this for the next months. Therefore, I'll close it. If someone wants to take lead on that, please see my previous comment and feel free to ping me for guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stale
Projects
Development

No branches or pull requests

6 participants