Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost of Key, Value abstractions #1642

Open
cijothomas opened this issue Mar 25, 2024 · 2 comments
Open

Cost of Key, Value abstractions #1642

cijothomas opened this issue Mar 25, 2024 · 2 comments
Assignees

Comments

@cijothomas
Copy link
Member

cijothomas commented Mar 25, 2024

Benchmarks from here shows that the cost of creating abstractions like Key,Value in opentelemetry crate is non-trivial.
Take the example below:

c.bench_function("CreateOTelKeyValue", |b| {
        b.iter(|| {
            let _v1 = black_box(KeyValue::new("attribute1", "value1"));
        });
    });

    c.bench_function("CreateTupleKeyValue", |b| {
        b.iter(|| {
            let _v1 = black_box(("attribute1", "value1"));
        });
    });

Its 7.7199 ns vs 786.74 ps (10x!)

It may be very-hard or even impossible to avoid the overhead. Opening an issue to facilitate further discussions. Some options include

  1. Do nothing, and accept this as the cost ("OTel tax"). This is simply listed only, not really an option we can afford to take!
  2. Make Key,Value etc. as Traits with no-op implementation in opentelemetry crate, with actual implementations moved to the sdk crate. This should avoid/reduce perf hit when just opentelemetry is used. Alternatively, this may also be achieved via some feature-flag with no-ops as default feature in API. The SDK enables the feature-flag in the API, so users without SDK enabled will get high perf.
  3. See if we can steal/borrow idea from tracing crate, which uses macros that avoid evaluating the attributes completely, unless enabled. The prior art here is definitely worth exploring further.

I also want to emphasize why is this issue is relevant/important! For an application owner who uses OTel, this may not be a significant concern. But this would be a serious (probably blocker) for a library author to bet on OpenTelemetry as the instrumentation API. This is because, by simply enabling OTel API calls, the library's own performance will degrade, even when nobody is interested in the telemetry (i.e the final app owner does not enable OTel itself). Whether the few nanosecs overhead is significant or not depends on the library's own latency without OTel! If it is in the order of milliseconds, then probably not a big issue, but Rust is used in perf sensitive scenarios where it won't be surprising to find libraries that perform its core operation in nanosecs order.

@cijothomas
Copy link
Member Author

@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .

@cijothomas
Copy link
Member Author

@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .

From SIG/Community meeting yesterday:
Lalit has confirmed that the perf hit is not due to any accidental copying/cloning or heap allocations. We are not doing these.
These are costs added up by having several redirections/enums (eg: Value(enum)->StringValue->OtelString(enum)->str).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants