Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview of Major blockers for GA/Stable Release #1572

Open
cijothomas opened this issue Feb 23, 2024 · 0 comments
Open

Overview of Major blockers for GA/Stable Release #1572

cijothomas opened this issue Feb 23, 2024 · 0 comments

Comments

@cijothomas
Copy link
Member

cijothomas commented Feb 23, 2024

A lot has happened over last few months in this repo - We added support for Logs signal, Metrics was rewritten to match new spec, plenty of performance improvements were achieved, spun up contrib repo and lot more. Big thanks to
everyone who made it happen!

Still, we are not in a position to declare stable for any signal yet, and this issue is meant to highlight the challenges/issues that is blocking a stable release. This is not a comprehensive list of all the issues, but rather an overview, only highlighting key challenges, organized by signal. The potential timelines are also not discussed in this issue, but will be separately announced.

Logs

API:

  • The opentelemetry crate does not provide an end-user-facing logging API but instead offers a logging-bridge API. This approach simplifies the process of stabilizing the logging signal, although it requires addressing issues related to the API making overly opinionated decisions.
  • For end-users, tracing crate is recommended for logging, and a logging-bridge for tracing is planned to be maintained within the repository itself, though the final decision can be influenced by the outcome of OTel Tracing API vs Tokio-Tracing
    API
    .

SDK:

  • The primary challenge is ensuring excellent performance. While there's no universal benchmark for this, the goal is to minimize the overhead introduced by the OpenTelemetry SDK itself. We need to aim for a performance comparable
    to using tracing with a custom subscriber that publishes to OTLP, bypassing OTel SDK.
  • A specific issue arises with log correlation
    with traces, especially when the tracing crate is used instead of the OTel Tracing API. This issue also affects the Traces signal and will likely be resolved with a unified solution. OTel Tracing API vs Tokio-Tracing API is intermixed with this to a certain extent.

Metrics

API:

  • The API is believed to have no significant challenges remaining, but a comprehensive review is necessary to confirm its readiness to meet performance objectives, and spec compliance.

SDK:

  • Substantial work is required to achieve performance targets. Comparisons with OTel .NET Metrics reveal a big gap in performance (3-4 M/sec throughput for Histograms vs. 16M/sec in OTel .NET), indicating the need for optimizations such as eliminating attribute de-duplication and avoiding sorting. Reducing contention between updates and between updates and collects is also crucial, as shown here.
  • Beyond performance, memory efficiency tests are needed, particularly for Delta temporality. The goal should be to ensure predictable memory overhead, with pre-allocation of necessary memory based on cardinality limits. A feedback mechanism should also be required so users can see if they are about to hit the limits or they are over-allocating and wasting memory.
  • Additional requirements include support for metrics without relying on async runtime like tokio and addressing shutdown-related issues, which may require non-trivial changes.
  • Testing coverage must be significantly improved to identify and resolve any bugs when using advanced capabilities (like temporality conversion, views etc.) through more targeted testing.
  • Implementing advanced features like Exemplars can wait until after the first stable release.

Traces

API:

  • The Tracing API is relatively mature but requires improved interoperability with the tracing crate. Addressing this interoperability issue is a top priority before the Tracing API can be considered stable. OTel Tracing API vs Tokio-Tracing
    API
    describes this in detail.

SDK:

  • General performance improvements are necessary. The Tracing SDK also faces shutdown-related bugs, suggesting that solutions may be broadly applicable across all signals.

OTLP Exporter (All Signals)

  • The OTLP Exporter, covering all signals, is expected to be relatively straightforward to stabilize once the API and SDK are stable or nearly so. Given limited resources, focusing on the OTLP Exporter and treating other exporters (e.g., Zipkin, Prometheus) as post-stable release tasks could be reasonable.

Context API

There were performance concerns raised about this API, we may need to special case a lot to make sure this won't become a bottleneck. Baggage may need some attention as
well, but I believe it can be treated as lower priority - though OTel's Baggage API is stable, the W3C Baggage propagation format is not yet final.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant