Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Telemetry Support (Cloud Monitoring / Cloud Trace) #8366

Open
oluatte opened this issue Apr 22, 2022 · 20 comments
Open

Open Telemetry Support (Cloud Monitoring / Cloud Trace) #8366

oluatte opened this issue Apr 22, 2022 · 20 comments
Assignees
Labels
api: cloudtrace Issues related to the Cloud Trace API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@oluatte
Copy link

oluatte commented Apr 22, 2022

Is your feature request related to a problem? Please describe.
We would like to use Open Telemetry on GCP but cannot do so easily due to various issues with open telemetry support for dotnet in Google libraries. This has in a tough spot.

  • Manually instrument our libraries with the google specific bits knowing fully well that the industry (and google itself) is converging on open telemetry OR
  • Standardize on open telemetry and find another sink / backend for our logs.

Describe the solution you'd like
Please support open telemetry for the dotnet stack.

Additional context
Mixing and matching libraries has not been successful. Even if we use google libraries for logging and open source libraries for traces (exporting to cloud trace), the trace ids don't match up.

@jskeet
Copy link
Collaborator

jskeet commented Apr 22, 2022

Assigning to Amanda for a more detailed answer, but the TL;DR is that we're aware that this is an area we could do a lot more in, and we'd definitely like to. We're busy with other work at the moment, but this is definitely on our backlog. I know that's not much of a consolation at the moment. I hope that when we eventually get to it, we'll be able to provide a nice solution that provides tracing with all the client libraries "for free" etc.

@jskeet jskeet added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 22, 2022
@oluatte
Copy link
Author

oluatte commented Apr 22, 2022

Thanks, Jon.

Also want to point that we would love to hear about any workarounds / alternative paths that we can use in the mean time. We've recently heard about the Open Telemetry Collector for example, but not sure this is a valid workaround for this.

@amanda-tarafa
Copy link
Contributor

@mayoatte Will try to get this started/done on the second half of the year, but that's a soft commitent for now. Again I'm sorry this is not a good answer for you at this moment.

Mixing and matching libraries has not been successful. Even if we use google libraries for logging and open source libraries for traces (exporting to cloud trace), the trace ids don't match up.

If the Google libraries that you are using for logging are the Google.Cloud.Diagnostics libraries, then those will attempt to use Google's own trace header by default to extract tracing context to include in the log entry. You can inject a transient Google.Cloud.Diagnostics.Common.ITraceContext that fetches the context from the Open Telemetry tracing context instead, and that will be used to append tracing information to log entries.

@oluatte
Copy link
Author

oluatte commented Apr 22, 2022

Thanks, Amanda.

We will try setting the trace context manually. And look forward to your open telemetry efforts.

@amanda-tarafa amanda-tarafa added priority: p3 Desirable enhancement or fix. May not be included in next release. api: cloudtrace Issues related to the Cloud Trace API. labels Jun 6, 2022
@amanda-tarafa
Copy link
Contributor

Reassigning to @Rishabh-V as he's looking into this.

@atrauzzi
Copy link

I'd really like to be able to use the standard OpenTelemetry libraries not only for instrumenting my code, but also for the export format.

I talk more about this desire here: open-telemetry/community#984

Basically, I would like to see a way to do OpenTelemetry without having to install a single google-specific package in my application. Instead, I should be able to configure the standard otel wire protocol exporter with a hostname that is "well known" on all google compute offerings (Cloud Run!).

@amanda-tarafa
Copy link
Contributor

@atrauzzi What we do in this repo is Google specific client and instrumentation libraries :), so we are unlikely to be of any help here. I do see where you are coming from with this request, but my point is that it is better suited for Cloud Run, AppEngine Flex etc.

Just to clarify, this issue covers one main aspect: How we are going to instrument the libraries we produce so that the telemetry they emit is OTEL compatible. The current plan is that we'll instrument using .NET standars like System.Diagnostics and Microsoft.Extensions.Logging, etc. This means that we are not inmediately concerned with what exports the telementry or how it is exported.

Let me know if this does not fully adress your comment.

@bgenidy
Copy link

bgenidy commented Jan 4, 2024

hi @Rishabh-V @amanda-tarafa was just curious if there was any progress on getting opentelemetry working out of the box with Google cloud logging in C#

@amanda-tarafa
Copy link
Contributor

@bgenidy We are looking into this still. Without making any hard commitments, we expect to have at least some basic support in the next couple of months.

@atrauzzi
Copy link

atrauzzi commented Jan 4, 2024

@amanda-tarafa -- Thanks for the followup! In terms of your response, yeah it makes sense -- although if I'm being pragmatic here, it's hard to say if it necessarily covers what myself (and likely many others) want.

The issue is that if I use the standard otel library, I shouldn't need anything Google-specific in my .NET app as the work has already been accomplished by the otel project. You're just duplicating their efforts with the added tax of developers having to add a Google specific library to their applications.

For the issue trackers, the main complaint is that they are very poor as a communication and advocacy tool between Google and its customers. Most things go there to die, which is not a great feeling for paying customers. The quality of dialogue with developers there is nothing compared to what people can get here ❤️ .

So... Would I use a package that does some Google magic with the standard .NET abstraction in the interim? Yeah, I'd probably tinker with it.

But this is a topic I sincerely believe your team might still be able to have a positive impact on by proactively advocating through internal channels (see my point on duplication of effort above)....and maybe you are already! But again, customers get such poor visibility into what's cooking inside Google Cloud, we're kind of forced into this starvation mentality. 😢

@atrauzzi
Copy link

atrauzzi commented Jan 4, 2024

On a separate note, I'd love to hear more from @jsuereth on this who did put in a small remark on the other issue. Is this just a matter of getting people within the same roof aligned?

@amanda-tarafa
Copy link
Contributor

@atrauzzi I'll try to break down in points what I understand from your last comment here and also from the discussion in open-telemetry/community#984 are your main sources of frustration. Then we can discuss each point individually.

  1. On the .NET Google Cloud client libraries instrumentation side, we won't be asking you to use any specific telemetry libraries, OTEL or otherwise. Our instrumentation won't even be OTEL specific. We'll rely on .NET's own standards for capturing client library telemetry (ActivitySource/Activity, ILogger, etc.), and then users of our libraries can decide where to send that telemetry to. There are two steps to this.
  2. "Something" in your application needs to actually capture all the information that the Google Cloud libraries have stored in, say, an Activity, and "translate it" to the telemetry technology you want to use. For OpenTelemetry you would use the OpenTelemetry Nuget package to indicate that the Google Cloud libraries are a source of telemetry. Simplifying much, you'd have something like this in your application:
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
...
// When configuring OpenTelemetry...
builder.Services.AddOpenTelemetryTracing(tracerProviderBuilder =>
{
    tracerProviderBuilder.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(s_serviceName))
        .AddSource("Google.Cloud.*") // Capture telemetry from .NET standard telemetry APIs and use it as an OTEL source.
        .... // All your other OTEL cofiguration, including you application code telemetry sources, etc.
});
  1. The OpenTelemetry SDK also needs to know where to send that information to. And yes, you are right that since Google Cloud does not support native OTLP, you need a Google Cloud specific exporter.
  2. A last issue I gather from your comments is that, when running on Google Cloud, you'd also like runtime telemetry itself to be OTEL exportable/exported.

Does this breakdown seem accurate to you? As I'll now try to identify your pain points on each of these. Note that I'm conscious that one of your pain points is that we don't seem to be working all together to make Google - OTEL integration seemless for users. We are very much working together, but when there are so many puzzle pieces is only natural that priorities and resources are not always necessarily alligned. By breaking these down I'm trying to pinpoint issues that we on the client libraries side can address inmediately, and issues were we need to work with the wider group.

Pain points for you to clarify/confirm:

  1. From, and I quote, "Would I use a package that does some Google magic with the standard .NET abstraction in the interim? Yeah, I'd probably tinker with it." I gather this is not your prefered approach. But I don't understand why. For us this is the best approcha because it keeps the Cloud libraries as indepent as possible from telemetry technologies giving users of our libraries the chance to use whatever they prefer or are already using, and it's not tied to OpenTelemetry remaining an industry standard for ever. We are, yes, assuming that telemetry technologies that aim to be relevant to the .NET community would offer integration with .NET telemetry standards. What are the issues you see with this approach? How would ideal instrumentation look for you?
  2. In principle this would be the minimum required for you to capture .NET Google Cloud libraries telemetry. We are still discussing how granular we want our library sources to be, how much information we'll initially capture, what will our reliance on Grpc.Net.Client own telemetry will be, etc. So I'd expect in reality that you'd have maybe a handful of similar lines of codes but nothing else. What are the issues you see with this approach? How would ideal usage look for you?
  3. I think this is really your main pain point. And I agree with you. @jsuereth is probably in a better position than myself to talk about this point.
  4. I think this is already somewhat possible, althouhg in my experience, what runtime telemetry is exported and how, and what can be configured or not is uneven across runtimes. I'm not certain we could ever get to a fully standardized telemetry experience across runtimes, but what would probably help here is a list of mayor pain points per runtime and/or the most painful disparities across runtimes. That would be helpful for each of the runtimes' teams to evaluate, prioritize and coordinate the work.

Let me know if I'm missunderstanding or missrepresenting your concerns.

@atrauzzi
Copy link

atrauzzi commented Jan 5, 2024

@amanda-tarafa - Thanks for the summary! I'll try to preserve the structure... 😄

Summary

  1. I think this is mostly true and nothing unexpected on my end. We're definitely on the same page here. Standard MS telemetry abstractions are good. The part that goes a little wobbly for me is the assertion that "we won't be asking you to use any specific telemetry libraries". That's contradicted by what you mention in item 3. I'm not calling out that the application has to bind to your abstraction. It's that the application has to take a dependency on something google specific, rather than being able to just speak otel everywhere. I think we'll come back to this in the next few responses.
  2. This is a bit confusing to me: "...all the information that the Google Cloud libraries have stored..." -- It's not google that makes traces, I do, as per the previous item. Maybe I'm conflating the tracing that the Google Cloud libraries generate with how to get the traces outside of the application process. If so, my focus is on the latter. 🙏
  3. Oof.
  4. Maybe? Might need more clarification on that for my understanding. But the North star here is that everything telemetry-related should be possible by only having a dependency on otel community libraries. Nothing google specific. Now if I end up getting google libraries in for google specific functionality for other reasons, that's okay. But in theory, by virtue of the community otel libraries and wire protocols & formats, I should be able to build and deploy an application that takes zero dependencies on google specific libraries and be able to get telemetry going. And that's also without having to include any sidecars in my deployments as well.

Pain Points

  1. I think if we apply point 2 from above, it might explain some of why what I'm mentioning is not lining up. At which point, you're totally right, and the way that the Google Cloud library does its own telemetry logging shouldn't depend on otel or anything. Just the MS abstraction. So yeah, all clear there if I'm describing it correctly now 🤝
  2. Obviated by item 1 I think?
  3. Yes! Although it's been a mysterious amount of time since I've seen any updates on this, from him or otherwise.
  4. Similar to items 1 and 2, I'm really only focused on getting telemetry from the standard MS abstraction out of the process using pure otel to a well-known endpoint visible from any process running on google cloud. Whether my component is a VM or a cloud run container, etc... Anything outside of that is indeed out of scope and up to each ecosystem to get to the point where it's producing standard otel wire data. But the goal for google should be getting in a position where it has a global ip or dns name that will catch otel standard telemetry and map it to the proprietary services.

Item 4 I think is a good summary. Thanks for sorting this all out with me!

@amanda-tarafa
Copy link
Contributor

@atrauzzi thanks for that. Let me clarify a few points and I think we'll be fully understanding ourselfs.

I think we can now merge summary and pain points:

  1. Good to know that we are on the same page here with standard .NET telemetry abstractions. I think the remaining confusion in "we won't be asking you to use any specific telemetry libraries" it's because when I said "we" I meant .NET Google Cloud client libraries, but you read, and rightly so, just Google. So, my point is just that the .NET Google Cloud client libraries themselves are not making you use anythings OTEL or otherwise specific. For the rest, we have point 3.
  2. Yes, the word "stored" in "all the information that the Google Cloud libraries have stored in, say, an Activity" was a poor choice on my part. I meant what you ultimately understood, "the telemetry information the Google Cloud libraries produce and make available for further consumption through .NET standard telemetry abstractions". But other than that confusion, it seems we are in agreement here as well.
  3. This is definetely your main pain point then, and that's the lack of OTLP support for storing your telemetry in Google Cloud. Here is where you need something Google specific. I agree with you that this is the current situation and that it is less than ideal. Again, I'm not in the best position to give updates here, I'll try to find more about it and see if there are updates to be shared.
  4. My point with 4 is that runtimes themselves (VMs, Cloud Run, etc.) act as producers of telemetry (describing the state of the runtime) and also as exporters of the telemetry they produce. The need for Google specific technology so that runtimes export their own produced telemetry is less of a pain point here, because that technology, as far as my experience goes, is provided by the runtime itself, so you don't have to do anything. But, if you wanted to use the runtime as a source of pure OTEL telemetry (to store telemetry in other than Google Cloud, for instance) you'd probably couldn't. From some of your comments in Advocate with cloud providers for pure OTLP endpoint implementations open-telemetry/community#984 I understood this to be a problem for you. But if it is not, then one less thing to worry about.

There's one last thing I want to clarify. This issue is about OTEL support in .NET Google Cloud Client libraries; this issue is not about OTLP Google support and it is not about OTEL support by Google runtimes. Basically, this issue covers points 1 and 2 from the summary but it does not cover points 3 and 4. This is just because this is the .NET Google Cloud client libraries repo, and it's what we on the .NET Google Cloud client libraries team can address ourselves. For points 3 and 4 (if it were to become a pain point), the responsabilities are spread across a wider group of teams. I will try to get some shareable updates for point 3, but I can't guarantee I will.

@amanda-tarafa
Copy link
Contributor

I have an update aound Google OTLP support. The expectation is that H2 2024 there should be OTLP support for trace and logs (not yet for metrics). But these are plans, there's not hard guarantees as plans may change at any time and without notice.

@atrauzzi
Copy link

Ah wonderful, that's great to hear!

Is the idea there that I'll be able to simply point my applications OTLP output to some hostname visible to my Cloud Run app and Google will give me some awesome stuff to see? 😆

Also, regarding your previous response -- 100% I think we're all clarified up there. I also want to thank you for your assistance getting that update. 🙏

@amanda-tarafa
Copy link
Contributor

Is the idea there that I'll be able to simply point my applications OTLP output to some hostname visible to my Cloud Run app and Google will give me some awesome stuff to see?

Yes, you'll configure your app through standard OTEL library to send telemetry to a Google URI, and then in Google Cloud Traces and Google Cloud Logging you'll see:

  • All the telemetry your own code is producing.
  • All the telemetry your OTEL enabled dependenies are producing (like .NET Google Cloud client libraries).
  • All the telemetry the runtime (Cloud Run, etc.) is producing (this happens already as far as my experience goes).

@atrauzzi
Copy link

atrauzzi commented Jan 12, 2024

image

@david-engelmann
Copy link

david-engelmann commented Jan 19, 2024

Is the idea there that I'll be able to simply point my applications OTLP output to some hostname visible to my Cloud Run app and Google will give me some awesome stuff to see?

Yes, you'll configure your app through standard OTEL library to send telemetry to a Google URI, and then in Google Cloud Traces and Google Cloud Logging you'll see:

  • All the telemetry your own code is producing.
  • All the telemetry your OTEL enabled dependenies are producing (like .NET Google Cloud client libraries).
  • All the telemetry the runtime (Cloud Run, etc.) is producing (this happens already as far as my experience goes).

@amanda-tarafa Do you know of any examples of this approach on Github that I could use as a reference?

@amanda-tarafa
Copy link
Contributor

@david-engelmann see this comment above:

I have an update aound Google OTLP support. The expectation is that H2 2024 there should be OTLP support for trace and logs (not yet for metrics). But these are plans, there's not hard guarantees as plans may change at any time and without notice.

So this is yet to be supported, so there are no examples. .NET Google Cloud libraries may be OTEL enabled sometime in the next couple of months, at that point will provide a couple of examples for configuring .NET libraries. For Google OTLP support examples you'll have to wait until at least H2 2024.

@amanda-tarafa amanda-tarafa added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p3 Desirable enhancement or fix. May not be included in next release. labels Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: cloudtrace Issues related to the Cloud Trace API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

7 participants