-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/telemetry: track tracer init time metric #1896
Conversation
…, so don't call unless the message is gauranteed to be flushed
… into evan.li/time-gauge-2
|
||
// copy over requests so we can do the actual submission without holding | ||
// the lock. Zero out the old stuff so we don't leak references | ||
for i, r := range c.requests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to populate the submissions slice before adding metrics events so that the sequence id's are in the right order. Sequence id represents the order in which telemetry events happened. newRequest
increments the client's sequence id, and newRequest
for metrics is only called inside the flush function.
system tests were failing because the submissions array had out-of-order sequence id's, for example:
[6, 1, 2, 3, 4, 5]
.
Telemetry system tests does not expect the events to arrive in a strict order, but it will assert that out-of-order events arrive within a certain timeframe of each other (0.3 seconds), and in this example event 6 was arriving way before event 5.
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
if r.Header.Get("DD-Telemetry-Request-Type") == string(RequestTypeAppClosing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't rely on app-closing to flush the metrics anymore, since app-closing is sent before metrics. Now we'll just hard code the expected metric messages and wait until we've hit that count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and comments here and there, looking pretty good!
internal/telemetry/client.go
Outdated
@@ -144,6 +144,8 @@ type client struct { | |||
// metrics are sent | |||
metrics map[Namespace]map[string]*metric | |||
newMetrics bool | |||
// flush interval incremented every time flush() is called | |||
interval int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere? It looks like it's always 0
in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed - I originally had this when I interpreted interval
for Gauge
to mean the flush interval - will remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add interval
back somewhere appropriate when we discover a good use case for Gauge
and Rate
type metrics.
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
… into evan.li/time-gauge-2
Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>
… into evan.li/time-gauge-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there! just two small comments
client.Record(NamespaceTracers, MetricKindDist, "soobar", 1, nil, false) | ||
client.Record(NamespaceTracers, MetricKindDist, "soobar", 3, nil, false) | ||
client.mu.Lock() | ||
client.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't stop trigger a flush? So is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant to remove client.stop()
here, thanks for pointing this out!
I wanted to replace stop
with a direct call to flush
so this unit test doesn't rely on the functionality of stop
. Implementation of stop
may change in the future because of our conversation about only sending app-closing
when SIGINT
and SIGQUIT
are intercepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed QA notes!
…ch as queuename tags contrib: upgrade labstack/echo/v4 from v4.2.0 to v4.9.0 (#1891) ci: fix flaky lint job (#1892) contrib/elasticsearch: use naming schema (#1897) ci: introduce golangci (#1898) appsec: suspicious request blocking (#1797) Co-authored-by: Julio Guerra <julio@datadog.com> ci/golangci-lint: skip google.golang.org/grpc.v12 (#1899) .github/workflows: run ASM and RC system-tests scenarios (#1900) contrib/hashicorp/vault: use naming schema (#1868) contrib/database/sql: add WithIgnoreQueryTypes option (#1823) Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com> Co-authored-by: Rodrigo Argüello <rodrigo.arguello@datadoghq.com> contrib/database/sql: use naming schema (#1895) internal/appsec: add server.request.method address (#1893) Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com> Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com> internal/appsec/dyngo: atomic instrumentation swapping (#1873) Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com> go.mod: datadog-agent/pkg/remoteconfig/state 7.45.0-rc.1 (#1902) internal/version: bump to v1.51.0 (#1912) ddtrace/tracer: don't set empty tracestate propagation tag (#1910) go.mod: github.com/DataDog/datadog-agent/pkg/obfuscate 7.45.0-rc.1 (#1916) appsec: add blocking SDK body operation (#1901) * Modifying the appsec api: adding appsec.MonitorParsedHTTPBody an error as return value * Adding a call to the WAF to check for security event synchronously with a call to appsec.MonitorParsedHTTPBody on the body passed as parameter * Removing the call to the WAF done on the body an the end of a request because we moved it. * Refactoring the waf addresses storage and access Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com> ddtrace/{opentelemetry,opentracer}: add telemetry (#1909) internal/appsec: fix user ID event detection (#1918) internal/telemetry: track tracer init time metric (#1896) Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com> internal/appsec/remoteconfig: fix rules overrides (#1921) contrib/mongodb: use naming schema (#1908) contrib/syndtr/goleveldb/leveldb: use naming schema (#1914) contrib/tidwall/buntdb: use naming schema (#1913) internal/appsec: do not ignore the appsec events rate limiter (#1927) remoteconfig: remove empty products and don't override appsec rules data (#1925) contrib/kafka: refactor tests (#1907) contrib/google.golang.org/grpc: use naming schema (#1919) contrib/twitchtv/twirp: use naming schema (#1920) contrib/http: use naming schema (#1929) ddtrace/tracer: reset decision maker during fallback behavior of w3c header extraction (#1933) contrib/cassandra: use naming schema (#1911) Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com> contrib/redis: use naming schema (#1906) Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com> ci/system-tests: more scenarios with parallel jobs (#1938) ci: update linter job and add bodyclose (#1942) contrib/redis/go-redis.v9: support v9 (#1730) Add support for new go-redis version v9. It does 2 things: Copy existing version 8 files to a new path, /redis/go-redis.v9. Make changes to support version 9. Fixes #1710 format and rerun go tidy get rid of prints add topLevelRegion assertions remove confusing named return values and todo comment ddtrace/tracer: ensure access to trace tags is concurrency-safe (#1948) Spancontext marshaling was accessing tracer internal structures without a lock, resulting in a data race and panic. This commit adds a few methods to trace to allow safe access to the tags and propagatingTags members of trace to the marshaling code. Fixes #1944 ddtrace/tracer: mark context updated when SetUser is called (#1949) Fixes a minor logic mistake when setting a user on a span lint and add default switch case refactor resourceNameKey and value assignments restructure functions to be left aligned use internal logger, be less verbose with function names go back to normal switch type and format Set keyTraceID128 on first span in the chunk only (#1946) go.mod: upgrade go-libddwaf to v1.2.0 (#1953) Co-authored-by: Julio Guerra <julio@datadog.com> contrib/database/sql: fix bug where options were always overwritten by register options (#1904) Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com> ci/smoke-tests: update the go.sum file after go get -u (#1957) contrib/net/http: don't set empty string values as span tags (#1956) Do not set span fields when they are not configured so the tracer can put the defaults in. use normal string then derefence rever go.mod and go.sum changes contrib/internal/httptrace: remove naming schema from init (#1960) contrib/graphql: use naming schema (#1926) internal/telemetry: trim the dependencies version prefix v (#1963) contrib/aws: use naming schema (#1931) contrib/cloud.google.com/go/pubsub.v1: use naming schema (#1937) go mod tidy lint and fix test
What does this PR do?
Implements
TimeGauge
telemetry function andDistributions
telemetry event to track tracer init time via thedd.instrumentation_telemetry_data.tracers.tracer_init_time
metric.Motivation
Describe how to test/QA your changes
Start a sample app with tracing and turn on the
DD_INSTRUMENTATION_TELEMETRY_DEBUG
flag.dd.instrumentation_telemetry_data.tracers.tracer_init_time
in the staging metrics explorer.service:tracer-telemetry-intake
@payload_dog_tag.request_type:distributions
in staging logsReviewer's Checklist