Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk-trace-base): add diagnostic logging when spans are dropped #3610

Merged
merged 3 commits into from
Feb 22, 2023

Conversation

neoeinstein
Copy link
Contributor

@neoeinstein neoeinstein commented Feb 13, 2023

Which problem is this PR solving?

The current JS SDK does not provide any sort of indication that the batch span processor is dropping spans, resulting in difficult to identify missing spans.

Short description of the changes

Adds diagnostic logging when spans begin and stop being dropped. When spans are being dropped, a debug diagnostic is logged indicating that spans are now being dropped, and a count is maintained. When a span is received that is successfully enqueued, the number of spans that were dropped between successful enqueue attempts is reported as a warning diagnostic and the count is reset to 0.

An alternate would be to add a private event emitter that would emit an event any time a span were dropped, allowing the consumer to subscribe and perform whatever action they deemed appropriate in reaction. However, this would require changing the publicly exposed API, so the diagnostic logging mechanism is utilized instead.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

No new tests added, but verified that existing tests continue to pass with expected behavior. This change only adds diagnostic logging with minimal logic to prevent flooding diagnostic logs, which is not externally visible.

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@neoeinstein neoeinstein requested a review from a team as a code owner February 13, 2023 17:20
@codecov
Copy link

codecov bot commented Feb 13, 2023

Codecov Report

Merging #3610 (d9bd8a6) into main (47444f2) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head d9bd8a6 differs from pull request most recent head 008255b. Consider uploading reports for the commit 008255b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3610      +/-   ##
==========================================
+ Coverage   94.01%   94.03%   +0.01%     
==========================================
  Files         268      268              
  Lines        7920     7927       +7     
  Branches     1641     1643       +2     
==========================================
+ Hits         7446     7454       +8     
+ Misses        474      473       -1     
Impacted Files Coverage Δ
...dk-trace-base/src/export/BatchSpanProcessorBase.ts 92.79% <100.00%> (+0.48%) ⬆️
...-trace-base/src/platform/node/RandomIdGenerator.ts 93.75% <0.00%> (+6.25%) ⬆️

@Flarna
Copy link
Member

Flarna commented Feb 13, 2023

Is it possible to add a test for this?

@neoeinstein
Copy link
Contributor Author

Updated with a commit that includes a test for this behavior.

Comment on lines +122 to +125
if (this._droppedSpansCount === 0) {
diag.debug('maxQueueSize reached, dropping spans');
}
this._droppedSpansCount++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very chatty in a system overloaded enough to be dropping spans. Do you think it's needed when the warn log on line 132 is already doing this job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a given stretch of dropped spans, this message will only be reported once, just like the summary warning, so should have the same cardinality as that one. This debug diagnostic can be used to identify at what time those drops started. It's recorded as debug, which should be able to be filtered out if debug diagnostics aren't interesting or desired.

I found it useful as there are two knobs that can be adjusted to prevent dropping spans: queue size and the export interval. The warning message makes it easier to adjust the queue size, but the debug message can make it apparent how often one should be exporting if they don't want to carry a larger memory load associated with a larger queue.

I'm not particularly attached to this message, so if we want it removed, that's fine with me.

@legendecas legendecas merged commit 86973eb into open-telemetry:main Feb 22, 2023
@neoeinstein neoeinstein deleted the dropped-span-reporting branch February 28, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants