New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support LogProcessors chaining through mutable reference #1726
base: main
Are you sure you want to change the base?
Conversation
…try-rust into log-processor-optimize
Also the result of stress test. There is perf improvement as we no longer do clone in log_emitter.rs. Existing:
PR:
will compare and add bench too. |
Benchmark
|
Thanks for looking into this! The perf win is amazing! Would the below statements be accurate?
Robert's suggestion of offering the ability to run independent pipelines (one pipeline with redaction processor exporting redacted data, and another pipeline exporting without redaction is possible with this change also, but it requires cloning to ensure the changes are only affecting that pipeline only) |
Nicely summarised the changes :) Yes to all. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1726 +/- ##
=====================================
Coverage 73.6% 73.6%
=====================================
Files 124 124
Lines 19517 19612 +95
=====================================
+ Hits 14377 14447 +70
- Misses 5140 5165 +25 ☔ View full report in Codecov by Sentry. |
Have added unit-test to validate LogProcessor chaining and updating of LogData. This is ready for review now. Also added changelog and benchmark in the description. |
|
||
let mut data = LogData { | ||
record: log_record, | ||
instrumentation: self.instrumentation_library().clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utpilla pointed out that instrumentation data shouldn't be allowed to be modified within processors. Will create a separate issue to track that.
//TODO :avoid cloning when logdata is borrowed? | ||
let owned_batch = batch | ||
.iter() | ||
.map(|&log_data| log_data.clone()) // Converts Cow to owned LogData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no 🐮 now 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, had to bring her
back in latest commit. Too much milking needed in batchprocessor without that.
/// Exports a batch of [`LogData`]. | ||
async fn export(&mut self, batch: Vec<LogData>) -> LogResult<()>; | ||
/// Exports a batch of reference to `LogData`. | ||
async fn export<'a>(&mut self, batch: &'a [&'a LogData]) -> LogResult<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the exported type a slice of ref to LogData? Previously, it was a concrete type Vec
. Please call it out in changelog.
(I agree with this change, just making sure the change is aptly reflected in changelog)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to revert the slice changes as they complicated the handling in the batch processor. Previously, the batch processor directly moved the logs from the queue to the export method. Now, we pass a reference to the logs, track which logs have been exported, and then delete them from the queue. Since the queue can receive new logs while the export is happening, it is necessary to keep track of the logs that were exported. Also, didn't see any improvement in stress and batch tests with the changes.
Fixes #1650
As discussed here, this PR modifies LogProcessor to take mutable reference to LogData, and further pass it as Owned/Borrowed type to the Exporter.
Existing:
This PR:
LogData
before enqueueing them and passes(i.e moves) this cloned data to the exporterLogData
directly to the exporter as borrowed type, without cloning.LogData
references and process them accordingly.- Enables passing modified LogData between processors in a chain.
- Allows LogData to be passed from the SDK to the exporter without cloning, such as with
SimpleLogProcessor
and for exporters likeopentelemetry-user-events-logs
andopentelemetry-etw-logs
that use areentrant log processor
.This also adds certain constraints for implementing custom exporters and log-processors, else their compilation will fail:
Exporters:
The lifetime parameter
'a
inVec<Cow<'a, LogData>>
ensures that the exporter do not retain references to the logdata beyond its intended lifespan. If there is a need to modify or store LogData beyond the duration of the export call, the data must be cloned.Asynchronous Log-Processors:
When a LogProcessor processes
LogData
asynchronously, such as in the case ofBatchLogProcessor
, there is a risk that the data may outlive the scope from which it was passed. To prevent any issues, LogData should always be cloned before any asynchronous processing begins.Synchronous Log Processors:
For synchronous processing within a LogProcessor, as in
SimpleLogProcessor
, it is important to ensure that references to LogData are contained within the bounds of the emit function. TheLogData
should be passed as a reference if all processing is completed within the emit function.Benchmark:
with PR: full-log-with-attributes/with-context: 317.76 ns
main repo:full-log-with-attributes/with-context: 431.24 ns
Stress test
With PR:
Number of threads: 20
Throughput: 40,814,600 iterations/sec
Throughput: 40,845,400 iterations/sec
Throughput: 40,541,400 iterations/sec
Throughput: 40,676,800 iterations/sec
Throughput: 35,816,800 iterations/sec
main repo:
Number of threads: 20
Throughput: 26,506,800 iterations/sec
Throughput: 27,819,200 iterations/sec
Throughput: 28,112,400 iterations/sec
Throughput: 27,798,000 iterations/sec
Changes
Please provide a brief description of the changes here.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes