Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic periodic exporting metric_reader #1603

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

xuan-cao-swi
Copy link
Contributor

Description

I'd like to contribute on Periodic exporting MetricReader. I think the periodic metric_reader is useful for asynchronous instrument that collect metrics (e.g. cpu, memory) within fixed duration.

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale label Mar 24, 2024

def close
@continue = false # force termination in next iteration
@thread.join(5) # wait 5 seconds for collecting and exporting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5 seconds? Should it be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I take reference on what how Java handle the thread waiting (here). I don't see spec mention about the thread waiting time but it seems like it should be configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the source, Xuan. Does it make sense for the close method to use the @interval_millis value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like using @interval_millis in join as well. Updated.

@github-actions github-actions bot removed the stale label Mar 26, 2024
Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together, Xuan! I'm so glad you're making progress on the Metrics work.


def close
@continue = false # force termination in next iteration
@thread.join(5) # wait 5 seconds for collecting and exporting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the source, Xuan. Does it make sense for the close method to use the @interval_millis value?

module SDK
module Metrics
module Export
# PeriodicMetricReader provides a minimal example implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what else should be added to take this beyond "a minimal example implementation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think more advanced scheduling mechanisms like what Java has (e.g. queue system); or more strict thread like batch_span_processor you pointed out.

thread = lock do 
     @keep_running = false 
     @condition.signal 
     @thread 
   end 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that makes sense. Would you be open to creating an issue for the advanced scheduling mechanisms and any further changes you'd see to make this more similar to the batch_span_processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do. Issue logged here

OpenTelemetry.handle_error(exception: e, message: 'Fail to close PeriodicMetricReader.')
end

# TODO: determine correctness: directly kill the reader without waiting for next metrics collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any clues in the spec. Looking at our SDK, the BatchSpanProcessor also has some periodic functionality and calls force_flush during shutdown, skipping the periodic wait.

Maybe that's the right call here too?

# Shuts the consumer thread down and flushes the current accumulated buffer
# will block until the thread is finished.
#
# @param [optional Numeric] timeout An optional timeout in seconds.
# @return [Integer] SUCCESS if no error occurred, FAILURE if a
# non-specific failure occurred, TIMEOUT if a timeout occurred.
def shutdown(timeout: nil)
start_time = OpenTelemetry::Common::Utilities.timeout_timestamp
thread = lock do
@keep_running = false
@condition.signal
@thread
end
thread&.join(timeout)
force_flush(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time))
dropped_spans = lock { spans.shift(spans.length) }
report_dropped_spans(dropped_spans, reason: 'terminating') if dropped_spans.any?
@exporter.shutdown(timeout: OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time))
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated.

counter.add(3, attributes: { 'b' => 'c' })
counter.add(4, attributes: { 'd' => 'e' })

sleep(8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested in an alternative to calling sleep, New Relic uses these methods to advance, freeze, and unfreeze time in our tests. It allows us to get the benefits of advancing time without slowing down the tests. We mostly use the Process-related methods now.

Incorporating timecop could be another option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example, and I have tried both new relic method and the timecop.
Not sure If I setup them correctly (with timecop/the code block), it didn't change the test time.

P.S. I think it would be great to have New Relic uses these methods to advance, freeze, and unfreeze time in our tests in test_helpers gem

xuan-cao-swi and others added 7 commits April 5, 2024 13:19
…ic_reader.rb

Co-authored-by: Kayla Reopelle (she/her) <87386821+kaylareopelle@users.noreply.github.com>
…ic_reader.rb

Co-authored-by: Kayla Reopelle (she/her) <87386821+kaylareopelle@users.noreply.github.com>
…ic_reader.rb

Co-authored-by: Kayla Reopelle (she/her) <87386821+kaylareopelle@users.noreply.github.com>
Co-authored-by: Kayla Reopelle (she/her) <87386821+kaylareopelle@users.noreply.github.com>
exporter: nil)
super()

@interval_millis = interval_millis

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does _millis mean? I thought "milliseconds" but after experimenting with this, it seems to be seconds, not milliseconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the name is indeed confusing. It should be seconds. I have updated the name to export_interval and export_timeout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! :)

@cdan-youdo
Copy link

What is needed for this PR to get merged?

module Export
# PeriodicMetricReader provides a minimal example implementation.
class PeriodicMetricReader < MetricReader
def initialize(export_interval: ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, the OTEL_METRIC_EXPORT_INTERVAL and OTEL_METRIC_EXPORT_TIMEOUT env variables should be in milliseconds. See https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader.

And according to https://opentelemetry.io/docs/specs/otel/metrics/sdk/#periodic-exporting-metricreader, the parameters should be named export_interval_millis and export_timeout_millis.

Furthermore, I think we should use Float for intervals, timeouts etc. same as in BatchSpanProcessor and other processors.

export_interval_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_INTERVAL', 60_000)),
export_timeout_millis: Float(ENV.fetch('OTEL_METRIC_EXPORT_TIMEOUT', 30_000)),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and reference on batch span processor. Updated.

while @continue
sleep(@export_interval)
begin
Timeout.timeout(@export_timeout) { @exporter.export(collect) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also pass timeout: @export_timeout to the @exporter?
Same as here

result_code = @export_mutex.synchronize { @exporter.export(batch, timeout: timeout) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Timeout.timeout(@export_timeout) { ... } will serve the purpose of timeout exporter if the exporter doesn't behave correctly (e.g. hanging).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was just saying maybe we should pass the timeout also to the exporter to give it a chance to behave properly. We still keep the surrounding Timeout.timeout(@export_timeout) { ... } block

begin
Timeout.timeout(@export_timeout) { @exporter.export(collect) }
rescue Timeout::Error => e
OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OpenTelemetry.handle_error(exception: e, message: 'PeriodicMetricReader timeout.') evaluating to FAILURE ? if not then we should return FAILURE explicitly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think metric_reader shouldn't return anything (similar to metric_reader); only exporter will return either fail or success from export

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you are absolutely right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants