Skip to content
This repository has been archived by the owner on Feb 15, 2022. It is now read-only.

Latest commit

 

History

History
61 lines (32 loc) · 3.16 KB

trace_tuning.md

File metadata and controls

61 lines (32 loc) · 3.16 KB

Scaling and Tuning

Data Prepper for Trace Analytics in v0.8.x supports both vertical and horizontal scaling.

To scale vertically, simply adjust the size of your single Data Prepper instance to meet your workload's demands.

To scale horizontally, deploy multiple Data Prepper instances to form a cluster by using the Peer Forwarder plugin. This plugin enables Data Preppers to communicate with others in the cluster and is required for horizontally-scaling deployments.

Scaling Tips

We would like to provide the users with some useful tips for scaling. We recommend the users to modify parameters based on their requirements. Also, monitor the Data Prepper host metrics and Elasticsearch metrics to ensure the configuration is working as expected.

Buffer

The total number of trace requests that Data Prepper is processing is equal to sum of buffer_size in otel-trace-pipeline and raw-trace-pipeline.

The total number of trace requests inflight to Elasticsearch is equal to the product of batch_size and workers in raw-trace-pipeline.

Our recommendation is that

  • have same buffer_size in otel-trace-pipeline and raw-trace-pipeline
  • buffer_size >= workers * batch_size in the raw-trace-pipeline

Workers

The workers setting determines the number of threads that will be used by Data Prepper to process requests from the buffer.

Our recommendation is that set the workers based on the CPU utilization, this value can be higher than available processors as the Data Prepper spends significant I/O time in sending data to elasticsearch.

Heap

You can configure the heap of Data Prepper by setting the JVM_OPTS environmental variable.

Our recommendation is that set the heap value should be minimum 4 * batch_size * otel_send_batch_size * maximum size of indvidual span.

As mentioned in the setup, set otel_send_batch_size as 50 in your opentelemetry collector configuration.

Disk

Data Prepper uses disk to store metadata required for service-map processing, we store only key fields traceId, spanId, parentSpanId, spanKind, spanName and serviceName. The service-map plugin ensures it only stores two files with each storing window_duration seconds of data. In our tests we found that for a throughput of 3000 spans/second, the total disk usages was 4 MB.

Data Prepper uses the disk to write logs. In the current version, you can redirect the logs to the path of your preference.

AWS

AWS EC2 Cloudformation template provides user-friendly mechanism to configure the above scaling attributes.

Kubernetes config files and EKS config files are available to configure these attributes in a cluster deployment.

Benchmark

We ran tests in a r5.xlarge with the below configuration,

  • buffer_size : 4096
  • batch_size : 256
  • workers : 8
  • Heap : 10GB

The above setup was able to handle a throughput of 2100 spans/second at 20 percent CPU utilization.