Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Exporter load prioritizer apparatus #178

Merged
merged 26 commits into from Apr 16, 2024
Merged

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Apr 12, 2024

Adds a configurable prioritization scheme to the Arrow exporter, leaving the existing "FIFO" policy the default and adding a new "leastloadedN" policy which considers the least-loaded among N randomly selected streams.

Adds a README.md explaining the internals of the Arrow streaming component. Updates the original diagram.

Adds a benchmark that exercises the different prioritizers and demonstrates different performance characteristics.

This required a substantial redesign of the existing prioritization scheme, which had some unnecessary complexity. The new design:

  1. Uses Context cancelation to implement the downgrade signal consistently across prioritizers
  2. Simplifies stream-shutdown and restart by re-using the streamWorkState object across streams
  3. Adds a drain() method to clean up after downgrade, as opposed to the formerly-complex synchronization used.

Extends most tests to cover all prioritizers.

Fixes #147.

Copy link
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation and diagram on this significant addition of a new load balancing policy between streams have been very useful. However, I did not see the benchmark results. Were they displayed somewhere?

Thank you for this PR.

@jmacd
Copy link
Contributor Author

jmacd commented Apr 15, 2024

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter/internal/arrow
BenchmarkFifo4-10                 	     235	   4996351 ns/op
BenchmarkFifo8-10                 	     198	   6138465 ns/op
BenchmarkFifo16-10                	     152	   7088242 ns/op
BenchmarkFifo32-10                	      79	  13435187 ns/op
BenchmarkFifo64-10                	      48	  20970645 ns/op
BenchmarkFifo128-10               	      90	  12018160 ns/op
BenchmarkLeastLoadedTwo4-10       	     217	   5209409 ns/op
BenchmarkLeastLoadedTwo8-10       	     194	   6073483 ns/op
BenchmarkLeastLoadedTwo16-10      	     138	   8168686 ns/op
BenchmarkLeastLoadedTwo32-10      	      88	  13806171 ns/op
BenchmarkLeastLoadedTwo64-10      	      55	  20729386 ns/op
BenchmarkLeastLoadedTwo128-10     	     162	   7124523 ns/op
BenchmarkLeastLoadedFour4-10      	     198	   5327549 ns/op
BenchmarkLeastLoadedFour8-10      	     183	   6305196 ns/op
BenchmarkLeastLoadedFour16-10     	     134	   8196523 ns/op
BenchmarkLeastLoadedFour32-10     	      58	  17414805 ns/op
BenchmarkLeastLoadedFour64-10     	      74	  18093378 ns/op
BenchmarkLeastLoadedFour128-10    	     142	   7061255 ns/op
PASS

TL;DR the first numeric column is number of repetitions to meet the benchmark time threshold (where larger numbers == better), the second column is timing per repetition (smaller == better). There is a lot of synthetic aspect to this test, many additional channels created by the test apparatus--with a specific and unrealistic latency curve, but it's still meaningful. For the case of 128 streams, FIFO does 90 reps @ 12ms, LeastLoadedTwo does 162 reps @ 7.1ms, LeastLoadedFour does 142 reps @ 7.0ms

@jmacd
Copy link
Contributor Author

jmacd commented Apr 16, 2024

@moh-osman3 Nice catch. The benchmarks improve. :-)

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter/internal/arrow
BenchmarkFifo4-10                 	     208	   4954205 ns/op
BenchmarkFifo8-10                 	     201	   5949269 ns/op
BenchmarkFifo16-10                	     152	   7282191 ns/op
BenchmarkFifo32-10                	      68	  15395150 ns/op
BenchmarkFifo64-10                	      50	  22943395 ns/op
BenchmarkFifo128-10               	     105	  11317888 ns/op
BenchmarkLeastLoadedTwo4-10       	     217	   5323038 ns/op
BenchmarkLeastLoadedTwo8-10       	     198	   5880872 ns/op
BenchmarkLeastLoadedTwo16-10      	     153	   6922781 ns/op
BenchmarkLeastLoadedTwo32-10      	     121	   8825361 ns/op
BenchmarkLeastLoadedTwo64-10      	     120	   9001932 ns/op
BenchmarkLeastLoadedTwo128-10     	     309	   3791125 ns/op
BenchmarkLeastLoadedFour4-10      	     192	   5625193 ns/op
BenchmarkLeastLoadedFour8-10      	     170	   6222414 ns/op
BenchmarkLeastLoadedFour16-10     	     146	   7166732 ns/op
BenchmarkLeastLoadedFour32-10     	     128	   8924360 ns/op
BenchmarkLeastLoadedFour64-10     	     122	   9992490 ns/op
BenchmarkLeastLoadedFour128-10    	     306	   3814781 ns/op

@jmacd
Copy link
Contributor Author

jmacd commented Apr 16, 2024

@moh-osman3 Please take another look.

@jmacd jmacd merged commit 85b184e into open-telemetry:main Apr 16, 2024
2 checks passed
@jmacd jmacd mentioned this pull request Apr 16, 2024
@jmacd jmacd deleted the jmacd/loadb branch April 16, 2024 19:53
jmacd added a commit that referenced this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add load balancing logic to Arrow exporter
3 participants