Arrow Exporter load prioritizer apparatus #178

jmacd · 2024-04-12T16:16:55Z

Adds a configurable prioritization scheme to the Arrow exporter, leaving the existing "FIFO" policy the default and adding a new "leastloadedN" policy which considers the least-loaded among N randomly selected streams.

Adds a README.md explaining the internals of the Arrow streaming component. Updates the original diagram.

Adds a benchmark that exercises the different prioritizers and demonstrates different performance characteristics.

This required a substantial redesign of the existing prioritization scheme, which had some unnecessary complexity. The new design:

Uses Context cancelation to implement the downgrade signal consistently across prioritizers
Simplifies stream-shutdown and restart by re-using the streamWorkState object across streams
Adds a drain() method to clean up after downgrade, as opposed to the formerly-complex synchronization used.

Extends most tests to cover all prioritizers.

Fixes #147.

…d/loadb

lquerel

The documentation and diagram on this significant addition of a new load balancing policy between streams have been very useful. However, I did not see the benchmark results. Were they displayed somewhere?

Thank you for this PR.

collector/exporter/otelarrowexporter/config_test.go

collector/exporter/otelarrowexporter/internal/arrow/bestofn.go

collector/exporter/otelarrowexporter/internal/arrow/fifo.go

collector/exporter/otelarrowexporter/internal/arrow/bestofn.go

jmacd · 2024-04-15T23:28:31Z

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter/internal/arrow
BenchmarkFifo4-10                 	     235	   4996351 ns/op
BenchmarkFifo8-10                 	     198	   6138465 ns/op
BenchmarkFifo16-10                	     152	   7088242 ns/op
BenchmarkFifo32-10                	      79	  13435187 ns/op
BenchmarkFifo64-10                	      48	  20970645 ns/op
BenchmarkFifo128-10               	      90	  12018160 ns/op
BenchmarkLeastLoadedTwo4-10       	     217	   5209409 ns/op
BenchmarkLeastLoadedTwo8-10       	     194	   6073483 ns/op
BenchmarkLeastLoadedTwo16-10      	     138	   8168686 ns/op
BenchmarkLeastLoadedTwo32-10      	      88	  13806171 ns/op
BenchmarkLeastLoadedTwo64-10      	      55	  20729386 ns/op
BenchmarkLeastLoadedTwo128-10     	     162	   7124523 ns/op
BenchmarkLeastLoadedFour4-10      	     198	   5327549 ns/op
BenchmarkLeastLoadedFour8-10      	     183	   6305196 ns/op
BenchmarkLeastLoadedFour16-10     	     134	   8196523 ns/op
BenchmarkLeastLoadedFour32-10     	      58	  17414805 ns/op
BenchmarkLeastLoadedFour64-10     	      74	  18093378 ns/op
BenchmarkLeastLoadedFour128-10    	     142	   7061255 ns/op
PASS

TL;DR the first numeric column is number of repetitions to meet the benchmark time threshold (where larger numbers == better), the second column is timing per repetition (smaller == better). There is a lot of synthetic aspect to this test, many additional channels created by the test apparatus--with a specific and unrealistic latency curve, but it's still meaningful. For the case of 128 streams, FIFO does 90 reps @ 12ms, LeastLoadedTwo does 162 reps @ 7.1ms, LeastLoadedFour does 142 reps @ 7.0ms

jmacd · 2024-04-16T14:45:53Z

@moh-osman3 Nice catch. The benchmarks improve. :-)

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter/internal/arrow
BenchmarkFifo4-10                 	     208	   4954205 ns/op
BenchmarkFifo8-10                 	     201	   5949269 ns/op
BenchmarkFifo16-10                	     152	   7282191 ns/op
BenchmarkFifo32-10                	      68	  15395150 ns/op
BenchmarkFifo64-10                	      50	  22943395 ns/op
BenchmarkFifo128-10               	     105	  11317888 ns/op
BenchmarkLeastLoadedTwo4-10       	     217	   5323038 ns/op
BenchmarkLeastLoadedTwo8-10       	     198	   5880872 ns/op
BenchmarkLeastLoadedTwo16-10      	     153	   6922781 ns/op
BenchmarkLeastLoadedTwo32-10      	     121	   8825361 ns/op
BenchmarkLeastLoadedTwo64-10      	     120	   9001932 ns/op
BenchmarkLeastLoadedTwo128-10     	     309	   3791125 ns/op
BenchmarkLeastLoadedFour4-10      	     192	   5625193 ns/op
BenchmarkLeastLoadedFour8-10      	     170	   6222414 ns/op
BenchmarkLeastLoadedFour16-10     	     146	   7166732 ns/op
BenchmarkLeastLoadedFour32-10     	     128	   8924360 ns/op
BenchmarkLeastLoadedFour64-10     	     122	   9992490 ns/op
BenchmarkLeastLoadedFour128-10    	     306	   3814781 ns/op

jmacd · 2024-04-16T15:04:52Z

@moh-osman3 Please take another look.

collector/exporter/otelarrowexporter/internal/arrow/stream.go

collector/exporter/otelarrowexporter/internal/arrow/exporter.go

Includes #178.

jmacd added 16 commits April 3, 2024 14:49

rebase

233e397

wip

c9e5dde

wip balance

da2d18d

tests w/ new balancer passing

a78ab13

almost

3b0dac0

passing tests w/ balancer

f618d4c

still debugging

d609504

rewrite fifo mechanism

25a97fa

add benchmark

0a62a22

wip

f09f022

wip

bba2fbd

wip

f8f4761

wip move cancel signal, add doneCancel

cf353a9

readme

6748b5b

comments

d9f44ed

more docs

0907adf

jmacd requested review from lquerel, moh-osman3 and codeboten as code owners April 12, 2024 16:16

jmacd added 5 commits April 12, 2024 09:22

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

3e556d9

…d/loadb

chlog

05f8c0d

restore ci/cd

fa31c8b

regen sums

1fd416c

remove dead file

a8bddd2

lquerel approved these changes Apr 12, 2024

View reviewed changes

moh-osman3 reviewed Apr 15, 2024

View reviewed changes

fix randomization

5b83660

remove error from nextWriter

802a912

comment that N > num_streams is meaningless

53f73cf

moh-osman3 approved these changes Apr 16, 2024

View reviewed changes

collector/exporter/otelarrowexporter/internal/arrow/stream.go Outdated Show resolved Hide resolved

collector/exporter/otelarrowexporter/internal/arrow/exporter.go Outdated Show resolved Hide resolved

jmacd added 2 commits April 16, 2024 12:12

remove cancelFunc

b956b21

from moh

1111935

jmacd merged commit 85b184e into open-telemetry:main Apr 16, 2024
2 checks passed

jmacd mentioned this pull request Apr 16, 2024

Release v0.22.0 #179

Merged

jmacd deleted the jmacd/loadb branch April 16, 2024 19:53

jmacd added a commit that referenced this pull request Apr 16, 2024

Release v0.22.0 (#179)

4fdd23c

Includes #178.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow Exporter load prioritizer apparatus #178

Arrow Exporter load prioritizer apparatus #178

jmacd commented Apr 12, 2024

lquerel left a comment

jmacd commented Apr 15, 2024

jmacd commented Apr 16, 2024

jmacd commented Apr 16, 2024

Arrow Exporter load prioritizer apparatus #178

Arrow Exporter load prioritizer apparatus #178

Conversation

jmacd commented Apr 12, 2024

lquerel left a comment

Choose a reason for hiding this comment

jmacd commented Apr 15, 2024

jmacd commented Apr 16, 2024

jmacd commented Apr 16, 2024