Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3987 [Java] Concurrency improvement for ReflectData FieldAccessors #2900

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ashley-taylor
Copy link

What is the purpose of the change

Improve the performance by removing synchronisation calls.

Running the ReflectRecordTest with 8 threads on a M2 Macbook pro

before change

    Benchmark                  Mode  Cnt        Score        Error  Units
    ReflectRecordTest.decode  thrpt    3  4,157,551.230 ± 831123.819  ops/s
    ReflectRecordTest.encode  thrpt    3  4,926,092.511 ± 455798.533  ops/s

after

    Benchmark                  Mode  Cnt         Score         Error  Units
    ReflectRecordTest.decode  thrpt    3  12,221,568.529 ± 1308392.259  ops/s
    ReflectRecordTest.encode  thrpt    3  12,569,201.802 ± 1898124.234  ops/s

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

org.apache.avo.reflect already tests this code

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the Java Pull Requests for Java binding label May 8, 2024
@ashley-taylor
Copy link
Author

to recreate the benchmark, you need to change the threads value here to be greater than 1

.warmupIterations(warmupIterations).measurementIterations(measurementIterations).forks(1).threads(1)

bySchema.put(schema, result);
this.bySchema = bySchema;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without synchronization this.bySchema could be modified several times by now. Would that be a problem ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the object is derived It won't cause an issue. There could be duplicated generation across threads. Causing some wasted compute at the start of processing. But since this.bySchema is always replaced with a new map with no future modifications concurrent read are safe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a Java programmer but the this.bySchema = bySchema assignment seems OK, assuming that the volatile causes a memory barrier that ensures the effects of bySchema.put(schema, result) will be seen by other threads that read this.bySchema after the assignment.

However, I wonder about bySchema.get(schema). Is WeakHashMap.get formally documented to be safe for concurrent calls? Each time WeakHashMap.get is called, WeakHashMap.expungeStaleEntries can delete some entries, and although WeakHashMap.expungeStaleEntries synchronizes those deletions against each other, it doesn't seem to synchronize them against any reads that WeakHashMap.get may be doing on other threads.

@ashley-taylor
Copy link
Author

@martin-g anything you need to get this merged?

@martin-g
Copy link
Member

@martin-g anything you need to get this merged?

I'll leave it to a Java developer because I am not that familiar with the Java SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
4 participants