Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOUringEventLoop nextWakeupNanos #148

Open
pveentjer opened this issue Mar 1, 2022 · 9 comments
Open

IOUringEventLoop nextWakeupNanos #148

pveentjer opened this issue Mar 1, 2022 · 9 comments

Comments

@pveentjer
Copy link
Contributor

pveentjer commented Mar 1, 2022

In the IOUringEventLoop the nextWakeupNanos is updated twice:

nextWakeupNanos.set(curDeadlineNanos);
...
if (nextWakeupNanos.get() == AWAKE || nextWakeupNanos.getAndSet(AWAKE) == AWAKE){  
   pendingWakeup = true;
}

The nextWakeupNanos is an AtomicLong and the set and getAndSet are not very cheap. On the X86 the trigger a memory fence which effectively causes the CPU to stop issuing loads till the store buffer has been drained. And this will not benefit performance.

Based on the code it seems that the nextWakeupNanos is not used by any other thread. Perhaps I'm missing something since I didn't study the code in-depth, but I think the nextWakeupNanos could be converted to a simple plain long.

THe IOUringSubmissionQueue already has the appropriate fences.

@franz1981
Copy link
Contributor

franz1981 commented Mar 1, 2022

Hi @pveentjer : nextWakeupNanos is used by

from outside the event loop (ie from another thread) if inEventLoop == false

I believe this could be changed in order to just use a plain get somewhere, but I believe using eventFd to wake up the EL is such costy that we rather prefer to pay the full barrier cost.

on https://www.scylladb.com/2018/02/15/memory-barriers-seastar-linux/ I see that there are ways to save a(n additional) full barrier there, so happy to review any PR to improve this

@pveentjer
Copy link
Contributor Author

Ok. In that case we need something more powerful than a plain long. But I think we can get rid of the fences. So a get/set opaque should be sufficient since there are no memory ordering effects required.

@pveentjer
Copy link
Contributor Author

pveentjer commented Mar 1, 2022

So the following should be sufficient for the first set:

nextWakeupNanos.lazySet(curDeadlineNanos);

It will provide a [StoreStore][LoadStore] which on the X86 you get for free since all stores are release stores. So it is just a compiler restriction.

setOpaque would be more appropriate, but not available on Java 8.

Need to think about the getAndSet.

@pveentjer
Copy link
Contributor Author

pveentjer commented Mar 1, 2022

With Scylla the mem barrier is removed for task exchange between threads. Memory ordering matters in that case. But afaik for the nextWakeupNanos memory ordering is irrelevant. So I don't see a good reason why a price needs to be paid for memory ordering if it is irrelevant.

@normanmaurer
Copy link
Member

@pveentjer PRs welcome

@pveentjer
Copy link
Contributor Author

I would be more than happy to make a PR :) But before I do, I want to make sure my understanding is correct.

Assumption: the memory ordering effects of nextWakeupNanos are irrelevant. We just need to make sure that the compiler doesn't optimize out the load or store.

So is this assumption correct?

@njhill
Copy link
Member

njhill commented Mar 1, 2022

Hey thanks @pveentjer. The IOUringEventLoop logic was derived from EpollEventLoop, and in an earlier iteration of that I did actually used lazySet here following similar reasoning to yours.

However I changed it to set after realizing that ordering is in fact important relative to the task queue submission. Otherwise I think a race / missed wake-up could be possible.

But it's possible I was wrong about this, I'm sure your understanding of the memory order modes is better than mine!

@franz1981
Copy link
Contributor

franz1981 commented Mar 1, 2022

@njhill
Oooops I didn't notice you pinged me on netty/netty#9561 (comment) :( I will use this chance to help @pveentjer to find why/if it hasn't worked

@njhill
Copy link
Member

njhill commented Mar 1, 2022

@franz1981 np I think it was just for info and it was quite a long time ago now!

@pveentjer you also probably noticed we guard the getAndSet with a volatile read to avoid its cost in most cases. If you think an AtomicInteger would be significantly cheaper than AtomicLong we may be able to do find a way to use that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants