Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC =act Make use of externalSubmit to yield. #666

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

He-Pin
Copy link
Member

@He-Pin He-Pin commented Sep 20, 2023

refs: #661

Some previous work done by @jrudolph years ago?

The Current VirtualThrad in JDK 21 making use of the externalSubmit

if (s == YIELDING) {   // Thread.yield
            setState(RUNNABLE);

            // notify JVMTI that unmount has completed, thread is runnable
            notifyJvmtiUnmount(/*hide*/false);

            // external submit if there are no tasks in the local task queue
            if (currentThread() instanceof CarrierThread ct && ct.getQueuedTaskCount() == 0) {
                externalSubmitRunContinuation(ct.getPool());
            } else {
                submitRunContinuation();
            }
        }

I think this change will affect the performance.

I think a better way should be explicit yield after some run, just as the cede of CE, which @armanbilge metioned.

if (JavaVersion.majorVersion >= 17)
pending
// if (JavaVersion.majorVersion >= 17)
// pending
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it locally.

@jrudolph
Copy link
Contributor

Thanks for reviving this experiment.

In any case, this would need very close auditing of performance consequences (which may be different under various JDKs). It was quite a contentious issue originally whether this should be fixed at all.

Without looking into the current situation, I'd say, let's not fix anything here without very good reason that something is broken in realistic cases. It seems likely that a fix like this will negatively affect throughput for everyone while improving fairness only in very specific cases. If we want to play with this, we would have to introduce a config setting that is off by default so people could try it in various circumstances.

@He-Pin
Copy link
Member Author

He-Pin commented Sep 21, 2023

@jrudolph Yes, I think the current VirtualThread has the same problem, which only yield to the submissionQueue when there is no works to do, and VirutalThread just do lazySubmit for most the time.

It very hard to make things right, the code changes from JDK release to release, and the current impl will hurt performance.

refs: typelevel/cats-effect#3070
refs: https://github.com/typelevel/cats-effect/releases

I asked @armanbilge , he said the WSTP can be used outside of CE too.

@pjfanning
Copy link
Contributor

Anyone know if there is anything in the sbt ecosystem to support Multi-Release jars. That would allow us to have tailored versions of some classes that are used when you use a particular JDK/JRE version at runtime. This would be more efficient than having one version of the code and checking the JDK/JRE version using if stataments.

@He-Pin
Copy link
Member Author

He-Pin commented Sep 21, 2023

@pjfanning I asked @Glavo once for this and she said the toolchian is not very well, the reactor-core is using multi-release jar.

@mdedetrich
Copy link
Contributor

mdedetrich commented Sep 21, 2023

Anyone know if there is anything in the sbt ecosystem to support Multi-Release jars. That would allow us to have tailored versions of some classes that are used when you use a particular JDK/JRE version at runtime. This would be more efficient than having one version of the code and checking the JDK/JRE version using if stataments.

Its a request feature that is yet to be implemented, see sbt/sbt#7174 (comment). I was thinking of working on this at some point but there are already so many OS things that I am working on flying around atm however I can re-prioritize to look into it (it does seem we will need this sooner or later). Having multi-release jar's will definitely make a lot of this stuff much cleaner.

I don't think it should be too hard, I think SBT already has functionality to detect multiple JDK's, what needs to be fleshed out is how the matrix will work, i.e. will we have scala-java8 folders for sources only compiled for JDK 8 and will stuff like scala-2-java8/scala-2.13+-java11 work?

@pjfanning
Copy link
Contributor

I don't think it should be too hard, I think SBT already has functionality to detect multiple JDK's, what needs to be fleshed out is how the matrix will work, i.e. will we have scala-java8 folders for sources only compiled for JDK 8 and will stuff like scala-2-java8/scala-2.13+-java11 work?

Sounds reasonable but we might also need to support src/main/java11 too (java code that should be used for Java 11+).

@mdedetrich
Copy link
Contributor

Sounds reasonable but we might also need to support src/main/java11 too (java code that should be used for Java 11+).

Yes thats implied

@jrudolph
Copy link
Contributor

One thing to keep in mind is that Pekko/Akka is already somewhat biased against fairness by choosing relatively big batching values both for the actor "throughput" setting and addition to that for the stream "sync processing limit". Therefore, a lot of the scheduling overhead that other effect systems might see because of pushing every single thunk through the general scheduler (fine-grained parallelism) might not apply to Pekko, especially when using streams. In general, this can lead to higher tail latencies in high load scenarios, though it depends a lot on actual setup of streams etc.

In general, providing fairness to get more predictable latency is a worthwhile goal. My impression is that low-level optimizations will likely not cut it. A high-level reason is that under overload scenarios (the only ones where fairness is relevant), some work is more important than other (work which drives the control flow vs actual CPU intensive workloads). However you structure your work queues as long as you are not able to prioritize more important work against greedy CPU intensive workload tasks (which the one in this example is with a tight loop of actors sending messages back and forth), you will not be able to solve the issues of calculations getting stuck because control flow is starved of CPU resources. (See also, how tight loops can prevent GCs from happening). One approach is to bulk-head CPU heavy but low-priority work out of the general purpose queues that drive control flow (e.g. introduction of internal dispatcher, running CPU workloads out of the streams, creating dispatchers for blocking work, creating extra OS threads to push back fairness issues to the OS scheduler). Unfortunately, these kind of solutions require manual interventions and understanding of the systems.

@jrudolph
Copy link
Contributor

And to be clear, props to cats-effect (and also Tokio) to push the state of the art on this. Let's just not jump to conclusions regarding Pekko about this.

@mdedetrich
Copy link
Contributor

In general, providing fairness to get more predictable latency is a worthwhile goal. My impression is that low-level optimizations will likely not cut it. A high-level reason is that under overload scenarios (the only ones where fairness is relevant), some work is more important than other (work which drives the control flow vs actual CPU intensive workloads). However you structure your work queues as long as you are not able to prioritize more important work against greedy CPU intensive workload tasks (which the one in this example is with a tight loop of actors sending messages back and forth), you will not be able to solve the issues of calculations getting stuck because control flow is starved of CPU resources. (See also, how tight loops can prevent GCs from happening). One approach is to bulk-head CPU heavy but low-priority work out of the general purpose queues that drive control flow (e.g. introduction of internal dispatcher, running CPU workloads out of the streams, creating dispatchers for blocking work, creating extra OS threads to push back fairness issues to the OS scheduler). Unfortunately, these kind of solutions require manual interventions and understanding of the systems.

If I understand correctly is this basically asking for a higher level API approach where we can assigned certain priorities to flows within a pekko stream (I guess via Attributes) so that in cases of CPU starvation the pekko actor/stream runtime has knowledge of what is more critical to continue and what isn't?

@jrudolph
Copy link
Contributor

No, I just want to prevent that we do microoptimizations based on findings of different projects with potentially different premises without having real world problems to solve. ForkJoinPoolStarvationSpec is imo not enough to warrant doing any changes here.

@mdedetrich
Copy link
Contributor

mdedetrich commented Sep 22, 2023

No, I just want to prevent that we do microoptimizations based on findings of different projects with potentially different premises without having real world problems to solve. ForkJoinPoolStarvationSpec is imo not enough to warrant doing any changes here.

No disagreements here, I think I got the incorrect impression (I thought you were speaking generally rather than this PR specifically).

Maybe we can make another general discussion thread if its relevant, but I do agree we need to be careful about these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants