Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDAP-20922] Reuse the Spark ClassLoader in app-fabric #15483

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

chtyim
Copy link
Contributor

@chtyim chtyim commented Dec 4, 2023

  • For non local spark program execution purposes, we don’t need to isolate the Spark classloader, since there is no creation of the SparkContext object.

@chtyim chtyim added build Triggers github actions build 6.10 labels Dec 4, 2023
// Never needs to rewrite yarn client in CDAP master, which is the only place using
// distributed program runner. Also wrap the classloader so that it can't be closed.
distributedRunnerClassLoader = new ClassLoader(createClassLoader(true, false, false)) {};
return distributedRunnerClassLoader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning distributedRunnerClassLoader and incurring another expensive read from a volatile variable, can we write the new classloader to our existing local variable classLoader, assign it to distributedRunnerClassLoader, then return the local variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really expensive and it is just have it hit once due to the singleton. Just don't wanna clutter the code too much

@chtyim chtyim force-pushed the feature/cache-spark-cl branch 2 times, most recently from 9ddb843 to a4ecc5d Compare December 4, 2023 22:31
- For non local spark program execution purposes, we don’t need to isolate the Spark classloader, since there is no creation of the SparkContext object.
}
try {
// Never needs to rewrite yarn client in CDAP master, which is the only place using
// distributed program runner. Also wrap the classloader so that it can't be closed.
Copy link
Contributor

@arjan-bal arjan-bal Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping it will also make it difficult for task workers to close the class loader. So we won't be able to solve the memory increases in task workers until we have a way to re-use this class loader in task workers also.

@arjan-bal arjan-bal changed the title [CDAP-20910] Reuse the Spark ClassLoader in app-fabric [CDAP-20922] Reuse the Spark ClassLoader in app-fabric Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.10 build Triggers github actions build
Projects
None yet
3 participants