-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-11687. Update CGroupsResourceCalculator to track usages using cgroupv2 #6780
Conversation
💔 -1 overall
This message was automatically generated. |
9b7e44c
to
39d7351
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @K0K0V0K for the patch, I had a comment.
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @K0K0V0K for the update. In general it looks good to me, I had 4 small comments.
...op/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2ResourceCalculator.java
Outdated
Show resolved
Hide resolved
.../yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsResourceCalculator.java
Outdated
Show resolved
Hide resolved
...yarn/server/nodemanager/containermanager/linux/resources/CGroupsMixedResourceCalculator.java
Outdated
Show resolved
Hide resolved
...n/server/nodemanager/containermanager/linux/resources/AbstractCGroupsResourceCalculator.java
Outdated
Show resolved
Hide resolved
* CGroups has its limitations. It can only be enabled, if both CPU and memory | ||
* cgroups are enabled with yarn.nodemanager.resource.cpu.enabled and | ||
* yarn.nodemanager.resource.memory.enabled respectively. This means that | ||
* memory limits are enforced by default. You can turn this off and keep | ||
* memory reporting only with yarn.nodemanager.resource.memory.enforced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this removed by mistake, or is this limitation not true anymore? If it's no longer relevant the other limitation shouldn't start with "Another limitation".
@VisibleForTesting | ||
long jiffyLengthMs = SysInfoLinux.JIFFY_LENGTH_IN_MILLIS; | ||
@VisibleForTesting | ||
CpuTimeTracker cpuTimeTracker; | ||
@VisibleForTesting | ||
CGroupsHandler cGroupsHandler; | ||
@VisibleForTesting | ||
String procFs = "/proc"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are not heavily used in subclasses it would be nicer if these were private and have package private getter/setters. I don't think exposing them for easier test setting is worth it for having a bit less code in an abstract base class.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…int fails (apache#6723) HADOOP-19057 switched the hadoop-aws test bucket from landsat-pds to noaa-cors-pds This new bucket isn't accessible if the client configuration sets an fs.s3a.endpoint/region value other than us-east-1. Contributed by Viraj Jasani
…ons (apache#6776). Contributed by kulkabhay. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…etManager. (apache#6001). Contributed by Vikas Kumar." This reverts commit e283375.
Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.77 to 1.78. - [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html) - [Commits](https://github.com/bcgit/bc-java/commits) --- updated-dependencies: - dependency-name: org.bouncycastle:bcprov-jdk18on dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.apache.derby:derby from 10.14.2.0 to 10.17.1.0. --- updated-dependencies: - dependency-name: org.apache.derby:derby dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…NS protection (apache#6814). Contributed by wangzhihui. Signed-off-by: Vinayakumar B <vinayakumarb@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Improve task commit resilience everywhere and add an option to reduce delete IO requests on job cleanup (relevant for ABFS and HDFS). Task Commit Resilience ---------------------- Task manifest saving is re-attempted on failure; the number of attempts made is configurable with the option: mapreduce.manifest.committer.manifest.save.attempts * The default is 5. * The minimum is 1; asking for less is ignored. * A retry policy adds 500ms of sleep per attempt. * Move from classic rename() to commitFile() to rename the file, after calling getFileStatus() to get its length and possibly etag. This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach the ResilientCommitByRename callbacks in abfs, which report on the outcome to the caller...which is then logged at WARN. * New statistic task_stage_save_summary_file to distinguish from other saving operations (job success/report file). This is only saved to the manifest on task commit retries, and provides statistics on all previous unsuccessful attempts to save the manifests + test changes to match the codepath changes, including improvements in fault injection. Directory size for deletion --------------------------- New option mapreduce.manifest.committer.cleanup.parallel.delete.base.first This attempts an initial attempt at deleting the base dir, only falling back to parallel deletes if there's a timeout. This option is disabled by default; Consider enabling it for abfs to reduce IO load. Consult the documentation for more details. Success file printing --------------------- The command to print a JSON _SUCCESS file from this committer and any S3A committer is now something which can be invoked from the mapred command: mapred successfile <path to file> Contributed by Steve Loughran
…pache#6034). Contributed by ConfX. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…es not set stateID in RPC response header. (apache#6804)
Contributed by zhihui wang
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper * Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile * Point users to downloads.apache.org for JVSC * Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile Contributed by Christopher Tubbs
…SAdmin.testDecommissionDataNodesReconfig failed (apache#6812) Contributed by Zengqiang Xu. Reviewed-by: Vinayakumar B <vinayakumarb@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>
…#6534) Contributed by xuyu
+remove reference in LICENSE-binary as it is no longer shipped Contributed by Steve Loughran
…he get file attributes for S3A. (apache#6646) Contributed by: Mukund Thakur
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
Port the functionality CGroupsResourceCalculator to CgroupV2
This (and CGroupsResourceCalculator) can only be turned on if mapreduce.job.process-tree.class is set to one of them (or to CombinedResourceCalculator).
How was this patch tested?