-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: protect concurrent access to PodInfo.Pod #115777
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@pohly is there a PR you have your local changes ready and hence I can take a look and reproduce this issue? I'm a bit skeptical about why DynamicResourceAllocation is unique to trigger this issue, before jumping to the decision of using atomic.Pointer or not. |
You can reproduce with my As far as I can tell, none of the existing test cases involve the handling of pod scheduling failures. Therefore they don't hit this particular race ( |
The pointer gets updated by informer callbacks in parallel to pod scheduling, without taking any locks. This showed up in a scheduler_perf test case involving dynamic resource allocation because there pod scheduling fails, which (for the first time in those benchmarks?) exercise additional code paths. There's no obvious lock that could protect this field. Even if there was one, finding all code locations which currently don't hold it when accessing the field would be difficult. Therefore conversion to an atomic.Pointer is used to fix this problem: the type change causes compile errors in (most) users of the field which haven't been updated to call Load resp. Store. The one exception is when the field is passed to functions taking an interface{}: such code compiles and then fails at runtime. The race report was (line numbers to not exactly match current master): WARNING: DATA RACE Write at 0x00c000baafc0 by goroutine 25750: k8s.io/kubernetes/pkg/scheduler/framework.(*PodInfo).Update() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/types.go:216 +0xd2e k8s.io/kubernetes/pkg/scheduler/internal/queue.updatePod() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:903 +0x5a4 k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).Update() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:620 +0x582 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).updatePodInSchedulingQueue() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/eventhandlers.go:140 +0x359 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).updatePodInSchedulingQueue-fm() <autogenerated>:1 +0x6d k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:246 +0x8b k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x29 k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:311 +0xf7 k8s.io/client-go/tools/cache.(*FilteringResourceEventHandler).OnUpdate() <autogenerated>:1 +0x8d k8s.io/client-go/tools/cache.(*processorListener).run.func1() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:973 +0x21e k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x48 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xce k8s.io/apimachinery/pkg/util/wait.JitterUntil() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x10d k8s.io/apimachinery/pkg/util/wait.Until() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x78 k8s.io/client-go/tools/cache.(*processorListener).run() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:969 +0x18 k8s.io/client-go/tools/cache.(*processorListener).run-fm() <autogenerated>:1 +0x39 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x73 Previous read at 0x00c000baafc0 by goroutine 25981: k8s.io/kubernetes/pkg/scheduler/internal/queue.(*nominator).add() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:1024 +0xda4 k8s.io/kubernetes/pkg/scheduler/internal/queue.(*nominator).AddNominatedPod() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:841 +0x58 k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).AddNominatedPod() <autogenerated>:1 +0x63 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).handleSchedulingFailure() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/schedule_one.go:916 +0x1a34 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).handleSchedulingFailure-fm() <autogenerated>:1 +0xcc k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/schedule_one.go:97 +0x962 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne-fm() <autogenerated>:1 +0x4d k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:190 +0x4c k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x48 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xce k8s.io/apimachinery/pkg/util/wait.JitterUntil() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x10d k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:190 +0xaa k8s.io/apimachinery/pkg/util/wait.UntilWithContext() /nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:101 +0x52 k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run.func1() /nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/scheduler.go:357 +0x5a
14e12f2
to
0285030
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Unrelated test flake (TestChangeCRD). |
I need some time to check the root cause, and think more about the current version as it involves a lot of places.
The perf integration test suite |
By looking at the race log:
It seems it comes from one of the updatePod() call inside:
@pohly could you elaborate in the DRA integration test, how this update is triggerd? and which fields are updated? |
I'm not hitting the race with the current code from #116207 (= DRA testcases, no optimizations). I do see it with https://github.com/kubernetes/kubernetes/compare/master...pohly:dra-integration-tests?expand=1. Without those optimizations the timing is vastly different, so perhaps that is hiding the race? It could of course also mean that some of the additional changes cause it. I can't say for sure which pod update is the one that triggers the race. When I add
|
I also get a different race report for gRPC logging. Once #116207 is merged, I intend to investigate how some small-scale test scenarios can be run in a unit test. Then the normal unit testing should run with race detection and we have a chance to detect problems like the one I already fixed without having to check manually. |
I was able to trigger the same race also with Not sure whether it matters, but I had added the following: diff --git a/pkg/scheduler/internal/queue/scheduling_queue.go b/pkg/scheduler/internal/queue/scheduling_queue.go
index 1f639b5c421..806d78f0a68 100644
--- a/pkg/scheduler/internal/queue/scheduling_queue.go
+++ b/pkg/scheduler/internal/queue/scheduling_queue.go
@@ -33,6 +33,7 @@ import (
"sync"
"time"
+ "github.com/google/go-cmp/cmp"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/types"
@@ -897,6 +898,7 @@ func (p *PriorityQueue) calculateBackoffDuration(podInfo *framework.QueuedPodInf
func updatePod(oldPodInfo interface{}, newPod *v1.Pod) *framework.QueuedPodInfo {
pInfo := oldPodInfo.(*framework.QueuedPodInfo)
+ fmt.Printf("pod %s updated:\n%s\n", pInfo.Pod.Name, cmp.Diff(pInfo.Pod, newPod))
pInfo.Update(newPod)
return pInfo
} $ go test -v -timeout=0 -bench=.*/PreemptionBasic -race -short=true -benchtime=1ns ./test/integration/scheduler_perf
...
pod pod-6t9mj updated:
&v1.Pod{
TypeMeta: {},
ObjectMeta: v1.ObjectMeta{
... // 3 identical fields
SelfLink: "",
UID: "02e593f8-903b-49ea-b19a-c7f2a63d5a46",
- ResourceVersion: "6685",
+ ResourceVersion: "7276",
Generation: 0,
CreationTimestamp: {Time: s"2023-03-08 17:53:17 +0100 CET"},
... // 4 identical fields
OwnerReferences: nil,
Finalizers: nil,
ManagedFields: []v1.ManagedFieldsEntry{
{Manager: "scheduler_perf.test", Operation: "Update", APIVersion: "v1", Time: s"2023-03-08 17:53:17 +0100 CET", ...},
+ {
+ Manager: "scheduler_perf.test",
+ Operation: "Update",
+ APIVersion: "v1",
+ Time: s"2023-03-08 17:53:17 +0100 CET",
+ FieldsType: "FieldsV1",
+ FieldsV1: s`{"f:status":{"f:conditions":{".":{},"k:{\"type\":\"PodScheduled\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:
me`...,
+ Subresource: "status",
+ },
},
},
Spec: {Containers: {{Name: "pause", Image: "registry.k8s.io/pause:3.9", Ports: {{ContainerPort: 80, Protocol: "TCP"}}, Resources: {Limits: {s"cpu": {i: {...}, s: "3"
, Format: "DecimalSI"}, s"memory": {i: {...}, s: "500Mi", Format: "BinarySI"}}, Requests: {s"cpu": {i: {...}, s: "3", Format: "DecimalSI"}, s"memory": {i: {...}, s: "500Mi",
Format: "BinarySI"}}}, ...}}, RestartPolicy: "Always", TerminationGracePeriodSeconds: &30, DNSPolicy: "ClusterFirst", ...},
Status: v1.PodStatus{
Phase: "Pending",
- Conditions: nil,
+ Conditions: []v1.PodCondition{
+ {
+ Type: "PodScheduled",
+ Status: "False",
+ LastTransitionTime: s"2023-03-08 17:53:17 +0100 CET",
+ Reason: "Unschedulable",
+ Message: "0/500 nodes are available: 500 Insufficient cpu.",
+ },
+ },
Message: "",
Reason: "",
- NominatedNodeName: "",
+ NominatedNodeName: "node-0-8vql9",
HostIP: "",
PodIP: "",
... // 7 identical fields
},
}
==================
WARNING: DATA RACE
Write at 0x00c00012db90 by goroutine 4837:
k8s.io/kubernetes/pkg/scheduler/framework.(*PodInfo).Update()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/types.go:155 +0xd2e
k8s.io/kubernetes/pkg/scheduler/internal/queue.updatePod()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:902 +0x1a6
k8s.io/kubernetes/pkg/scheduler/internal/queue.(*PriorityQueue).Update()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:620 +0xfd6
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).updatePodInSchedulingQueue()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/eventhandlers.go:140 +0x359
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).updatePodInSchedulingQueue-fm()
<autogenerated>:1 +0x6d
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:246 +0x8b
k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
<autogenerated>:1 +0x29
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:311 +0xf7
k8s.io/client-go/tools/cache.(*FilteringResourceEventHandler).OnUpdate()
<autogenerated>:1 +0x8d
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:973 +0x21e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x48
k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xce
k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x10d
k8s.io/apimachinery/pkg/util/wait.Until()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x78
k8s.io/client-go/tools/cache.(*processorListener).run()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:969 +0x18
k8s.io/client-go/tools/cache.(*processorListener).run-fm()
<autogenerated>:1 +0x39
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x73
Previous read at 0x00c00012db90 by goroutine 173610:
k8s.io/kubernetes/pkg/scheduler/framework.(*PodInfo).DeepCopy()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/types.go:141 +0x38
k8s.io/kubernetes/pkg/scheduler/internal/queue.(*nominator).NominatedPodsForNode()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go:851 +0x21a
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).NominatedPodsForNode()
<autogenerated>:1 +0x7d
k8s.io/kubernetes/pkg/scheduler/framework/runtime.addNominatedPods()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:871 +0x105
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPluginsWithNominatedPods()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:846 +0x174
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).SelectVictimsOnNode()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:188 +0x6cc
k8s.io/kubernetes/pkg/scheduler/framework/preemption.(*Evaluator).DryRunPreemption.func1()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/preemption/preemption.go:568 +0x206
k8s.io/kubernetes/pkg/scheduler/framework/parallelize.Parallelizer.Until.func1()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/parallelize/parallelism.go:60 +0x1af
k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:90 +0x1c1
Goroutine 4837 (running) created at:
k8s.io/apimachinery/pkg/util/wait.(*Group).Start()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0xe4
k8s.io/client-go/tools/cache.(*sharedProcessor).run.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:799 +0x1e7
k8s.io/client-go/tools/cache.(*sharedProcessor).run()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:803 +0x51
k8s.io/client-go/tools/cache.(*sharedProcessor).run-fm()
<autogenerated>:1 +0x44
k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:58 +0x3e
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x73
Goroutine 173610 (finished) created at:
k8s.io/client-go/util/workqueue.ParallelizeUntil()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:76 +0x30f
k8s.io/kubernetes/pkg/scheduler/framework/parallelize.Parallelizer.Until()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/parallelize/parallelism.go:63 +0x1b6
k8s.io/kubernetes/pkg/scheduler/framework/preemption.(*Evaluator).DryRunPreemption()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/preemption/preemption.go:599 +0x755
k8s.io/kubernetes/pkg/scheduler/framework/preemption.(*Evaluator).findCandidates()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/preemption/preemption.go:244 +0x6ed
k8s.io/kubernetes/pkg/scheduler/framework/preemption.(*Evaluator).Preempt()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/preemption/preemption.go:171 +0x4d9
k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption.(*DefaultPreemption).PostFilter()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go:99 +0x244
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:801 +0xd1
k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunPostFilterPlugins()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:776 +0x42f
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/schedule_one.go:154 +0x2a6
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/schedule_one.go:95 +0x544
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne-fm()
<autogenerated>:1 +0x4d
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:190 +0x4c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x48
k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xce
k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x10d
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:190 +0xaa
k8s.io/apimachinery/pkg/util/wait.UntilWithContext()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:101 +0x52
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run.func1()
/nvme/gopath/src/k8s.io/kubernetes/pkg/scheduler/scheduler.go:358 +0x5a
================== |
I agree that there is a bug: But I disagree with the solution. Instead, we should instantiate a new QueuedPodInfo and put that one in the scheduling queue. We are not saving any much allocations if we are calling DeepCopy anyways. |
The current approach is going to hide race conditions in other fields of PodInfo. |
Or you could protect |
Why that? If other fields suffer from the same problem, they will show up as a data race the same way as updating Pod did.
A new PodInfo or QueuedPodInfo? But either way I am not sure whether that will fix the problem completely. The
I do not fully understand what a different solution would look like yet, so help definitely would be needed right now. Regarding backporting the current atomic.Pointer change, there will be a snag: "go vet" in older Kubernetes releases will complain about the |
It is possible that a race condition in other fields is harder to reproduce, but there is the same pattern: updatePod changes those fields, and they are read during scheduling.
and
That's why I suggest having a new QueuedPodInfo. We are already getting a lock for updating the scheduling queue. We are just not locking updates to things inside QueuedPodInfo. kubernetes/pkg/scheduler/internal/queue/scheduling_queue.go Lines 604 to 606 in 3307b39
This also allows us to cherrypick without relying on the generic pointer. |
When taking that lock, why is it necessary to handle PodInfo and QueuedPodInfo instances differently? |
So the argument was about the completeness of the fix, not whether it makes other similar problems harder to find. I agree, if such fields exist, then protecting PodInfo as a whole is necessary and better. |
Is the read path taking that lock already? This here:
|
I didn't understand the question.
Oh.... The problem is that the nominator has its own lock
separate from the PriorityQueue's
That's clearly a problem. It's unclear to me if this is the only problem. |
I was asking about the read path locking because that question then needs to be answered for all code which reads I started protecting the pod field with atomic.Pointer because then (in most cases) the compiler will reject incorrect code. Introducing some additional rules like "access |
Those are valid concerns. I think we need to focus on a hotfix for now, specially given that we can't backport the usage of atomic.Pointer. Pop is not a problem precisely because we are popping it out from the queue. So potential parallel updates while scheduling is happening would create a new object in the queue. |
#116395 fixes the same problem. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The pointer gets updated by informer callbacks in parallel to pod scheduling, without taking any locks. This showed up in a scheduler_perf test case involving dynamic resource allocation because there pod scheduling fails, which (for the first time in those benchmarks?) exercise additional code paths.
There's no obvious lock that could protect this field. Even if there was one, finding all code locations which currently don't hold it when accessing the field would be difficult. Therefore conversion to an atomic.Pointer is used to fix this problem: the type change causes compile errors in (most) users of the field which haven't been updated to call Load resp. Store.
The one exception is when the field is passed to functions taking an interface{}: such code compiles and then fails at runtime.
The race report was (line numbers to not exactly match current master):
Which issue(s) this PR fixes:
Fixes #115762
Special notes for your reviewer:
This contains one "govet" suppression because of google/go-cmp#310 (comment)
Does this PR introduce a user-facing change?