Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placement: disseminate min Actor API version + reject nodes with version lower than minium #7134

Merged
merged 37 commits into from
Nov 13, 2023

Conversation

ItalyPaleAle
Copy link
Contributor

Fixes #6838

Completes the work that was started in 1.12:

  • The placement service collects all API versions from dapr (which were already reported starting with Dapr 1.12, but unused)
  • If a node tries to join a cluster and its actor API version is lower than the minimum observed in the cluster, it's rejected
  • It's possible to manually set a "ceiling" using -max-api-version when starting the placement service
  • The cluster API version is disseminated to all sidecars connected
  • Once a cluster reaches a given API level, it can never be downgraded (unless users pass -max-api-version). This is to prevent issues such as clusters with mixed versions, in which Placement is restarted, and the minimum version depends on the order the apps come back online at.

The behavior is validated with new ITs, as well as updated unit tests.

Note:

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
…disseminate-min-version

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle ItalyPaleAle requested review from a team as code owners November 1, 2023 17:44
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle ItalyPaleAle added the autoupdate DaprBot will keep the Pull Request up to date with master branch label Nov 1, 2023
Copy link

codecov bot commented Nov 2, 2023

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (6bf645f) 65.07% compared to head (3ddc6a7) 64.98%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7134      +/-   ##
==========================================
- Coverage   65.07%   64.98%   -0.10%     
==========================================
  Files         221      221              
  Lines       20798    20847      +49     
==========================================
+ Hits        13534    13547      +13     
- Misses       6125     6156      +31     
- Partials     1139     1144       +5     
Files Coverage Δ
cmd/placement/options/options.go 94.44% <100.00%> (+0.32%) ⬆️
pkg/injector/patcher/sidecar_container.go 89.82% <100.00%> (+0.03%) ⬆️
pkg/placement/raft/fsm.go 76.82% <100.00%> (+0.87%) ⬆️
pkg/placement/raft/state.go 92.38% <89.47%> (-1.03%) ⬇️
pkg/placement/membership.go 48.38% <20.00%> (-5.39%) ⬇️
pkg/placement/placement.go 73.17% <66.66%> (-4.31%) ⬇️
pkg/operator/operator.go 0.00% <0.00%> (ø)
pkg/placement/tables.go 0.00% <0.00%> (ø)

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mukundansundar mukundansundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should a minActorApiLevel also be a configurable parameter?
Do we consider the APILevel there in placement service as the min API level or should we have a separate value?
i.e. let's say that there is complete backwards compatibility between levels 10, 11 and 12 ....
So some daprd versions can join with 10 others with 11 others with 12 ... But the min level can be 10 and the cluster level can still be 11.

charts/dapr/README.md Outdated Show resolved Hide resolved
for _, entity := range req.Entities {
monitoring.RecordActorHeartbeat(req.Id, entity, req.Name, req.Pod, p.clock.Now())
monitoring.RecordActorHeartbeat(req.Id, entity, req.Name, req.Pod, now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a blocking call which takes some time, and if so should now be calculated again after the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking call. It just made sense to get the current time once only, so it's the same everywhere, down to the nanoseconds.

@ItalyPaleAle
Copy link
Contributor Author

should a minActorApiLevel also be a configurable parameter?

I didn't think of that at the beginning, but I now can see a use case for this. I will add it.

Do we consider the APILevel there in placement service as the min API level or should we have a separate value

The API level reported by placement is the minimum observed API level in the cluster

i.e. let's say that there is complete backwards compatibility between levels 10, 11 and 12 ..

The design doc #6838 states that versions are never fully backwards-compatible. The version does not track the daprd version however, so if no change is made that requires updating the version in a Dapr release, it won't be updated.

@yaron2
Copy link
Member

yaron2 commented Nov 2, 2023

Sets the "max-api-level" flag which prevents the Actor API level for going above this value. The default value of -1 means no cap - this description isn't clear to users. lets make this very clear so users know why they're given this option, what it affects and why they care about it - otherwise this Helm chart setting needs to be removed.

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle
Copy link
Contributor Author

Updated per review feedback. Added "min" option and clarified message on Helm chart.

Co-authored-by: Cassie Coyle <cassie.i.coyle@gmail.com>
Signed-off-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>
daixiang0
daixiang0 previously approved these changes Nov 6, 2023
Copy link
Member

@daixiang0 daixiang0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -141,6 +141,7 @@ The Helm chart has the follow configuration options that can be supplied:
| `dapr_placement.cluster.logStoreWinPath` | Mount path for persistent volume for log store in windows when HA is true | `C:\\raft-log` |
| `dapr_placement.volumeclaims.storageSize` | Attached volume size | `1Gi` |
| `dapr_placement.volumeclaims.storageClassName` | storage class name | |
| `dapr_placement.maxApiLevel` | Sets the `max-api-level` flag which prevents the Actor API level for going above this value. The default value of -1 means no cap. | `-1` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `dapr_placement.maxApiLevel` | Sets the `max-api-level` flag which prevents the Actor API level for going above this value. The default value of -1 means no cap. | `-1` |
| `dapr_placement.maxApiLevel` | Sets the `max-api-level` flag which prevents the Actor API level from going above this value. The default value of -1 means no cap. | `-1` |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a bit of blurb as to why a user would want to change this? It is not currently clear to me why a user would want to use this flag, and question why it needs to be exposed by the helm chart at all. Is this not an internal value which is hard coded into the versioned build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I add the more detailed explanation in the docs? I'd rather not write a long paragraph here.

Most users will not need to set this, but it can be necessary in some rare situations.

By default this is automatically set to the minimum observed value in the cluster. For example, say you have 3 apps, with these actor API levels: 10, 20, 20. The minimum observed is 10, so that will be the level used in the cluster.

Next, assume that the app with level 10 goes away, so the minimum level now becomes 20.

The level can never be decreased, only increased. Which means that if now the app with level 10 comes back, it will not be able to connect to Placement (and use actors).

In this very specific case, users can set maxApiLevel to 10 to enforce the reported level is never higher than 10.

I can see this possibly being the case during upgrades or if users have the need to run older versions of a sidecar alongside new ones. It should not be a common case, however.

tests/integration/suite/placement/apilevel/no_max.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/no_max.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/shared.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/shared.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/with_min.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/with_max.go Outdated Show resolved Hide resolved
tests/integration/suite/placement/apilevel/no_max.go Outdated Show resolved Hide resolved
pkg/placement/membership.go Outdated Show resolved Hide resolved
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
ItalyPaleAle and others added 5 commits November 6, 2023 17:35
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@yaron2 yaron2 merged commit d3ded6a into dapr:master Nov 13, 2023
18 of 22 checks passed
ItalyPaleAle added a commit to ItalyPaleAle/dapr that referenced this pull request Nov 15, 2023
Follow-up from dapr#7134
Part of dapr#6838

Actor runtime can use this to change its behavior. Will allow completing dapr#7129

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
yaron2 pushed a commit that referenced this pull request Nov 17, 2023
Follow-up from #7134
Part of #6838

Actor runtime can use this to change its behavior. Will allow completing #7129

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle ItalyPaleAle added this to the v1.13 milestone Jan 31, 2024
cicoyle added a commit to cicoyle/dapr that referenced this pull request May 24, 2024
…ion lower than minium (dapr#7134)

* WIP: Placement service disseminates min version

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Added max-api-level flag

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Placement disseminates min API level

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Some work on tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Updated tests and fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Completed tests and fixed some bugs

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Some test fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Added option to Helm chart

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Persist API level and make sure it's never decreased

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Updated some tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed updating on node deletion

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Updated tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Lint

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Updated per review feedback

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Update tests/integration/suite/placement/apilevel/no_max.go

Co-authored-by: Cassie Coyle <cassie.i.coyle@gmail.com>
Signed-off-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>

* Review feedback pt 1

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Handle maxApiLevel using a pointer

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Use util.HTTPClient

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Typos

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

---------

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com>
Co-authored-by: Cassie Coyle <cassie.i.coyle@gmail.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>
Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
cicoyle pushed a commit to cicoyle/dapr that referenced this pull request May 24, 2024
Follow-up from dapr#7134
Part of dapr#6838

Actor runtime can use this to change its behavior. Will allow completing dapr#7129

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: Cassandra Coyle <cassie@diagrid.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoupdate DaprBot will keep the Pull Request up to date with master branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include runtime version in Dapr placement data to ease with upgrades
7 participants