You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I checked to make sure that this issue has not already been filed
I am reporting the issue to the correct repository (for multi-repository projects)
Expected Behavior
If a move plan calculation cannot be computed, we should know that and then bubble the appropriate error up rather than assume it was calculated. An example will be outlined below.
Current Behavior
I tried updating to the latest cloud-sdk-go version and ecctl version locally and received the exact same stack trace as older versions (both are on v1.8.0), so I don't perceive this as a new problem and perhaps a backport should be considered when we fix it. Here's the crux of the problem:
We should be checking that we have a calculated plan for everything, rather than just assuming we'll get one as a good defensive programming practice. Mym recommendation is that each stack component and cluster in its corresponding slice should have the check and bubble the error up for a a nil calculated plan here: https://github.com/elastic/cloud-sdk-go/blob/master/pkg/api/platformapi/allocatorapi/vacate.go#L633-L718.
At the end of the day it's a chicken and egg issue, so if ES is in a really bad state, something like entsearch can't be checked; thus, we can't calc a plan. Here's an example:
{
"cluster_type": "elasticsearch",
"details": "Could not make sure [ElasticsearchCluster(<redacted>)] is up and running",
"caused_by": "no.found.constructor.plan.apm.ClusterNotReachable: Unexpected response [401 Unauthorized, {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [cloud-internal-enterprise_search-server] for REST request [/]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [cloud-internal-enterprise_search-server] for REST request [/]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}]"
}
Possible Solution
Noted above, but to make parsing easier:
We should be checking that we have a calculated plan for everything, rather than just assuming we'll get one as a good defensive programming practice. Mym recommendation is that each stack component and cluster in its corresponding slice should have the check and bubble the error up for a a nil calculated plan here: https://github.com/elastic/cloud-sdk-go/blob/master/pkg/api/platformapi/allocatorapi/vacate.go#L633-L718.
I also don't mind working on this in the near future, but if someone wants to tackle it within the next few weeks, feel free.
Steps to Reproduce
Get ES into a really bad state with another stack component running along side it and attempt to vacate said stack component.
Context
I think enough was provided above :).
Your Environment
Version used: v1.8.0 for all the things
Environment name and version (e.g. Go 1.9): 1.16
The text was updated successfully, but these errors were encountered:
Readiness Checklist
Expected Behavior
If a move plan calculation cannot be computed, we should know that and then bubble the appropriate error up rather than assume it was calculated. An example will be outlined below.
Current Behavior
I tried updating to the latest cloud-sdk-go version and ecctl version locally and received the exact same stack trace as older versions (both are on v1.8.0), so I don't perceive this as a new problem and perhaps a backport should be considered when we fix it. Here's the crux of the problem:
"Debugging" locally:
The problem is indeed the plan computation is returning nil and we don't bother to check for nil values here: https://github.com/elastic/cloud-sdk-go/blob/master/pkg/api/platformapi/allocatorapi/vacate.go#L708, but more specifically on the field:
https://github.com/elastic/cloud-sdk-go/blob/master/pkg/models/move_clusters_details.go#L54 -> https://github.com/elastic/cloud-sdk-go/blob/master/pkg/models/move_enterprise_search_details.go#L41
We should be checking that we have a calculated plan for everything, rather than just assuming we'll get one as a good defensive programming practice. Mym recommendation is that each stack component and cluster in its corresponding slice should have the check and bubble the error up for a a nil calculated plan here: https://github.com/elastic/cloud-sdk-go/blob/master/pkg/api/platformapi/allocatorapi/vacate.go#L633-L718.
At the end of the day it's a chicken and egg issue, so if ES is in a really bad state, something like entsearch can't be checked; thus, we can't calc a plan. Here's an example:
Possible Solution
Noted above, but to make parsing easier:
I also don't mind working on this in the near future, but if someone wants to tackle it within the next few weeks, feel free.
Steps to Reproduce
Get ES into a really bad state with another stack component running along side it and attempt to vacate said stack component.
Context
I think enough was provided above :).
Your Environment
The text was updated successfully, but these errors were encountered: