-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet]: On agent upgrade failure for first time, review error badge is not displayed #183243
Comments
Pinging @elastic/fleet (Team:Fleet) |
@amolnater-qasource Kindly review |
Secondary review for this ticket is Done. |
@jillguyonnet - Could you weigh in on this? AFAIU, the "review errors" badge should appear when the polling request detects an error in this case, right? |
@kpollich That's correct, with a caveat that the polling request only queries the last 35 seconds (this comment details the logic). It would be good to clarify a few details in order to understand this scenario.
Action status after failed upgrade for horde agent: {"actionId":"1345158b-e460-462c-b480-48f691147bce","nbAgentsActionCreated":1,"nbAgentsAck":0,"version":"8.11.22","type":"UPGRADE","nbAgentsActioned":1,"status":"FAILED","expiration":"2024-06-13T16:39:46.846Z","creationTime":"2024-05-14T16:39:46.846Z","nbAgentsFailed":1,"hasRolloutPeriod":false,"completionTime":"0001-01-01T00:00:00.000Z","latestErrors":[{"agentId":"3d485f27-db35-41fb-af80-f4b122a254cc","error":"HTTP Fail","timestamp":"0001-01-01T00:00:00Z","hostname":"eh-Snakerowan-5Nbx"}]} Action status after failed upgrade for 2 agents on Multipass: {"actionId":"4bc043d8-026a-4e86-8907-8b4beb9f329a","nbAgentsActionCreated":2,"nbAgentsAck":2,"version":"8.12.9","startTime":"2024-05-14T16:24:16.988Z","type":"UPGRADE","nbAgentsActioned":2,"status":"COMPLETE","expiration":"2024-06-13T16:24:16.988Z","creationTime":"2024-05-14T16:24:30.324Z","nbAgentsFailed":0,"hasRolloutPeriod":false,"completionTime":"2024-05-14T16:39:24.952Z","latestErrors":[]}
|
The horde implementation has diverged from the agent somehow, but it's not clear just reading this what it might be. What version did you use when you tested this? Depending on the exact format it might hit different parts of the agent code. For example if it looked valid but didn't exist I'd have expected the agent to attempt to download it and report recurring failures doing that. |
For the real agent that is what I expected to see. It will retry the download until the download timeout expires, by default this is two hours. After that it should report the upgrade as failed. |
@cmacknz Can we configure the download timeout? It would make testing this a lot easier. |
I think that agent didn't respect it when sent from the Fleet override API, but it's been a while since I tested this: elastic/elastic-agent#4580 |
Kibana Build details:
Preconditions:
Steps to reproduce:
Expected Result:
On agent upgrade failure for first time, review error badge should display.
Screen Shot:
The text was updated successfully, but these errors were encountered: