Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] master_type failover connection lost to master after each master_alive_interval and sends a new /start event at every interval #66497

Open
amalaguti opened this issue May 10, 2024 · 0 comments
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@amalaguti
Copy link

Description
3006.8 Windows minion configured in multimaster failover, when minion starts, after each master_alive_interval it shows "Connection to master (second) lost", even when this master is still available, and tries to connect to the following master in the list, and this loop continues after each master_alive_interval, switching connection from master to master in the masters list

It also generates multiple /start events (on each master) (the minion service is not restarted, so there should not been /start events after the initial connection)

IMPORTANT: Don't see the same behavior in Windows minion 3006.7

salt/minion/minion-win-1/start	{
    "_stamp": "2024-05-10T00:33:48.353903",
    "cmd": "_minion_event",
    "data": "Minion minion-win-1 started at Fri May 10 00:33:47 2024",
    "id": "minion-win-1",
    "pretag": null,
    "tag": "salt/minion/minion-win-1/start"
}
salt/minion/minion-win-1/start	{
    "_stamp": "2024-05-10T00:34:17.309567",
    "cmd": "_minion_event",
    "data": "Minion minion-win-1 started at Fri May 10 00:34:16 2024",
    "id": "minion-win-1",
    "pretag": null,
    "tag": "salt/minion/minion-win-1/start"
}
salt/minion/minion-win-1/start	{
    "_stamp": "2024-05-10T00:34:46.306666",
    "cmd": "_minion_event",
    "data": "Minion minion-win-1 started at Fri May 10 00:34:45 2024",
    "id": "minion-win-1",
    "pretag": null,
    "tag": "salt/minion/minion-win-1/start"
}
master: 
  - 172.21.0.10
  - 172.21.0.11
auth_tries: 3
auth_timeout: 5
master_tries: 1 # same with -1
enable_legacy_startup_events: False
master_type: failover
master_alive_interval: 20
log_level_logfile: info

Minion log (log messages have been changed from log.debug to log.info in salt/minion.py module to avoid using debug logging)

2024-05-10 00:57:17,730 [salt.cli.daemons                         :280 ][INFO    ][5072] Setting up the Salt Minion "minion-win-1"
2024-05-10 00:57:17,967 [salt.cli.daemons                         :82  ][INFO    ][5072] Starting up the Salt Minion
2024-05-10 00:57:17,967 [salt.utils.event                         :1126][INFO    ][5072] Starting pull socket on 4511
2024-05-10 00:57:18,592 [salt.minion                                                              :1305][INFO    ][5072] Creating minion process manager
2024-05-10 00:57:18,639 [salt.minion                                                              :554 ][INFO    ][5072] Got list of available master addresses: ['172.21.0.10', '172.21.0.11']
2024-05-10 00:57:18,639 [salt.minion                                                              :641 ][CRITICAL][5072] 'master_type' set to 'failover' but 'retry_dns' is not 0. Setting 'retry_dns' to 0 to failover to the next master on DNS errors.
2024-05-10 00:57:18,639 [salt.minion                                                              :232 ][INFO    ][5072] Master URI: tcp://172.21.0.10:4506
2024-05-10 00:57:18,655 [salt.minion                                                              :232 ][INFO    ][5072] Master URI: tcp://172.21.0.11:4506
2024-05-10 00:57:18,655 [salt.minion                                                              :712 ][INFO    ][5072] Connecting to master. Attempt 1 of 1
2024-05-10 00:57:18,655 [salt.minion                                                              :232 ][INFO    ][5072] Master URI: tcp://172.21.0.10:4506
2024-05-10 00:57:18,842 [salt.minion                                                              :1402][INFO    ][5072] Connecting minion's long-running req channel
2024-05-10 00:57:19,452 [salt.loader.['172.21.0.10', '172.21.0.11'].int.module.cmdmod             :438 ][INFO    ][5072] Executing command tzutil in directory 'C:\Windows\system32\config\systemprofile'
2024-05-10 00:57:19,686 [salt.utils.schedule                                                      :415 ][INFO    ][5072] Added new job __mine_interval to scheduler
2024-05-10 00:57:19,686 [salt.minion                                                              :1481][INFO    ][5072] Added mine.update to scheduler
2024-05-10 00:57:19,686 [salt.utils.schedule                                                      :415 ][INFO    ][5072] Added new job __master_alive_172.21.0.10 to scheduler
2024-05-10 00:57:19,702 [salt.minion                                                              :2694][INFO    ][5072] Minion is starting as user 'SYSTEM'
2024-05-10 00:57:19,702 [salt.minion                                                              :3119][INFO    ][5072] Minion 'minion-win-1' trying to tune in
2024-05-10 00:57:19,702 [salt.minion                                                              :3129][INFO    ][5072] Minion is ready to receive requests!
2024-05-10 00:57:19,702 [salt.minion                                                              :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_delete_complete'
2024-05-10 00:57:19,702 [salt.minion                                                              :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 00:57:19,702 [salt.minion                                                              :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_add_complete'
2024-05-10 00:57:19,702 [salt.minion                                                              :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '/salt/minion/minion_schedule_delete_complete'
2024-05-10 00:57:19,702 [salt.minion                                                              :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '__master_req_channel_payload/172.21.0.10'
2024-05-10 00:57:19,702 [salt.minion                                                              :893 ][INFO    ][5072] Minion return retry timer set to 9 seconds (randomized)
2024-05-10 00:57:20,703 [salt.utils.schedule                                                      :1792][INFO    ][5072] Running scheduled job: __mine_interval with jid 20240510005720703948
2024-05-10 00:57:40,714 [salt.utils.schedule                                                            :1792][INFO    ][5072] Running scheduled job: __master_alive_172.21.0.10 with jid 20240510005740714178
2024-05-10 00:57:42,266 [salt.minion                                                                    :2744][INFO    ][5072] Minion of '172.21.0.10' is handling event tag '__master_disconnected'
2024-05-10 00:57:42,266 [salt.minion                                                                    :2828][INFO    ][5072] Connection to master 172.21.0.10 lost
2024-05-10 00:57:42,266 [salt.minion                                                                    :2832][INFO    ][5072] Trying to tune in to next master from master-list
2024-05-10 00:57:42,282 [salt.minion                                                                    :616 ][INFO    ][5072] Moving possibly failed master 172.21.0.10 to the end of the list of masters
2024-05-10 00:57:42,282 [salt.minion                                                                    :187 ][WARNING ][5072] Master ip address changed from 172.21.0.10 to 172.21.0.11
2024-05-10 00:57:42,282 [salt.minion                                                                    :232 ][INFO    ][5072] Master URI: tcp://172.21.0.11:4506
2024-05-10 00:57:42,282 [salt.minion                                                                    :232 ][INFO    ][5072] Master URI: tcp://172.21.0.10:4506
2024-05-10 00:57:42,282 [salt.minion                                                                    :712 ][INFO    ][5072] Connecting to master. Attempt 1 of 1
2024-05-10 00:57:42,282 [salt.minion                                                                    :187 ][WARNING ][5072] Master ip address changed from 172.21.0.10 to 172.21.0.11
2024-05-10 00:57:42,282 [salt.minion                                                                    :232 ][INFO    ][5072] Master URI: tcp://172.21.0.11:4506
2024-05-10 00:57:42,313 [salt.minion                                                                    :2744][INFO    ][5072] Minion of '172.21.0.11' is handling event tag '/salt/minion/minion_schedule_delete_complete'
2024-05-10 00:57:42,391 [salt.minion                                                                    :2859][INFO    ][5072] Re-initialising subsystems for new master 172.21.0.11
2024-05-10 00:57:42,985 [salt.minion                                                                    :2880][INFO    ][5072] Minion is ready to receive requests!
2024-05-10 00:57:43,000 [salt.minion                                                                    :2744][INFO    ][5072] Minion of '172.21.0.11' is handling event tag '__master_connected'
2024-05-10 00:57:43,000 [salt.minion                                                                    :2744][INFO    ][5072] Minion of '172.21.0.11' is handling event tag '__master_req_channel_payload/172.21.0.11'
2024-05-10 00:57:43,000 [salt.minion                                                                    :893 ][INFO    ][5072] Minion return retry timer set to 8 seconds (randomized)
2024-05-10 00:58:03,717 [salt.utils.schedule                                                            :1792][INFO    ][5072] Running scheduled job: __master_alive_172.21.0.11 with jid 20240510005803717225
2024-05-10 00:58:05,248 [salt.minion                                                                    :2744][INFO    ][5072] Minion of '172.21.0.11' is handling event tag '__master_disconnected'
2024-05-10 00:58:05,248 [salt.minion                                                                    :2828][INFO    ][5072] Connection to master 172.21.0.11 lost
2024-05-10 00:58:05,264 [salt.minion                                                                    :2832][INFO    ][5072] Trying to tune in to next master from master-list
2024-05-10 00:58:05,264 [salt.minion                                                                    :616 ][INFO    ][5072] Moving possibly failed master 172.21.0.11 to the end of the list of masters
2024-05-10 00:58:05,264 [salt.minion                                                                    :187 ][WARNING ][5072] Master ip address changed from 172.21.0.11 to 172.21.0.10

Setup
3006.8 master and Windows minion

Steps to Reproduce the behavior
Configure minion as shown in multimaster failover, start minion and check minion log and event bus after each master_alive_interval

Expected behavior
The minion should stay connected to the second master when the master_alive_interval runs, no additional /start event should be seen after the initial minion start

Versions Report
3006.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

1 participant