Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI dies because of mac_addr mismatch #3469

Open
nielsreijers opened this issue Apr 6, 2024 · 5 comments
Open

CLI dies because of mac_addr mismatch #3469

nielsreijers opened this issue Apr 6, 2024 · 5 comments
Assignees

Comments

@nielsreijers
Copy link
Contributor

nielsreijers commented Apr 6, 2024

Describe the bug
Using the lxd driver, after trying to start one of my VMs, the cli dies when executing a number of commands, while some others still work.

The cause turned out to be a mismatch in mac_addr between /var/snap/multipass/common/data/multipassd/lxd/multipassd-vm-instances.json and the actual address in lxd.
/var/snap/multipass/common/data/multipassd/lxd/multipassd-vm-instances.json contains:

{
    "mptest": {
        "deleted": false,
        "disk_space": "53687091200",
        "extra_interfaces": [
        ],
        "mac_addr": "52:54:00:3b:04:a7",
        "mem_size": "8589934592",
        "metadata": {
        },
        "mounts": [
        ],
        "num_cores": 2,
        "ssh_username": "ubuntu",
        "state": 4
    },
    "openstack": {
        "deleted": false,
        "disk_space": "53687091200",
        "extra_interfaces": [
        ],
        "mac_addr": "52:54:00:22:57:9f",
        "mem_size": "17179869184",
        "metadata": {
        },
        "mounts": [
        ],
        "num_cores": 4,
        "ssh_username": "ubuntu",
        "state": 2
    }
}

lxc network list-leases --project multipass mpbr0 shows

+-----------+-------------------+---------------------------------------+---------+
| HOSTNAME  |    MAC ADDRESS    |              IP ADDRESS               |  TYPE   |
+-----------+-------------------+---------------------------------------+---------+
| mptest    | 52:54:00:3b:04:a7 | 10.199.64.235                         | DYNAMIC |
+-----------+-------------------+---------------------------------------+---------+
| mptest    | 52:54:00:3b:04:a7 | fd42:6718:1331:ed30:5054:ff:fe3b:4a7  | DYNAMIC |
+-----------+-------------------+---------------------------------------+---------+
| openstack | 52:54:00:b8:67:1a | 10.199.64.200                         | DYNAMIC |
+-----------+-------------------+---------------------------------------+---------+
| openstack | 52:54:00:b8:67:1a | fd42:6718:1331:ed30:5054:ff:feb8:671a | DYNAMIC |
+-----------+-------------------+---------------------------------------+---------+

Note how the mac address is identical for the mptest VM, but differs for the openstack VM.

The VM does start correctly, as shown by sudo lxc list --project multipass, but because of the mac_addr mismatch, many command stop working, including simple multipass ls. I think any command that requires multipassd to ssh into the broken VM would fail.

For those commands the lxd driver tries to determine the VM's IP by querying the lxd leases and looking for the VM's mac address. For example, for the 'ls' command the call path is: list in daemon.cpp -> get_all_ipv4 -> ssh_hostname -> ip_address_for -> get_ip_for.
get_ip_for queries the lxd leases, and can't find the mac address because the VM's actual mac is different from the address it's looking for. ip_address_for then keeps retrying for two minutes and fails with a timeout exception.

I think there are two problems here:

  • It shouldn't have got in this state in the first place.
  • If it is, multipass should handle the situation a bit better.

To Reproduce
Unfortunately, I haven't been able to reproduce it using multipass commands yet (still trying though). But manually editing lxd/multipassd-vm-instances.json should reproduce it.

Expected behavior
a) I would expect mac_addr in lxd/multipassd-vm-instances.json to match the mac address in lxd.
b) In case it doesn't, I would expect multipass to handle the error more gracefully, either because we may be able to find the IP in a different way, in which case everything should just work. Or if not, then the cli should notify the user there's a problem with one of the VMs, but still allow the user to work normally with everything but the offending VM.

Logs
This is the most relevant part from ``journalctl --unit snap.multipass.multipassdwhen it's started with--verbosity debug```. There's a lot more output of course, but this shows multipassd querying the lxd leases and getting a different mac than the one it's looking for. This process then repeats several times until it times out.

    GET unix://multipass/var/snap/lxd/common/lxd/unix.socket@1.0/networks/mpbr0/leases?project=multipass
Apr 02 14:17:37 wanheda multipassd[1241989]: Got reply: {
                                                 "error": "",
                                                 "error_code": 0,
                                                 "metadata": [
                                                     {
                                                         "address": "fd42:6718:1331:ed30:5054:ff:feb8:671a",
                                                         "hostname": "openstack",
                                                         "hwaddr": "52:54:00:b8:67:1a",
                                                         "location": "none",
                                                         "type": "dynamic"
                                                     },
                                                     {
                                                         "address": "10.199.64.200",
                                                         "hostname": "openstack",
                                                         "hwaddr": "52:54:00:b8:67:1a",
                                                         "location": "none",
                                                         "type": "dynamic"
                                                     }
                                                 ],
                                                 "operation": "",
                                                 "status": "Success",
                                                 "status_code": 200,
                                                 "type": "sync"
                                             }
Apr 02 14:17:38 wanheda multipassd[1241989]: Could not determine IP address within 1000ms

Additional info

  • OS: Fedora 39
  • multipass version
    multipass 1.13.1
    multipassd 1.13.1
  • multipass info
    Name: mptest
    State: Running
    IPv4: 10.199.64.235
    Release: Ubuntu 22.04.4 LTS
    Image hash: da10d667adf5 (Ubuntu 22.04 LTS)
    CPU(s): 2
    Load: 0.00 0.00 0.00
    Disk usage: 1.6GiB out of 48.4GiB
    Memory usage: 216.0MiB out of 7.7GiB
    Mounts: --

Name: openstack
State: Stopped
IPv4: --
Release: --
Image hash: da10d667adf5 (Ubuntu 22.04 LTS)
CPU(s): --
Load: --
Disk usage: --
Memory usage: --
Mounts: --

  • multipass get local.driver
    lxd

Additional context
There may be some other issues at play here. I'm quite new to multipass. Potentially a really great tool, but it's been quite unstable on my machine unfortunately.

I'm pretty sure this is an issue and at times it worked exactly as I expected given my current understanding. As long as my openstack VM was stopped, everything worked fine. After a multipass start openstack, the cli broke for many commands, but ``lxcshowed the VM did start. And stopping the VM withlxc``` brought the multipass cli back to life.

However, as I'm typing this, there seems to be at least one other issue since stopping the offending VM in lxc didn't work and my multipass ls has been unresponsive for several minutes now, much longer than the 2 minute timeout exception I got previously. So at the moment things seems to be going wrong at a different level.

@nielsreijers nielsreijers added bug needs triage Issue needs to be triaged labels Apr 6, 2024
@nielsreijers
Copy link
Contributor Author

Some more information: after fixing the mac_addr in lxd/multipassd-vm-instances.json, starting the VM initially times out, but is started in the end and the cli mostly works again (multipass ls) no longer hands.

The VM still doesn't work as it should though:

➜  multipass multipass exec openstack -- ls
exec failed: ssh failed to authenticate: 'Access denied for 'publickey'. Authentication that can continue: publickey'

But it works using lxc:

➜  multipass sudo lxc exec openstack --project multipass -- bash
root@openstack:~#

So it looks that there's some other state that's out of sync or not properly initialised. (either in the VM or multipass)

@luis4a0
Copy link
Contributor

luis4a0 commented Apr 15, 2024

Hi @nielsreijers, thanks for taking the time to investigate this and report. Let me ask you if you did something from the LXD command-line to make those MACs change. Thanks!

@luis4a0 luis4a0 removed the needs triage Issue needs to be triaged label Apr 15, 2024
@nielsreijers
Copy link
Contributor Author

Not that I'm aware of. I'm pretty new to both LXD and multipass, so there's really not a lot I could do.

The context is that I was trying to spin up a VM to try Sunbeam and got into all sorts of issues with juju getting stuck, no networking because of the iptables issue, and multipass cli hanging.

I'm new to all of these (really enjoying the process of learning them) so I was messing around quite a bit, and I probably killed a few processes rather ungracefully when things got stuck. I really don't know much about LXD and learned just enough to be able to check multipass was creating the VMs and that they were indeed running. But I have no idea what I could have done from the commandline to change the MAC address.

I did create and deleted several VMs with the same name, and hard killed multipassd a number of times when things weren't responding, so I suspect that's where things got out of sync somehow. But unfortunately I haven't been able to recreate it.

I'm travelling at the moment, but I'll have another go at it when I'm back home in a week and a half or so.

@townsend2010
Copy link
Collaborator

Hey @luis4a0!

Could you please follow up on this one? Thanks!

@nielsreijers
Copy link
Contributor Author

Hi @luis4a0 and @townsend2010

I'm finally back from a long trip, and will have some free time to spend on fun/useful projects over the next weeks. I'll have another go at reproducing the issue for sure, but I'd also like to get involved in fixing this or other issues in multipass if that's possible.
But since I'm new to multipass and it's been a while since I used C++, it would be nice if there was a place to ask some newbie questions and discuss if/how to fix things?

That's provided I can figure out what's going wrong of course. But besides the question of how it got into this inconsistent state, it would also be nice if it handled the situation a bit more gracefully, don't you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants