Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]VPC网络,指定VLAN TAG做业务口,host无法正确启动 #19771

Open
zhfish opened this issue Mar 22, 2024 · 6 comments
Open

[BUG]VPC网络,指定VLAN TAG做业务口,host无法正确启动 #19771

zhfish opened this issue Mar 22, 2024 · 6 comments
Labels

Comments

@zhfish
Copy link

zhfish commented Mar 22, 2024

问题描述/What happened:
host启动失败,宿主机网络肯定是离线的
网页里的二层网络能看到宿主机的ip
bond做了聚合,vnet是bond0的vlan
ovs已经建了br0,vnet也被正确添加进去了
这里有个疑问,做vlan的话,是ovs绑定bond0指定vlan,还是直接绑定vlan接口?

[root@gpu-4 ~]# nmcli dev status
vnet        vlan      已连接  vlan-vnet
bond0       bond      已连接  bond0
enp44s0f0   ethernet  已连接  bond-slave-enp44s0f0
enp45s0f0   ethernet  已连接  bond-slave-enp45s0f0
enp193s0f0  ethernet  已断开  --
enp193s0f1  ethernet  不可用  --
enp193s0f2  ethernet  不可用  --
enp193s0f3  ethernet  不可用  --
enp44s0f1   ethernet  不可用  --
enp45s0f1   ethernet  不可用  --
lo          loopback  未托管  --
[warning 2024-03-22 16:33:35 hostinfo.(*SHostInfo).isVirtualFunction(hostinfo.go:1650)] failed get nic enp45s0f1 phys_port_name: read /sys/class/net/enp45s0f1/phys_port_name: operation not supported
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doSendPhysicalNicInfo(hostinfo.go:1730)] upload physical nic: enp45s0f1(0c:42:a1:ec:b3:ab)
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doUploadNicInfoInternal(hostinfo.go:1747)] Upload NIC br: if:enp45s0f1
[info 2024-03-22 16:33:35 hostinfo.(*SHostInfo).doUploadNicInfoInternal(hostinfo.go:1747)] Upload NIC br:br0 if:vnet
[error 2024-03-22 16:33:35 hostinfo.(*SHostInfo).onFail(hostinfo.go:1105)] register failed: initHostNetworks: uploadNetworkInfo: doSyncNicInfo vnet: doUploadNicInfoInternal: modules.Hosts.PerformAction add-netif: {"error":{"class":"BadRequestError","code":400,"details":"addNetif: {\"error\":{\"class\":\"BadRequestError\",\"code\":400,\"data\":{\"fields\":[{}],\"id\":\"%!v(MISSING)\"},\"details\":\"hh.Attach2Network: net.GetFreeIP: getFreeIP: {\\\"error\\\":{\\\"class\\\":\\\"InsufficientResourceError\\\",\\\"code\\\":400,\\\"data\\\":{\\\"id\\\":\\\"Out of IP address\\\"},\\\"details\\\":\\\"Out of IP address\\\"}}\"}}","request":{"body":"{\"host\":{\"bridge\":\"br0\",\"interface\":\"vnet\",\"ip_addr\":\"10.106.75.4\",\"link_up\":true,\"mac\":\"08:c0:eb:3b...id\":875,\"wire\":\"bcast0\"}}","headers":{"Content-Length":"173","Content-Type":"application/json","User-Agent":"yunioncloud-go/201708","X-Auth-Token":"*","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-region:30888","X-Yunion-Span-Id":"0","X-Yunion-Span-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"2b2367c7"},"method":"POST","url":"https://default-region:30888/hosts/0d632ec4-2347-4fd6-8f6b-ea22314d131c/add-netif"}}}
panic: exit immediately for retry...

goroutine 1 [running]:
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).onFail(0xc0002e8580, {0x36244a0?, 0xc001f64138?})
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1108 +0x44a
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).register(0xc0002e8580)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1082 +0xfb
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).StartRegister(0xc001043550?, 0xc000c32300?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:1049 +0x32
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc00022e398?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:107 +0x2df
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc0002c0150)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a

环境/Environment:

  • OS (e.g. cat /etc/os-release):
NAME="OpenCloudOS"
VERSION="8.8"
ID="opencloudos"
ID_LIKE="rhel fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:oc8"
PRETTY_NAME="OpenCloudOS 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:opencloudos:opencloudos:8"
HOME_URL="https://www.opencloudos.org/"
BUG_REPORT_URL="https://bugs.opencloudos.tech/"
  • Kernel (e.g. uname -a):
Linux gpu-4.cloud 5.4.119-20.0009.29 #1 SMP Mon Aug 14 20:03:28 CST 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Host: (e.g. dmidecode | egrep -i 'manufacturer|product' |sort -u)
        idProduct: 0xffb0
        Manufacturer:
        Manufacturer: ACBEL
        Manufacturer: Advanced Micro Devices, Inc.
        Manufacturer: Micron Technology
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Module Manufacturer ID: Bank 1, Hex 0x2C
        Module Product ID: Unknown
        Product Name: 65MA32
        Product Name: X7840A0
  • Service Version (e.g. kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list):
    没取到,过了一段时间,所有容器都exited了
The connection to the server 10.106.75.4:6443 was refused - did you specify the right host or port?
error: expected 'exec (POD | TYPE/NAME) COMMAND [ARG1] [ARG2] ... [ARGN]'.
POD or TYPE/NAME and COMMAND are required arguments for the exec command
See 'kubectl exec -h' for help and examples
[root@gpu-4 ~]# climc

#19608 开始, 我重新装了系统,全新安装,并且只配置了一个ip,防止多ip干扰。

@zhfish zhfish added the bug Something isn't working label Mar 22, 2024
@swordqiu
Copy link
Member

@zhfish 请参考 https://www.cloudpods.org/docs/guides/onpremise/network/examples

感觉你的配置像这个场景?
image

vnet是vlan口,用于宿主机本地通信,同时这个虚拟机可以用bond0上的其他vlan,是这样吗?

@zhfish
Copy link
Author

zhfish commented Mar 23, 2024

差不多,但有差异
原计划:
管理口eth0,不设置vlan,走trunk口的默认vlanid
业务口bond0,指定vlan

后因上述问题
直接在管理口用bond0 + 指定vlan去做测试,避免多IP干扰
管理口和业务口都用同一vlan去做通信

应该说预期至少是单网口VPC网络(指定vlan) 或者 双网口VPC网络(管理口不指定vlan,业务口指定vlan)

listen_interface: bond0.3001
networks:
- bond0/br0/bcast0

和这个相反,我希望networks里的bond0可以指定vlanid

@swordqiu
Copy link
Member

差不多,但有差异 原计划: 管理口eth0,不设置vlan,走trunk口的默认vlanid 业务口bond0,指定vlan

后因上述问题 直接在管理口用bond0 + 指定vlan去做测试,避免多IP干扰 管理口和业务口都用同一vlan去做通信

应该说预期至少是单网口VPC网络(指定vlan) 或者 双网口VPC网络(管理口不指定vlan,业务口指定vlan)

listen_interface: bond0.3001
networks:
- bond0/br0/bcast0

和这个相反,我希望networks里的bond0可以指定vlanid

是这个模式吗?
image

@zhfish
Copy link
Author

zhfish commented Mar 24, 2024

理想状态(双网口 VPC网络 )
image

最小方案(单网口 VPC网络)
image

@zhfish zhfish changed the title [BUG]端口聚合状态下,host无法正确启动 [BUG]VPC网络,指定VLAN TAG做业务口,host无法正确启动 Mar 24, 2024
@swordqiu
Copy link
Member

第一个配置:

需要在平台添加一个包含eth0 IP的IP子网,不需要设置bond0的IP

listen_interface: eth0
networks:
- bond0/br1/bcast1

第二个配置:

需要在平台添加一个包含bond0 IP的IP子网

networks:
- bond0/br1/<ip_of_bond0>

理想状态(双网口 VPC网络 ) image

最小方案(单网口 VPC网络) image

@zhfish
Copy link
Author

zhfish commented Mar 26, 2024

第一个配置:

需要在平台添加一个包含eth0 IP的IP子网,不需要设置bond0的IP

listen_interface: eth0
networks:
- bond0/br1/bcast1

第二个配置:

需要在平台添加一个包含bond0 IP的IP子网

networks:
- bond0/br1/<ip_of_bond0>

理想状态(双网口 VPC网络 ) image
最小方案(单网口 VPC网络) image

第一个配置,vpc虚拟机会走bond0?

第二个配置就不太对了。。bond0本身是没ip的,给设置了ip也用不了,因为需要指定vlan tag,但桥接vlan子接口又会报错。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants