QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [question]vhost-user: atuo fix network link broken during migration
@ 2020-03-23  8:17 yangke (J)
  2020-03-24  5:49 ` Jason Wang
  0 siblings, 1 reply; 4+ messages in thread
From: yangke (J) @ 2020-03-23  8:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, wangxin (U), quintela

We find an issue when host mce trigger openvswitch(dpdk) restart in source host during guest migration, 
VM is still link down in frontend after migration, it cause the network in VM never be up again.

virtio_net_load_device:
    /* nc.link_down can't be migrated, so infer link_down according
     * to link status bit in n->status */
    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
    for (i = 0; i < n->max_queues; i++) {
        qemu_get_subqueue(n->nic, i)->link_down = link_down;
    }

guset:                    migrate begin -------> vCPU pause --------> vmsate load ------->migrate finish
                                            ^                  ^                     ^
                                            |                  |                     |
openvswitch in source host:          begin to restart       restarting             started
                                            ^                  ^                     ^
                                            |                  |                     |
nc in frontend in source:               link down           link down             link down
                                            ^                  ^                     ^
                                            |                  |                     |
nc in frontend in destination:          link up             link up               link down
                                            ^                  ^                     ^
                                            |                  |                     |
guset network:                            broken             broken                broken
                                            ^                  ^                     ^
                                            |                  |                     |
nc in backend in source:                link down          link down              link up
                                            ^                  ^                     ^
                                            |                  |                     |
nc in backend in destination:           link up             link up               link up

The link_down of frontend was loaded from n->status, n->status is link down in source, so the link_down of 
frontend is true. The backend in destination host is link up, but the frontend in destination host is 
link down, it cause the network in gust never be up again until an guest cold reboot.

Is there a way to auto fix the link status? or just abort the migration in virtio net device load?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [question]vhost-user: atuo fix network link broken during migration
  2020-03-23  8:17 [question]vhost-user: atuo fix network link broken during migration yangke (J)
@ 2020-03-24  5:49 ` Jason Wang
  2020-03-24 11:08   ` 答复: " yangke (J)
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Wang @ 2020-03-24  5:49 UTC (permalink / raw)
  To: yangke (J), qemu-devel; +Cc: wangxin (U), quintela


On 2020/3/23 下午4:17, yangke (J) wrote:
> We find an issue when host mce trigger openvswitch(dpdk) restart in source host during guest migration,


Did you mean the vhost-user netev was deleted from the source host?


> VM is still link down in frontend after migration, it cause the network in VM never be up again.
>
> virtio_net_load_device:
>      /* nc.link_down can't be migrated, so infer link_down according
>       * to link status bit in n->status */
>      link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
>      for (i = 0; i < n->max_queues; i++) {
>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>      }
>
> guset:                    migrate begin -------> vCPU pause --------> vmsate load ------->migrate finish
>                                              ^                  ^                     ^
>                                              |                  |                     |
> openvswitch in source host:          begin to restart       restarting             started
>                                              ^                  ^                     ^
>                                              |                  |                     |
> nc in frontend in source:               link down           link down             link down
>                                              ^                  ^                     ^
>                                              |                  |                     |
> nc in frontend in destination:          link up             link up               link down
>                                              ^                  ^                     ^
>                                              |                  |                     |
> guset network:                            broken             broken                broken
>                                              ^                  ^                     ^
>                                              |                  |                     |
> nc in backend in source:                link down          link down              link up
>                                              ^                  ^                     ^
>                                              |                  |                     |
> nc in backend in destination:           link up             link up               link up
>
> The link_down of frontend was loaded from n->status, n->status is link down in source, so the link_down of
> frontend is true. The backend in destination host is link up, but the frontend in destination host is
> link down, it cause the network in gust never be up again until an guest cold reboot.
>
> Is there a way to auto fix the link status? or just abort the migration in virtio net device load?


Maybe we can try to sync link status after migration?

Thanks




^ permalink raw reply	[flat|nested] 4+ messages in thread

* 答复: [question]vhost-user: atuo fix network link broken during migration
  2020-03-24  5:49 ` Jason Wang
@ 2020-03-24 11:08   ` yangke (J)
  2020-03-26  9:45     ` Jason Wang
  0 siblings, 1 reply; 4+ messages in thread
From: yangke (J) @ 2020-03-24 11:08 UTC (permalink / raw)
  To: Jason Wang, qemu-devel; +Cc: wangxin (U), quintela

> > We find an issue when host mce trigger openvswitch(dpdk) restart in 
> > source host during guest migration,
>
>
> Did you mean the vhost-user netev was deleted from the source host?


The vhost-user netev was not deleted from the source host. I mean that:
in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect to OVS and link status is set to link up. But in our scenario, before qemu_chr reconnect to OVS, the VM migrate is finished. The link_down of frontend was loaded from n->status in destination, it cause the network in gust never be up again.

qemu_chr disconnect:
#0  vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
#1  0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, ring=0x7fff59ecb510)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
#2  0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
#3  0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
#4  0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, dev=dev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
#5  0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, ncs=<optimized out>,	total_queues=4)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
#6  0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0,	status=status@entry=7 '\a')
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
#7  0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, status=<optimized out>)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
#8  0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
    at net/net.c:1437
#9  0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at net/vhost_user.c:217//qemu_chr_be_event
#10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
#11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>,	cond=<optimized out>, opaque=<optimized out>) at qemu_char.c:3265


>
>
> > VM is still link down in frontend after migration, it cause the network in VM never be up again.
> >
> > virtio_net_load_device:
> >      /* nc.link_down can't be migrated, so infer link_down according
> >       * to link status bit in n->status */
> >      link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> >      for (i = 0; i < n->max_queues; i++) {
> >          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >      }
> >
> > guset:               migrate begin -----> vCPU pause ---> vmsate load ---> migrate finish
> >                                      ^                ^                ^
> >                                      |                |                |
> > openvswitch in source host:   begin to restart   restarting        started
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in source:        link down        link down        link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in destination:   link up          link up          link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > guset network:                    broken           broken           broken
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in source:         link down        link down        link up
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in destination:    link up          link up          link up
> >
> > The link_down of frontend was loaded from n->status, n->status is link 
> > down in source, so the link_down of frontend is true. The backend in 
> > destination host is link up, but the frontend in destination host is link down, it cause the network in gust never be up again until an guest cold reboot.
> >
> > Is there a way to auto fix the link status? or just abort the migration in virtio net device load?
>
>
> Maybe we can try to sync link status after migration?
>
> Thanks


In extreme scenario, after migration the OVS(DPDK) in source may be still not started.


Our plan is to check the link state of backend when load the link_down of frontend.
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in n->status */
-    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    if (qemu_get_queue(n->nic)->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP | !qemu_get_queue(n->nic)->peer->link_down) == 0;
+    } else {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    }
     for (i = 0; i < n->max_queues; i++) {
         qemu_get_subqueue(n->nic, i)->link_down = link_down;
     }

Is good enough to auto fix the link status?

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 答复: [question]vhost-user: atuo fix network link broken during migration
  2020-03-24 11:08   ` 答复: " yangke (J)
@ 2020-03-26  9:45     ` Jason Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Wang @ 2020-03-26  9:45 UTC (permalink / raw)
  To: yangke (J), qemu-devel
  Cc: Marc-André Lureau, wangxin (U), Maxime Coquelin, quintela


On 2020/3/24 下午7:08, yangke (J) wrote:
>>> We find an issue when host mce trigger openvswitch(dpdk) restart in
>>> source host during guest migration,
>>
>> Did you mean the vhost-user netev was deleted from the source host?
>
> The vhost-user netev was not deleted from the source host. I mean that:
> in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect to OVS and link status is set to link up. But in our scenario, before qemu_chr reconnect to OVS, the VM migrate is finished. The link_down of frontend was loaded from n->status in destination, it cause the network in gust never be up again.


I'm not sure we should fix this in qemu.

Generally, it's the task of management to make sure the destination 
device configuration is the same as source.

E.g in this case, management should bring up the link if re-connection 
in source is completed.

What's more the qmp_set_link() done in vhost-user.c looks hacky which 
changes the link status without the care of management.


>
> qemu_chr disconnect:
> #0  vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
> #1  0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, ring=0x7fff59ecb510)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
> #2  0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
> #3  0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
> #4  0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, dev=dev@entry=0x2ca36c0)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
> #5  0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, ncs=<optimized out>,	total_queues=4)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
> #6  0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0,	status=status@entry=7 '\a')
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
> #7  0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, status=<optimized out>)
>      at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
> #8  0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
>      at net/net.c:1437
> #9  0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at net/vhost_user.c:217//qemu_chr_be_event
> #10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
> #11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>,	cond=<optimized out>, opaque=<optimized out>) at qemu_char.c:3265
>
>
>>
>>> VM is still link down in frontend after migration, it cause the network in VM never be up again.
>>>
>>> virtio_net_load_device:
>>>       /* nc.link_down can't be migrated, so infer link_down according
>>>        * to link status bit in n->status */
>>>       link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
>>>       for (i = 0; i < n->max_queues; i++) {
>>>           qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>       }
>>>
>>> guset:               migrate begin -----> vCPU pause ---> vmsate load ---> migrate finish
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> openvswitch in source host:   begin to restart   restarting        started
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> nc in frontend in source:        link down        link down        link down
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> nc in frontend in destination:   link up          link up          link down
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> guset network:                    broken           broken           broken
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> nc in backend in source:         link down        link down        link up
>>>                                       ^                ^                ^
>>>                                       |                |                |
>>> nc in backend in destination:    link up          link up          link up
>>>
>>> The link_down of frontend was loaded from n->status, n->status is link
>>> down in source, so the link_down of frontend is true. The backend in
>>> destination host is link up, but the frontend in destination host is link down, it cause the network in gust never be up again until an guest cold reboot.
>>>
>>> Is there a way to auto fix the link status? or just abort the migration in virtio net device load?
>>
>> Maybe we can try to sync link status after migration?
>>
>> Thanks
>
> In extreme scenario, after migration the OVS(DPDK) in source may be still not started.
>
>
> Our plan is to check the link state of backend when load the link_down of frontend.
>       /* nc.link_down can't be migrated, so infer link_down according
>        * to link status bit in n->status */
> -    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> +    if (qemu_get_queue(n->nic)->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
> +        link_down = (n->status & VIRTIO_NET_S_LINK_UP | !qemu_get_queue(n->nic)->peer->link_down) == 0;
> +    } else {
> +        link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> +    }
>       for (i = 0; i < n->max_queues; i++) {
>           qemu_get_subqueue(n->nic, i)->link_down = link_down;
>       }
>
> Is good enough to auto fix the link status?


I still think it's the task of management. Try sync status internally as 
what vhost-user currently did may lead bugs.

Thanks


>
> Thanks



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23  8:17 [question]vhost-user: atuo fix network link broken during migration yangke (J)
2020-03-24  5:49 ` Jason Wang
2020-03-24 11:08   ` 答复: " yangke (J)
2020-03-26  9:45     ` Jason Wang

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git