qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "yangke (J)" <yangke27@huawei.com>
To: Jason Wang <jasowang@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: "wangxin \(U\)" <wangxinxin.wang@huawei.com>,
	"quintela@redhat.com" <quintela@redhat.com>
Subject: 答复: [question]vhost-user: atuo fix network link broken during migration
Date: Tue, 24 Mar 2020 11:08:47 +0000	[thread overview]
Message-ID: <0CC1E03725E48D478F815032182740230A42C15B@DGGEMM532-MBS.china.huawei.com> (raw)
In-Reply-To: <47abadbd-c559-1900-f3b1-3697f9e7c0b5@redhat.com>

> > We find an issue when host mce trigger openvswitch(dpdk) restart in 
> > source host during guest migration,
>
>
> Did you mean the vhost-user netev was deleted from the source host?


The vhost-user netev was not deleted from the source host. I mean that:
in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect to OVS and link status is set to link up. But in our scenario, before qemu_chr reconnect to OVS, the VM migrate is finished. The link_down of frontend was loaded from n->status in destination, it cause the network in gust never be up again.

qemu_chr disconnect:
#0  vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
#1  0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, ring=0x7fff59ecb510)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
#2  0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
#3  0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, vdev=vdev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
#4  0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, dev=dev@entry=0x2ca36c0)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
#5  0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, ncs=<optimized out>,	total_queues=4)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
#6  0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0,	status=status@entry=7 '\a')
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
#7  0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, status=<optimized out>)
    at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
#8  0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
    at net/net.c:1437
#9  0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at net/vhost_user.c:217//qemu_chr_be_event
#10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
#11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>,	cond=<optimized out>, opaque=<optimized out>) at qemu_char.c:3265


>
>
> > VM is still link down in frontend after migration, it cause the network in VM never be up again.
> >
> > virtio_net_load_device:
> >      /* nc.link_down can't be migrated, so infer link_down according
> >       * to link status bit in n->status */
> >      link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> >      for (i = 0; i < n->max_queues; i++) {
> >          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >      }
> >
> > guset:               migrate begin -----> vCPU pause ---> vmsate load ---> migrate finish
> >                                      ^                ^                ^
> >                                      |                |                |
> > openvswitch in source host:   begin to restart   restarting        started
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in source:        link down        link down        link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in frontend in destination:   link up          link up          link down
> >                                      ^                ^                ^
> >                                      |                |                |
> > guset network:                    broken           broken           broken
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in source:         link down        link down        link up
> >                                      ^                ^                ^
> >                                      |                |                |
> > nc in backend in destination:    link up          link up          link up
> >
> > The link_down of frontend was loaded from n->status, n->status is link 
> > down in source, so the link_down of frontend is true. The backend in 
> > destination host is link up, but the frontend in destination host is link down, it cause the network in gust never be up again until an guest cold reboot.
> >
> > Is there a way to auto fix the link status? or just abort the migration in virtio net device load?
>
>
> Maybe we can try to sync link status after migration?
>
> Thanks


In extreme scenario, after migration the OVS(DPDK) in source may be still not started.


Our plan is to check the link state of backend when load the link_down of frontend.
     /* nc.link_down can't be migrated, so infer link_down according
      * to link status bit in n->status */
-    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    if (qemu_get_queue(n->nic)->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP | !qemu_get_queue(n->nic)->peer->link_down) == 0;
+    } else {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    }
     for (i = 0; i < n->max_queues; i++) {
         qemu_get_subqueue(n->nic, i)->link_down = link_down;
     }

Is good enough to auto fix the link status?

Thanks

  reply	other threads:[~2020-03-24 11:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-23  8:17 [question]vhost-user: atuo fix network link broken during migration yangke (J)
2020-03-24  5:49 ` Jason Wang
2020-03-24 11:08   ` yangke (J) [this message]
2020-03-26  9:45     ` 答复: " Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0CC1E03725E48D478F815032182740230A42C15B@DGGEMM532-MBS.china.huawei.com \
    --to=yangke27@huawei.com \
    --cc=jasowang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).