From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54586) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XP5En-0003s9-65 for qemu-devel@nongnu.org; Wed, 03 Sep 2014 03:44:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XP5El-0001G7-R1 for qemu-devel@nongnu.org; Wed, 03 Sep 2014 03:44:17 -0400 Received: from mail-ob0-x235.google.com ([2607:f8b0:4003:c01::235]:44184) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XP5El-0001Ft-GS for qemu-devel@nongnu.org; Wed, 03 Sep 2014 03:44:15 -0400 Received: by mail-ob0-f181.google.com with SMTP id vb8so5767568obc.40 for ; Wed, 03 Sep 2014 00:44:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140903061015.GA5449@redhat.com> References: <1409160982-16389-1-git-send-email-mdroth@linux.vnet.ibm.com> <20140902152050.32021.68140@loki> <20140902152546.GA23254@redhat.com> <20140902152736.GA23266@redhat.com> <20140902210315.GA25153@redhat.com> <20140902215125.GC25231@redhat.com> <20140903061015.GA5449@redhat.com> From: Andrey Korolyov Date: Wed, 3 Sep 2014 11:43:54 +0400 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [Qemu-stable] Patch Round-up for stable 2.1.1, freeze on 2014-09-03 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: ehabkost@redhat.com, "qemu-devel@nongnu.org" , Stefan Hajnoczi , knut.omang@oracle.com, qemu-stable@nongnu.org, Michael Roth , Michael Tokarev , Gerd Hoffmann , "J. Kiszka" , chen.fan.fnst@cn.fujitsu.com, Paolo Bonzini , sebastian.tanase@openwide.fr, zhang.zhanghailiang@huawei.com On Wed, Sep 3, 2014 at 10:10 AM, Michael S. Tsirkin wrote: > On Wed, Sep 03, 2014 at 02:17:02AM +0400, Andrey Korolyov wrote: >> On Wed, Sep 3, 2014 at 2:09 AM, Andrey Korolyov wrote: >> > On Wed, Sep 3, 2014 at 1:51 AM, Michael S. Tsirkin wrote: >> >> On Wed, Sep 03, 2014 at 01:29:29AM +0400, Andrey Korolyov wrote: >> >>> On Wed, Sep 3, 2014 at 1:03 AM, Michael S. Tsirkin wrote: >> >>> >> bad one is the >> >>> >> >> >>> >> Author: Jason Wang >> >>> >> Date: Tue Sep 2 18:07:46 2014 +0300 >> >>> >> >> >>> >> vhost_net: start/stop guest notifiers properly >> >>> > >> >>> > >> >>> > >> >>> > upstream has this (pull request sent today): >> >>> > vhost_net: cleanup start/stop condition >> >>> > >> >>> > Could you apply it and see if it helps please? >> >>> > >> >>> > Michael, if it helps it should be before start/stop guest notifiers >> >>> > ideally to avoid bisect problems. >> >>> >> >>> It is already applied as shown from the list in the previous message >> >>> (there are some aio fixes too on top of 2.1 I picked before but they >> >>> should not impact vhost-net interaction in any mean). The symptoms are >> >>> a bit interesting - VM crashes only at PCI device initalization (e.g. >> >>> grub stage after reset and initrd unpacking are passing well, but then >> >>> things getting ugly). I am running 3.14 guest i686-pae kernel from >> >>> debian backports in guest, so it may be version-specific after all. If >> >>> it`ll be hard to reproduce, I can try 64bit, expecting same behavior. >> >>> Please find args in attached file. >> >> >> >> >> >> >> >> ok just to make sure - which tree do I clone exactly? >> >> >> > >> > https://github.com/mdroth/qemu.git stable-2.1-staging showing same >> > behavior for me with those patches >> >> Forgot to mention important detail - I am playing with -mq now, so >> actually virtio-net working in a bit different way than it may >> expected (it also shown in args list from above, but someone may miss >> it): >> ... >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> userspace virtio >> qemu-system-x86_64: unable to start vhost net: 95: falling back on >> userspace virtio >> ... > > > OK I see at least one obvious bug there: does the following fix the > crash for you? > Separately, we need to debug why mq vhost is broken for you. > Is this a regression? > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index ba5d544..1fe18c7 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -289,7 +289,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); > VirtioBusState *vbus = VIRTIO_BUS(qbus); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); > - int r, i = 0; > + int r, i; > > if (!vhost_net_device_endian_ok(dev)) { > error_report("vhost-net does not support cross-endian"); > @@ -317,16 +317,22 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, > r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev); > > if (r < 0) { > - goto err; > + goto err_start; > } > } > > return 0; > > -err: > +err_start: > while (--i >= 0) { > vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev); > } > +err: > + r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false); > + if (r < 0) { > + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r); > + fflush(stderr); > + } > return r; > } > another bits of information: - the userspace fallback is not specific to mq (very unfortunately for me because I didn`t checked this exact regression week before when I saw it for mq and it is not specific for queued patches for 2.1.1), - bug itself is not specific to mq, reproduces every time even with more generic interface config without queues, - patch from above does not fix the issue. Strace output for all threads is available at http://xdel.ru/downloads/qemu.out.gz, attached just before reset.