From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W1cOP-0007jm-HC for qemu-devel@nongnu.org; Fri, 10 Jan 2014 08:45:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W1cOI-0001Ii-6H for qemu-devel@nongnu.org; Fri, 10 Jan 2014 08:44:57 -0500 Received: from mail-pd0-f182.google.com ([209.85.192.182]:55550) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W1cOH-0001Ib-Tc for qemu-devel@nongnu.org; Fri, 10 Jan 2014 08:44:50 -0500 Received: by mail-pd0-f182.google.com with SMTP id v10so4584597pde.13 for ; Fri, 10 Jan 2014 05:44:49 -0800 (PST) Message-ID: <52CFF94B.9000301@ozlabs.ru> Date: Sat, 11 Jan 2014 00:44:43 +1100 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <20131222105640.GA31586@redhat.com> <52B6FB41.4050806@ozlabs.ru> <52B6FEB9.8010407@ozlabs.ru> <20131223162426.GA1491@redhat.com> <52B8FAD3.4010606@ozlabs.ru> <20131224094047.GB2848@redhat.com> <52B99701.2050709@ozlabs.ru> <20131224154331.GA14229@redhat.com> <52CBFE98.8080201@ozlabs.ru> <52CF817E.7070905@ozlabs.ru> <20140110124148.GE10700@redhat.com> In-Reply-To: <20140110124148.GE10700@redhat.com> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: "qemu-devel@nongnu.org" On 01/10/2014 11:41 PM, Michael S. Tsirkin wrote: > On Fri, Jan 10, 2014 at 04:13:34PM +1100, Alexey Kardashevskiy wrote: >> On 01/08/2014 12:18 AM, Alexey Kardashevskiy wrote: >>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote: >>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote: >>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote: >>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote: >>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote: >>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote: >>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote: >>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote: >>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote: >>>>>>>>>>>> Hi! >>>>>>>>>>>> >>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does >>>>>>>>>>>> not survive reboot of the guest. >>>>>>>>>>>> >>>>>>>>>>>> Steps to reproduce: >>>>>>>>>>>> 1. boot the guest >>>>>>>>>>>> 2. configure eth0 and do ping - everything works >>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot") >>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all. >>>>>>>>>>>> >>>>>>>>>>>> The test is: >>>>>>>>>>>> ifconfig eth0 172.20.1.2 up >>>>>>>>>>>> ping 172.20.1.23 >>>>>>>>>>>> >>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic >>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot, >>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the >>>>>>>>>>>> response and it does the same after reboot but the answer does not come. >>>>>>>>>>> >>>>>>>>>>> So you see the arp packet in guest but not in host? >>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is >>>>>>>>>>> enabled - then you might see some errors in the kernel log. >>>>>>>>>> >>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is >>>>>>>>>> happening there. >>>>>>>>>> >>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up >>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will >>>>>>>>>> not work at all. I.e. this script produces not-working-eth0: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ifconfig eth0 172.20.1.2 down >>>>>>>>>> sleep 210 >>>>>>>>>> ifconfig eth0 172.20.1.2 up >>>>>>>>>> ping 172.20.1.23 >>>>>>>>>> >>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce. >>>>>>>>>> >>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's >>>>>>>>>> thread which may get suspended if not used for a while after the start and >>>>>>>>>> does not wake up but this is almost a blind guess. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest >>>>>>>>> reboot but does not help with the initial 210 seconds delay: >>>>>>>>> >>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >>>>>>>>> index 69068e0..5e67650 100644 >>>>>>>>> --- a/drivers/vhost/vhost.c >>>>>>>>> +++ b/drivers/vhost/vhost.c >>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct >>>>>>>>> vhost_work *work) >>>>>>>>> list_add_tail(&work->node, &dev->work_list); >>>>>>>>> work->queue_seq++; >>>>>>>>> spin_unlock_irqrestore(&dev->work_lock, flags); >>>>>>>>> - wake_up_process(dev->worker); >>>>>>>>> } else { >>>>>>>>> spin_unlock_irqrestore(&dev->work_lock, flags); >>>>>>>>> } >>>>>>>>> + wake_up_process(dev->worker); >>>>>>>>> } >>>>>>>>> EXPORT_SYMBOL_GPL(vhost_work_queue); >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere? >>>>>>> >>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing >>>>>>> happens to cause races. >>>>>>> >>>>>>> >>>>>>>> Since it's all around startup, >>>>>>>> you can try kicking the host eventfd in >>>>>>>> vhost_net_start. >>>>>>> >>>>>>> >>>>>>> How exactly? This did not help. Thanks. >>>>>>> >>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c >>>>>>> index 006576d..407ecf2 100644 >>>>>>> --- a/hw/net/vhost_net.c >>>>>>> +++ b/hw/net/vhost_net.c >>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState >>>>>>> *ncs, >>>>>>> if (r < 0) { >>>>>>> goto err; >>>>>>> } >>>>>>> + >>>>>>> + VHostNetState *vn = tap_get_vhost_net(ncs[i].peer); >>>>>>> + struct vhost_vring_file file = { >>>>>>> + .index = i >>>>>>> + }; >>>>>>> + file.fd = >>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq)); >>>>>>> + r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file); >>>>>> >>>>>> No, this sets the notifier, it does not kick. >>>>>> To kick you write 1 there: >>>>>> uint6_t v = 1; >>>>>> write(fd, &v, sizeof v); >>>>> >>>>> >>>>> Please, be precise. How/where do I get that @fd? Is what I do correct? >>>> >>>> Yes. >>> >>> Turns out that no. The control device in the host kernel does not implement >>> write() so it always fails. >>> >>> This works: >>> >>> uint64_t v = 1; >>> int fd = event_notifier_get_fd(&vq->host_notifier); >>> int r = write(fd, &v, sizeof v); >>> >>> By "works" I mean it helps to wake the whole thing up and the guest's eth0 >>> starts working after 3 minutes delay. >> >> >> >> Checked if virtnet_napi_enable() is called as expected and it is. As I can >> see "Receiving skb proto" in the guest's receive_buf(), I believe >> host->guest channel works just fine but the guest is unable to send >> anything until QEMU writes to event notifier (the code above). >> >> I actually spotted the problem in the host kernel - KVM_IOEVENTFD is called >> with a PCI bus address but kvm_io_bus_write() is called with a guest >> physical address and these things are different on PPC64/spapr. >> >> I am trying to make a patch for this and post it to some list tonight, I'll >> put you in copy. >> > > Can we fix this in qemu? > > We do: > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > true, n, notifier); > > I think as a result, KVM_IOEVENTFD should be called with guest physical address. I fixed this in "[PATCH] KVM: fix addr type for KVM_IOEVENTFD", you are in cc. Heh. I suspected something ppc64 specific as the problem does not appear on x86, and I posted another patch for PPC64 HV KVM, but the QEMU's bug is still nice :) -- Alexey