All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
@ 2013-12-21 15:01 Alexey Kardashevskiy
  2013-12-22 10:56 ` Michael S. Tsirkin
  2013-12-22 11:41 ` Zhi Yong Wu
  0 siblings, 2 replies; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-21 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael S. Tsirkin

Hi!

I am having a problem with virtio-net + vhost on POWER7 machine - it does
not survive reboot of the guest.

Steps to reproduce:
1. boot the guest
2. configure eth0 and do ping - everything works
3. reboot the guest (i.e. type "reboot")
4. when it is booted, eth0 can be configured but will not work at all.

The test is:
ifconfig eth0 172.20.1.2 up
ping 172.20.1.23

If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
coming from the guest. If to compare how it works before and after reboot,
I can see the guest doing an ARP request for 172.20.1.23 and receives the
response and it does the same after reboot but the answer does not come.

If to remove vhost=on, it is all good. If to try Fedora19
(v3.10-something), it all good again - works before and after reboot.


And there 2 questions:

1. does anybody have any clue what might go wrong after reboot?

2. Is there any good material to read about what exactly and how vhost
accelerates?

My understanding is that packets from the guest to the real network are
going as:
1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
that there is a new packet.


Thanks!


This how I run QEMU:
./qemu-system-ppc64 \
-enable-kvm \
-m 2048 \
-machine pseries \
-initrd 1.cpio \
-kernel vml312_virtio_net_dbg \
-nographic \
-vga none \
-netdev
tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
-device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00


That is bridge config:
[aik@dyn232 ~]$ brctl show
bridge name	bridge id		STP enabled	interfaces
brtest		8000.00145e992e88	no	pin	eth4


The ifup.sh script:
ifconfig $1 hw ether ee:01:02:03:04:05
/sbin/ifconfig $1 up
/usr/sbin/brctl addif brtest $1




-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-21 15:01 [Qemu-devel] vhost-net issue: does not survive reboot on ppc64 Alexey Kardashevskiy
@ 2013-12-22 10:56 ` Michael S. Tsirkin
  2013-12-22 14:46   ` Alexey Kardashevskiy
  2013-12-22 11:41 ` Zhi Yong Wu
  1 sibling, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-22 10:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> Hi!
> 
> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> not survive reboot of the guest.
> 
> Steps to reproduce:
> 1. boot the guest
> 2. configure eth0 and do ping - everything works
> 3. reboot the guest (i.e. type "reboot")
> 4. when it is booted, eth0 can be configured but will not work at all.
> 
> The test is:
> ifconfig eth0 172.20.1.2 up
> ping 172.20.1.23
> 
> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> coming from the guest. If to compare how it works before and after reboot,
> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> response and it does the same after reboot but the answer does not come.

So you see the arp packet in guest but not in host?

One thing to try is to boot debug kernel - where pr_debug is
enabled - then you might see some errors in the kernel log.

> If to remove vhost=on, it is all good. If to try Fedora19
> (v3.10-something), it all good again - works before and after reboot.
> 
> 
> And there 2 questions:
> 
> 1. does anybody have any clue what might go wrong after reboot?
> 
> 2. Is there any good material to read about what exactly and how vhost
> accelerates?
> 
> My understanding is that packets from the guest to the real network are
> going as:
> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> that there is a new packet.
> 
> 
> Thanks!
> 
> 
> This how I run QEMU:
> ./qemu-system-ppc64 \
> -enable-kvm \
> -m 2048 \
> -machine pseries \
> -initrd 1.cpio \
> -kernel vml312_virtio_net_dbg \
> -nographic \
> -vga none \
> -netdev
> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> 
> 
> That is bridge config:
> [aik@dyn232 ~]$ brctl show
> bridge name	bridge id		STP enabled	interfaces
> brtest		8000.00145e992e88	no	pin	eth4
> 
> 
> The ifup.sh script:
> ifconfig $1 hw ether ee:01:02:03:04:05
> /sbin/ifconfig $1 up
> /usr/sbin/brctl addif brtest $1
> 
> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-21 15:01 [Qemu-devel] vhost-net issue: does not survive reboot on ppc64 Alexey Kardashevskiy
  2013-12-22 10:56 ` Michael S. Tsirkin
@ 2013-12-22 11:41 ` Zhi Yong Wu
  2013-12-22 14:48   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 25+ messages in thread
From: Zhi Yong Wu @ 2013-12-22 11:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: QEMU Developers

On Sat, Dec 21, 2013 at 11:01 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> Hi!
HI, Alexey

>
> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> not survive reboot of the guest.
Can you let me login to your environment for debug? I am interested in
trying to fix this issue.

>
> Steps to reproduce:
> 1. boot the guest
> 2. configure eth0 and do ping - everything works
> 3. reboot the guest (i.e. type "reboot")
> 4. when it is booted, eth0 can be configured but will not work at all.
>
> The test is:
> ifconfig eth0 172.20.1.2 up
> ping 172.20.1.23
>
> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> coming from the guest. If to compare how it works before and after reboot,
> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> response and it does the same after reboot but the answer does not come.
>
> If to remove vhost=on, it is all good. If to try Fedora19
> (v3.10-something), it all good again - works before and after reboot.
>
>
> And there 2 questions:
>
> 1. does anybody have any clue what might go wrong after reboot?
>
> 2. Is there any good material to read about what exactly and how vhost
> accelerates?
>
> My understanding is that packets from the guest to the real network are
> going as:
> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> that there is a new packet.
>
>
> Thanks!
>
>
> This how I run QEMU:
> ./qemu-system-ppc64 \
> -enable-kvm \
> -m 2048 \
> -machine pseries \
> -initrd 1.cpio \
> -kernel vml312_virtio_net_dbg \
> -nographic \
> -vga none \
> -netdev
> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>
>
> That is bridge config:
> [aik@dyn232 ~]$ brctl show
> bridge name     bridge id               STP enabled     interfaces
> brtest          8000.00145e992e88       no      pin     eth4
>
>
> The ifup.sh script:
> ifconfig $1 hw ether ee:01:02:03:04:05
> /sbin/ifconfig $1 up
> /usr/sbin/brctl addif brtest $1
>
>
>
>
> --
> Alexey
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-22 10:56 ` Michael S. Tsirkin
@ 2013-12-22 14:46   ` Alexey Kardashevskiy
  2013-12-22 15:01     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-22 14:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>> not survive reboot of the guest.
>>
>> Steps to reproduce:
>> 1. boot the guest
>> 2. configure eth0 and do ping - everything works
>> 3. reboot the guest (i.e. type "reboot")
>> 4. when it is booted, eth0 can be configured but will not work at all.
>>
>> The test is:
>> ifconfig eth0 172.20.1.2 up
>> ping 172.20.1.23
>>
>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>> coming from the guest. If to compare how it works before and after reboot,
>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>> response and it does the same after reboot but the answer does not come.
> 
> So you see the arp packet in guest but not in host?

Yes.


> One thing to try is to boot debug kernel - where pr_debug is
> enabled - then you might see some errors in the kernel log.

Tried and added lot more debug printk myself, not clear at all what is
happening there.

One more hint - if I boot the guest and the guest does not bring eth0 up
AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
not work at all. I.e. this script produces not-working-eth0:


ifconfig eth0 172.20.1.2 down
sleep 210
ifconfig eth0 172.20.1.2 up
ping 172.20.1.23

s/210/200/ - and it starts working. No reboot is required to reproduce.

No "vhost" == always works. The only difference I can see here is vhost's
thread which may get suspended if not used for a while after the start and
does not wake up but this is almost a blind guess.



>> If to remove vhost=on, it is all good. If to try Fedora19
>> (v3.10-something), it all good again - works before and after reboot.
>>
>>
>> And there 2 questions:
>>
>> 1. does anybody have any clue what might go wrong after reboot?
>>
>> 2. Is there any good material to read about what exactly and how vhost
>> accelerates?
>>
>> My understanding is that packets from the guest to the real network are
>> going as:
>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>> that there is a new packet.


What about the documentation? :) or the idea?


>>
>>
>> Thanks!
>>
>>
>> This how I run QEMU:
>> ./qemu-system-ppc64 \
>> -enable-kvm \
>> -m 2048 \
>> -machine pseries \
>> -initrd 1.cpio \
>> -kernel vml312_virtio_net_dbg \
>> -nographic \
>> -vga none \
>> -netdev
>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>
>>
>> That is bridge config:
>> [aik@dyn232 ~]$ brctl show
>> bridge name	bridge id		STP enabled	interfaces
>> brtest		8000.00145e992e88	no	pin	eth4
>>
>>
>> The ifup.sh script:
>> ifconfig $1 hw ether ee:01:02:03:04:05
>> /sbin/ifconfig $1 up
>> /usr/sbin/brctl addif brtest $1


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-22 11:41 ` Zhi Yong Wu
@ 2013-12-22 14:48   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-22 14:48 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: QEMU Developers

On 12/22/2013 10:41 PM, Zhi Yong Wu wrote:
> On Sat, Dec 21, 2013 at 11:01 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> Hi!
> HI, Alexey
> 
>>
>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>> not survive reboot of the guest.
> Can you let me login to your environment for debug? I am interested in
> trying to fix this issue.


You do not need my environment, just make sure your guest does not bring
the virtio's ethernet up, wait for 4 minutes and try to bring it up and use
(ping, for example). Any POWER7, POWER8 machine should be able to reproduce it.



> 
>>
>> Steps to reproduce:
>> 1. boot the guest
>> 2. configure eth0 and do ping - everything works
>> 3. reboot the guest (i.e. type "reboot")
>> 4. when it is booted, eth0 can be configured but will not work at all.
>>
>> The test is:
>> ifconfig eth0 172.20.1.2 up
>> ping 172.20.1.23
>>
>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>> coming from the guest. If to compare how it works before and after reboot,
>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>> response and it does the same after reboot but the answer does not come.
>>
>> If to remove vhost=on, it is all good. If to try Fedora19
>> (v3.10-something), it all good again - works before and after reboot.
>>
>>
>> And there 2 questions:
>>
>> 1. does anybody have any clue what might go wrong after reboot?
>>
>> 2. Is there any good material to read about what exactly and how vhost
>> accelerates?
>>
>> My understanding is that packets from the guest to the real network are
>> going as:
>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>> that there is a new packet.
>>
>>
>> Thanks!
>>
>>
>> This how I run QEMU:
>> ./qemu-system-ppc64 \
>> -enable-kvm \
>> -m 2048 \
>> -machine pseries \
>> -initrd 1.cpio \
>> -kernel vml312_virtio_net_dbg \
>> -nographic \
>> -vga none \
>> -netdev
>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>
>>
>> That is bridge config:
>> [aik@dyn232 ~]$ brctl show
>> bridge name     bridge id               STP enabled     interfaces
>> brtest          8000.00145e992e88       no      pin     eth4
>>
>>
>> The ifup.sh script:
>> ifconfig $1 hw ether ee:01:02:03:04:05
>> /sbin/ifconfig $1 up
>> /usr/sbin/brctl addif brtest $1


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-22 14:46   ` Alexey Kardashevskiy
@ 2013-12-22 15:01     ` Alexey Kardashevskiy
  2013-12-23 16:24       ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-22 15:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>> Hi!
>>>
>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>> not survive reboot of the guest.
>>>
>>> Steps to reproduce:
>>> 1. boot the guest
>>> 2. configure eth0 and do ping - everything works
>>> 3. reboot the guest (i.e. type "reboot")
>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>
>>> The test is:
>>> ifconfig eth0 172.20.1.2 up
>>> ping 172.20.1.23
>>>
>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>> coming from the guest. If to compare how it works before and after reboot,
>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>> response and it does the same after reboot but the answer does not come.
>>
>> So you see the arp packet in guest but not in host?
> 
> Yes.
> 
> 
>> One thing to try is to boot debug kernel - where pr_debug is
>> enabled - then you might see some errors in the kernel log.
> 
> Tried and added lot more debug printk myself, not clear at all what is
> happening there.
> 
> One more hint - if I boot the guest and the guest does not bring eth0 up
> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> not work at all. I.e. this script produces not-working-eth0:
> 
> 
> ifconfig eth0 172.20.1.2 down
> sleep 210
> ifconfig eth0 172.20.1.2 up
> ping 172.20.1.23
> 
> s/210/200/ - and it starts working. No reboot is required to reproduce.
> 
> No "vhost" == always works. The only difference I can see here is vhost's
> thread which may get suspended if not used for a while after the start and
> does not wake up but this is almost a blind guess.


Yet another clue - this host kernel patch seems to help with the guest
reboot but does not help with the initial 210 seconds delay:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69068e0..5e67650 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
vhost_work *work)
                list_add_tail(&work->node, &dev->work_list);
                work->queue_seq++;
                spin_unlock_irqrestore(&dev->work_lock, flags);
-               wake_up_process(dev->worker);
        } else {
                spin_unlock_irqrestore(&dev->work_lock, flags);
        }
+       wake_up_process(dev->worker);
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);





>>> If to remove vhost=on, it is all good. If to try Fedora19
>>> (v3.10-something), it all good again - works before and after reboot.
>>>
>>>
>>> And there 2 questions:
>>>
>>> 1. does anybody have any clue what might go wrong after reboot?
>>>
>>> 2. Is there any good material to read about what exactly and how vhost
>>> accelerates?
>>>
>>> My understanding is that packets from the guest to the real network are
>>> going as:
>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>> that there is a new packet.
> 
> 
> What about the documentation? :) or the idea?
> 
> 
>>>
>>>
>>> Thanks!
>>>
>>>
>>> This how I run QEMU:
>>> ./qemu-system-ppc64 \
>>> -enable-kvm \
>>> -m 2048 \
>>> -machine pseries \
>>> -initrd 1.cpio \
>>> -kernel vml312_virtio_net_dbg \
>>> -nographic \
>>> -vga none \
>>> -netdev
>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>
>>>
>>> That is bridge config:
>>> [aik@dyn232 ~]$ brctl show
>>> bridge name	bridge id		STP enabled	interfaces
>>> brtest		8000.00145e992e88	no	pin	eth4
>>>
>>>
>>> The ifup.sh script:
>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>> /sbin/ifconfig $1 up
>>> /usr/sbin/brctl addif brtest $1
> 
> 


-- 
Alexey

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-22 15:01     ` Alexey Kardashevskiy
@ 2013-12-23 16:24       ` Michael S. Tsirkin
  2013-12-24  3:09         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-23 16:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> > On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>> Hi!
> >>>
> >>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>> not survive reboot of the guest.
> >>>
> >>> Steps to reproduce:
> >>> 1. boot the guest
> >>> 2. configure eth0 and do ping - everything works
> >>> 3. reboot the guest (i.e. type "reboot")
> >>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>
> >>> The test is:
> >>> ifconfig eth0 172.20.1.2 up
> >>> ping 172.20.1.23
> >>>
> >>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>> coming from the guest. If to compare how it works before and after reboot,
> >>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>> response and it does the same after reboot but the answer does not come.
> >>
> >> So you see the arp packet in guest but not in host?
> > 
> > Yes.
> > 
> > 
> >> One thing to try is to boot debug kernel - where pr_debug is
> >> enabled - then you might see some errors in the kernel log.
> > 
> > Tried and added lot more debug printk myself, not clear at all what is
> > happening there.
> > 
> > One more hint - if I boot the guest and the guest does not bring eth0 up
> > AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> > not work at all. I.e. this script produces not-working-eth0:
> > 
> > 
> > ifconfig eth0 172.20.1.2 down
> > sleep 210
> > ifconfig eth0 172.20.1.2 up
> > ping 172.20.1.23
> > 
> > s/210/200/ - and it starts working. No reboot is required to reproduce.
> > 
> > No "vhost" == always works. The only difference I can see here is vhost's
> > thread which may get suspended if not used for a while after the start and
> > does not wake up but this is almost a blind guess.
> 
> 
> Yet another clue - this host kernel patch seems to help with the guest
> reboot but does not help with the initial 210 seconds delay:
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 69068e0..5e67650 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> vhost_work *work)
>                 list_add_tail(&work->node, &dev->work_list);
>                 work->queue_seq++;
>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> -               wake_up_process(dev->worker);
>         } else {
>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>         }
> +       wake_up_process(dev->worker);
>  }
>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> 
> 

Interesting. Some kind of race? A missing memory barrier somewhere?

Since it's all around startup,
you can try kicking the host eventfd in
vhost_net_start.

> 
> 
> >>> If to remove vhost=on, it is all good. If to try Fedora19
> >>> (v3.10-something), it all good again - works before and after reboot.
> >>>
> >>>
> >>> And there 2 questions:
> >>>
> >>> 1. does anybody have any clue what might go wrong after reboot?
> >>>
> >>> 2. Is there any good material to read about what exactly and how vhost
> >>> accelerates?
> >>>
> >>> My understanding is that packets from the guest to the real network are
> >>> going as:
> >>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> >>> that there is a new packet.
> > 
> > 
> > What about the documentation? :) or the idea?
> > 
> > 
> >>>
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> This how I run QEMU:
> >>> ./qemu-system-ppc64 \
> >>> -enable-kvm \
> >>> -m 2048 \
> >>> -machine pseries \
> >>> -initrd 1.cpio \
> >>> -kernel vml312_virtio_net_dbg \
> >>> -nographic \
> >>> -vga none \
> >>> -netdev
> >>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>
> >>>
> >>> That is bridge config:
> >>> [aik@dyn232 ~]$ brctl show
> >>> bridge name	bridge id		STP enabled	interfaces
> >>> brtest		8000.00145e992e88	no	pin	eth4
> >>>
> >>>
> >>> The ifup.sh script:
> >>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>> /sbin/ifconfig $1 up
> >>> /usr/sbin/brctl addif brtest $1
> > 
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-23 16:24       ` Michael S. Tsirkin
@ 2013-12-24  3:09         ` Alexey Kardashevskiy
  2013-12-24  9:40           ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-24  3:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>> Hi!
>>>>>
>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>> not survive reboot of the guest.
>>>>>
>>>>> Steps to reproduce:
>>>>> 1. boot the guest
>>>>> 2. configure eth0 and do ping - everything works
>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>
>>>>> The test is:
>>>>> ifconfig eth0 172.20.1.2 up
>>>>> ping 172.20.1.23
>>>>>
>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>> response and it does the same after reboot but the answer does not come.
>>>>
>>>> So you see the arp packet in guest but not in host?
>>>
>>> Yes.
>>>
>>>
>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>> enabled - then you might see some errors in the kernel log.
>>>
>>> Tried and added lot more debug printk myself, not clear at all what is
>>> happening there.
>>>
>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>> not work at all. I.e. this script produces not-working-eth0:
>>>
>>>
>>> ifconfig eth0 172.20.1.2 down
>>> sleep 210
>>> ifconfig eth0 172.20.1.2 up
>>> ping 172.20.1.23
>>>
>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>
>>> No "vhost" == always works. The only difference I can see here is vhost's
>>> thread which may get suspended if not used for a while after the start and
>>> does not wake up but this is almost a blind guess.
>>
>>
>> Yet another clue - this host kernel patch seems to help with the guest
>> reboot but does not help with the initial 210 seconds delay:
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 69068e0..5e67650 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>> vhost_work *work)
>>                 list_add_tail(&work->node, &dev->work_list);
>>                 work->queue_seq++;
>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>> -               wake_up_process(dev->worker);
>>         } else {
>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>         }
>> +       wake_up_process(dev->worker);
>>  }
>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>
>>
> 
> Interesting. Some kind of race? A missing memory barrier somewhere?

I do not see how. I boot the guest and just wait 210 seconds, nothing
happens to cause races.


> Since it's all around startup,
> you can try kicking the host eventfd in
> vhost_net_start.


How exactly? This did not help. Thanks.

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 006576d..407ecf2 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
*ncs,
         if (r < 0) {
             goto err;
         }
+
+        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
+        struct vhost_vring_file file = {
+            .index = i
+        };
+        file.fd =
event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
+        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
+        if (r) {
+            error_report("Error notifiyng host notifier: %d", -r);
+            goto err;
+        }
     }



> 
>>
>>
>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>
>>>>>
>>>>> And there 2 questions:
>>>>>
>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>
>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>> accelerates?
>>>>>
>>>>> My understanding is that packets from the guest to the real network are
>>>>> going as:
>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>> that there is a new packet.
>>>
>>>
>>> What about the documentation? :) or the idea?
>>>
>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> This how I run QEMU:
>>>>> ./qemu-system-ppc64 \
>>>>> -enable-kvm \
>>>>> -m 2048 \
>>>>> -machine pseries \
>>>>> -initrd 1.cpio \
>>>>> -kernel vml312_virtio_net_dbg \
>>>>> -nographic \
>>>>> -vga none \
>>>>> -netdev
>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>
>>>>>
>>>>> That is bridge config:
>>>>> [aik@dyn232 ~]$ brctl show
>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>
>>>>>
>>>>> The ifup.sh script:
>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>> /sbin/ifconfig $1 up
>>>>> /usr/sbin/brctl addif brtest $1
>>>
>>>
>>
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-24  3:09         ` Alexey Kardashevskiy
@ 2013-12-24  9:40           ` Michael S. Tsirkin
  2013-12-24 14:15             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-24  9:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> > On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>> Hi!
> >>>>>
> >>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>> not survive reboot of the guest.
> >>>>>
> >>>>> Steps to reproduce:
> >>>>> 1. boot the guest
> >>>>> 2. configure eth0 and do ping - everything works
> >>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>
> >>>>> The test is:
> >>>>> ifconfig eth0 172.20.1.2 up
> >>>>> ping 172.20.1.23
> >>>>>
> >>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>> response and it does the same after reboot but the answer does not come.
> >>>>
> >>>> So you see the arp packet in guest but not in host?
> >>>
> >>> Yes.
> >>>
> >>>
> >>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>> enabled - then you might see some errors in the kernel log.
> >>>
> >>> Tried and added lot more debug printk myself, not clear at all what is
> >>> happening there.
> >>>
> >>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>> not work at all. I.e. this script produces not-working-eth0:
> >>>
> >>>
> >>> ifconfig eth0 172.20.1.2 down
> >>> sleep 210
> >>> ifconfig eth0 172.20.1.2 up
> >>> ping 172.20.1.23
> >>>
> >>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>
> >>> No "vhost" == always works. The only difference I can see here is vhost's
> >>> thread which may get suspended if not used for a while after the start and
> >>> does not wake up but this is almost a blind guess.
> >>
> >>
> >> Yet another clue - this host kernel patch seems to help with the guest
> >> reboot but does not help with the initial 210 seconds delay:
> >>
> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >> index 69068e0..5e67650 100644
> >> --- a/drivers/vhost/vhost.c
> >> +++ b/drivers/vhost/vhost.c
> >> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >> vhost_work *work)
> >>                 list_add_tail(&work->node, &dev->work_list);
> >>                 work->queue_seq++;
> >>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >> -               wake_up_process(dev->worker);
> >>         } else {
> >>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>         }
> >> +       wake_up_process(dev->worker);
> >>  }
> >>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>
> >>
> > 
> > Interesting. Some kind of race? A missing memory barrier somewhere?
> 
> I do not see how. I boot the guest and just wait 210 seconds, nothing
> happens to cause races.
> 
> 
> > Since it's all around startup,
> > you can try kicking the host eventfd in
> > vhost_net_start.
> 
> 
> How exactly? This did not help. Thanks.
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 006576d..407ecf2 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> *ncs,
>          if (r < 0) {
>              goto err;
>          }
> +
> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> +        struct vhost_vring_file file = {
> +            .index = i
> +        };
> +        file.fd =
> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);

No, this sets the notifier, it does not kick.
To kick you write 1 there:
	uint6_t  v = 1;
	write(fd, &v, sizeof v);

> +        if (r) {
> +            error_report("Error notifiyng host notifier: %d", -r);
> +            goto err;
> +        }
>      }
> 
> 
> 
> > 
> >>
> >>
> >>>>> If to remove vhost=on, it is all good. If to try Fedora19
> >>>>> (v3.10-something), it all good again - works before and after reboot.
> >>>>>
> >>>>>
> >>>>> And there 2 questions:
> >>>>>
> >>>>> 1. does anybody have any clue what might go wrong after reboot?
> >>>>>
> >>>>> 2. Is there any good material to read about what exactly and how vhost
> >>>>> accelerates?
> >>>>>
> >>>>> My understanding is that packets from the guest to the real network are
> >>>>> going as:
> >>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> >>>>> that there is a new packet.
> >>>
> >>>
> >>> What about the documentation? :) or the idea?
> >>>
> >>>
> >>>>>
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>> This how I run QEMU:
> >>>>> ./qemu-system-ppc64 \
> >>>>> -enable-kvm \
> >>>>> -m 2048 \
> >>>>> -machine pseries \
> >>>>> -initrd 1.cpio \
> >>>>> -kernel vml312_virtio_net_dbg \
> >>>>> -nographic \
> >>>>> -vga none \
> >>>>> -netdev
> >>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>>>
> >>>>>
> >>>>> That is bridge config:
> >>>>> [aik@dyn232 ~]$ brctl show
> >>>>> bridge name	bridge id		STP enabled	interfaces
> >>>>> brtest		8000.00145e992e88	no	pin	eth4
> >>>>>
> >>>>>
> >>>>> The ifup.sh script:
> >>>>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>>>> /sbin/ifconfig $1 up
> >>>>> /usr/sbin/brctl addif brtest $1
> >>>
> >>>
> >>
> >>
> >> -- 
> >> Alexey
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-24  9:40           ` Michael S. Tsirkin
@ 2013-12-24 14:15             ` Alexey Kardashevskiy
  2013-12-24 15:43               ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-24 14:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>> not survive reboot of the guest.
>>>>>>>
>>>>>>> Steps to reproduce:
>>>>>>> 1. boot the guest
>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>
>>>>>>> The test is:
>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>> ping 172.20.1.23
>>>>>>>
>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>
>>>>>> So you see the arp packet in guest but not in host?
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>
>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>> happening there.
>>>>>
>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>
>>>>>
>>>>> ifconfig eth0 172.20.1.2 down
>>>>> sleep 210
>>>>> ifconfig eth0 172.20.1.2 up
>>>>> ping 172.20.1.23
>>>>>
>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>
>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>> thread which may get suspended if not used for a while after the start and
>>>>> does not wake up but this is almost a blind guess.
>>>>
>>>>
>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>> reboot but does not help with the initial 210 seconds delay:
>>>>
>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>> index 69068e0..5e67650 100644
>>>> --- a/drivers/vhost/vhost.c
>>>> +++ b/drivers/vhost/vhost.c
>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>> vhost_work *work)
>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>                 work->queue_seq++;
>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>> -               wake_up_process(dev->worker);
>>>>         } else {
>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>         }
>>>> +       wake_up_process(dev->worker);
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>
>>>>
>>>
>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>
>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>> happens to cause races.
>>
>>
>>> Since it's all around startup,
>>> you can try kicking the host eventfd in
>>> vhost_net_start.
>>
>>
>> How exactly? This did not help. Thanks.
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index 006576d..407ecf2 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>> *ncs,
>>          if (r < 0) {
>>              goto err;
>>          }
>> +
>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>> +        struct vhost_vring_file file = {
>> +            .index = i
>> +        };
>> +        file.fd =
>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> 
> No, this sets the notifier, it does not kick.
> To kick you write 1 there:
> 	uint6_t  v = 1;
> 	write(fd, &v, sizeof v);


Please, be precise. How/where do I get that @fd? Is what I do correct? What
is uint6_t - uint8_t or uint16_t (neither works)?

May be it is a missing barrier - I rebooted machine several times and now
sometime after even 240 seconds (not 210 as before) it works (but most of
the time still does not)...


>> +        if (r) {
>> +            error_report("Error notifiyng host notifier: %d", -r);
>> +            goto err;
>> +        }
>>      }
>>
>>
>>
>>>
>>>>
>>>>
>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>>>
>>>>>>>
>>>>>>> And there 2 questions:
>>>>>>>
>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>>>
>>>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>>>> accelerates?
>>>>>>>
>>>>>>> My understanding is that packets from the guest to the real network are
>>>>>>> going as:
>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>>>> that there is a new packet.
>>>>>
>>>>>
>>>>> What about the documentation? :) or the idea?
>>>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> This how I run QEMU:
>>>>>>> ./qemu-system-ppc64 \
>>>>>>> -enable-kvm \
>>>>>>> -m 2048 \
>>>>>>> -machine pseries \
>>>>>>> -initrd 1.cpio \
>>>>>>> -kernel vml312_virtio_net_dbg \
>>>>>>> -nographic \
>>>>>>> -vga none \
>>>>>>> -netdev
>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>>>
>>>>>>>
>>>>>>> That is bridge config:
>>>>>>> [aik@dyn232 ~]$ brctl show
>>>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>>>
>>>>>>>
>>>>>>> The ifup.sh script:
>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>>>> /sbin/ifconfig $1 up
>>>>>>> /usr/sbin/brctl addif brtest $1



-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-24 14:15             ` Alexey Kardashevskiy
@ 2013-12-24 15:43               ` Michael S. Tsirkin
  2013-12-25  1:36                 ` Alexey Kardashevskiy
  2014-01-07 13:18                 ` Alexey Kardashevskiy
  0 siblings, 2 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-24 15:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> > On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>> Hi!
> >>>>>>>
> >>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>> not survive reboot of the guest.
> >>>>>>>
> >>>>>>> Steps to reproduce:
> >>>>>>> 1. boot the guest
> >>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>
> >>>>>>> The test is:
> >>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>> ping 172.20.1.23
> >>>>>>>
> >>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>
> >>>>>> So you see the arp packet in guest but not in host?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>
> >>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>
> >>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>> happening there.
> >>>>>
> >>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>
> >>>>>
> >>>>> ifconfig eth0 172.20.1.2 down
> >>>>> sleep 210
> >>>>> ifconfig eth0 172.20.1.2 up
> >>>>> ping 172.20.1.23
> >>>>>
> >>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>
> >>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>> thread which may get suspended if not used for a while after the start and
> >>>>> does not wake up but this is almost a blind guess.
> >>>>
> >>>>
> >>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>> reboot but does not help with the initial 210 seconds delay:
> >>>>
> >>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>> index 69068e0..5e67650 100644
> >>>> --- a/drivers/vhost/vhost.c
> >>>> +++ b/drivers/vhost/vhost.c
> >>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>> vhost_work *work)
> >>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>                 work->queue_seq++;
> >>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>> -               wake_up_process(dev->worker);
> >>>>         } else {
> >>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>         }
> >>>> +       wake_up_process(dev->worker);
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>
> >>>>
> >>>
> >>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>
> >> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >> happens to cause races.
> >>
> >>
> >>> Since it's all around startup,
> >>> you can try kicking the host eventfd in
> >>> vhost_net_start.
> >>
> >>
> >> How exactly? This did not help. Thanks.
> >>
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 006576d..407ecf2 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >> *ncs,
> >>          if (r < 0) {
> >>              goto err;
> >>          }
> >> +
> >> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >> +        struct vhost_vring_file file = {
> >> +            .index = i
> >> +        };
> >> +        file.fd =
> >> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> > 
> > No, this sets the notifier, it does not kick.
> > To kick you write 1 there:
> > 	uint6_t  v = 1;
> > 	write(fd, &v, sizeof v);
> 
> 
> Please, be precise. How/where do I get that @fd? Is what I do correct?

Yes.

> What
> is uint6_t - uint8_t or uint16_t (neither works)?

Sorry, should have been uint64_t.

> May be it is a missing barrier - I rebooted machine several times and now
> sometime after even 240 seconds (not 210 as before) it works (but most of
> the time still does not)...
> 
> 
> >> +        if (r) {
> >> +            error_report("Error notifiyng host notifier: %d", -r);
> >> +            goto err;
> >> +        }
> >>      }
> >>
> >>
> >>
> >>>
> >>>>
> >>>>
> >>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
> >>>>>>> (v3.10-something), it all good again - works before and after reboot.
> >>>>>>>
> >>>>>>>
> >>>>>>> And there 2 questions:
> >>>>>>>
> >>>>>>> 1. does anybody have any clue what might go wrong after reboot?
> >>>>>>>
> >>>>>>> 2. Is there any good material to read about what exactly and how vhost
> >>>>>>> accelerates?
> >>>>>>>
> >>>>>>> My understanding is that packets from the guest to the real network are
> >>>>>>> going as:
> >>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> >>>>>>> that there is a new packet.
> >>>>>
> >>>>>
> >>>>> What about the documentation? :) or the idea?
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>>
> >>>>>>> This how I run QEMU:
> >>>>>>> ./qemu-system-ppc64 \
> >>>>>>> -enable-kvm \
> >>>>>>> -m 2048 \
> >>>>>>> -machine pseries \
> >>>>>>> -initrd 1.cpio \
> >>>>>>> -kernel vml312_virtio_net_dbg \
> >>>>>>> -nographic \
> >>>>>>> -vga none \
> >>>>>>> -netdev
> >>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>>>>>
> >>>>>>>
> >>>>>>> That is bridge config:
> >>>>>>> [aik@dyn232 ~]$ brctl show
> >>>>>>> bridge name	bridge id		STP enabled	interfaces
> >>>>>>> brtest		8000.00145e992e88	no	pin	eth4
> >>>>>>>
> >>>>>>>
> >>>>>>> The ifup.sh script:
> >>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>>>>>> /sbin/ifconfig $1 up
> >>>>>>> /usr/sbin/brctl addif brtest $1
> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-24 15:43               ` Michael S. Tsirkin
@ 2013-12-25  1:36                 ` Alexey Kardashevskiy
  2013-12-25  9:52                   ` Michael S. Tsirkin
  2014-01-07 13:18                 ` Alexey Kardashevskiy
  1 sibling, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-25  1:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>
>>>>>>>>> Steps to reproduce:
>>>>>>>>> 1. boot the guest
>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>
>>>>>>>>> The test is:
>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>> ping 172.20.1.23
>>>>>>>>>
>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>
>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>
>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>
>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>> happening there.
>>>>>>>
>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>
>>>>>>>
>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>> sleep 210
>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>> ping 172.20.1.23
>>>>>>>
>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>
>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>
>>>>>>
>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 69068e0..5e67650 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>> vhost_work *work)
>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>                 work->queue_seq++;
>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>> -               wake_up_process(dev->worker);
>>>>>>         } else {
>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>         }
>>>>>> +       wake_up_process(dev->worker);
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>
>>>>>>
>>>>>
>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>
>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>> happens to cause races.
>>>>
>>>>
>>>>> Since it's all around startup,
>>>>> you can try kicking the host eventfd in
>>>>> vhost_net_start.
>>>>
>>>>
>>>> How exactly? This did not help. Thanks.
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 006576d..407ecf2 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>> *ncs,
>>>>          if (r < 0) {
>>>>              goto err;
>>>>          }
>>>> +
>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>> +        struct vhost_vring_file file = {
>>>> +            .index = i
>>>> +        };
>>>> +        file.fd =
>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>
>>> No, this sets the notifier, it does not kick.
>>> To kick you write 1 there:
>>> 	uint6_t  v = 1;
>>> 	write(fd, &v, sizeof v);
>>
>>
>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> 
> Yes.
> 
>> What
>> is uint6_t - uint8_t or uint16_t (neither works)?
> 
> Sorry, should have been uint64_t.


Oh, that I missed :-) Anyway, this does not make any difference. Is there
any cheap&dirty way to make vhost-net kernel thread always awake? Sending
it signals from the user space does not work...



>> May be it is a missing barrier - I rebooted machine several times and now
>> sometime after even 240 seconds (not 210 as before) it works (but most of
>> the time still does not)...
>>
>>
>>>> +        if (r) {
>>>> +            error_report("Error notifiyng host notifier: %d", -r);
>>>> +            goto err;
>>>> +        }
>>>>      }
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And there 2 questions:
>>>>>>>>>
>>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>>>>>
>>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>>>>>> accelerates?
>>>>>>>>>
>>>>>>>>> My understanding is that packets from the guest to the real network are
>>>>>>>>> going as:
>>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>>>>>> that there is a new packet.
>>>>>>>
>>>>>>>
>>>>>>> What about the documentation? :) or the idea?
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This how I run QEMU:
>>>>>>>>> ./qemu-system-ppc64 \
>>>>>>>>> -enable-kvm \
>>>>>>>>> -m 2048 \
>>>>>>>>> -machine pseries \
>>>>>>>>> -initrd 1.cpio \
>>>>>>>>> -kernel vml312_virtio_net_dbg \
>>>>>>>>> -nographic \
>>>>>>>>> -vga none \
>>>>>>>>> -netdev
>>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is bridge config:
>>>>>>>>> [aik@dyn232 ~]$ brctl show
>>>>>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The ifup.sh script:
>>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>>>>>> /sbin/ifconfig $1 up
>>>>>>>>> /usr/sbin/brctl addif brtest $1



-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-25  1:36                 ` Alexey Kardashevskiy
@ 2013-12-25  9:52                   ` Michael S. Tsirkin
  2013-12-26 10:13                     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-25  9:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> > On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>> Hi!
> >>>>>>>>>
> >>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>
> >>>>>>>>> Steps to reproduce:
> >>>>>>>>> 1. boot the guest
> >>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>>>
> >>>>>>>>> The test is:
> >>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>> ping 172.20.1.23
> >>>>>>>>>
> >>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>>>
> >>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>>
> >>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>
> >>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>> happening there.
> >>>>>>>
> >>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>
> >>>>>>>
> >>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>> sleep 210
> >>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>> ping 172.20.1.23
> >>>>>>>
> >>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>>>
> >>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>>>> thread which may get suspended if not used for a while after the start and
> >>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>
> >>>>>>
> >>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>
> >>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>> index 69068e0..5e67650 100644
> >>>>>> --- a/drivers/vhost/vhost.c
> >>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>>>> vhost_work *work)
> >>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>                 work->queue_seq++;
> >>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>> -               wake_up_process(dev->worker);
> >>>>>>         } else {
> >>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>         }
> >>>>>> +       wake_up_process(dev->worker);
> >>>>>>  }
> >>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>
> >>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>> happens to cause races.
> >>>>
> >>>>
> >>>>> Since it's all around startup,
> >>>>> you can try kicking the host eventfd in
> >>>>> vhost_net_start.
> >>>>
> >>>>
> >>>> How exactly? This did not help. Thanks.
> >>>>
> >>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>> index 006576d..407ecf2 100644
> >>>> --- a/hw/net/vhost_net.c
> >>>> +++ b/hw/net/vhost_net.c
> >>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >>>> *ncs,
> >>>>          if (r < 0) {
> >>>>              goto err;
> >>>>          }
> >>>> +
> >>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>> +        struct vhost_vring_file file = {
> >>>> +            .index = i
> >>>> +        };
> >>>> +        file.fd =
> >>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>
> >>> No, this sets the notifier, it does not kick.
> >>> To kick you write 1 there:
> >>> 	uint6_t  v = 1;
> >>> 	write(fd, &v, sizeof v);
> >>
> >>
> >> Please, be precise. How/where do I get that @fd? Is what I do correct?
> > 
> > Yes.
> > 
> >> What
> >> is uint6_t - uint8_t or uint16_t (neither works)?
> > 
> > Sorry, should have been uint64_t.
> 
> 
> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> it signals from the user space does not work...

You can run a timer in qemu and signal the eventfd from there
periodically.

Just to restate, tcpdump in guest shows that guest sends arp packet,
but tcpdump in host on tun device does not show any packets?

If yes, other things to try:
1. trace handle_tx [vhost_net]
2. trace tun_get_user [tun]
3. I suspect some guest bug in one of the features.
Let's try to disable some flags with device property:
you can get the list by doing:
./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
Things I would try turning off is host offloads (ones that start with host_)
event_idx,any_layout,mq.
Turn them all off, if it helps try to find the one that helped.


> 
> 
> >> May be it is a missing barrier - I rebooted machine several times and now
> >> sometime after even 240 seconds (not 210 as before) it works (but most of
> >> the time still does not)...
> >>
> >>
> >>>> +        if (r) {
> >>>> +            error_report("Error notifiyng host notifier: %d", -r);
> >>>> +            goto err;
> >>>> +        }
> >>>>      }
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
> >>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And there 2 questions:
> >>>>>>>>>
> >>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
> >>>>>>>>>
> >>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
> >>>>>>>>> accelerates?
> >>>>>>>>>
> >>>>>>>>> My understanding is that packets from the guest to the real network are
> >>>>>>>>> going as:
> >>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> >>>>>>>>> that there is a new packet.
> >>>>>>>
> >>>>>>>
> >>>>>>> What about the documentation? :) or the idea?
> >>>>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This how I run QEMU:
> >>>>>>>>> ./qemu-system-ppc64 \
> >>>>>>>>> -enable-kvm \
> >>>>>>>>> -m 2048 \
> >>>>>>>>> -machine pseries \
> >>>>>>>>> -initrd 1.cpio \
> >>>>>>>>> -kernel vml312_virtio_net_dbg \
> >>>>>>>>> -nographic \
> >>>>>>>>> -vga none \
> >>>>>>>>> -netdev
> >>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That is bridge config:
> >>>>>>>>> [aik@dyn232 ~]$ brctl show
> >>>>>>>>> bridge name	bridge id		STP enabled	interfaces
> >>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The ifup.sh script:
> >>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>>>>>>>> /sbin/ifconfig $1 up
> >>>>>>>>> /usr/sbin/brctl addif brtest $1
> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-25  9:52                   ` Michael S. Tsirkin
@ 2013-12-26 10:13                     ` Alexey Kardashevskiy
  2013-12-26 10:49                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-26 10:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>> Hi!
>>>>>>>>>>>
>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>
>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>
>>>>>>>>>>> The test is:
>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>
>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>
>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>
>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>> happening there.
>>>>>>>>>
>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>> sleep 210
>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>> ping 172.20.1.23
>>>>>>>>>
>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>
>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>
>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>> vhost_work *work)
>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>                 work->queue_seq++;
>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>         } else {
>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>         }
>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>  }
>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>
>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>> happens to cause races.
>>>>>>
>>>>>>
>>>>>>> Since it's all around startup,
>>>>>>> you can try kicking the host eventfd in
>>>>>>> vhost_net_start.
>>>>>>
>>>>>>
>>>>>> How exactly? This did not help. Thanks.
>>>>>>
>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>> index 006576d..407ecf2 100644
>>>>>> --- a/hw/net/vhost_net.c
>>>>>> +++ b/hw/net/vhost_net.c
>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>> *ncs,
>>>>>>          if (r < 0) {
>>>>>>              goto err;
>>>>>>          }
>>>>>> +
>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>> +        struct vhost_vring_file file = {
>>>>>> +            .index = i
>>>>>> +        };
>>>>>> +        file.fd =
>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>
>>>>> No, this sets the notifier, it does not kick.
>>>>> To kick you write 1 there:
>>>>> 	uint6_t  v = 1;
>>>>> 	write(fd, &v, sizeof v);
>>>>
>>>>
>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>
>>> Yes.
>>>
>>>> What
>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>
>>> Sorry, should have been uint64_t.
>>
>>
>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
>> it signals from the user space does not work...
> 
> You can run a timer in qemu and signal the eventfd from there
> periodically.
> 
> Just to restate, tcpdump in guest shows that guest sends arp packet,
> but tcpdump in host on tun device does not show any packets?


Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
something is happening on the host's TAP - the guest sends ARP request, the
response is visible on the TAP interface but not in the guest.

Summarizing everything:

Host and guest are Fedora19, the host kernel is 3.13-rc4, the guest kernel
is 3.12, steps to reproduce are:
1. boot the guest, no network is up
2. do this:
ifconfig eth0 172.20.1.2 down ; sleep 240 ; ifconfig eth0 172.20.1.2 up ;
ping 172.20.1.23

Ping will fail.

After that, I can bring eth1 up, ssh to it and run tcpdump.


The command line is:
./qemu-system-ppc64 \
	-enable-kvm \
	-m 2048 \
	-L qemu-ppc64-bios/ \
	-machine pseries \
	-trace events=qemu_trace_events \
	-kernel vml312 \
	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
	-nographic \
	-vga none \
	-nodefaults \
	-chardev stdio,id=id0,signal=off,mux=on \
	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
	-mon id=id2,chardev=id0,mode=readline \
	-netdev
tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
	-device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer \
	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01




The guest config:

[root@localhost ~]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.20.1.2  netmask 255.255.0.0  broadcast 172.20.255.255
        ether c0:41:49:4b:00:00  txqueuelen 1000  (Ethernet)
        RX packets 427  bytes 36406 (35.5 KiB)
        RX errors 0  dropped 172  overruns 0  frame 0
        TX packets 41  bytes 3284 (3.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@localhost ~]# ping 172.20.1.23
>From 172.20.1.2 icmp_seq=1 Destination Host Unreachable
>From 172.20.1.2 icmp_seq=2 Destination Host Unreachable
>From 172.20.1.2 icmp_seq=3 Destination Host Unreachable
...
--- 172.20.1.23 ping statistics ---
17 packets transmitted, 0 received, +15 errors, 100% packet loss, time 16026ms

Ping fails.


On the guest side (ssh via spapr-vlan): tcpdump -i eth0

[root@localhost ~]# tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
20:41:49.206927 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:50.203149 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:51.203148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:52.233150 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:53.233148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:54.233149 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:55.233168 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:56.233148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:57.233147 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:58.233163 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:41:59.233148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:00.233148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:01.233164 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:02.233148 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:03.233149 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:04.233167 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:05.233149 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
20:42:06.233149 ARP, Request who-has 172.20.1.23 tell
localhost.localdomain, length 28
^C
18 packets captured
18 packets received by filter
0 packets dropped by kernel


On the host: tcpdump -i tap-id3

[aik@dyn232 ~]$ sudo tcpdump -i tap-id3
tcpdump: WARNING: tap-id3: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap-id3, link-type EN10MB (Ethernet), capture size 65535 bytes
20:41:48.496935 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:41:49.012376 IP v7000-A.ozlabs.ibm.com.svrloc > 239.255.255.253.svrloc:
UDP, length 49
20:41:49.038231 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:49.038283 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:49.080479 IP6 fe80::e61f:13ff:fe8e:215c > ff02::16: HBH ICMP6,
multicast listener report v2, 1 group record(s), length 28
20:41:49.857302 IP v7000-C.ozlabs.ibm.com.svrloc > 239.255.255.253.svrloc:
UDP, length 49
20:41:49.926920 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:49.928808 IP6 fe80::e61f:13ff:fe59:1fd9.dhcpv6-client >
ff02::1:2.dhcpv6-server: dhcp6 solicit
20:41:50.034427 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:50.034466 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:50.497330 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:41:50.759672 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:50.853096 CDPv2, ttl: 180s, Device-ID 'BlueSwitch5.ozlabs.ibm.com',
length 438
20:41:50.937232 IP ka2-imm.ozlabs.ibm.com.svrloc > 239.255.255.253.svrloc:
UDP, length 49
20:41:51.034421 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:51.034462 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:51.759734 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:52.035032 IP6 fe80::5ef3:fcff:fe2f:6730.dhcpv6-client >
ff02::1:2.dhcpv6-server: dhcp6 solicit
20:41:52.064429 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:52.064471 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:52.497384 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:41:52.926863 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:53.064420 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:53.064461 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:53.135592 IP6 fe80::e61f:13ff:fe8e:287a > ff02::16: HBH ICMP6,
multicast listener report v2, 1 group record(s), length 28
20:41:53.368024 ARP, Request who-has p5-77-E1.ozlabs.ibm.com tell
penny.ozlabs.ibm.com, length 46
20:41:53.731449 ARP, Request who-has ADLC-Jago.ozlabs.ibm.com tell
ADLC-HMC1.ozlabs.ibm.com, length 46
20:41:53.731451 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-HMC1.ozlabs.ibm.com, length 46
20:41:53.759850 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:54.064424 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:54.064465 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:54.368031 ARP, Request who-has p5-77-E1.ozlabs.ibm.com tell
penny.ozlabs.ibm.com, length 46
20:41:54.497623 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:41:54.731459 ARP, Request who-has ADLC-Jago.ozlabs.ibm.com tell
ADLC-HMC1.ozlabs.ibm.com, length 46
20:41:54.731462 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-HMC1.ozlabs.ibm.com, length 46
20:41:54.759915 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:55.064439 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:55.064479 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:58.064437 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:58.064478 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:58.498053 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:41:58.927228 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:41:59.064415 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:41:59.064455 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:41:59.759221 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:42:00.064421 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:42:00.064462 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:42:00.108876 IP6 fe80::e61f:13ff:fe8e:215c > ff02::16: HBH ICMP6,
multicast listener report v2, 1 group record(s), length 28
20:42:00.498451 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
20:42:00.759288 ARP, Request who-has ADLC-node1.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:42:01.016884 ARP, Request who-has ds4300-b.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:42:01.064442 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:42:01.064482 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:42:01.247007 IP6 fe80::e61f:13ff:fe8e:287a > ff02::16: HBH ICMP6,
multicast listener report v2, 1 group record(s), length 28
20:42:01.381838 IP dyn120.ozlabs.ibm.com.bootpc > 255.255.255.255.bootps:
BOOTP/DHCP, Request from 00:1a:64:44:af:4d (oui Unknown), length 300
20:42:01.445952 ARP, Request who-has p5-40-P2-E0.ozlabs.ibm.com tell
powermon.ozlabs.ibm.com, length 46
20:42:01.960208 ARP, Request who-has ds3200-b.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:42:02.064418 ARP, Request who-has 172.20.1.23 tell 172.20.1.2, length 28
20:42:02.064456 ARP, Reply 172.20.1.23 is-at 00:90:fa:13:19:66 (oui
Unknown), length 46
20:42:02.445982 ARP, Request who-has p5-40-P2-E0.ozlabs.ibm.com tell
powermon.ozlabs.ibm.com, length 46
20:42:02.472576 ARP, Request who-has ds3200-a.ozlabs.ibm.com tell
ADLC-CBISD.ozlabs.ibm.com, length 46
20:42:02.498516 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id
2001.d4:8c:b5:d0:9f:80.8007, length 43
^C20:42:03.019758 IP6 fe80::c988:ee1e:6e92:2383 > ff02::1:ff39:b4c: ICMP6,
neighbor solicitation, who has fe80::2a0:b8ff:fe39:b4c, length 32

63 packets captured
99 packets received by filter
11 packets dropped by kernel




> 
> If yes, other things to try:
> 1. trace handle_tx [vhost_net]
> 2. trace tun_get_user [tun]
> 3. I suspect some guest bug in one of the features.
> Let's try to disable some flags with device property:
> you can get the list by doing:
> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
> Things I would try turning off is host offloads (ones that start with host_)
> event_idx,any_layout,mq.
> Turn them all off, if it helps try to find the one that helped.
> 
> 
>>
>>
>>>> May be it is a missing barrier - I rebooted machine several times and now
>>>> sometime after even 240 seconds (not 210 as before) it works (but most of
>>>> the time still does not)...
>>>>
>>>>
>>>>>> +        if (r) {
>>>>>> +            error_report("Error notifiyng host notifier: %d", -r);
>>>>>> +            goto err;
>>>>>> +        }
>>>>>>      }
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And there 2 questions:
>>>>>>>>>>>
>>>>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>>>>>>>
>>>>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>>>>>>>> accelerates?
>>>>>>>>>>>
>>>>>>>>>>> My understanding is that packets from the guest to the real network are
>>>>>>>>>>> going as:
>>>>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>>>>>>>> that there is a new packet.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What about the documentation? :) or the idea?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This how I run QEMU:
>>>>>>>>>>> ./qemu-system-ppc64 \
>>>>>>>>>>> -enable-kvm \
>>>>>>>>>>> -m 2048 \
>>>>>>>>>>> -machine pseries \
>>>>>>>>>>> -initrd 1.cpio \
>>>>>>>>>>> -kernel vml312_virtio_net_dbg \
>>>>>>>>>>> -nographic \
>>>>>>>>>>> -vga none \
>>>>>>>>>>> -netdev
>>>>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That is bridge config:
>>>>>>>>>>> [aik@dyn232 ~]$ brctl show
>>>>>>>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The ifup.sh script:
>>>>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>>>>>>>> /sbin/ifconfig $1 up
>>>>>>>>>>> /usr/sbin/brctl addif brtest $1
>>
>>
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 10:13                     ` Alexey Kardashevskiy
@ 2013-12-26 10:49                       ` Michael S. Tsirkin
  2013-12-26 12:51                         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-26 10:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>>> Hi!
> >>>>>>>>>>>
> >>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>>>
> >>>>>>>>>>> Steps to reproduce:
> >>>>>>>>>>> 1. boot the guest
> >>>>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>>>>>
> >>>>>>>>>>> The test is:
> >>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>
> >>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>>>>>
> >>>>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>>>
> >>>>>>>>> Yes.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>>>
> >>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>>>> happening there.
> >>>>>>>>>
> >>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>>>> sleep 210
> >>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>> ping 172.20.1.23
> >>>>>>>>>
> >>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>>>>>
> >>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>>>>>> thread which may get suspended if not used for a while after the start and
> >>>>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>>>> index 69068e0..5e67650 100644
> >>>>>>>> --- a/drivers/vhost/vhost.c
> >>>>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>>>>>> vhost_work *work)
> >>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>>>                 work->queue_seq++;
> >>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>> -               wake_up_process(dev->worker);
> >>>>>>>>         } else {
> >>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>         }
> >>>>>>>> +       wake_up_process(dev->worker);
> >>>>>>>>  }
> >>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>>>
> >>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>>>> happens to cause races.
> >>>>>>
> >>>>>>
> >>>>>>> Since it's all around startup,
> >>>>>>> you can try kicking the host eventfd in
> >>>>>>> vhost_net_start.
> >>>>>>
> >>>>>>
> >>>>>> How exactly? This did not help. Thanks.
> >>>>>>
> >>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>>>> index 006576d..407ecf2 100644
> >>>>>> --- a/hw/net/vhost_net.c
> >>>>>> +++ b/hw/net/vhost_net.c
> >>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >>>>>> *ncs,
> >>>>>>          if (r < 0) {
> >>>>>>              goto err;
> >>>>>>          }
> >>>>>> +
> >>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>>>> +        struct vhost_vring_file file = {
> >>>>>> +            .index = i
> >>>>>> +        };
> >>>>>> +        file.fd =
> >>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>>>
> >>>>> No, this sets the notifier, it does not kick.
> >>>>> To kick you write 1 there:
> >>>>> 	uint6_t  v = 1;
> >>>>> 	write(fd, &v, sizeof v);
> >>>>
> >>>>
> >>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> >>>
> >>> Yes.
> >>>
> >>>> What
> >>>> is uint6_t - uint8_t or uint16_t (neither works)?
> >>>
> >>> Sorry, should have been uint64_t.
> >>
> >>
> >> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> >> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> >> it signals from the user space does not work...
> > 
> > You can run a timer in qemu and signal the eventfd from there
> > periodically.
> > 
> > Just to restate, tcpdump in guest shows that guest sends arp packet,
> > but tcpdump in host on tun device does not show any packets?
> 
> 
> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
> something is happening on the host's TAP - the guest sends ARP request, the
> response is visible on the TAP interface but not in the guest.

Okay. So problem is on host to guest path then.
Things to try:

1. trace handle_rx [vhost_net]
2. trace tun_put_user [tun]
3. I suspect some host bug in one of the features.
Let's try to disable some flags with device property:
you can get the list by doing:
./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
Things I would try turning off is guest offloads (ones that start with guest_)
event_idx,any_layout,mq.
Turn them all off, if it helps try to find the one that helped.

-- 
MST

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 10:49                       ` Michael S. Tsirkin
@ 2013-12-26 12:51                         ` Alexey Kardashevskiy
  2013-12-26 13:48                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-26 12:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>
>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>>
>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>
>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>>> happening there.
>>>>>>>>>>>
>>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>> sleep 210
>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>
>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>>
>>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>>> vhost_work *work)
>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>         } else {
>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>         }
>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>  }
>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>>
>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>>> happens to cause races.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Since it's all around startup,
>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>> vhost_net_start.
>>>>>>>>
>>>>>>>>
>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>
>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>>> *ncs,
>>>>>>>>          if (r < 0) {
>>>>>>>>              goto err;
>>>>>>>>          }
>>>>>>>> +
>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>> +            .index = i
>>>>>>>> +        };
>>>>>>>> +        file.fd =
>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>
>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>> To kick you write 1 there:
>>>>>>> 	uint6_t  v = 1;
>>>>>>> 	write(fd, &v, sizeof v);
>>>>>>
>>>>>>
>>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> What
>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>
>>>>> Sorry, should have been uint64_t.
>>>>
>>>>
>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
>>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
>>>> it signals from the user space does not work...
>>>
>>> You can run a timer in qemu and signal the eventfd from there
>>> periodically.
>>>
>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>> but tcpdump in host on tun device does not show any packets?
>>
>>
>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>> something is happening on the host's TAP - the guest sends ARP request, the
>> response is visible on the TAP interface but not in the guest.
> 
> Okay. So problem is on host to guest path then.
> Things to try:
> 
> 1. trace handle_rx [vhost_net]
> 2. trace tun_put_user [tun]
> 3. I suspect some host bug in one of the features.
> Let's try to disable some flags with device property:
> you can get the list by doing:
> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
> Things I would try turning off is guest offloads (ones that start with guest_)
> event_idx,any_layout,mq.
> Turn them all off, if it helps try to find the one that helped.


Heh. It still would be awesome to read basics about this vhost thing as I
am debugging blindly :)

Regarding your suggestions.

1. I put "printk" in handle_rx and tun_put_user.
handle_rx stopped being called after 2:40 from the guest start,
tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
If I bring the guest's eth0 up while handle_rx is still printing, it works,
i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
bring eth0 back to live.

2. This is exactly how I run QEMU now. I basically set "off" for every
on/off parameters. This did not change anything.

./qemu-system-ppc64 \
	-enable-kvm \
	-m 2048 \
	-L qemu-ppc64-bios/ \
	-machine pseries \
	-trace events=qemu_trace_events \
	-kernel vml312 \
	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
	-nographic \
	-vga none \
	-nodefaults \
	-chardev stdio,id=id0,signal=off,mux=on \
	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
	-mon id=id2,chardev=id0,mode=readline \
	-netdev
tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
	-device
virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
command_serr_enable=off \
	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01



-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 12:51                         ` Alexey Kardashevskiy
@ 2013-12-26 13:48                           ` Michael S. Tsirkin
  2013-12-26 14:59                             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-26 13:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
> > On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> >>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Steps to reproduce:
> >>>>>>>>>>>>> 1. boot the guest
> >>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The test is:
> >>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>>>>>
> >>>>>>>>>>> Yes.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>>>>>
> >>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>>>>>> happening there.
> >>>>>>>>>>>
> >>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>>>>>> sleep 210
> >>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>
> >>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>>>>>>>
> >>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>>>>>>>> thread which may get suspended if not used for a while after the start and
> >>>>>>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>>>>>> index 69068e0..5e67650 100644
> >>>>>>>>>> --- a/drivers/vhost/vhost.c
> >>>>>>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>>>>>>>> vhost_work *work)
> >>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>>>>>                 work->queue_seq++;
> >>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>> -               wake_up_process(dev->worker);
> >>>>>>>>>>         } else {
> >>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>>         }
> >>>>>>>>>> +       wake_up_process(dev->worker);
> >>>>>>>>>>  }
> >>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>>>>>
> >>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>>>>>> happens to cause races.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Since it's all around startup,
> >>>>>>>>> you can try kicking the host eventfd in
> >>>>>>>>> vhost_net_start.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> How exactly? This did not help. Thanks.
> >>>>>>>>
> >>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>>>>>> index 006576d..407ecf2 100644
> >>>>>>>> --- a/hw/net/vhost_net.c
> >>>>>>>> +++ b/hw/net/vhost_net.c
> >>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >>>>>>>> *ncs,
> >>>>>>>>          if (r < 0) {
> >>>>>>>>              goto err;
> >>>>>>>>          }
> >>>>>>>> +
> >>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>>>>>> +        struct vhost_vring_file file = {
> >>>>>>>> +            .index = i
> >>>>>>>> +        };
> >>>>>>>> +        file.fd =
> >>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>>>>>
> >>>>>>> No, this sets the notifier, it does not kick.
> >>>>>>> To kick you write 1 there:
> >>>>>>> 	uint6_t  v = 1;
> >>>>>>> 	write(fd, &v, sizeof v);
> >>>>>>
> >>>>>>
> >>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>> What
> >>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
> >>>>>
> >>>>> Sorry, should have been uint64_t.
> >>>>
> >>>>
> >>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> >>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> >>>> it signals from the user space does not work...
> >>>
> >>> You can run a timer in qemu and signal the eventfd from there
> >>> periodically.
> >>>
> >>> Just to restate, tcpdump in guest shows that guest sends arp packet,
> >>> but tcpdump in host on tun device does not show any packets?
> >>
> >>
> >> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
> >> something is happening on the host's TAP - the guest sends ARP request, the
> >> response is visible on the TAP interface but not in the guest.
> > 
> > Okay. So problem is on host to guest path then.
> > Things to try:
> > 
> > 1. trace handle_rx [vhost_net]
> > 2. trace tun_put_user [tun]
> > 3. I suspect some host bug in one of the features.
> > Let's try to disable some flags with device property:
> > you can get the list by doing:
> > ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
> > Things I would try turning off is guest offloads (ones that start with guest_)
> > event_idx,any_layout,mq.
> > Turn them all off, if it helps try to find the one that helped.
> 
> 
> Heh. It still would be awesome to read basics about this vhost thing as I
> am debugging blindly :)
> 
> Regarding your suggestions.
> 
> 1. I put "printk" in handle_rx and tun_put_user.

Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
look for function filtering.

> handle_rx stopped being called after 2:40 from the guest start,
> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
> If I bring the guest's eth0 up while handle_rx is still printing, it works,
> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
> bring eth0 back to live.

OK so what should happen is that handle rx is called
when you bring eth0 up.
Do you see this?
The way it is supposed to work is this:

vhost_net_enable_vq calls vhost_poll_start then
this calls mask = file->f_op->poll(file, &poll->table)
on the tun file.
this calls tun_chr_poll.
at this point there are packets queued on tun already
so that returns POLLIN | POLLRDNORM;
this calls vhost_poll_wakeup and that checks mask against
the key.
key is POLLIN so vhost_poll_queue is called.
this in turn calls vhost_work_queue
work list is either empty then we wake up worker
or it's not empty  then worker is running out job anyway.
this will then invoke handle_rx_net.


> 2. This is exactly how I run QEMU now. I basically set "off" for every
> on/off parameters. This did not change anything.
> 
> ./qemu-system-ppc64 \
> 	-enable-kvm \
> 	-m 2048 \
> 	-L qemu-ppc64-bios/ \
> 	-machine pseries \
> 	-trace events=qemu_trace_events \
> 	-kernel vml312 \
> 	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
> 	-nographic \
> 	-vga none \
> 	-nodefaults \
> 	-chardev stdio,id=id0,signal=off,mux=on \
> 	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
> 	-mon id=id2,chardev=id0,mode=readline \
> 	-netdev
> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> 	-device
> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
> command_serr_enable=off \
> 	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
> 	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
> 

Yes this looks like some kind of race.

> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 13:48                           ` Michael S. Tsirkin
@ 2013-12-26 14:59                             ` Alexey Kardashevskiy
  2013-12-26 15:12                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-26 14:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>>>>> happening there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>>>> sleep 210
>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>
>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>>>>
>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>>>>> vhost_work *work)
>>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>>>         } else {
>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>         }
>>>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>>>  }
>>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>>>>
>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>>>>> happens to cause races.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Since it's all around startup,
>>>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>>>> vhost_net_start.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>>>>> *ncs,
>>>>>>>>>>          if (r < 0) {
>>>>>>>>>>              goto err;
>>>>>>>>>>          }
>>>>>>>>>> +
>>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>>>> +            .index = i
>>>>>>>>>> +        };
>>>>>>>>>> +        file.fd =
>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>>>
>>>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>>>> To kick you write 1 there:
>>>>>>>>> 	uint6_t  v = 1;
>>>>>>>>> 	write(fd, &v, sizeof v);
>>>>>>>>
>>>>>>>>
>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> What
>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>>>
>>>>>>> Sorry, should have been uint64_t.
>>>>>>
>>>>>>
>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
>>>>>> it signals from the user space does not work...
>>>>>
>>>>> You can run a timer in qemu and signal the eventfd from there
>>>>> periodically.
>>>>>
>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>>>> but tcpdump in host on tun device does not show any packets?
>>>>
>>>>
>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>>>> something is happening on the host's TAP - the guest sends ARP request, the
>>>> response is visible on the TAP interface but not in the guest.
>>>
>>> Okay. So problem is on host to guest path then.
>>> Things to try:
>>>
>>> 1. trace handle_rx [vhost_net]
>>> 2. trace tun_put_user [tun]
>>> 3. I suspect some host bug in one of the features.
>>> Let's try to disable some flags with device property:
>>> you can get the list by doing:
>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
>>> Things I would try turning off is guest offloads (ones that start with guest_)
>>> event_idx,any_layout,mq.
>>> Turn them all off, if it helps try to find the one that helped.
>>
>>
>> Heh. It still would be awesome to read basics about this vhost thing as I
>> am debugging blindly :)
>>
>> Regarding your suggestions.
>>
>> 1. I put "printk" in handle_rx and tun_put_user.
> 
> Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
> look for function filtering.
> 
>> handle_rx stopped being called after 2:40 from the guest start,
>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
>> If I bring the guest's eth0 up while handle_rx is still printing, it works,
>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
>> bring eth0 back to live.
> 
> OK so what should happen is that handle rx is called
> when you bring eth0 up.
> Do you see this?
> The way it is supposed to work is this:
> 
> vhost_net_enable_vq calls vhost_poll_start then


This and what follows it is called when QEMU is just booting (in response
to PCI enable? somewhere in the middle of PCI discovery process) and then
VHOST_NET_SET_BACKEND is not called ever again.



> this calls mask = file->f_op->poll(file, &poll->table)
> on the tun file.
> this calls tun_chr_poll.
> at this point there are packets queued on tun already
> so that returns POLLIN | POLLRDNORM;
> this calls vhost_poll_wakeup and that checks mask against
> the key.
> key is POLLIN so vhost_poll_queue is called.
> this in turn calls vhost_work_queue
> work list is either empty then we wake up worker
> or it's not empty  then worker is running out job anyway.
> this will then invoke handle_rx_net.
> 
> 
>> 2. This is exactly how I run QEMU now. I basically set "off" for every
>> on/off parameters. This did not change anything.
>>
>> ./qemu-system-ppc64 \
>> 	-enable-kvm \
>> 	-m 2048 \
>> 	-L qemu-ppc64-bios/ \
>> 	-machine pseries \
>> 	-trace events=qemu_trace_events \
>> 	-kernel vml312 \
>> 	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>> 	-nographic \
>> 	-vga none \
>> 	-nodefaults \
>> 	-chardev stdio,id=id0,signal=off,mux=on \
>> 	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>> 	-mon id=id2,chardev=id0,mode=readline \
>> 	-netdev
>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>> 	-device
>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
>> command_serr_enable=off \
>> 	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
>> 	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
>>
> 
> Yes this looks like some kind of race.


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 14:59                             ` Alexey Kardashevskiy
@ 2013-12-26 15:12                               ` Michael S. Tsirkin
  2013-12-27  1:44                                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2013-12-26 15:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote:
> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
> > On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
> >>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
> >>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>>>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Steps to reproduce:
> >>>>>>>>>>>>>>> 1. boot the guest
> >>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The test is:
> >>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Yes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>>>>>>>> happening there.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>>>>>>>> sleep 210
> >>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>>>>>>>>>> thread which may get suspended if not used for a while after the start and
> >>>>>>>>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>>>>>>>> index 69068e0..5e67650 100644
> >>>>>>>>>>>> --- a/drivers/vhost/vhost.c
> >>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>>>>>>>>>> vhost_work *work)
> >>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>>>>>>>                 work->queue_seq++;
> >>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>>>> -               wake_up_process(dev->worker);
> >>>>>>>>>>>>         } else {
> >>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>>>>         }
> >>>>>>>>>>>> +       wake_up_process(dev->worker);
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>>>>>>>
> >>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>>>>>>>> happens to cause races.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Since it's all around startup,
> >>>>>>>>>>> you can try kicking the host eventfd in
> >>>>>>>>>>> vhost_net_start.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> How exactly? This did not help. Thanks.
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>>>>>>>> index 006576d..407ecf2 100644
> >>>>>>>>>> --- a/hw/net/vhost_net.c
> >>>>>>>>>> +++ b/hw/net/vhost_net.c
> >>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >>>>>>>>>> *ncs,
> >>>>>>>>>>          if (r < 0) {
> >>>>>>>>>>              goto err;
> >>>>>>>>>>          }
> >>>>>>>>>> +
> >>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>>>>>>>> +        struct vhost_vring_file file = {
> >>>>>>>>>> +            .index = i
> >>>>>>>>>> +        };
> >>>>>>>>>> +        file.fd =
> >>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>>>>>>>
> >>>>>>>>> No, this sets the notifier, it does not kick.
> >>>>>>>>> To kick you write 1 there:
> >>>>>>>>> 	uint6_t  v = 1;
> >>>>>>>>> 	write(fd, &v, sizeof v);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> >>>>>>>
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>>> What
> >>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
> >>>>>>>
> >>>>>>> Sorry, should have been uint64_t.
> >>>>>>
> >>>>>>
> >>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> >>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> >>>>>> it signals from the user space does not work...
> >>>>>
> >>>>> You can run a timer in qemu and signal the eventfd from there
> >>>>> periodically.
> >>>>>
> >>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
> >>>>> but tcpdump in host on tun device does not show any packets?
> >>>>
> >>>>
> >>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
> >>>> something is happening on the host's TAP - the guest sends ARP request, the
> >>>> response is visible on the TAP interface but not in the guest.
> >>>
> >>> Okay. So problem is on host to guest path then.
> >>> Things to try:
> >>>
> >>> 1. trace handle_rx [vhost_net]
> >>> 2. trace tun_put_user [tun]
> >>> 3. I suspect some host bug in one of the features.
> >>> Let's try to disable some flags with device property:
> >>> you can get the list by doing:
> >>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
> >>> Things I would try turning off is guest offloads (ones that start with guest_)
> >>> event_idx,any_layout,mq.
> >>> Turn them all off, if it helps try to find the one that helped.
> >>
> >>
> >> Heh. It still would be awesome to read basics about this vhost thing as I
> >> am debugging blindly :)
> >>
> >> Regarding your suggestions.
> >>
> >> 1. I put "printk" in handle_rx and tun_put_user.
> > 
> > Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
> > look for function filtering.
> > 
> >> handle_rx stopped being called after 2:40 from the guest start,
> >> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
> >> If I bring the guest's eth0 up while handle_rx is still printing, it works,
> >> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
> >> bring eth0 back to live.
> > 
> > OK so what should happen is that handle rx is called
> > when you bring eth0 up.
> > Do you see this?
> > The way it is supposed to work is this:
> > 
> > vhost_net_enable_vq calls vhost_poll_start then
> 
> 
> This and what follows it is called when QEMU is just booting (in response
> to PCI enable? somewhere in the middle of PCI discovery process) and then
> VHOST_NET_SET_BACKEND is not called ever again.
> 

What should happen is up/down in guest
will call virtio_net_vhost_status in qemu
and then vhost_net_start/vhost_net_stop is called
accordingly.
These call VHOST_NET_SET_BACKEND ioctls

you don't see this?

> 
> > this calls mask = file->f_op->poll(file, &poll->table)
> > on the tun file.
> > this calls tun_chr_poll.
> > at this point there are packets queued on tun already
> > so that returns POLLIN | POLLRDNORM;
> > this calls vhost_poll_wakeup and that checks mask against
> > the key.
> > key is POLLIN so vhost_poll_queue is called.
> > this in turn calls vhost_work_queue
> > work list is either empty then we wake up worker
> > or it's not empty  then worker is running out job anyway.
> > this will then invoke handle_rx_net.
> > 
> > 
> >> 2. This is exactly how I run QEMU now. I basically set "off" for every
> >> on/off parameters. This did not change anything.
> >>
> >> ./qemu-system-ppc64 \
> >> 	-enable-kvm \
> >> 	-m 2048 \
> >> 	-L qemu-ppc64-bios/ \
> >> 	-machine pseries \
> >> 	-trace events=qemu_trace_events \
> >> 	-kernel vml312 \
> >> 	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
> >> 	-nographic \
> >> 	-vga none \
> >> 	-nodefaults \
> >> 	-chardev stdio,id=id0,signal=off,mux=on \
> >> 	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
> >> 	-mon id=id2,chardev=id0,mode=readline \
> >> 	-netdev
> >> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >> 	-device
> >> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
> >> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
> >> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
> >> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
> >> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
> >> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
> >> command_serr_enable=off \
> >> 	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
> >> 	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
> >>
> > 
> > Yes this looks like some kind of race.
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-26 15:12                               ` Michael S. Tsirkin
@ 2013-12-27  1:44                                 ` Alexey Kardashevskiy
  2014-01-06  9:57                                   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-27  1:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/27/2013 02:12 AM, Michael S. Tsirkin wrote:
> On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote:
>> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>>>>>>> happening there.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>>>>>> sleep 210
>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>>>>>>> vhost_work *work)
>>>>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>>>>>         } else {
>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>>>>>>
>>>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>>>>>>> happens to cause races.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Since it's all around startup,
>>>>>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>>>>>> vhost_net_start.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>>>>>>> *ncs,
>>>>>>>>>>>>          if (r < 0) {
>>>>>>>>>>>>              goto err;
>>>>>>>>>>>>          }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>>>>>> +            .index = i
>>>>>>>>>>>> +        };
>>>>>>>>>>>> +        file.fd =
>>>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>>>>>
>>>>>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>>>>>> To kick you write 1 there:
>>>>>>>>>>> 	uint6_t  v = 1;
>>>>>>>>>>> 	write(fd, &v, sizeof v);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> What
>>>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>>>>>
>>>>>>>>> Sorry, should have been uint64_t.
>>>>>>>>
>>>>>>>>
>>>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
>>>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
>>>>>>>> it signals from the user space does not work...
>>>>>>>
>>>>>>> You can run a timer in qemu and signal the eventfd from there
>>>>>>> periodically.
>>>>>>>
>>>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>>>>>> but tcpdump in host on tun device does not show any packets?
>>>>>>
>>>>>>
>>>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>>>>>> something is happening on the host's TAP - the guest sends ARP request, the
>>>>>> response is visible on the TAP interface but not in the guest.
>>>>>
>>>>> Okay. So problem is on host to guest path then.
>>>>> Things to try:
>>>>>
>>>>> 1. trace handle_rx [vhost_net]
>>>>> 2. trace tun_put_user [tun]
>>>>> 3. I suspect some host bug in one of the features.
>>>>> Let's try to disable some flags with device property:
>>>>> you can get the list by doing:
>>>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
>>>>> Things I would try turning off is guest offloads (ones that start with guest_)
>>>>> event_idx,any_layout,mq.
>>>>> Turn them all off, if it helps try to find the one that helped.
>>>>
>>>>
>>>> Heh. It still would be awesome to read basics about this vhost thing as I
>>>> am debugging blindly :)
>>>>
>>>> Regarding your suggestions.
>>>>
>>>> 1. I put "printk" in handle_rx and tun_put_user.
>>>
>>> Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
>>> look for function filtering.
>>>
>>>> handle_rx stopped being called after 2:40 from the guest start,
>>>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
>>>> If I bring the guest's eth0 up while handle_rx is still printing, it works,
>>>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
>>>> bring eth0 back to live.
>>>
>>> OK so what should happen is that handle rx is called
>>> when you bring eth0 up.
>>> Do you see this?
>>> The way it is supposed to work is this:
>>>
>>> vhost_net_enable_vq calls vhost_poll_start then
>>
>>
>> This and what follows it is called when QEMU is just booting (in response
>> to PCI enable? somewhere in the middle of PCI discovery process) and then
>> VHOST_NET_SET_BACKEND is not called ever again.
>>
> 
> What should happen is up/down in guest
> will call virtio_net_vhost_status in qemu
> and then vhost_net_start/vhost_net_stop is called
> accordingly.
> These call VHOST_NET_SET_BACKEND ioctls
> 
> you don't see this?


Nope. What I see is that vhost_net_start is only called on
VIRTIO_PCI_STATUS and never after that as PCI status does not change (does
not it?).

The log of QEMU + gdb with some breakpoints:
http://pastebin.com/CSN6iSn6

In this example, I did not wait ~240 seconds so it works but still does not
print what you say it should print :)

Here is what I run:
http://ozlabs.ru/gitweb/?p=qemu/.git;a=shortlog;h=refs/heads/vhostdbg

Thanks!

[ time to go to the ocean :) ]


>>
>>> this calls mask = file->f_op->poll(file, &poll->table)
>>> on the tun file.
>>> this calls tun_chr_poll.
>>> at this point there are packets queued on tun already
>>> so that returns POLLIN | POLLRDNORM;
>>> this calls vhost_poll_wakeup and that checks mask against
>>> the key.
>>> key is POLLIN so vhost_poll_queue is called.
>>> this in turn calls vhost_work_queue
>>> work list is either empty then we wake up worker
>>> or it's not empty  then worker is running out job anyway.
>>> this will then invoke handle_rx_net.
>>>
>>>
>>>> 2. This is exactly how I run QEMU now. I basically set "off" for every
>>>> on/off parameters. This did not change anything.
>>>>
>>>> ./qemu-system-ppc64 \
>>>> 	-enable-kvm \
>>>> 	-m 2048 \
>>>> 	-L qemu-ppc64-bios/ \
>>>> 	-machine pseries \
>>>> 	-trace events=qemu_trace_events \
>>>> 	-kernel vml312 \
>>>> 	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>>>> 	-nographic \
>>>> 	-vga none \
>>>> 	-nodefaults \
>>>> 	-chardev stdio,id=id0,signal=off,mux=on \
>>>> 	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>>>> 	-mon id=id2,chardev=id0,mode=readline \
>>>> 	-netdev
>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>> 	-device
>>>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
>>>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
>>>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
>>>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
>>>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
>>>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
>>>> command_serr_enable=off \
>>>> 	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
>>>> 	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
>>>>
>>>
>>> Yes this looks like some kind of race.
>>
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-27  1:44                                 ` Alexey Kardashevskiy
@ 2014-01-06  9:57                                   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-06  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/27/2013 12:44 PM, Alexey Kardashevskiy wrote:
> On 12/27/2013 02:12 AM, Michael S. Tsirkin wrote:
>> On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote:
>>> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
>>>> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
>>>>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
>>>>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>>>>>>>> happening there.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>>>>>>> sleep 210
>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>>>>>>>> vhost_work *work)
>>>>>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>>>>>>         } else {
>>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>>>>>>>> happens to cause races.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since it's all around startup,
>>>>>>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>>>>>>> vhost_net_start.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>>>>>>>> *ncs,
>>>>>>>>>>>>>          if (r < 0) {
>>>>>>>>>>>>>              goto err;
>>>>>>>>>>>>>          }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>>>>>>> +            .index = i
>>>>>>>>>>>>> +        };
>>>>>>>>>>>>> +        file.fd =
>>>>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>>>>>>
>>>>>>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>>>>>>> To kick you write 1 there:
>>>>>>>>>>>> 	uint6_t  v = 1;
>>>>>>>>>>>> 	write(fd, &v, sizeof v);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> What
>>>>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>>>>>>
>>>>>>>>>> Sorry, should have been uint64_t.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
>>>>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
>>>>>>>>> it signals from the user space does not work...
>>>>>>>>
>>>>>>>> You can run a timer in qemu and signal the eventfd from there
>>>>>>>> periodically.
>>>>>>>>
>>>>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>>>>>>> but tcpdump in host on tun device does not show any packets?
>>>>>>>
>>>>>>>
>>>>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>>>>>>> something is happening on the host's TAP - the guest sends ARP request, the
>>>>>>> response is visible on the TAP interface but not in the guest.
>>>>>>
>>>>>> Okay. So problem is on host to guest path then.
>>>>>> Things to try:
>>>>>>
>>>>>> 1. trace handle_rx [vhost_net]
>>>>>> 2. trace tun_put_user [tun]
>>>>>> 3. I suspect some host bug in one of the features.
>>>>>> Let's try to disable some flags with device property:
>>>>>> you can get the list by doing:
>>>>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
>>>>>> Things I would try turning off is guest offloads (ones that start with guest_)
>>>>>> event_idx,any_layout,mq.
>>>>>> Turn them all off, if it helps try to find the one that helped.
>>>>>
>>>>>
>>>>> Heh. It still would be awesome to read basics about this vhost thing as I
>>>>> am debugging blindly :)
>>>>>
>>>>> Regarding your suggestions.
>>>>>
>>>>> 1. I put "printk" in handle_rx and tun_put_user.
>>>>
>>>> Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
>>>> look for function filtering.
>>>>
>>>>> handle_rx stopped being called after 2:40 from the guest start,
>>>>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
>>>>> If I bring the guest's eth0 up while handle_rx is still printing, it works,
>>>>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
>>>>> bring eth0 back to live.
>>>>
>>>> OK so what should happen is that handle rx is called
>>>> when you bring eth0 up.
>>>> Do you see this?
>>>> The way it is supposed to work is this:
>>>>
>>>> vhost_net_enable_vq calls vhost_poll_start then
>>>
>>>
>>> This and what follows it is called when QEMU is just booting (in response
>>> to PCI enable? somewhere in the middle of PCI discovery process) and then
>>> VHOST_NET_SET_BACKEND is not called ever again.
>>>
>>
>> What should happen is up/down in guest
>> will call virtio_net_vhost_status in qemu
>> and then vhost_net_start/vhost_net_stop is called
>> accordingly.
>> These call VHOST_NET_SET_BACKEND ioctls
>>
>> you don't see this?
> 
> 
> Nope. What I see is that vhost_net_start is only called on
> VIRTIO_PCI_STATUS and never after that as PCI status does not change (does
> not it?).
> 
> The log of QEMU + gdb with some breakpoints:
> http://pastebin.com/CSN6iSn6
> 
> In this example, I did not wait ~240 seconds so it works but still does not
> print what you say it should print :)
> 
> Here is what I run:
> http://ozlabs.ru/gitweb/?p=qemu/.git;a=shortlog;h=refs/heads/vhostdbg
> 
> Thanks!
> 
> [ time to go to the ocean :) ]


I am back. Are you? :)

Looked a bit further. In the guest's virtnet_set_rx_mode()
(drivers/net/virtio_net.c) I added this:

===
struct scatterlist sg;
struct virtio_net_ctrl_mq s;

s.virtqueue_pairs = 1;
sg_init_one(&sg, &s, sizeof(s));
virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
          VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg, NULL);
===

.. in a desperate hope that it will signal to QEMU to stop vhost in
virtio_net_vhost_status(). But it does not call vhost_net_stop() as the
link is up - it is always up since virtnet_probe() and it never goes down
so I guess this is by design.


So... vhost-net thread in the host goes to sleep and there is no way to
wake it up from the guest as "ifconfig eth0 down ; ifconfig eth0 up" does
not change neither link status or @VirtIODevice::status.


What would be the right thing to do now? Implement link state management?
Or invent "virtio link" and leave QEMU's nc->peer->link_down alone?

Or there is some way to tell the kernel thread not to sleep?

Thanks!



> 
> 
>>>
>>>> this calls mask = file->f_op->poll(file, &poll->table)
>>>> on the tun file.
>>>> this calls tun_chr_poll.
>>>> at this point there are packets queued on tun already
>>>> so that returns POLLIN | POLLRDNORM;
>>>> this calls vhost_poll_wakeup and that checks mask against
>>>> the key.
>>>> key is POLLIN so vhost_poll_queue is called.
>>>> this in turn calls vhost_work_queue
>>>> work list is either empty then we wake up worker
>>>> or it's not empty  then worker is running out job anyway.
>>>> this will then invoke handle_rx_net.
>>>>
>>>>
>>>>> 2. This is exactly how I run QEMU now. I basically set "off" for every
>>>>> on/off parameters. This did not change anything.
>>>>>
>>>>> ./qemu-system-ppc64 \
>>>>> 	-enable-kvm \
>>>>> 	-m 2048 \
>>>>> 	-L qemu-ppc64-bios/ \
>>>>> 	-machine pseries \
>>>>> 	-trace events=qemu_trace_events \
>>>>> 	-kernel vml312 \
>>>>> 	-append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>>>>> 	-nographic \
>>>>> 	-vga none \
>>>>> 	-nodefaults \
>>>>> 	-chardev stdio,id=id0,signal=off,mux=on \
>>>>> 	-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>>>>> 	-mon id=id2,chardev=id0,mode=readline \
>>>>> 	-netdev
>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>> 	-device
>>>>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
>>>>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
>>>>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
>>>>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
>>>>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
>>>>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
>>>>> command_serr_enable=off \
>>>>> 	-netdev user,id=id5,hostfwd=tcp::5000-:22 \
>>>>> 	-device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
>>>>>
>>>>
>>>> Yes this looks like some kind of race.
>>>
>>>
>>> -- 
>>> Alexey
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2013-12-24 15:43               ` Michael S. Tsirkin
  2013-12-25  1:36                 ` Alexey Kardashevskiy
@ 2014-01-07 13:18                 ` Alexey Kardashevskiy
  2014-01-10  5:13                   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-07 13:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>
>>>>>>>>> Steps to reproduce:
>>>>>>>>> 1. boot the guest
>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>
>>>>>>>>> The test is:
>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>> ping 172.20.1.23
>>>>>>>>>
>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>
>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>
>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>
>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>> happening there.
>>>>>>>
>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>
>>>>>>>
>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>> sleep 210
>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>> ping 172.20.1.23
>>>>>>>
>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>
>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>
>>>>>>
>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 69068e0..5e67650 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>> vhost_work *work)
>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>                 work->queue_seq++;
>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>> -               wake_up_process(dev->worker);
>>>>>>         } else {
>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>         }
>>>>>> +       wake_up_process(dev->worker);
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>
>>>>>>
>>>>>
>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>
>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>> happens to cause races.
>>>>
>>>>
>>>>> Since it's all around startup,
>>>>> you can try kicking the host eventfd in
>>>>> vhost_net_start.
>>>>
>>>>
>>>> How exactly? This did not help. Thanks.
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 006576d..407ecf2 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>> *ncs,
>>>>          if (r < 0) {
>>>>              goto err;
>>>>          }
>>>> +
>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>> +        struct vhost_vring_file file = {
>>>> +            .index = i
>>>> +        };
>>>> +        file.fd =
>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>
>>> No, this sets the notifier, it does not kick.
>>> To kick you write 1 there:
>>> 	uint6_t  v = 1;
>>> 	write(fd, &v, sizeof v);
>>
>>
>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> 
> Yes.

Turns out that no. The control device in the host kernel does not implement
write() so it always fails.

This works:

uint64_t v = 1;
int fd = event_notifier_get_fd(&vq->host_notifier);
int r = write(fd, &v, sizeof v);

By "works" I mean it helps to wake the whole thing up and the guest's eth0
starts working after 3 minutes delay.



>> What
>> is uint6_t - uint8_t or uint16_t (neither works)?
> 
> Sorry, should have been uint64_t.
> 
>> May be it is a missing barrier - I rebooted machine several times and now
>> sometime after even 240 seconds (not 210 as before) it works (but most of
>> the time still does not)...
>>
>>
>>>> +        if (r) {
>>>> +            error_report("Error notifiyng host notifier: %d", -r);
>>>> +            goto err;
>>>> +        }
>>>>      }
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And there 2 questions:
>>>>>>>>>
>>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>>>>>
>>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>>>>>> accelerates?
>>>>>>>>>
>>>>>>>>> My understanding is that packets from the guest to the real network are
>>>>>>>>> going as:
>>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>>>>>> that there is a new packet.
>>>>>>>
>>>>>>>
>>>>>>> What about the documentation? :) or the idea?
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This how I run QEMU:
>>>>>>>>> ./qemu-system-ppc64 \
>>>>>>>>> -enable-kvm \
>>>>>>>>> -m 2048 \
>>>>>>>>> -machine pseries \
>>>>>>>>> -initrd 1.cpio \
>>>>>>>>> -kernel vml312_virtio_net_dbg \
>>>>>>>>> -nographic \
>>>>>>>>> -vga none \
>>>>>>>>> -netdev
>>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is bridge config:
>>>>>>>>> [aik@dyn232 ~]$ brctl show
>>>>>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The ifup.sh script:
>>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>>>>>> /sbin/ifconfig $1 up
>>>>>>>>> /usr/sbin/brctl addif brtest $1
>>
>>
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2014-01-07 13:18                 ` Alexey Kardashevskiy
@ 2014-01-10  5:13                   ` Alexey Kardashevskiy
  2014-01-10 12:41                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 25+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-10  5:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 01/08/2014 12:18 AM, Alexey Kardashevskiy wrote:
> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>
>>>>>>>>>> Steps to reproduce:
>>>>>>>>>> 1. boot the guest
>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>
>>>>>>>>>> The test is:
>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>
>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>
>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>
>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>
>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>> happening there.
>>>>>>>>
>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>
>>>>>>>>
>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>> sleep 210
>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>> ping 172.20.1.23
>>>>>>>>
>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>
>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>
>>>>>>>
>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>
>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>> index 69068e0..5e67650 100644
>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>> vhost_work *work)
>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>                 work->queue_seq++;
>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>         } else {
>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>         }
>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>
>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>> happens to cause races.
>>>>>
>>>>>
>>>>>> Since it's all around startup,
>>>>>> you can try kicking the host eventfd in
>>>>>> vhost_net_start.
>>>>>
>>>>>
>>>>> How exactly? This did not help. Thanks.
>>>>>
>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>> index 006576d..407ecf2 100644
>>>>> --- a/hw/net/vhost_net.c
>>>>> +++ b/hw/net/vhost_net.c
>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>> *ncs,
>>>>>          if (r < 0) {
>>>>>              goto err;
>>>>>          }
>>>>> +
>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>> +        struct vhost_vring_file file = {
>>>>> +            .index = i
>>>>> +        };
>>>>> +        file.fd =
>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>
>>>> No, this sets the notifier, it does not kick.
>>>> To kick you write 1 there:
>>>> 	uint6_t  v = 1;
>>>> 	write(fd, &v, sizeof v);
>>>
>>>
>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>
>> Yes.
> 
> Turns out that no. The control device in the host kernel does not implement
> write() so it always fails.
> 
> This works:
> 
> uint64_t v = 1;
> int fd = event_notifier_get_fd(&vq->host_notifier);
> int r = write(fd, &v, sizeof v);
> 
> By "works" I mean it helps to wake the whole thing up and the guest's eth0
> starts working after 3 minutes delay.



Checked if virtnet_napi_enable() is called as expected and it is. As I can
see "Receiving skb proto" in the guest's receive_buf(), I believe
host->guest channel works just fine but the guest is unable to send
anything until QEMU writes to event notifier (the code above).

I actually spotted the problem in the host kernel - KVM_IOEVENTFD is called
with a PCI bus address but kvm_io_bus_write() is called with a guest
physical address and these things are different on PPC64/spapr.

I am trying to make a patch for this and post it to some list tonight, I'll
put you in copy.



>>> What
>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>
>> Sorry, should have been uint64_t.
>>
>>> May be it is a missing barrier - I rebooted machine several times and now
>>> sometime after even 240 seconds (not 210 as before) it works (but most of
>>> the time still does not)...
>>>
>>>
>>>>> +        if (r) {
>>>>> +            error_report("Error notifiyng host notifier: %d", -r);
>>>>> +            goto err;
>>>>> +        }
>>>>>      }
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
>>>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And there 2 questions:
>>>>>>>>>>
>>>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
>>>>>>>>>>
>>>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
>>>>>>>>>> accelerates?
>>>>>>>>>>
>>>>>>>>>> My understanding is that packets from the guest to the real network are
>>>>>>>>>> going as:
>>>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
>>>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
>>>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
>>>>>>>>>> that there is a new packet.
>>>>>>>>
>>>>>>>>
>>>>>>>> What about the documentation? :) or the idea?
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This how I run QEMU:
>>>>>>>>>> ./qemu-system-ppc64 \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -m 2048 \
>>>>>>>>>> -machine pseries \
>>>>>>>>>> -initrd 1.cpio \
>>>>>>>>>> -kernel vml312_virtio_net_dbg \
>>>>>>>>>> -nographic \
>>>>>>>>>> -vga none \
>>>>>>>>>> -netdev
>>>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That is bridge config:
>>>>>>>>>> [aik@dyn232 ~]$ brctl show
>>>>>>>>>> bridge name	bridge id		STP enabled	interfaces
>>>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The ifup.sh script:
>>>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
>>>>>>>>>> /sbin/ifconfig $1 up
>>>>>>>>>> /usr/sbin/brctl addif brtest $1
>>>
>>>
>>>
>>> -- 
>>> Alexey
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2014-01-10  5:13                   ` Alexey Kardashevskiy
@ 2014-01-10 12:41                     ` Michael S. Tsirkin
  2014-01-10 13:44                       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2014-01-10 12:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Fri, Jan 10, 2014 at 04:13:34PM +1100, Alexey Kardashevskiy wrote:
> On 01/08/2014 12:18 AM, Alexey Kardashevskiy wrote:
> > On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> >> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>>> Hi!
> >>>>>>>>>>
> >>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
> >>>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>>
> >>>>>>>>>> Steps to reproduce:
> >>>>>>>>>> 1. boot the guest
> >>>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
> >>>>>>>>>>
> >>>>>>>>>> The test is:
> >>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>
> >>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
> >>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
> >>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
> >>>>>>>>>> response and it does the same after reboot but the answer does not come.
> >>>>>>>>>
> >>>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>>
> >>>>>>>> Yes.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>>
> >>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>>> happening there.
> >>>>>>>>
> >>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
> >>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
> >>>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>>> sleep 210
> >>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>> ping 172.20.1.23
> >>>>>>>>
> >>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
> >>>>>>>>
> >>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
> >>>>>>>> thread which may get suspended if not used for a while after the start and
> >>>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>>
> >>>>>>>
> >>>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>>
> >>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>>> index 69068e0..5e67650 100644
> >>>>>>> --- a/drivers/vhost/vhost.c
> >>>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
> >>>>>>> vhost_work *work)
> >>>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>>                 work->queue_seq++;
> >>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>> -               wake_up_process(dev->worker);
> >>>>>>>         } else {
> >>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>         }
> >>>>>>> +       wake_up_process(dev->worker);
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>>
> >>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>>> happens to cause races.
> >>>>>
> >>>>>
> >>>>>> Since it's all around startup,
> >>>>>> you can try kicking the host eventfd in
> >>>>>> vhost_net_start.
> >>>>>
> >>>>>
> >>>>> How exactly? This did not help. Thanks.
> >>>>>
> >>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>>> index 006576d..407ecf2 100644
> >>>>> --- a/hw/net/vhost_net.c
> >>>>> +++ b/hw/net/vhost_net.c
> >>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
> >>>>> *ncs,
> >>>>>          if (r < 0) {
> >>>>>              goto err;
> >>>>>          }
> >>>>> +
> >>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>>> +        struct vhost_vring_file file = {
> >>>>> +            .index = i
> >>>>> +        };
> >>>>> +        file.fd =
> >>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>>
> >>>> No, this sets the notifier, it does not kick.
> >>>> To kick you write 1 there:
> >>>> 	uint6_t  v = 1;
> >>>> 	write(fd, &v, sizeof v);
> >>>
> >>>
> >>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> >>
> >> Yes.
> > 
> > Turns out that no. The control device in the host kernel does not implement
> > write() so it always fails.
> > 
> > This works:
> > 
> > uint64_t v = 1;
> > int fd = event_notifier_get_fd(&vq->host_notifier);
> > int r = write(fd, &v, sizeof v);
> > 
> > By "works" I mean it helps to wake the whole thing up and the guest's eth0
> > starts working after 3 minutes delay.
> 
> 
> 
> Checked if virtnet_napi_enable() is called as expected and it is. As I can
> see "Receiving skb proto" in the guest's receive_buf(), I believe
> host->guest channel works just fine but the guest is unable to send
> anything until QEMU writes to event notifier (the code above).
> 
> I actually spotted the problem in the host kernel - KVM_IOEVENTFD is called
> with a PCI bus address but kvm_io_bus_write() is called with a guest
> physical address and these things are different on PPC64/spapr.
> 
> I am trying to make a patch for this and post it to some list tonight, I'll
> put you in copy.
> 

Can we fix this in qemu?

We do:
        memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
                                  true, n, notifier);

I think as a result, KVM_IOEVENTFD should be called with guest physical address.


> 
> >>> What
> >>> is uint6_t - uint8_t or uint16_t (neither works)?
> >>
> >> Sorry, should have been uint64_t.
> >>
> >>> May be it is a missing barrier - I rebooted machine several times and now
> >>> sometime after even 240 seconds (not 210 as before) it works (but most of
> >>> the time still does not)...
> >>>
> >>>
> >>>>> +        if (r) {
> >>>>> +            error_report("Error notifiyng host notifier: %d", -r);
> >>>>> +            goto err;
> >>>>> +        }
> >>>>>      }
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
> >>>>>>>>>> (v3.10-something), it all good again - works before and after reboot.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> And there 2 questions:
> >>>>>>>>>>
> >>>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
> >>>>>>>>>>
> >>>>>>>>>> 2. Is there any good material to read about what exactly and how vhost
> >>>>>>>>>> accelerates?
> >>>>>>>>>>
> >>>>>>>>>> My understanding is that packets from the guest to the real network are
> >>>>>>>>>> going as:
> >>>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the host knows
> >>>>>>>>>> that there is a new packet.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> What about the documentation? :) or the idea?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> This how I run QEMU:
> >>>>>>>>>> ./qemu-system-ppc64 \
> >>>>>>>>>> -enable-kvm \
> >>>>>>>>>> -m 2048 \
> >>>>>>>>>> -machine pseries \
> >>>>>>>>>> -initrd 1.cpio \
> >>>>>>>>>> -kernel vml312_virtio_net_dbg \
> >>>>>>>>>> -nographic \
> >>>>>>>>>> -vga none \
> >>>>>>>>>> -netdev
> >>>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
> >>>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> That is bridge config:
> >>>>>>>>>> [aik@dyn232 ~]$ brctl show
> >>>>>>>>>> bridge name	bridge id		STP enabled	interfaces
> >>>>>>>>>> brtest		8000.00145e992e88	no	pin	eth4
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The ifup.sh script:
> >>>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>>>>>>>>> /sbin/ifconfig $1 up
> >>>>>>>>>> /usr/sbin/brctl addif brtest $1
> >>>
> >>>
> >>>
> >>> -- 
> >>> Alexey
> > 
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
  2014-01-10 12:41                     ` Michael S. Tsirkin
@ 2014-01-10 13:44                       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 25+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-10 13:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 01/10/2014 11:41 PM, Michael S. Tsirkin wrote:
> On Fri, Jan 10, 2014 at 04:13:34PM +1100, Alexey Kardashevskiy wrote:
>> On 01/08/2014 12:18 AM, Alexey Kardashevskiy wrote:
>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>
>>>>>>>>>>>> The test is:
>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>
>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>
>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>
>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>> happening there.
>>>>>>>>>>
>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>> sleep 210
>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>
>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>
>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>> vhost_work *work)
>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>         } else {
>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>         }
>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>  }
>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>
>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>> happens to cause races.
>>>>>>>
>>>>>>>
>>>>>>>> Since it's all around startup,
>>>>>>>> you can try kicking the host eventfd in
>>>>>>>> vhost_net_start.
>>>>>>>
>>>>>>>
>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>
>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>> index 006576d..407ecf2 100644
>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>> *ncs,
>>>>>>>          if (r < 0) {
>>>>>>>              goto err;
>>>>>>>          }
>>>>>>> +
>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>> +        struct vhost_vring_file file = {
>>>>>>> +            .index = i
>>>>>>> +        };
>>>>>>> +        file.fd =
>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>
>>>>>> No, this sets the notifier, it does not kick.
>>>>>> To kick you write 1 there:
>>>>>> 	uint6_t  v = 1;
>>>>>> 	write(fd, &v, sizeof v);
>>>>>
>>>>>
>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>
>>>> Yes.
>>>
>>> Turns out that no. The control device in the host kernel does not implement
>>> write() so it always fails.
>>>
>>> This works:
>>>
>>> uint64_t v = 1;
>>> int fd = event_notifier_get_fd(&vq->host_notifier);
>>> int r = write(fd, &v, sizeof v);
>>>
>>> By "works" I mean it helps to wake the whole thing up and the guest's eth0
>>> starts working after 3 minutes delay.
>>
>>
>>
>> Checked if virtnet_napi_enable() is called as expected and it is. As I can
>> see "Receiving skb proto" in the guest's receive_buf(), I believe
>> host->guest channel works just fine but the guest is unable to send
>> anything until QEMU writes to event notifier (the code above).
>>
>> I actually spotted the problem in the host kernel - KVM_IOEVENTFD is called
>> with a PCI bus address but kvm_io_bus_write() is called with a guest
>> physical address and these things are different on PPC64/spapr.
>>
>> I am trying to make a patch for this and post it to some list tonight, I'll
>> put you in copy.
>>
> 
> Can we fix this in qemu?
> 
> We do:
>         memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
>                                   true, n, notifier);
> 
> I think as a result, KVM_IOEVENTFD should be called with guest physical address.


I fixed this in "[PATCH] KVM: fix addr type for KVM_IOEVENTFD", you are in
cc. Heh. I suspected something ppc64 specific as the problem does not
appear on x86, and I posted another patch for PPC64 HV KVM, but the QEMU's
bug is still nice :)




-- 
Alexey

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-01-10 13:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-21 15:01 [Qemu-devel] vhost-net issue: does not survive reboot on ppc64 Alexey Kardashevskiy
2013-12-22 10:56 ` Michael S. Tsirkin
2013-12-22 14:46   ` Alexey Kardashevskiy
2013-12-22 15:01     ` Alexey Kardashevskiy
2013-12-23 16:24       ` Michael S. Tsirkin
2013-12-24  3:09         ` Alexey Kardashevskiy
2013-12-24  9:40           ` Michael S. Tsirkin
2013-12-24 14:15             ` Alexey Kardashevskiy
2013-12-24 15:43               ` Michael S. Tsirkin
2013-12-25  1:36                 ` Alexey Kardashevskiy
2013-12-25  9:52                   ` Michael S. Tsirkin
2013-12-26 10:13                     ` Alexey Kardashevskiy
2013-12-26 10:49                       ` Michael S. Tsirkin
2013-12-26 12:51                         ` Alexey Kardashevskiy
2013-12-26 13:48                           ` Michael S. Tsirkin
2013-12-26 14:59                             ` Alexey Kardashevskiy
2013-12-26 15:12                               ` Michael S. Tsirkin
2013-12-27  1:44                                 ` Alexey Kardashevskiy
2014-01-06  9:57                                   ` Alexey Kardashevskiy
2014-01-07 13:18                 ` Alexey Kardashevskiy
2014-01-10  5:13                   ` Alexey Kardashevskiy
2014-01-10 12:41                     ` Michael S. Tsirkin
2014-01-10 13:44                       ` Alexey Kardashevskiy
2013-12-22 11:41 ` Zhi Yong Wu
2013-12-22 14:48   ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.