All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
@ 2014-08-19  4:56 Jason Wang
  2014-08-20  9:02 ` William Dauchy
  2014-08-20  9:23 ` Zhangjie (HZ)
  0 siblings, 2 replies; 16+ messages in thread
From: Jason Wang @ 2014-08-19  4:56 UTC (permalink / raw)
  To: qemu-devel, mst, zhangjie14; +Cc: Jason Wang, William Dauchy

commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
support changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.

To adapt this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is used to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.

Reported-by: "Zhangjie (HZ)" <zhangjie14@huawei.com>
Cc: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

--
Changes from Michael's patch:
- Remove the assertion
Changes from V1:
- Rebase to latest
Changes from V2:
- Introduce vhost_net_set_vq_index() to unbreak multiqueue

Zhang Jie, please test this patch to see if it fixes the issue.
---
 hw/net/vhost_net.c | 31 +++++++++++++++++++------------
 hw/virtio/vhost.c  |  2 --
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index f87c798..fe0203d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -188,9 +188,13 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
     return vhost_dev_query(&net->dev, dev);
 }
 
+static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
+{
+    net->dev.vq_index = vq_index;
+}
+
 static int vhost_net_start_one(struct vhost_net *net,
-                               VirtIODevice *dev,
-                               int vq_index)
+			VirtIODevice *dev)
 {
     struct vhost_vring_file file = { };
     int r;
@@ -201,7 +205,6 @@ static int vhost_net_start_one(struct vhost_net *net,
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
-    net->dev.vq_index = vq_index;
 
     r = vhost_dev_enable_notifiers(&net->dev, dev);
     if (r < 0) {
@@ -309,11 +312,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < total_queues; i++) {
-        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
-
-        if (r < 0) {
-            goto err;
-        }
+        vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
@@ -322,6 +321,14 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         goto err;
     }
 
+    for (i = 0; i < total_queues; i++) {
+        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
+
+        if (r < 0) {
+            goto err;
+        }
+    }
+
     return 0;
 
 err:
@@ -339,16 +346,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
     int i, r;
 
+    for (i = 0; i < total_queues; i++) {
+        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+    }
+
     r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
     if (r < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
         fflush(stderr);
     }
     assert(r >= 0);
-
-    for (i = 0; i < total_queues; i++) {
-        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
-    }
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e55fe1c..5d7c40a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -976,7 +976,6 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
 bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
 {
     struct vhost_virtqueue *vq = hdev->vqs + n - hdev->vq_index;
-    assert(hdev->started);
     assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
     return event_notifier_test_and_clear(&vq->masked_notifier);
 }
@@ -988,7 +987,6 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
     struct VirtQueue *vvq = virtio_get_queue(vdev, n);
     int r, index = n - hdev->vq_index;
 
-    assert(hdev->started);
     assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 
     struct vhost_vring_file file = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-19  4:56 [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly Jason Wang
@ 2014-08-20  9:02 ` William Dauchy
  2014-08-20  9:23 ` Zhangjie (HZ)
  1 sibling, 0 replies; 16+ messages in thread
From: William Dauchy @ 2014-08-20  9:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: zhangjie14, qemu-devel, William Dauchy, mst

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

On Aug19 12:56, Jason Wang wrote:
> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
> support changed the order of stopping the device. Previously
> vhost_dev_stop would disable backend and only afterwards, unset guest
> notifiers. We now unset guest notifiers while vhost is still
> active. This can lose interrupts causing guest networking to fail. In
> particular, this has been observed during migration.
> 
> To adapt this, several other changes are needed:
> - remove the hdev->started assertion in vhost.c since we may want to
> start the guest notifiers before vhost starts and stop the guest
> notifiers after vhost is stopped.
> - introduce the vhost_net_set_vq_index() and call it before setting
> guest notifiers. This is used to guarantee vhost_net has the correct
> virtqueue index when setting guest notifiers.
> 
> Reported-by: "Zhangjie (HZ)" <zhangjie14@huawei.com>
> Cc: William Dauchy <wdauchy@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> --
> Changes from Michael's patch:
> - Remove the assertion
> Changes from V1:
> - Rebase to latest
> Changes from V2:
> - Introduce vhost_net_set_vq_index() to unbreak multiqueue

indeed I had with v2:
qemu-system-x86_64: hw/virtio/vhost.c:990: vhost_virtqueue_mask: Assertion `n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs' failed.

v3 seems ok to me
-- 
William

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-19  4:56 [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly Jason Wang
  2014-08-20  9:02 ` William Dauchy
@ 2014-08-20  9:23 ` Zhangjie (HZ)
  2014-08-20 10:18   ` Michael S. Tsirkin
                     ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-08-20  9:23 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, mst; +Cc: William Dauchy

On 2014/8/19 12:56, Jason Wang wrote:
> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
 call it before setting
> Zhang Jie, please test this patch to see if it fixes the issue.

> +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
> +{
> +    net->dev.vq_index = vq_index;
> +}
                          int vq_index)
> ...
Because of vhost_net_set_vq_index, VM can be start successfully.
But, after about 80 times of migration under my environment, virtual nic became unreachable again.
When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
the problem.
-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-20  9:23 ` Zhangjie (HZ)
@ 2014-08-20 10:18   ` Michael S. Tsirkin
  2014-08-20 21:21     ` Michael S. Tsirkin
  2014-08-21  4:29   ` Jason Wang
  2014-08-27 11:54   ` Michael S. Tsirkin
  2 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2014-08-20 10:18 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, qemu-devel, William Dauchy

On Wed, Aug 20, 2014 at 05:23:21PM +0800, Zhangjie (HZ) wrote:
> On 2014/8/19 12:56, Jason Wang wrote:
> > commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>  call it before setting
> > Zhang Jie, please test this patch to see if it fixes the issue.
> 
> > +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
> > +{
> > +    net->dev.vq_index = vq_index;
> > +}
>                           int vq_index)
> > ...
> Because of vhost_net_set_vq_index, VM can be start successfully.
> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
> the problem.

Could you please clarify what do you mean by "notify tap" here?
Thanks!

> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-20 10:18   ` Michael S. Tsirkin
@ 2014-08-20 21:21     ` Michael S. Tsirkin
  2014-08-21  7:37       ` Zhangjie (HZ)
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2014-08-20 21:21 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, qemu-devel, William Dauchy

On Wed, Aug 20, 2014 at 12:18:33PM +0200, Michael S. Tsirkin wrote:
> On Wed, Aug 20, 2014 at 05:23:21PM +0800, Zhangjie (HZ) wrote:
> > On 2014/8/19 12:56, Jason Wang wrote:
> > > commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
> >  call it before setting
> > > Zhang Jie, please test this patch to see if it fixes the issue.
> > 
> > > +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
> > > +{
> > > +    net->dev.vq_index = vq_index;
> > > +}
> >                           int vq_index)
> > > ...
> > Because of vhost_net_set_vq_index, VM can be start successfully.
> > But, after about 80 times of migration under my environment, virtual nic became unreachable again.
> > When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
> > the problem.
> 
> Could you please clarify what do you mean by "notify tap" here?
> Thanks!


Or just post your jprobe script.

> > -- 
> > Best Wishes!
> > Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-20  9:23 ` Zhangjie (HZ)
  2014-08-20 10:18   ` Michael S. Tsirkin
@ 2014-08-21  4:29   ` Jason Wang
  2014-08-21  6:28     ` Zhangjie (HZ)
  2014-08-27 11:54   ` Michael S. Tsirkin
  2 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2014-08-21  4:29 UTC (permalink / raw)
  To: Zhangjie (HZ), qemu-devel, mst; +Cc: William Dauchy

On 08/20/2014 05:23 PM, Zhangjie (HZ) wrote:
> On 2014/8/19 12:56, Jason Wang wrote:
>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>  call it before setting
>> Zhang Jie, please test this patch to see if it fixes the issue.
>> +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
>> +{
>> +    net->dev.vq_index = vq_index;
>> +}
>                           int vq_index)
>> ...
> Because of vhost_net_set_vq_index, VM can be start successfully.
> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
> the problem.

Thanks for the testing. A questions is can you reproduce this when vhost
is disabled?

Anyway, I will try to reproduce it by myself.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-21  4:29   ` Jason Wang
@ 2014-08-21  6:28     ` Zhangjie (HZ)
  2014-08-21  6:53       ` Jason Wang
  2014-08-22  2:56       ` Jason Wang
  0 siblings, 2 replies; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-08-21  6:28 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, mst; +Cc: William Dauchy


On 2014/8/21 12:29, Jason Wang wrote:
> On 08/20/2014 05:23 PM, Zhangjie (HZ) wrote:
>> On 2014/8/19 12:56, Jason Wang wrote:
>>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>>  call it before setting
>>> Zhang Jie, please test this patch to see if it fixes the issue.
>>> +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
>>> +{
>>> +    net->dev.vq_index = vq_index;
>>> +}
>>                           int vq_index)
>>> ...
>> Because of vhost_net_set_vq_index, VM can be start successfully.
>> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
>> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
>> the problem.
> 
> Thanks for the testing. A questions is can you reproduce this when vhost
> is disabled?
After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
Then, no skb from virtio-net can be sent to tap.

Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
We can make up a signal from virtio-net after the migration.

> 
> Anyway, I will try to reproduce it by myself.
> 
The test environment is really terrible, I build a environment myself, but it problem did not occur.
The environment I use now is from a colleague Responsible for test work.
Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
The physical nic is 1GE, connected through a internal nework.

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-21  6:28     ` Zhangjie (HZ)
@ 2014-08-21  6:53       ` Jason Wang
  2014-08-21  7:42         ` Zhangjie (HZ)
  2014-08-22  2:56       ` Jason Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Wang @ 2014-08-21  6:53 UTC (permalink / raw)
  To: Zhangjie (HZ), qemu-devel, mst; +Cc: William Dauchy

On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
> On 2014/8/21 12:29, Jason Wang wrote:
>> On 08/20/2014 05:23 PM, Zhangjie (HZ) wrote:
>>> On 2014/8/19 12:56, Jason Wang wrote:
>>>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>>>  call it before setting
>>>> Zhang Jie, please test this patch to see if it fixes the issue.
>>>> +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
>>>> +{
>>>> +    net->dev.vq_index = vq_index;
>>>> +}
>>>                           int vq_index)
>>>> ...
>>> Because of vhost_net_set_vq_index, VM can be start successfully.
>>> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
>>> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
>>> the problem.
>> Thanks for the testing. A questions is can you reproduce this when vhost
>> is disabled?
> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
> Then, no skb from virtio-net can be sent to tap.

Yes and I mean to test vhost=off to see if it was the issue of vhost.
>
> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
> We can make up a signal from virtio-net after the migration.

You can make a patch like this to debug. If problem disappears, it means
interrupt was really lost anyway.
>
>> Anyway, I will try to reproduce it by myself.
>>
> The test environment is really terrible, I build a environment myself, but it problem did not occur.
> The environment I use now is from a colleague Responsible for test work.
> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
> The physical nic is 1GE, connected through a internal nework.

Just want to confirm. For the problem did not occur, you mean with my
patch on top?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-20 21:21     ` Michael S. Tsirkin
@ 2014-08-21  7:37       ` Zhangjie (HZ)
  0 siblings, 0 replies; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-08-21  7:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, William Dauchy

On 2014/8/21 5:21, Michael S. Tsirkin wrote:
> On Wed, Aug 20, 2014 at 12:18:33PM +0200, Michael S. Tsirkin wrote:
>>> Because of vhost_net_set_vq_index, VM can be start successfully.
>>> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
>>> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
>>> the problem.
>>
>> Could you please clarify what do you mean by "notify tap" here?
>> Thanks!
>  
> Or just post your jprobe script.
> 
Ok, thanks, this is the key function.
static int cont =1;
static int virtnet_poll_jprobe(struct napi_struct *napi, int budget)
{
        struct receive_queue *rq =
                container_of(napi, struct receive_queue, napi);
        struct virtnet_info *vi = rq->vq->vdev->priv;
        struct send_queue *sq;
        struct vring_virtqueue *vq;
        int i;

        if (cont == 1) {
                for (i = 0; i < curr_queue_pairs; i++) {
                        sq = &vi->sq[i];
                        vq = to_vvq(sq->vq);
                       vq->notify(sq->vq);
                }
                cont = 0;
        }
        jprobe_return();
        return 0;
}
When the problem occurs, one queue pair(each nic has two queue pairs) is active, so virtual nic can still receive skbs.
And, only one queue pair misses a interrupt.
To run the jprobe, some definition of function and structure should be copied to jprobe.c.

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-21  6:53       ` Jason Wang
@ 2014-08-21  7:42         ` Zhangjie (HZ)
  2014-08-27 12:59           ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-08-21  7:42 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, mst; +Cc: William Dauchy

On 2014/8/21 14:53, Jason Wang wrote:
> On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
>> 
>> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
>> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
>> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
>> Then, no skb from virtio-net can be sent to tap.
> 
> Yes and I mean to test vhost=off to see if it was the issue of vhost.
That sounds reasonable, but the test case is to test vhost.
>>
>> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
>> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
>> We can make up a signal from virtio-net after the migration.
> 
> You can make a patch like this to debug. If problem disappears, it means
> interrupt was really lost anyway.
>>
>>> Anyway, I will try to reproduce it by myself.
>>>
>> The test environment is really terrible, I build a environment myself, but it problem did not occur.
>> The environment I use now is from a colleague Responsible for test work.
>> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
>> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
>> The physical nic is 1GE, connected through a internal nework.
> 
> Just want to confirm. For the problem did not occur, you mean with my
> patch on top?
> .
> 
I mean, with your patch, I have to test 80 times before it occurs, the probability is reduced.

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-21  6:28     ` Zhangjie (HZ)
  2014-08-21  6:53       ` Jason Wang
@ 2014-08-22  2:56       ` Jason Wang
  1 sibling, 0 replies; 16+ messages in thread
From: Jason Wang @ 2014-08-22  2:56 UTC (permalink / raw)
  To: Zhangjie (HZ), qemu-devel, mst; +Cc: William Dauchy

On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
> On 2014/8/21 12:29, Jason Wang wrote:
>> > On 08/20/2014 05:23 PM, Zhangjie (HZ) wrote:
>>> >> On 2014/8/19 12:56, Jason Wang wrote:
>>>> >>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>>> >>  call it before setting
>>>> >>> Zhang Jie, please test this patch to see if it fixes the issue.
>>>> >>> +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
>>>> >>> +{
>>>> >>> +    net->dev.vq_index = vq_index;
>>>> >>> +}
>>> >>                           int vq_index)
>>>> >>> ...
>>> >> Because of vhost_net_set_vq_index, VM can be start successfully.
>>> >> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
>>> >> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
>>> >> the problem.
>> > 
>> > Thanks for the testing. A questions is can you reproduce this when vhost
>> > is disabled?
> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
> Then, no skb from virtio-net can be sent to tap.
>
> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
> We can make up a signal from virtio-net after the migration.
>
>> > 
>> > Anyway, I will try to reproduce it by myself.
>> > 
> The test environment is really terrible, I build a environment myself, but it problem did not occur.
> The environment I use now is from a colleague Responsible for test work.
> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
> The physical nic is 1GE, connected through a internal nework.
Yes.

I'm trying to reproduce locally, but with my patch on top, after 5000+
times of migration, network is still available (I stress the guest
network in the same time). What's the qemu command line did you use, and
did you enable zerocopy?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-20  9:23 ` Zhangjie (HZ)
  2014-08-20 10:18   ` Michael S. Tsirkin
  2014-08-21  4:29   ` Jason Wang
@ 2014-08-27 11:54   ` Michael S. Tsirkin
  2 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2014-08-27 11:54 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, qemu-devel, William Dauchy

On Wed, Aug 20, 2014 at 05:23:21PM +0800, Zhangjie (HZ) wrote:
> On 2014/8/19 12:56, Jason Wang wrote:
> > commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>  call it before setting
> > Zhang Jie, please test this patch to see if it fixes the issue.
> 
> > +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
> > +{
> > +    net->dev.vq_index = vq_index;
> > +}
>                           int vq_index)
> > ...
> Because of vhost_net_set_vq_index, VM can be start successfully.
> But, after about 80 times of migration under my environment, virtual nic became unreachable again.
> When I use jprobe to notify tap, the virtual nic becomes reachable again. This shows that interrupts missing causes
> the problem.

What about the patch
	net: Forbid dealing with packets when VM is not
does it help if you additionally apply that one?

> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-21  7:42         ` Zhangjie (HZ)
@ 2014-08-27 12:59           ` Michael S. Tsirkin
  2014-08-29 10:40             ` Zhangjie (HZ)
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2014-08-27 12:59 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, qemu-devel, William Dauchy

On Thu, Aug 21, 2014 at 03:42:53PM +0800, Zhangjie (HZ) wrote:
> On 2014/8/21 14:53, Jason Wang wrote:
> > On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
> >> 
> >> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
> >> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
> >> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
> >> Then, no skb from virtio-net can be sent to tap.
> > 
> > Yes and I mean to test vhost=off to see if it was the issue of vhost.
> That sounds reasonable, but the test case is to test vhost.
> >>
> >> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
> >> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
> >> We can make up a signal from virtio-net after the migration.
> > 
> > You can make a patch like this to debug. If problem disappears, it means
> > interrupt was really lost anyway.
> >>
> >>> Anyway, I will try to reproduce it by myself.
> >>>
> >> The test environment is really terrible, I build a environment myself, but it problem did not occur.
> >> The environment I use now is from a colleague Responsible for test work.
> >> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
> >> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
> >> The physical nic is 1GE, connected through a internal nework.
> > 
> > Just want to confirm. For the problem did not occur, you mean with my
> > patch on top?
> > .
> > 
> I mean, with your patch, I have to test 80 times before it occurs, the probability is reduced.

Could you please try to apply the patch
	[PATCH V4] net: Forbid dealing with packets when VM is not running
on top and see if this helps?

Thanks!

> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-27 12:59           ` Michael S. Tsirkin
@ 2014-08-29 10:40             ` Zhangjie (HZ)
  2014-09-01  8:18               ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-08-29 10:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, William Dauchy



On 2014/8/27 20:59, Michael S. Tsirkin wrote:
> On Thu, Aug 21, 2014 at 03:42:53PM +0800, Zhangjie (HZ) wrote:
>> On 2014/8/21 14:53, Jason Wang wrote:
>>> On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
>>>>
>>>> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
>>>> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
>>>> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
>>>> Then, no skb from virtio-net can be sent to tap.
>>>
>>> Yes and I mean to test vhost=off to see if it was the issue of vhost.
>> That sounds reasonable, but the test case is to test vhost.
>>>>
>>>> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
>>>> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
>>>> We can make up a signal from virtio-net after the migration.
>>>
>>> You can make a patch like this to debug. If problem disappears, it means
>>> interrupt was really lost anyway.
>>>>
>>>>> Anyway, I will try to reproduce it by myself.
>>>>>
>>>> The test environment is really terrible, I build a environment myself, but it problem did not occur.
>>>> The environment I use now is from a colleague Responsible for test work.
>>>> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
>>>> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
>>>> The physical nic is 1GE, connected through a internal nework.
>>>
>>> Just want to confirm. For the problem did not occur, you mean with my
>>> patch on top?
>>> .
>>>
>> I mean, with your patch, I have to test 80 times before it occurs, the probability is reduced.
> 
> Could you please try to apply the patch
> 	[PATCH V4] net: Forbid dealing with packets when VM is not running
> on top and see if this helps?
> 
> Thanks!
> 
>> -- 
>> Best Wishes!
>> Zhang Jie
> .
> 
Thanks! I will have a test.
-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-08-29 10:40             ` Zhangjie (HZ)
@ 2014-09-01  8:18               ` Michael S. Tsirkin
  2014-09-05  8:06                 ` Zhangjie (HZ)
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2014-09-01  8:18 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, qemu-devel, William Dauchy

On Fri, Aug 29, 2014 at 06:40:24PM +0800, Zhangjie (HZ) wrote:
> 
> 
> On 2014/8/27 20:59, Michael S. Tsirkin wrote:
> > On Thu, Aug 21, 2014 at 03:42:53PM +0800, Zhangjie (HZ) wrote:
> >> On 2014/8/21 14:53, Jason Wang wrote:
> >>> On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
> >>>>
> >>>> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
> >>>> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
> >>>> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
> >>>> Then, no skb from virtio-net can be sent to tap.
> >>>
> >>> Yes and I mean to test vhost=off to see if it was the issue of vhost.
> >> That sounds reasonable, but the test case is to test vhost.
> >>>>
> >>>> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
> >>>> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
> >>>> We can make up a signal from virtio-net after the migration.
> >>>
> >>> You can make a patch like this to debug. If problem disappears, it means
> >>> interrupt was really lost anyway.
> >>>>
> >>>>> Anyway, I will try to reproduce it by myself.
> >>>>>
> >>>> The test environment is really terrible, I build a environment myself, but it problem did not occur.
> >>>> The environment I use now is from a colleague Responsible for test work.
> >>>> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
> >>>> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
> >>>> The physical nic is 1GE, connected through a internal nework.
> >>>
> >>> Just want to confirm. For the problem did not occur, you mean with my
> >>> patch on top?
> >>> .
> >>>
> >> I mean, with your patch, I have to test 80 times before it occurs, the probability is reduced.
> > 
> > Could you please try to apply the patch
> > 	[PATCH V4] net: Forbid dealing with packets when VM is not running
> > on top and see if this helps?
> > 
> > Thanks!
> > 
> >> -- 
> >> Best Wishes!
> >> Zhang Jie
> > .
> > 
> Thanks! I will have a test.

Great, once you have the result of the two patches applied
together, please let us know on the list.


> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly
  2014-09-01  8:18               ` Michael S. Tsirkin
@ 2014-09-05  8:06                 ` Zhangjie (HZ)
  0 siblings, 0 replies; 16+ messages in thread
From: Zhangjie (HZ) @ 2014-09-05  8:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, William Dauchy



On 2014/9/1 16:18, Michael S. Tsirkin wrote:
> On Fri, Aug 29, 2014 at 06:40:24PM +0800, Zhangjie (HZ) wrote:
>>
>>
>> On 2014/8/27 20:59, Michael S. Tsirkin wrote:
>>> On Thu, Aug 21, 2014 at 03:42:53PM +0800, Zhangjie (HZ) wrote:
>>>> On 2014/8/21 14:53, Jason Wang wrote:
>>>>> On 08/21/2014 02:28 PM, Zhangjie (HZ) wrote:
>>>>>>
>>>>>> After migration, vhost is not disabled, virtual nic became unreachable because vhost is not awakened.
>>>>>> By the logical of EVENT_IDX, virtio-net will not kick vhost again if the used idx is not updated.
>>>>>> So, if one interrupts is lost during migration, virtio_net will not kick vhost again.
>>>>>> Then, no skb from virtio-net can be sent to tap.
>>>>>
>>>>> Yes and I mean to test vhost=off to see if it was the issue of vhost.
>>>> That sounds reasonable, but the test case is to test vhost.
>>>>>>
>>>>>> Jason's patch reduced the probability of occurrence, from about 1/20 to 1/80. It is really effective. I think the patch should be acked.
>>>>>> May be we can try to solve the problem from another perspective. Do you have some methods to sense the migration?
>>>>>> We can make up a signal from virtio-net after the migration.
>>>>>
>>>>> You can make a patch like this to debug. If problem disappears, it means
>>>>> interrupt was really lost anyway.
>>>>>>
>>>>>>> Anyway, I will try to reproduce it by myself.
>>>>>>>
>>>>>> The test environment is really terrible, I build a environment myself, but it problem did not occur.
>>>>>> The environment I use now is from a colleague Responsible for test work.
>>>>>> Two hosts, every host has about 20 vms, they send packages(ipv4 and ipv6) between each other.
>>>>>> The VM to be migrated also sens packages itself, and there is a ping(-i 0.001) from another host to it.
>>>>>> The physical nic is 1GE, connected through a internal nework.
>>>>>
>>>>> Just want to confirm. For the problem did not occur, you mean with my
>>>>> patch on top?
>>>>> .
>>>>>
>>>> I mean, with your patch, I have to test 80 times before it occurs, the probability is reduced.
>>>
>>> Could you please try to apply the patch
>>> 	[PATCH V4] net: Forbid dealing with packets when VM is not running
>>> on top and see if this helps?
>>>
>>> Thanks!
>>>
>>>> -- 
>>>> Best Wishes!
>>>> Zhang Jie
>>> .
>>>
>> Thanks! I will have a test.
> 
> Great, once you have the result of the two patches applied
> together, please let us know on the list.
> 
> 
>> -- 
>> Best Wishes!
>> Zhang Jie
> .
> 

I'm sorry for not giving test results in time, test resource is busy these days.
Perhaps I can give the result next week.
-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-09-05  8:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19  4:56 [Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly Jason Wang
2014-08-20  9:02 ` William Dauchy
2014-08-20  9:23 ` Zhangjie (HZ)
2014-08-20 10:18   ` Michael S. Tsirkin
2014-08-20 21:21     ` Michael S. Tsirkin
2014-08-21  7:37       ` Zhangjie (HZ)
2014-08-21  4:29   ` Jason Wang
2014-08-21  6:28     ` Zhangjie (HZ)
2014-08-21  6:53       ` Jason Wang
2014-08-21  7:42         ` Zhangjie (HZ)
2014-08-27 12:59           ` Michael S. Tsirkin
2014-08-29 10:40             ` Zhangjie (HZ)
2014-09-01  8:18               ` Michael S. Tsirkin
2014-09-05  8:06                 ` Zhangjie (HZ)
2014-08-22  2:56       ` Jason Wang
2014-08-27 11:54   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.