netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
@ 2014-07-31 11:47 Zhangjie (HZ)
  2014-07-31 14:31 ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Zhangjie (HZ) @ 2014-07-31 11:47 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, qinchuanyu, liuyongan, zhangjie14, davem

[The test scenario]:

Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
Sometimes the CPU idle of the Host maybe 0;

[Problem description]:

I wonder if it was interrupts missing that cause the network unreachable.
In the migration process of kvm, source end should suspend, which include steps as follows:
1.	do_vm_stop->pause_all_vcpus
2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
Then, vhost could not sense the notify, and the tx notify is lost.
On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-07-31 11:47 Query: Is it possible to lose interrupts between vhost and virtio_net during migration? Zhangjie (HZ)
@ 2014-07-31 14:31 ` Michael S. Tsirkin
  2014-07-31 14:37   ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-07-31 14:31 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: netdev, jasowang, qinchuanyu, liuyongan, davem

On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
> [The test scenario]:
> 
> Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
> There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
> Sometimes the CPU idle of the Host maybe 0;
> 
> [Problem description]:
> 
> I wonder if it was interrupts missing that cause the network unreachable.
> In the migration process of kvm, source end should suspend, which include steps as follows:
> 1.	do_vm_stop->pause_all_vcpus
> 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
> 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
> This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
> then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
> Then, vhost could not sense the notify, and the tx notify is lost.
> On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.

Could be a bug in userspace: should should cleanups notifiers
after it stops vhost.

Could you please send this to appropriate mailing lists?
I have a policy against off-list discussions.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-07-31 14:31 ` Michael S. Tsirkin
@ 2014-07-31 14:37   ` Michael S. Tsirkin
  2014-08-01 10:47     ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-07-31 14:37 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: netdev, jasowang, qinchuanyu, liuyongan, davem

On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
> > [The test scenario]:
> > 
> > Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
> > There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
> > Sometimes the CPU idle of the Host maybe 0;
> > 
> > [Problem description]:
> > 
> > I wonder if it was interrupts missing that cause the network unreachable.
> > In the migration process of kvm, source end should suspend, which include steps as follows:
> > 1.	do_vm_stop->pause_all_vcpus
> > 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
> > 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
> > This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
> > then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
> > Then, vhost could not sense the notify, and the tx notify is lost.
> > On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
> 
> Could be a bug in userspace: should should cleanups notifiers
> after it stops vhost.
> 
> Could you please send this to appropriate mailing lists?
> I have a policy against off-list discussions.

Also, Jason, could you take a look please?
Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
changed the order of stopping the device.
Previously vhost_dev_stop would disable backend and only afterwards,
unset guest notifiers.  You now unset guest notifiers while vhost is still
active. Looks like this can lose events?



> -- 
> MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-07-31 14:37   ` Michael S. Tsirkin
@ 2014-08-01 10:47     ` Jason Wang
  2014-08-01 11:14       ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2014-08-01 10:47 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhangjie (HZ); +Cc: netdev, qinchuanyu, liuyongan, davem

On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>> > On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>> > > [The test scenario]:
>>> > > 
>>> > > Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>> > > There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>> > > Sometimes the CPU idle of the Host maybe 0;
>>> > > 
>>> > > [Problem description]:
>>> > > 
>>> > > I wonder if it was interrupts missing that cause the network unreachable.
>>> > > In the migration process of kvm, source end should suspend, which include steps as follows:
>>> > > 1.	do_vm_stop->pause_all_vcpus
>>> > > 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>> > > 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>> > > This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>> > > then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>> > > Then, vhost could not sense the notify, and the tx notify is lost.
>>> > > On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>> > 
>> > Could be a bug in userspace: should should cleanups notifiers
>> > after it stops vhost.
>> > 
>> > Could you please send this to appropriate mailing lists?
>> > I have a policy against off-list discussions.
> Also, Jason, could you take a look please?
> Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
> changed the order of stopping the device.
> Previously vhost_dev_stop would disable backend and only afterwards,
> unset guest notifiers.  You now unset guest notifiers while vhost is still
> active. Looks like this can lose events?

Not sure it will really cause the issue. Since during guest notifier
deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
the notifier and trigger callback if set. Looks like this can guarantee
the interrupt was not lost.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-01 10:47     ` Jason Wang
@ 2014-08-01 11:14       ` Jason Wang
  2014-08-05  6:29         ` Zhangjie (HZ)
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2014-08-01 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhangjie (HZ); +Cc: netdev, qinchuanyu, liuyongan, davem

On 08/01/2014 06:47 PM, Jason Wang wrote:
> On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
>> > On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>>>> >> > On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>>>>> >>> > > [The test scenario]:
>>>>>> >>> > > 
>>>>>> >>> > > Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>>>>> >>> > > There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>>>>> >>> > > Sometimes the CPU idle of the Host maybe 0;
>>>>>> >>> > > 
>>>>>> >>> > > [Problem description]:
>>>>>> >>> > > 
>>>>>> >>> > > I wonder if it was interrupts missing that cause the network unreachable.
>>>>>> >>> > > In the migration process of kvm, source end should suspend, which include steps as follows:
>>>>>> >>> > > 1.	do_vm_stop->pause_all_vcpus
>>>>>> >>> > > 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>>>>> >>> > > 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>>>>> >>> > > This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>>>>> >>> > > then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>>>>> >>> > > Then, vhost could not sense the notify, and the tx notify is lost.
>>>>>> >>> > > On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>>>> >> > 
>>>> >> > Could be a bug in userspace: should should cleanups notifiers
>>>> >> > after it stops vhost.
>>>> >> > 
>>>> >> > Could you please send this to appropriate mailing lists?
>>>> >> > I have a policy against off-list discussions.
>> > Also, Jason, could you take a look please?
>> > Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>> > changed the order of stopping the device.
>> > Previously vhost_dev_stop would disable backend and only afterwards,
>> > unset guest notifiers.  You now unset guest notifiers while vhost is still
>> > active. Looks like this can lose events?
> Not sure it will really cause the issue. Since during guest notifier
> deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
> the notifier and trigger callback if set. Looks like this can guarantee
> the interrupt was not lost.

More thought on this, looks like it was still a window between guest
notifiers disabling and vhost_net stopping.

Please Zhang Jie test the patch of changing its order and if it works,
sends a formal patch to qemu-devel.

btw, vhost_scsi may need the fix as well since it may meet the same issue.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-01 11:14       ` Jason Wang
@ 2014-08-05  6:29         ` Zhangjie (HZ)
  2014-08-05  9:49           ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Zhangjie (HZ) @ 2014-08-05  6:29 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: netdev, qinchuanyu, liuyongan, davem

Jason is right, the new order is not the cause of network unreachable.
Changing order seems not work. After about 40 times, the problem occurs again.
Maybe there is other hidden reasons for that.

On 2014/8/1 19:14, Jason Wang wrote:
> On 08/01/2014 06:47 PM, Jason Wang wrote:
>> On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>>>>>>>>>>> [The test scenario]:
>>>>>>>>>>>>
>>>>>>>>>>>> Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>>>>>>>>>>> There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>>>>>>>>>>> Sometimes the CPU idle of the Host maybe 0;
>>>>>>>>>>>>
>>>>>>>>>>>> [Problem description]:
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if it was interrupts missing that cause the network unreachable.
>>>>>>>>>>>> In the migration process of kvm, source end should suspend, which include steps as follows:
>>>>>>>>>>>> 1.	do_vm_stop->pause_all_vcpus
>>>>>>>>>>>> 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>>>>>>>>>>> 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>>>>>>>>>>> This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>>>>>>>>>>> then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>>>>>>>>>>> Then, vhost could not sense the notify, and the tx notify is lost.
>>>>>>>>>>>> On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>>>>>>>>
>>>>>>>> Could be a bug in userspace: should should cleanups notifiers
>>>>>>>> after it stops vhost.
>>>>>>>>
>>>>>>>> Could you please send this to appropriate mailing lists?
>>>>>>>> I have a policy against off-list discussions.
>>>> Also, Jason, could you take a look please?
>>>> Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>>>> changed the order of stopping the device.
>>>> Previously vhost_dev_stop would disable backend and only afterwards,
>>>> unset guest notifiers.  You now unset guest notifiers while vhost is still
>>>> active. Looks like this can lose events?
>> Not sure it will really cause the issue. Since during guest notifier
>> deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
>> the notifier and trigger callback if set. Looks like this can guarantee
>> the interrupt was not lost.
> 
> More thought on this, looks like it was still a window between guest
> notifiers disabling and vhost_net stopping.
> 
> Please Zhang Jie test the patch of changing its order and if it works,
> sends a formal patch to qemu-devel.
> 
> btw, vhost_scsi may need the fix as well since it may meet the same issue.
> 
> Thanks
> .
> 

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-05  6:29         ` Zhangjie (HZ)
@ 2014-08-05  9:49           ` Michael S. Tsirkin
  2014-08-05 12:14             ` Zhangjie (HZ)
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-08-05  9:49 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: Jason Wang, netdev, qinchuanyu, liuyongan, davem

On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
> Jason is right, the new order is not the cause of network unreachable.
> Changing order seems not work. After about 40 times, the problem occurs again.
> Maybe there is other hidden reasons for that.

To make sure, you tested the patch that I posted to list:
"vhost_net: stop guest notifiers after backend"?

Please confirm.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-05  9:49           ` Michael S. Tsirkin
@ 2014-08-05 12:14             ` Zhangjie (HZ)
  2014-08-07 12:47               ` Zhangjie (HZ)
  0 siblings, 1 reply; 14+ messages in thread
From: Zhangjie (HZ) @ 2014-08-05 12:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, kvm; +Cc: Jason Wang, netdev, qinchuanyu, liuyongan, davem

On 2014/8/5 17:49, Michael S. Tsirkin wrote:
> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>> Jason is right, the new order is not the cause of network unreachable.
>> Changing order seems not work. After about 40 times, the problem occurs again.
>> Maybe there is other hidden reasons for that.
I modified the code to change the order myself yesterday.
This result is about my code.
> 
> To make sure, you tested the patch that I posted to list:
> "vhost_net: stop guest notifiers after backend"?
> 
> Please confirm.
> 
OK, I will test with your patch "vhost_net: stop guest notifiers after backend".

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-05 12:14             ` Zhangjie (HZ)
@ 2014-08-07 12:47               ` Zhangjie (HZ)
  2014-08-14  8:52                 ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Zhangjie (HZ) @ 2014-08-07 12:47 UTC (permalink / raw)
  To: Michael S. Tsirkin, kvm; +Cc: Jason Wang, netdev, qinchuanyu, liuyongan, davem


On 2014/8/5 20:14, Zhangjie (HZ) wrote:
> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>>> Jason is right, the new order is not the cause of network unreachable.
>>> Changing order seems not work. After about 40 times, the problem occurs again.
>>> Maybe there is other hidden reasons for that.
> I modified the code to change the order myself yesterday.
> This result is about my code.
>>
>> To make sure, you tested the patch that I posted to list:
>> "vhost_net: stop guest notifiers after backend"?
>>
>> Please confirm.
>>
> OK, I will test with your patch "vhost_net: stop guest notifiers after backend".
> 
Unfortunately, after using the patch "vhost_net: stop guest notifiers after backend",
Linux VMs stopt themselves a few minutes after they were started.
>@@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>         goto err;
>     }
>
>+    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>+    if (r < 0) {
>+        error_report("Error binding guest notifier: %d", -r);
>+        goto err;
>+    }
>+
>     for (i = 0; i < total_queues; i++) {
>         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
>
>@@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>         }
>     }
>
>-    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>-    if (r < 0) {
>-        error_report("Error binding guest notifier: %d", -r);
>-        goto err;
>-    }
>-
>     return 0;
I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.
-- 
Best Wishes!
Zhang Jie


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-07 12:47               ` Zhangjie (HZ)
@ 2014-08-14  8:52                 ` Jason Wang
  2014-08-14 10:02                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2014-08-14  8:52 UTC (permalink / raw)
  To: Zhangjie (HZ), Michael S. Tsirkin, kvm
  Cc: netdev, qinchuanyu, liuyongan, davem

On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
> On 2014/8/5 20:14, Zhangjie (HZ) wrote:
>> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
>>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>>>> Jason is right, the new order is not the cause of network unreachable.
>>>> Changing order seems not work. After about 40 times, the problem occurs again.
>>>> Maybe there is other hidden reasons for that.
>> I modified the code to change the order myself yesterday.
>> This result is about my code.
>>> To make sure, you tested the patch that I posted to list:
>>> "vhost_net: stop guest notifiers after backend"?
>>>
>>> Please confirm.
>>>
>> OK, I will test with your patch "vhost_net: stop guest notifiers after backend".
>>
> Unfortunately, after using the patch "vhost_net: stop guest notifiers after backend",
> Linux VMs stopt themselves a few minutes after they were started.
>> @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>         goto err;
>>     }
>>
>> +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>> +    if (r < 0) {
>> +        error_report("Error binding guest notifier: %d", -r);
>> +        goto err;
>> +    }
>> +
>>     for (i = 0; i < total_queues; i++) {
>>         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
>>
>> @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>         }
>>     }
>>
>> -    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>> -    if (r < 0) {
>> -        error_report("Error binding guest notifier: %d", -r);
>> -        goto err;
>> -    }
>> -
>>     return 0;
> I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.

Michael, can we just remove those assertions? Since you may want to set
guest notifiers before starting the backend.

Another question for virtio_pci_vector_poll(): why not using
msix_notify() instead of msix_set_pending(). If so, there's no need to
change the vhost_net_start() ?

Zhang Jie, is this a regression? If yes, could you please do a bisection
to find the first bad commit.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-14  8:52                 ` Jason Wang
@ 2014-08-14 10:02                   ` Michael S. Tsirkin
  2014-08-15  2:55                     ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-08-14 10:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: Zhangjie (HZ), kvm, netdev, qinchuanyu, liuyongan, davem

On Thu, Aug 14, 2014 at 04:52:40PM +0800, Jason Wang wrote:
> On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
> > On 2014/8/5 20:14, Zhangjie (HZ) wrote:
> >> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
> >>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
> >>>> Jason is right, the new order is not the cause of network unreachable.
> >>>> Changing order seems not work. After about 40 times, the problem occurs again.
> >>>> Maybe there is other hidden reasons for that.
> >> I modified the code to change the order myself yesterday.
> >> This result is about my code.
> >>> To make sure, you tested the patch that I posted to list:
> >>> "vhost_net: stop guest notifiers after backend"?
> >>>
> >>> Please confirm.
> >>>
> >> OK, I will test with your patch "vhost_net: stop guest notifiers after backend".
> >>
> > Unfortunately, after using the patch "vhost_net: stop guest notifiers after backend",
> > Linux VMs stopt themselves a few minutes after they were started.
> >> @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>         goto err;
> >>     }
> >>
> >> +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> >> +    if (r < 0) {
> >> +        error_report("Error binding guest notifier: %d", -r);
> >> +        goto err;
> >> +    }
> >> +
> >>     for (i = 0; i < total_queues; i++) {
> >>         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
> >>
> >> @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >>         }
> >>     }
> >>
> >> -    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> >> -    if (r < 0) {
> >> -        error_report("Error binding guest notifier: %d", -r);
> >> -        goto err;
> >> -    }
> >> -
> >>     return 0;
> > I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.
> 
> Michael, can we just remove those assertions? Since you may want to set
> guest notifiers before starting the backend.

Which assertions?

> Another question for virtio_pci_vector_poll(): why not using
> msix_notify() instead of msix_set_pending().

We can do that but the effect will be same since we know
vector is masked.

> If so, there's no need to
> change the vhost_net_start() ?

Confused, don't see the connection.

> Zhang Jie, is this a regression? If yes, could you please do a bisection
> to find the first bad commit.
> 
> Thanks

Pretty sure it's the mq patch: a9f98bb5ebe6fb1869321dcc58e72041ae626ad8

    Since we may have many vhost/net devices for a virtio-net device.  The setting of
    guest notifiers were moved out of the starting/stopping of a specific vhost
    thread. The vhost_net_{start|stop}() were renamed to
    vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were introduced
    to configure the guest notifiers and start/stop all vhost/vhost_net devices.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-14 10:02                   ` Michael S. Tsirkin
@ 2014-08-15  2:55                     ` Jason Wang
  2014-08-17 10:22                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2014-08-15  2:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhangjie (HZ), kvm, netdev, qinchuanyu, liuyongan, davem

On 08/14/2014 06:02 PM, Michael S. Tsirkin wrote:
> On Thu, Aug 14, 2014 at 04:52:40PM +0800, Jason Wang wrote:
>> On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
>>> On 2014/8/5 20:14, Zhangjie (HZ) wrote:
>>>> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
>>>>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>>>>>> Jason is right, the new order is not the cause of network unreachable.
>>>>>> Changing order seems not work. After about 40 times, the problem occurs again.
>>>>>> Maybe there is other hidden reasons for that.
>>>> I modified the code to change the order myself yesterday.
>>>> This result is about my code.
>>>>> To make sure, you tested the patch that I posted to list:
>>>>> "vhost_net: stop guest notifiers after backend"?
>>>>>
>>>>> Please confirm.
>>>>>
>>>> OK, I will test with your patch "vhost_net: stop guest notifiers after backend".
>>>>
>>> Unfortunately, after using the patch "vhost_net: stop guest notifiers after backend",
>>> Linux VMs stopt themselves a few minutes after they were started.
>>>> @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>         goto err;
>>>>     }
>>>>
>>>> +    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>>>> +    if (r < 0) {
>>>> +        error_report("Error binding guest notifier: %d", -r);
>>>> +        goto err;
>>>> +    }
>>>> +
>>>>     for (i = 0; i < total_queues; i++) {
>>>>         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
>>>>
>>>> @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>         }
>>>>     }
>>>>
>>>> -    r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>>>> -    if (r < 0) {
>>>> -        error_report("Error binding guest notifier: %d", -r);
>>>> -        goto err;
>>>> -    }
>>>> -
>>>>     return 0;
>>> I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.
>> Michael, can we just remove those assertions? Since you may want to set
>> guest notifiers before starting the backend.
> Which assertions?

I mean assert(hdev->started) in vhost.c. Your patch may hit them.

>> Another question for virtio_pci_vector_poll(): why not using
>> msix_notify() instead of msix_set_pending().
> We can do that but the effect will be same since we know
> vector is masked.

Perhaps not in during current vhost starting. We start backend before
setting guest notifiers now. So backend are using masked notifier in
this time but the vector was not masked.

>
>> If so, there's no need to
>> change the vhost_net_start() ?
> Confused, don't see the connection.

If we use msix_notify(), it will raise the irq if backend want it before
setting guest notifiers. So no need to check the order of setting guest
notifiers and starting backend in vhost_net_start().
>
>> Zhang Jie, is this a regression? If yes, could you please do a bisection
>> to find the first bad commit.
>>
>> Thanks
> Pretty sure it's the mq patch: a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>
>     Since we may have many vhost/net devices for a virtio-net device.  The setting of
>     guest notifiers were moved out of the starting/stopping of a specific vhost
>     thread. The vhost_net_{start|stop}() were renamed to
>     vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were introduced
>     to configure the guest notifiers and start/stop all vhost/vhost_net devices.
>

Ok.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-15  2:55                     ` Jason Wang
@ 2014-08-17 10:22                       ` Michael S. Tsirkin
  2014-08-18  5:23                         ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2014-08-17 10:22 UTC (permalink / raw)
  To: Jason Wang; +Cc: Zhangjie (HZ), kvm, netdev, qinchuanyu, liuyongan, davem

On Fri, Aug 15, 2014 at 10:55:32AM +0800, Jason Wang wrote:
> >>> I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.
> >> Michael, can we just remove those assertions? Since you may want to set
> >> guest notifiers before starting the backend.
> > Which assertions?
> 
> I mean assert(hdev->started) in vhost.c. Your patch may hit them.
I don't follow, but since my patch doesn't help anyway, pls go ahead
and post your idea in form of a patch, will be clearer and can
be tested.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
  2014-08-17 10:22                       ` Michael S. Tsirkin
@ 2014-08-18  5:23                         ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2014-08-18  5:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhangjie (HZ), kvm, netdev, qinchuanyu, liuyongan, davem

On 08/17/2014 06:22 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 15, 2014 at 10:55:32AM +0800, Jason Wang wrote:
>>>>> I wonder if k->set_guest_notifiers should be called after "hdev->started = true;" in vhost_dev_start.
>>>> Michael, can we just remove those assertions? Since you may want to set
>>>> guest notifiers before starting the backend.
>>> Which assertions?
>> I mean assert(hdev->started) in vhost.c. Your patch may hit them.
> I don't follow, but since my patch doesn't help anyway, pls go ahead
> and post your idea in form of a patch, will be clearer and can
> be tested.

Ok, will post the patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-18  5:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-31 11:47 Query: Is it possible to lose interrupts between vhost and virtio_net during migration? Zhangjie (HZ)
2014-07-31 14:31 ` Michael S. Tsirkin
2014-07-31 14:37   ` Michael S. Tsirkin
2014-08-01 10:47     ` Jason Wang
2014-08-01 11:14       ` Jason Wang
2014-08-05  6:29         ` Zhangjie (HZ)
2014-08-05  9:49           ` Michael S. Tsirkin
2014-08-05 12:14             ` Zhangjie (HZ)
2014-08-07 12:47               ` Zhangjie (HZ)
2014-08-14  8:52                 ` Jason Wang
2014-08-14 10:02                   ` Michael S. Tsirkin
2014-08-15  2:55                     ` Jason Wang
2014-08-17 10:22                       ` Michael S. Tsirkin
2014-08-18  5:23                         ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).