All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
@ 2017-03-01 11:50 Halil Pasic
  2017-03-01 12:54 ` Michael S. Tsirkin
  2017-03-01 12:57 ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Halil Pasic @ 2017-03-01 11:50 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin
  Cc: Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck, Halil Pasic

The commits 03de2f527 "virtio-blk: do not use vring in dataplane"  and
9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
changed how notifications are done for virtio-blk substantially. Due to a
race condition interrupts are lost when irqfd is torn down after
notify_guest_bh was scheduled but before it actually runs.  Furthermore
virtio_notify_irqfd ignores the value returned by event_notifier_set
which correctly indicates that notification has failed due to bad file
descriptor.

Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd
notification mechanism if event_notifier_set fails.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
---

This is probably not the only way to fix this: suggestions welcome. I
did not use a fixes tag because I'm not sure yet where exactly things got
broken. Maybe guys more familiar with dataplane an coroutines can help
(Paolo, Stefan).
---
 hw/virtio/virtio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 23483c7..8e1c1e9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
      * to an atomic operation.
      */
     virtio_set_isr(vq->vdev, 0x1);
-    event_notifier_set(&vq->guest_notifier);
+    if (event_notifier_set(&vq->guest_notifier)) {
+        virtio_notify_vector(vdev, vq->vector);
+    }
 }
 
 static void virtio_irq(VirtQueue *vq)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic
@ 2017-03-01 12:54 ` Michael S. Tsirkin
  2017-03-01 13:31   ` Halil Pasic
  2017-03-01 12:57 ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2017-03-01 12:54 UTC (permalink / raw)
  To: Halil Pasic; +Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck

On Wed, Mar 01, 2017 at 12:50:04PM +0100, Halil Pasic wrote:
> The commits 03de2f527 "virtio-blk: do not use vring in dataplane"  and
> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
> changed how notifications are done for virtio-blk substantially. Due to a
> race condition interrupts are lost when irqfd is torn down after
> notify_guest_bh was scheduled but before it actually runs.  Furthermore
> virtio_notify_irqfd ignores the value returned by event_notifier_set
> which correctly indicates that notification has failed due to bad file
> descriptor.
> 
> Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd
> notification mechanism if event_notifier_set fails.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> ---
> 
> This is probably not the only way to fix this: suggestions welcome. I
> did not use a fixes tag because I'm not sure yet where exactly things got
> broken. Maybe guys more familiar with dataplane an coroutines can help
> (Paolo, Stefan).
> ---
>  hw/virtio/virtio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 23483c7..8e1c1e9 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>       * to an atomic operation.
>       */
>      virtio_set_isr(vq->vdev, 0x1);
> -    event_notifier_set(&vq->guest_notifier);
> +    if (event_notifier_set(&vq->guest_notifier)) {
> +        virtio_notify_vector(vdev, vq->vector);
> +    }

Does this fail because the underlying fd got closed?
Then there's a problem: trying to write to a closed
fd might corrupt an unrelated fd.
If you want to use this way we need to set fds to -1 when we close.

>  }
>  
>  static void virtio_irq(VirtQueue *vq)
> -- 
> 2.8.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic
  2017-03-01 12:54 ` Michael S. Tsirkin
@ 2017-03-01 12:57 ` Paolo Bonzini
  2017-03-01 13:22   ` Halil Pasic
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-01 12:57 UTC (permalink / raw)
  To: Halil Pasic, qemu-devel, Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Cornelia Huck



On 01/03/2017 12:50, Halil Pasic wrote:
> The commits 03de2f527 "virtio-blk: do not use vring in dataplane"  and
> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
> changed how notifications are done for virtio-blk substantially. Due to a
> race condition interrupts are lost when irqfd is torn down after
> notify_guest_bh was scheduled but before it actually runs.

I don't think the non-irqfd notification mechanism is thread safe, and
that would be a problem for this patch.

What is the path that causes the irqfd to be torn down?  Only something
like a reset should cause it (the only call in virtio-blk is from
virtio_blk_data_plane_stop), and then the guest doesn't care anymore
about interrupts.

That path also does a qemu_bh_delete, so the notify_guest_bh should not
be invoked at all.

Paolo

>  Furthermore
> virtio_notify_irqfd ignores the value returned by event_notifier_set
> which correctly indicates that notification has failed due to bad file
> descriptor.
> 
> Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd
> notification mechanism if event_notifier_set fails.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> ---
> 
> This is probably not the only way to fix this: suggestions welcome. I
> did not use a fixes tag because I'm not sure yet where exactly things got
> broken. Maybe guys more familiar with dataplane an coroutines can help
> (Paolo, Stefan).
> ---
>  hw/virtio/virtio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 23483c7..8e1c1e9 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>       * to an atomic operation.
>       */
>      virtio_set_isr(vq->vdev, 0x1);
> -    event_notifier_set(&vq->guest_notifier);
> +    if (event_notifier_set(&vq->guest_notifier)) {
> +        virtio_notify_vector(vdev, vq->vector);
> +    }
>  }
>  
>  static void virtio_irq(VirtQueue *vq)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 12:57 ` Paolo Bonzini
@ 2017-03-01 13:22   ` Halil Pasic
  2017-03-01 14:29     ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Halil Pasic @ 2017-03-01 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Cornelia Huck



On 03/01/2017 01:57 PM, Paolo Bonzini wrote:
> 
> 
> On 01/03/2017 12:50, Halil Pasic wrote:
>> The commits 03de2f527 "virtio-blk: do not use vring in dataplane"  and
>> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
>> changed how notifications are done for virtio-blk substantially. Due to a
>> race condition interrupts are lost when irqfd is torn down after
>> notify_guest_bh was scheduled but before it actually runs.
> 
> I don't think the non-irqfd notification mechanism is thread safe, and
> that would be a problem for this patch.

Sorry, haven't looked into that thoroughly (and speculated people with
more understanding will jump in).

> 
> What is the path that causes the irqfd to be torn down?  Only something

Here a trace:

135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0
135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0
135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0
135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0
135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880
135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8
135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540
                         ^== DATAPLANE STOP  
135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0
135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
                         ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false);
                             in virtio_blk_data_plane_stop and done immediately after
                             irqfd is cleaned up by the transport
135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
halil: error in event_notifier_set: Bad file descriptor
                         ^== here we have the problem

If you want a stacktrace that can be arranged to.

> like a reset should cause it (the only call in virtio-blk is from
> virtio_blk_data_plane_stop), and then the guest doesn't care anymore
> about interrupts.

I do not understand this with 'doesn't care anymore about interrupts'.
I was debugging a virtio-blk device being stuck waiting for a host
notification (interrupt) after migration.

> 
> That path also does a qemu_bh_delete, so the notify_guest_bh should not
> be invoked at all.
> 

That's only for destroy. I'm migrating.

Seems I tried to fix this is the wrong way. Was not too confident about it
in the first place. Suggestions welcome!

Cheers,
Halil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 12:54 ` Michael S. Tsirkin
@ 2017-03-01 13:31   ` Halil Pasic
  0 siblings, 0 replies; 10+ messages in thread
From: Halil Pasic @ 2017-03-01 13:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck



On 03/01/2017 01:54 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 01, 2017 at 12:50:04PM +0100, Halil Pasic wrote:
>> The commits 03de2f527 "virtio-blk: do not use vring in dataplane"  and
>> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
>> changed how notifications are done for virtio-blk substantially. Due to a
>> race condition interrupts are lost when irqfd is torn down after
>> notify_guest_bh was scheduled but before it actually runs.  Furthermore
>> virtio_notify_irqfd ignores the value returned by event_notifier_set
>> which correctly indicates that notification has failed due to bad file
>> descriptor.
>>
>> Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd
>> notification mechanism if event_notifier_set fails.
>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> ---
>>
>> This is probably not the only way to fix this: suggestions welcome. I
>> did not use a fixes tag because I'm not sure yet where exactly things got
>> broken. Maybe guys more familiar with dataplane an coroutines can help
>> (Paolo, Stefan).
>> ---
>>  hw/virtio/virtio.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 23483c7..8e1c1e9 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>>       * to an atomic operation.
>>       */
>>      virtio_set_isr(vq->vdev, 0x1);
>> -    event_notifier_set(&vq->guest_notifier);
>> +    if (event_notifier_set(&vq->guest_notifier)) {
>> +        virtio_notify_vector(vdev, vq->vector);
>> +    }
> 
> Does this fail because the underlying fd got closed?

Yes. Its data_plane_stop()->virtio_ccw_set_guest_notifier->event_notifier_cleanup().
The function event_notifier_cleanup() does not set fds to -1 :(.
Seems to me, it would be safer to do so. Should I make a patch?

> Then there's a problem: trying to write to a closed
> fd might corrupt an unrelated fd.
> If you want to use this way we need to set fds to -1 when we close.

Sorry, did not check for that. OTOH Paolo says my approach is
fundamentally flawed because virtio_notify_vector is not thread
safe. I suggest we continue the discussion there.

Regards,
Halil

> 
>>  }
>>  
>>  static void virtio_irq(VirtQueue *vq)
>> -- 
>> 2.8.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 13:22   ` Halil Pasic
@ 2017-03-01 14:29     ` Paolo Bonzini
  2017-03-01 16:08       ` Halil Pasic
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-01 14:29 UTC (permalink / raw)
  To: Halil Pasic, qemu-devel, Michael S. Tsirkin
  Cc: Stefan Hajnoczi, Cornelia Huck



On 01/03/2017 14:22, Halil Pasic wrote:
> Here a trace:
> 
> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0
> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0
> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0
> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0
> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880
> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8
> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540
>                          ^== DATAPLANE STOP  
> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0
> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>                          ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false);
>                              in virtio_blk_data_plane_stop and done immediately after
>                              irqfd is cleaned up by the transport
> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> halil: error in event_notifier_set: Bad file descriptor
>                          ^== here we have the problem
> 
> If you want a stacktrace that can be arranged to.
> 
>> like a reset should cause it (the only call in virtio-blk is from
>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore
>> about interrupts.
> I do not understand this with 'doesn't care anymore about interrupts'.
> I was debugging a virtio-blk device being stuck waiting for a host
> notification (interrupt) after migration.

Ok, this explains it better then.  The issue is that
virtio_blk_data_plane_stop doesn't flush the bottom half, which you want
to do when the caller is, for example, virtio_ccw_vmstate_change.

Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s)
after

    blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());

?

Paolo

>> That path also does a qemu_bh_delete, so the notify_guest_bh should not
>> be invoked at all.
>>
> That's only for destroy. I'm migrating.
> 
> Seems I tried to fix this is the wrong way. Was not too confident about it
> in the first place. Suggestions welcome!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 14:29     ` Paolo Bonzini
@ 2017-03-01 16:08       ` Halil Pasic
  2017-03-01 16:53         ` Michael S. Tsirkin
  2017-03-01 19:53         ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Halil Pasic @ 2017-03-01 16:08 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin
  Cc: Cornelia Huck, Stefan Hajnoczi



On 03/01/2017 03:29 PM, Paolo Bonzini wrote:
> 
> 
> On 01/03/2017 14:22, Halil Pasic wrote:
>> Here a trace:
>>
>> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0
>> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0
>> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0
>> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0
>> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880
>> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8
>> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540
>>                          ^== DATAPLANE STOP  
>> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0
>> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>>                          ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false);
>>                              in virtio_blk_data_plane_stop and done immediately after
>>                              irqfd is cleaned up by the transport
>> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
>> halil: error in event_notifier_set: Bad file descriptor
>>                          ^== here we have the problem
>>
>> If you want a stacktrace that can be arranged to.
>>
>>> like a reset should cause it (the only call in virtio-blk is from
>>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore
>>> about interrupts.
>> I do not understand this with 'doesn't care anymore about interrupts'.
>> I was debugging a virtio-blk device being stuck waiting for a host
>> notification (interrupt) after migration.
> 
> Ok, this explains it better then.  The issue is that
> virtio_blk_data_plane_stop doesn't flush the bottom half, which you want
> to do when the caller is, for example, virtio_ccw_vmstate_change.
> 
> Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s)
> after
> 
>     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> 
> ?
> 

With

--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
     /* Drain and switch bs back to the QEMU main loop */
     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
+    qemu_bh_cancel(s->bh);
+    notify_guest_bh(s);
 
applied I do not see the problem any more. I will most likely
turn this into a patch tomorrow. I would like to give it some more testing and
thinking (see questions below) until tomorrow.

I should probably cc stable, or?

I would also like to do some diagnostic stuff if virtio_notify_irqfd fails.
Maybe assert success for event_notifier_set. Would that be OK with you?

I have a couple of questions about the ways of the dataplane code. If
you are too busy, feel free to not answer -- I will keep thinking myself.

Q1. For this to work correctly, it seems to me, we need to be sure that
virtio_blk_req_complete can not be happen between the newly added
notify_guest_bh(s);
and 
vblk->dataplane_started = false; 
becomes visible. How is this ensured?
Q2. The virtio_blk_data_plane_stop should be from the thread/context
associated with the main event loop, and with that
vblk->dataplane_started = false too. But I think dataplane_started
may end up being used form a different thread (e.g. req_complete).
How does the sequencing work there and/or is it even important?

Regards,
Halil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 16:08       ` Halil Pasic
@ 2017-03-01 16:53         ` Michael S. Tsirkin
  2017-03-01 19:53         ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2017-03-01 16:53 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Paolo Bonzini, qemu-devel, Cornelia Huck, Stefan Hajnoczi

On Wed, Mar 01, 2017 at 05:08:39PM +0100, Halil Pasic wrote:
> 
> 
> On 03/01/2017 03:29 PM, Paolo Bonzini wrote:
> > 
> > 
> > On 01/03/2017 14:22, Halil Pasic wrote:
> >> Here a trace:
> >>
> >> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0
> >> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> >> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0
> >> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0
> >> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> >> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0
> >> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> >> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880
> >> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8
> >> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540
> >>                          ^== DATAPLANE STOP  
> >> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0
> >> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> >>                          ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false);
> >>                              in virtio_blk_data_plane_stop and done immediately after
> >>                              irqfd is cleaned up by the transport
> >> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870
> >> halil: error in event_notifier_set: Bad file descriptor
> >>                          ^== here we have the problem
> >>
> >> If you want a stacktrace that can be arranged to.
> >>
> >>> like a reset should cause it (the only call in virtio-blk is from
> >>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore
> >>> about interrupts.
> >> I do not understand this with 'doesn't care anymore about interrupts'.
> >> I was debugging a virtio-blk device being stuck waiting for a host
> >> notification (interrupt) after migration.
> > 
> > Ok, this explains it better then.  The issue is that
> > virtio_blk_data_plane_stop doesn't flush the bottom half, which you want
> > to do when the caller is, for example, virtio_ccw_vmstate_change.
> > 
> > Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s)
> > after
> > 
> >     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> > 
> > ?
> > 
> 
> With
> 
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>  
>      /* Drain and switch bs back to the QEMU main loop */
>      blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> +    qemu_bh_cancel(s->bh);
> +    notify_guest_bh(s);
>  
> applied I do not see the problem any more. I will most likely
> turn this into a patch tomorrow. I would like to give it some more testing and
> thinking (see questions below) until tomorrow.
> 
> I should probably cc stable, or?
> 
> I would also like to do some diagnostic stuff if virtio_notify_irqfd fails.
> Maybe assert success for event_notifier_set. Would that be OK with you?

Main reason it can't fail is because we don't close the fd.
Given no callers check the return status, I'd be inclined to go further
and stick that assert into event_notifier_set, convert that
function to int.

> I have a couple of questions about the ways of the dataplane code. If
> you are too busy, feel free to not answer -- I will keep thinking myself.
> 
> Q1. For this to work correctly, it seems to me, we need to be sure that
> virtio_blk_req_complete can not be happen between the newly added
> notify_guest_bh(s);
> and 
> vblk->dataplane_started = false; 
> becomes visible. How is this ensured?
> Q2. The virtio_blk_data_plane_stop should be from the thread/context
> associated with the main event loop, and with that
> vblk->dataplane_started = false too. But I think dataplane_started
> may end up being used form a different thread (e.g. req_complete).
> How does the sequencing work there and/or is it even important?
> 
> Regards,
> Halil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 16:08       ` Halil Pasic
  2017-03-01 16:53         ` Michael S. Tsirkin
@ 2017-03-01 19:53         ` Paolo Bonzini
  2017-03-02 13:14           ` Halil Pasic
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-01 19:53 UTC (permalink / raw)
  To: Halil Pasic, qemu-devel, Michael S. Tsirkin
  Cc: Cornelia Huck, Stefan Hajnoczi



On 01/03/2017 17:08, Halil Pasic wrote:
> applied I do not see the problem any more. I will most likely
> turn this into a patch tomorrow. I would like to give it some more testing and
> thinking (see questions below) until tomorrow.
> 
> I should probably cc stable, or?

Yes, please do!

> 
> Q1. For this to work correctly, it seems to me, we need to be sure that
> virtio_blk_req_complete can not be happen between the newly added
> notify_guest_bh(s);
> and 
> vblk->dataplane_started = false; 
> becomes visible. How is this ensured?

blk_set_aio_context drains the block device, and the event notifiers are
not active anymore so draining the block device coincides with the last
call to virtio_blk_req_complete.

Please add a comment - it's a good observation.

> Q2. The virtio_blk_data_plane_stop should be from the thread/context
> associated with the main event loop, and with that
> vblk->dataplane_started = false too. But I think dataplane_started
> may end up being used form a different thread (e.g. req_complete).

1) virtio_queue_aio_set_host_notifier_handler stops the event notifiers

2) virtio_bus_set_host_notifier invokes them one last time before exiting

Note that this could call again virtio_queue_notify_vq and hence
virtio_device_start_ioeventfd, but dataplane won't be reactivated
because vblk->dataplane_started is still true.

> How does the sequencing work there and/or is it even important?

It is important and not really easy to get right---as shown by the bug
you found, in fact.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify
  2017-03-01 19:53         ` Paolo Bonzini
@ 2017-03-02 13:14           ` Halil Pasic
  0 siblings, 0 replies; 10+ messages in thread
From: Halil Pasic @ 2017-03-02 13:14 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin
  Cc: Cornelia Huck, Stefan Hajnoczi



On 03/01/2017 08:53 PM, Paolo Bonzini wrote:
> 
> 
> On 01/03/2017 17:08, Halil Pasic wrote:
>> applied I do not see the problem any more. I will most likely
>> turn this into a patch tomorrow. I would like to give it some more testing and
>> thinking (see questions below) until tomorrow.
>>
>> I should probably cc stable, or?
> 
> Yes, please do!
> 
>>
>> Q1. For this to work correctly, it seems to me, we need to be sure that
>> virtio_blk_req_complete can not be happen between the newly added
>> notify_guest_bh(s);
>> and 
>> vblk->dataplane_started = false; 
>> becomes visible. How is this ensured?
> 
> blk_set_aio_context drains the block device, and the event notifiers are
> not active anymore so draining the block device coincides with the last
> call to virtio_blk_req_complete.
> 
> Please add a comment - it's a good observation.
> 
>> Q2. The virtio_blk_data_plane_stop should be from the thread/context
>> associated with the main event loop, and with that
>> vblk->dataplane_started = false too. But I think dataplane_started
>> may end up being used form a different thread (e.g. req_complete).
> 
> 1) virtio_queue_aio_set_host_notifier_handler stops the event notifiers
> 
> 2) virtio_bus_set_host_notifier invokes them one last time before exiting
> 
> Note that this could call again virtio_queue_notify_vq and hence
> virtio_device_start_ioeventfd, but dataplane won't be reactivated
> because vblk->dataplane_started is still true.
> 
>> How does the sequencing work there and/or is it even important?
> 
> It is important and not really easy to get right---as shown by the bug
> you found, in fact.
> 

Thank you very much for the explanations. I have just sent a patch
based on what we discussed here. I think I roughly understand now, how
this is supposed to work regarding concurrency, but I guess I will
have to just trust you to some extent.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-02 13:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic
2017-03-01 12:54 ` Michael S. Tsirkin
2017-03-01 13:31   ` Halil Pasic
2017-03-01 12:57 ` Paolo Bonzini
2017-03-01 13:22   ` Halil Pasic
2017-03-01 14:29     ` Paolo Bonzini
2017-03-01 16:08       ` Halil Pasic
2017-03-01 16:53         ` Michael S. Tsirkin
2017-03-01 19:53         ` Paolo Bonzini
2017-03-02 13:14           ` Halil Pasic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.