* [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify @ 2017-03-01 11:50 Halil Pasic 2017-03-01 12:54 ` Michael S. Tsirkin 2017-03-01 12:57 ` Paolo Bonzini 0 siblings, 2 replies; 10+ messages in thread From: Halil Pasic @ 2017-03-01 11:50 UTC (permalink / raw) To: qemu-devel, Michael S. Tsirkin Cc: Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck, Halil Pasic The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active" changed how notifications are done for virtio-blk substantially. Due to a race condition interrupts are lost when irqfd is torn down after notify_guest_bh was scheduled but before it actually runs. Furthermore virtio_notify_irqfd ignores the value returned by event_notifier_set which correctly indicates that notification has failed due to bad file descriptor. Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd notification mechanism if event_notifier_set fails. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> --- This is probably not the only way to fix this: suggestions welcome. I did not use a fixes tag because I'm not sure yet where exactly things got broken. Maybe guys more familiar with dataplane an coroutines can help (Paolo, Stefan). --- hw/virtio/virtio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 23483c7..8e1c1e9 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq) * to an atomic operation. */ virtio_set_isr(vq->vdev, 0x1); - event_notifier_set(&vq->guest_notifier); + if (event_notifier_set(&vq->guest_notifier)) { + virtio_notify_vector(vdev, vq->vector); + } } static void virtio_irq(VirtQueue *vq) -- 2.8.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic @ 2017-03-01 12:54 ` Michael S. Tsirkin 2017-03-01 13:31 ` Halil Pasic 2017-03-01 12:57 ` Paolo Bonzini 1 sibling, 1 reply; 10+ messages in thread From: Michael S. Tsirkin @ 2017-03-01 12:54 UTC (permalink / raw) To: Halil Pasic; +Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck On Wed, Mar 01, 2017 at 12:50:04PM +0100, Halil Pasic wrote: > The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and > 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active" > changed how notifications are done for virtio-blk substantially. Due to a > race condition interrupts are lost when irqfd is torn down after > notify_guest_bh was scheduled but before it actually runs. Furthermore > virtio_notify_irqfd ignores the value returned by event_notifier_set > which correctly indicates that notification has failed due to bad file > descriptor. > > Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd > notification mechanism if event_notifier_set fails. > > Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> > --- > > This is probably not the only way to fix this: suggestions welcome. I > did not use a fixes tag because I'm not sure yet where exactly things got > broken. Maybe guys more familiar with dataplane an coroutines can help > (Paolo, Stefan). > --- > hw/virtio/virtio.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 23483c7..8e1c1e9 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq) > * to an atomic operation. > */ > virtio_set_isr(vq->vdev, 0x1); > - event_notifier_set(&vq->guest_notifier); > + if (event_notifier_set(&vq->guest_notifier)) { > + virtio_notify_vector(vdev, vq->vector); > + } Does this fail because the underlying fd got closed? Then there's a problem: trying to write to a closed fd might corrupt an unrelated fd. If you want to use this way we need to set fds to -1 when we close. > } > > static void virtio_irq(VirtQueue *vq) > -- > 2.8.4 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 12:54 ` Michael S. Tsirkin @ 2017-03-01 13:31 ` Halil Pasic 0 siblings, 0 replies; 10+ messages in thread From: Halil Pasic @ 2017-03-01 13:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck On 03/01/2017 01:54 PM, Michael S. Tsirkin wrote: > On Wed, Mar 01, 2017 at 12:50:04PM +0100, Halil Pasic wrote: >> The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and >> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active" >> changed how notifications are done for virtio-blk substantially. Due to a >> race condition interrupts are lost when irqfd is torn down after >> notify_guest_bh was scheduled but before it actually runs. Furthermore >> virtio_notify_irqfd ignores the value returned by event_notifier_set >> which correctly indicates that notification has failed due to bad file >> descriptor. >> >> Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd >> notification mechanism if event_notifier_set fails. >> >> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> >> --- >> >> This is probably not the only way to fix this: suggestions welcome. I >> did not use a fixes tag because I'm not sure yet where exactly things got >> broken. Maybe guys more familiar with dataplane an coroutines can help >> (Paolo, Stefan). >> --- >> hw/virtio/virtio.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c >> index 23483c7..8e1c1e9 100644 >> --- a/hw/virtio/virtio.c >> +++ b/hw/virtio/virtio.c >> @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq) >> * to an atomic operation. >> */ >> virtio_set_isr(vq->vdev, 0x1); >> - event_notifier_set(&vq->guest_notifier); >> + if (event_notifier_set(&vq->guest_notifier)) { >> + virtio_notify_vector(vdev, vq->vector); >> + } > > Does this fail because the underlying fd got closed? Yes. Its data_plane_stop()->virtio_ccw_set_guest_notifier->event_notifier_cleanup(). The function event_notifier_cleanup() does not set fds to -1 :(. Seems to me, it would be safer to do so. Should I make a patch? > Then there's a problem: trying to write to a closed > fd might corrupt an unrelated fd. > If you want to use this way we need to set fds to -1 when we close. Sorry, did not check for that. OTOH Paolo says my approach is fundamentally flawed because virtio_notify_vector is not thread safe. I suggest we continue the discussion there. Regards, Halil > >> } >> >> static void virtio_irq(VirtQueue *vq) >> -- >> 2.8.4 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic 2017-03-01 12:54 ` Michael S. Tsirkin @ 2017-03-01 12:57 ` Paolo Bonzini 2017-03-01 13:22 ` Halil Pasic 1 sibling, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2017-03-01 12:57 UTC (permalink / raw) To: Halil Pasic, qemu-devel, Michael S. Tsirkin Cc: Stefan Hajnoczi, Cornelia Huck On 01/03/2017 12:50, Halil Pasic wrote: > The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and > 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active" > changed how notifications are done for virtio-blk substantially. Due to a > race condition interrupts are lost when irqfd is torn down after > notify_guest_bh was scheduled but before it actually runs. I don't think the non-irqfd notification mechanism is thread safe, and that would be a problem for this patch. What is the path that causes the irqfd to be torn down? Only something like a reset should cause it (the only call in virtio-blk is from virtio_blk_data_plane_stop), and then the guest doesn't care anymore about interrupts. That path also does a qemu_bh_delete, so the notify_guest_bh should not be invoked at all. Paolo > Furthermore > virtio_notify_irqfd ignores the value returned by event_notifier_set > which correctly indicates that notification has failed due to bad file > descriptor. > > Let's fix this by making virtio_notify_irqfd fall back to the non-irqfd > notification mechanism if event_notifier_set fails. > > Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> > --- > > This is probably not the only way to fix this: suggestions welcome. I > did not use a fixes tag because I'm not sure yet where exactly things got > broken. Maybe guys more familiar with dataplane an coroutines can help > (Paolo, Stefan). > --- > hw/virtio/virtio.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 23483c7..8e1c1e9 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -1581,7 +1581,9 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq) > * to an atomic operation. > */ > virtio_set_isr(vq->vdev, 0x1); > - event_notifier_set(&vq->guest_notifier); > + if (event_notifier_set(&vq->guest_notifier)) { > + virtio_notify_vector(vdev, vq->vector); > + } > } > > static void virtio_irq(VirtQueue *vq) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 12:57 ` Paolo Bonzini @ 2017-03-01 13:22 ` Halil Pasic 2017-03-01 14:29 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Halil Pasic @ 2017-03-01 13:22 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin Cc: Stefan Hajnoczi, Cornelia Huck On 03/01/2017 01:57 PM, Paolo Bonzini wrote: > > > On 01/03/2017 12:50, Halil Pasic wrote: >> The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and >> 9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active" >> changed how notifications are done for virtio-blk substantially. Due to a >> race condition interrupts are lost when irqfd is torn down after >> notify_guest_bh was scheduled but before it actually runs. > > I don't think the non-irqfd notification mechanism is thread safe, and > that would be a problem for this patch. Sorry, haven't looked into that thoroughly (and speculated people with more understanding will jump in). > > What is the path that causes the irqfd to be torn down? Only something Here a trace: 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 ^== DATAPLANE STOP 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false); in virtio_blk_data_plane_stop and done immediately after irqfd is cleaned up by the transport 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 halil: error in event_notifier_set: Bad file descriptor ^== here we have the problem If you want a stacktrace that can be arranged to. > like a reset should cause it (the only call in virtio-blk is from > virtio_blk_data_plane_stop), and then the guest doesn't care anymore > about interrupts. I do not understand this with 'doesn't care anymore about interrupts'. I was debugging a virtio-blk device being stuck waiting for a host notification (interrupt) after migration. > > That path also does a qemu_bh_delete, so the notify_guest_bh should not > be invoked at all. > That's only for destroy. I'm migrating. Seems I tried to fix this is the wrong way. Was not too confident about it in the first place. Suggestions welcome! Cheers, Halil ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 13:22 ` Halil Pasic @ 2017-03-01 14:29 ` Paolo Bonzini 2017-03-01 16:08 ` Halil Pasic 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2017-03-01 14:29 UTC (permalink / raw) To: Halil Pasic, qemu-devel, Michael S. Tsirkin Cc: Stefan Hajnoczi, Cornelia Huck On 01/03/2017 14:22, Halil Pasic wrote: > Here a trace: > > 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 > 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 > 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 > 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 > 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880 > 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8 > 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 > ^== DATAPLANE STOP > 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 > 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false); > in virtio_blk_data_plane_stop and done immediately after > irqfd is cleaned up by the transport > 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > halil: error in event_notifier_set: Bad file descriptor > ^== here we have the problem > > If you want a stacktrace that can be arranged to. > >> like a reset should cause it (the only call in virtio-blk is from >> virtio_blk_data_plane_stop), and then the guest doesn't care anymore >> about interrupts. > I do not understand this with 'doesn't care anymore about interrupts'. > I was debugging a virtio-blk device being stuck waiting for a host > notification (interrupt) after migration. Ok, this explains it better then. The issue is that virtio_blk_data_plane_stop doesn't flush the bottom half, which you want to do when the caller is, for example, virtio_ccw_vmstate_change. Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s) after blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); ? Paolo >> That path also does a qemu_bh_delete, so the notify_guest_bh should not >> be invoked at all. >> > That's only for destroy. I'm migrating. > > Seems I tried to fix this is the wrong way. Was not too confident about it > in the first place. Suggestions welcome! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 14:29 ` Paolo Bonzini @ 2017-03-01 16:08 ` Halil Pasic 2017-03-01 16:53 ` Michael S. Tsirkin 2017-03-01 19:53 ` Paolo Bonzini 0 siblings, 2 replies; 10+ messages in thread From: Halil Pasic @ 2017-03-01 16:08 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin Cc: Cornelia Huck, Stefan Hajnoczi On 03/01/2017 03:29 PM, Paolo Bonzini wrote: > > > On 01/03/2017 14:22, Halil Pasic wrote: >> Here a trace: >> >> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 >> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880 >> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8 >> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 >> ^== DATAPLANE STOP >> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false); >> in virtio_blk_data_plane_stop and done immediately after >> irqfd is cleaned up by the transport >> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> halil: error in event_notifier_set: Bad file descriptor >> ^== here we have the problem >> >> If you want a stacktrace that can be arranged to. >> >>> like a reset should cause it (the only call in virtio-blk is from >>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore >>> about interrupts. >> I do not understand this with 'doesn't care anymore about interrupts'. >> I was debugging a virtio-blk device being stuck waiting for a host >> notification (interrupt) after migration. > > Ok, this explains it better then. The issue is that > virtio_blk_data_plane_stop doesn't flush the bottom half, which you want > to do when the caller is, for example, virtio_ccw_vmstate_change. > > Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s) > after > > blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); > > ? > With --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) /* Drain and switch bs back to the QEMU main loop */ blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); + qemu_bh_cancel(s->bh); + notify_guest_bh(s); applied I do not see the problem any more. I will most likely turn this into a patch tomorrow. I would like to give it some more testing and thinking (see questions below) until tomorrow. I should probably cc stable, or? I would also like to do some diagnostic stuff if virtio_notify_irqfd fails. Maybe assert success for event_notifier_set. Would that be OK with you? I have a couple of questions about the ways of the dataplane code. If you are too busy, feel free to not answer -- I will keep thinking myself. Q1. For this to work correctly, it seems to me, we need to be sure that virtio_blk_req_complete can not be happen between the newly added notify_guest_bh(s); and vblk->dataplane_started = false; becomes visible. How is this ensured? Q2. The virtio_blk_data_plane_stop should be from the thread/context associated with the main event loop, and with that vblk->dataplane_started = false too. But I think dataplane_started may end up being used form a different thread (e.g. req_complete). How does the sequencing work there and/or is it even important? Regards, Halil ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 16:08 ` Halil Pasic @ 2017-03-01 16:53 ` Michael S. Tsirkin 2017-03-01 19:53 ` Paolo Bonzini 1 sibling, 0 replies; 10+ messages in thread From: Michael S. Tsirkin @ 2017-03-01 16:53 UTC (permalink / raw) To: Halil Pasic; +Cc: Paolo Bonzini, qemu-devel, Cornelia Huck, Stefan Hajnoczi On Wed, Mar 01, 2017 at 05:08:39PM +0100, Halil Pasic wrote: > > > On 03/01/2017 03:29 PM, Paolo Bonzini wrote: > > > > > > On 01/03/2017 14:22, Halil Pasic wrote: > >> Here a trace: > >> > >> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 > >> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > >> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 > >> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 > >> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > >> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 > >> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > >> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880 > >> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8 > >> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 > >> ^== DATAPLANE STOP > >> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 > >> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > >> ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false); > >> in virtio_blk_data_plane_stop and done immediately after > >> irqfd is cleaned up by the transport > >> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 > >> halil: error in event_notifier_set: Bad file descriptor > >> ^== here we have the problem > >> > >> If you want a stacktrace that can be arranged to. > >> > >>> like a reset should cause it (the only call in virtio-blk is from > >>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore > >>> about interrupts. > >> I do not understand this with 'doesn't care anymore about interrupts'. > >> I was debugging a virtio-blk device being stuck waiting for a host > >> notification (interrupt) after migration. > > > > Ok, this explains it better then. The issue is that > > virtio_blk_data_plane_stop doesn't flush the bottom half, which you want > > to do when the caller is, for example, virtio_ccw_vmstate_change. > > > > Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s) > > after > > > > blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); > > > > ? > > > > With > > --- a/hw/block/dataplane/virtio-blk.c > +++ b/hw/block/dataplane/virtio-blk.c > @@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) > > /* Drain and switch bs back to the QEMU main loop */ > blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); > + qemu_bh_cancel(s->bh); > + notify_guest_bh(s); > > applied I do not see the problem any more. I will most likely > turn this into a patch tomorrow. I would like to give it some more testing and > thinking (see questions below) until tomorrow. > > I should probably cc stable, or? > > I would also like to do some diagnostic stuff if virtio_notify_irqfd fails. > Maybe assert success for event_notifier_set. Would that be OK with you? Main reason it can't fail is because we don't close the fd. Given no callers check the return status, I'd be inclined to go further and stick that assert into event_notifier_set, convert that function to int. > I have a couple of questions about the ways of the dataplane code. If > you are too busy, feel free to not answer -- I will keep thinking myself. > > Q1. For this to work correctly, it seems to me, we need to be sure that > virtio_blk_req_complete can not be happen between the newly added > notify_guest_bh(s); > and > vblk->dataplane_started = false; > becomes visible. How is this ensured? > Q2. The virtio_blk_data_plane_stop should be from the thread/context > associated with the main event loop, and with that > vblk->dataplane_started = false too. But I think dataplane_started > may end up being used form a different thread (e.g. req_complete). > How does the sequencing work there and/or is it even important? > > Regards, > Halil ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 16:08 ` Halil Pasic 2017-03-01 16:53 ` Michael S. Tsirkin @ 2017-03-01 19:53 ` Paolo Bonzini 2017-03-02 13:14 ` Halil Pasic 1 sibling, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2017-03-01 19:53 UTC (permalink / raw) To: Halil Pasic, qemu-devel, Michael S. Tsirkin Cc: Cornelia Huck, Stefan Hajnoczi On 01/03/2017 17:08, Halil Pasic wrote: > applied I do not see the problem any more. I will most likely > turn this into a patch tomorrow. I would like to give it some more testing and > thinking (see questions below) until tomorrow. > > I should probably cc stable, or? Yes, please do! > > Q1. For this to work correctly, it seems to me, we need to be sure that > virtio_blk_req_complete can not be happen between the newly added > notify_guest_bh(s); > and > vblk->dataplane_started = false; > becomes visible. How is this ensured? blk_set_aio_context drains the block device, and the event notifiers are not active anymore so draining the block device coincides with the last call to virtio_blk_req_complete. Please add a comment - it's a good observation. > Q2. The virtio_blk_data_plane_stop should be from the thread/context > associated with the main event loop, and with that > vblk->dataplane_started = false too. But I think dataplane_started > may end up being used form a different thread (e.g. req_complete). 1) virtio_queue_aio_set_host_notifier_handler stops the event notifiers 2) virtio_bus_set_host_notifier invokes them one last time before exiting Note that this could call again virtio_queue_notify_vq and hence virtio_device_start_ioeventfd, but dataplane won't be reactivated because vblk->dataplane_started is still true. > How does the sequencing work there and/or is it even important? It is important and not really easy to get right---as shown by the bug you found, in fact. Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify 2017-03-01 19:53 ` Paolo Bonzini @ 2017-03-02 13:14 ` Halil Pasic 0 siblings, 0 replies; 10+ messages in thread From: Halil Pasic @ 2017-03-02 13:14 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel, Michael S. Tsirkin Cc: Cornelia Huck, Stefan Hajnoczi On 03/01/2017 08:53 PM, Paolo Bonzini wrote: > > > On 01/03/2017 17:08, Halil Pasic wrote: >> applied I do not see the problem any more. I will most likely >> turn this into a patch tomorrow. I would like to give it some more testing and >> thinking (see questions below) until tomorrow. >> >> I should probably cc stable, or? > > Yes, please do! > >> >> Q1. For this to work correctly, it seems to me, we need to be sure that >> virtio_blk_req_complete can not be happen between the newly added >> notify_guest_bh(s); >> and >> vblk->dataplane_started = false; >> becomes visible. How is this ensured? > > blk_set_aio_context drains the block device, and the event notifiers are > not active anymore so draining the block device coincides with the last > call to virtio_blk_req_complete. > > Please add a comment - it's a good observation. > >> Q2. The virtio_blk_data_plane_stop should be from the thread/context >> associated with the main event loop, and with that >> vblk->dataplane_started = false too. But I think dataplane_started >> may end up being used form a different thread (e.g. req_complete). > > 1) virtio_queue_aio_set_host_notifier_handler stops the event notifiers > > 2) virtio_bus_set_host_notifier invokes them one last time before exiting > > Note that this could call again virtio_queue_notify_vq and hence > virtio_device_start_ioeventfd, but dataplane won't be reactivated > because vblk->dataplane_started is still true. > >> How does the sequencing work there and/or is it even important? > > It is important and not really easy to get right---as shown by the bug > you found, in fact. > Thank you very much for the explanations. I have just sent a patch based on what we discussed here. I think I roughly understand now, how this is supposed to work regarding concurrency, but I guess I will have to just trust you to some extent. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-02 13:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-01 11:50 [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify Halil Pasic 2017-03-01 12:54 ` Michael S. Tsirkin 2017-03-01 13:31 ` Halil Pasic 2017-03-01 12:57 ` Paolo Bonzini 2017-03-01 13:22 ` Halil Pasic 2017-03-01 14:29 ` Paolo Bonzini 2017-03-01 16:08 ` Halil Pasic 2017-03-01 16:53 ` Michael S. Tsirkin 2017-03-01 19:53 ` Paolo Bonzini 2017-03-02 13:14 ` Halil Pasic
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.