[Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead
@ 2015-08-20  8:14 Peter Lieven
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback Peter Lieven
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Lieven @ 2015-08-20  8:14 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, Peter Lieven, jsnow, pbonzini

the blk_drain_all() that is executed if the guest issues a DMA cancel
leads to a stuck main loop if the storage backend (e.g. a NFS share)
is unresponsive.

This scenario is a common case for CDROM images mounted from an
NFS share. In this case a broken NFS server can take down the
whole VM even if the mounted CDROM is not used and was just not
unmounted after usage.

This approach avoids the blk_drain_all for read-only media and
cancelles the AIO locally and makes the callback a NOP if the
original request is completed after the NFS share is responsive
again.

Peter Lieven (2):
  block/io: allow AIOCB without callback
  ide/atapi: partially avoid deadlock if the storage backend is dead

 block/io.c   |  8 ++++++--
 hw/ide/pci.c | 32 ++++++++++++++++++--------------
 2 files changed, 24 insertions(+), 16 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback
  2015-08-20  8:14 [Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
@ 2015-08-20  8:14 ` Peter Lieven
  2015-08-21  6:12   ` Eric Blake
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Lieven @ 2015-08-20  8:14 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, Peter Lieven, jsnow, pbonzini

If the backend storage is unresponsive and we cancel a request due to
a timeout we cannot immediately destroy the AIOCB because the storage
might complete the original request laster if it is responsive again.
For this purpose allow to set the callback to NULL and ignore it in
this case.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index d4bc83b..e628581 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2007,7 +2007,9 @@ static void bdrv_aio_bh_cb(void *opaque)
         qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
     }
     qemu_vfree(acb->bounce);
-    acb->common.cb(acb->common.opaque, acb->ret);
+    if (acb->common.cb) {
+        acb->common.cb(acb->common.opaque, acb->ret);
+    }
     qemu_bh_delete(acb->bh);
     acb->bh = NULL;
     qemu_aio_unref(acb);
@@ -2075,7 +2077,9 @@ static const AIOCBInfo bdrv_em_co_aiocb_info = {
 static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
 {
     if (!acb->need_bh) {
-        acb->common.cb(acb->common.opaque, acb->req.error);
+        if (acb->common.cb) {
+            acb->common.cb(acb->common.opaque, acb->req.error);
+        }
         qemu_aio_unref(acb);
     }
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-08-20  8:14 [Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback Peter Lieven
@ 2015-08-20  8:14 ` Peter Lieven
  2015-08-21  6:13   ` Eric Blake
  2015-09-03 16:59   ` Stefan Hajnoczi
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Lieven @ 2015-08-20  8:14 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, Peter Lieven, jsnow, pbonzini

the blk_drain_all() that is executed if the guest issues a DMA cancel
leads to a stuck main loop if the storage backend (e.g. a NFS share)
is unresponsive.

This scenario is a common case for CDROM images mounted from an
NFS share. In this case a broken NFS server can take down the
whole VM even if the mounted CDROM is not used and was just not
unmounted after usage.

This approach avoids the blk_drain_all for read-only media and
cancelles the AIO locally and makes the callback a NOP if the
original request is completed after the NFS share is responsive
again.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/ide/pci.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..a8b4175 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,25 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
     /* Ignore writes to SSBM if it keeps the old value */
     if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
         if (!(val & BM_CMD_START)) {
-            /*
-             * We can't cancel Scatter Gather DMA in the middle of the
-             * operation or a partial (not full) DMA transfer would reach
-             * the storage so we wait for completion instead (we beahve
-             * like if the DMA was completed by the time the guest trying
-             * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
-             * set).
-             *
-             * In the future we'll be able to safely cancel the I/O if the
-             * whole DMA operation will be submitted to disk with a single
-             * aio operation with preadv/pwritev.
-             */
             if (bm->bus->dma->aiocb) {
-                blk_drain_all();
-                assert(bm->bus->dma->aiocb == NULL);
+                if (!bdrv_is_read_only(bm->bus->dma->aiocb->bs)) {
+                    /* We can't cancel Scatter Gather DMA in the middle of the
+                     * operation or a partial (not full) DMA transfer would
+                     * reach the storage so we wait for completion instead
+                     * (we beahve like if the DMA was completed by the time the
+                     * guest trying to cancel dma with bmdma_cmd_writeb with
+                     * BM_CMD_START not set). */
+                    blk_drain_all();
+                    assert(bm->bus->dma->aiocb == NULL);
+                } else {
+                    /* On a read-only device (e.g. CDROM) we can't cause incon-
+                     * sistencies and thus cancel the AIOCB locally and avoid
+                     * to be called back later if the original request is
+                     * completed. */
+                    BlockAIOCB *aiocb = bm->bus->dma->aiocb;
+                    aiocb->cb(aiocb->opaque, -ECANCELED);
+                    aiocb->cb = NULL;
+                }
             }
             bm->status &= ~BM_STATUS_DMAING;
         } else {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback Peter Lieven
@ 2015-08-21  6:12   ` Eric Blake
  2015-08-31  8:38     ` Peter Lieven
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Blake @ 2015-08-21  6:12 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jsnow, pbonzini

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

On 08/20/2015 01:14 AM, Peter Lieven wrote:
> If the backend storage is unresponsive and we cancel a request due to
> a timeout we cannot immediately destroy the AIOCB because the storage
> might complete the original request laster if it is responsive again.

s/laster/later/

> For this purpose allow to set the callback to NULL and ignore it in
> this case.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---

I'll leave the technical review to others, I'm just pointing out grammar.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
@ 2015-08-21  6:13   ` Eric Blake
  2015-09-03 16:59   ` Stefan Hajnoczi
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Blake @ 2015-08-21  6:13 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jsnow, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

On 08/20/2015 01:14 AM, Peter Lieven wrote:
> the blk_drain_all() that is executed if the guest issues a DMA cancel
> leads to a stuck main loop if the storage backend (e.g. a NFS share)
> is unresponsive.
> 
> This scenario is a common case for CDROM images mounted from an
> NFS share. In this case a broken NFS server can take down the
> whole VM even if the mounted CDROM is not used and was just not
> unmounted after usage.
> 
> This approach avoids the blk_drain_all for read-only media and
> cancelles the AIO locally and makes the callback a NOP if the

s/cancelles/cancels/

> original request is completed after the NFS share is responsive
> again.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  hw/ide/pci.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)

> +                if (!bdrv_is_read_only(bm->bus->dma->aiocb->bs)) {
> +                    /* We can't cancel Scatter Gather DMA in the middle of the
> +                     * operation or a partial (not full) DMA transfer would
> +                     * reach the storage so we wait for completion instead
> +                     * (we beahve like if the DMA was completed by the time the

s/beahve like/behave as/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback
  2015-08-21  6:12   ` Eric Blake
@ 2015-08-31  8:38     ` Peter Lieven
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Lieven @ 2015-08-31  8:38 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: kwolf, stefanha, Jeff Cody, Max Reitz, pbonzini, jsnow

Am 21.08.2015 um 08:12 schrieb Eric Blake:
> On 08/20/2015 01:14 AM, Peter Lieven wrote:
>> If the backend storage is unresponsive and we cancel a request due to
>> a timeout we cannot immediately destroy the AIOCB because the storage
>> might complete the original request laster if it is responsive again.
> s/laster/later/
>
>> For this purpose allow to set the callback to NULL and ignore it in
>> this case.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
> I'll leave the technical review to others, I'm just pointing out grammar.
>


I am using this one for quite some time now. It seems a good step into solving the deadlock
problem. The issue is we still need to make the ATAPI calls async. The OS is only spending 2-3 Minutes
with DMA cancelling and then issues reads again so we still deadlock at the end.

Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-08-20  8:14 ` [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
  2015-08-21  6:13   ` Eric Blake
@ 2015-09-03 16:59   ` Stefan Hajnoczi
  2015-09-06  9:24     ` Peter Lieven
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2015-09-03 16:59 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, jsnow, qemu-devel, qemu-block

On Thu, Aug 20, 2015 at 10:14:08AM +0200, Peter Lieven wrote:
> the blk_drain_all() that is executed if the guest issues a DMA cancel
> leads to a stuck main loop if the storage backend (e.g. a NFS share)
> is unresponsive.
> 
> This scenario is a common case for CDROM images mounted from an
> NFS share. In this case a broken NFS server can take down the
> whole VM even if the mounted CDROM is not used and was just not
> unmounted after usage.
> 
> This approach avoids the blk_drain_all for read-only media and
> cancelles the AIO locally and makes the callback a NOP if the
> original request is completed after the NFS share is responsive
> again.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  hw/ide/pci.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index d31ff88..a8b4175 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -240,21 +240,25 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>      /* Ignore writes to SSBM if it keeps the old value */
>      if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>          if (!(val & BM_CMD_START)) {
> -            /*
> -             * We can't cancel Scatter Gather DMA in the middle of the
> -             * operation or a partial (not full) DMA transfer would reach
> -             * the storage so we wait for completion instead (we beahve
> -             * like if the DMA was completed by the time the guest trying
> -             * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> -             * set).
> -             *
> -             * In the future we'll be able to safely cancel the I/O if the
> -             * whole DMA operation will be submitted to disk with a single
> -             * aio operation with preadv/pwritev.
> -             */
>              if (bm->bus->dma->aiocb) {
> -                blk_drain_all();
> -                assert(bm->bus->dma->aiocb == NULL);
> +                if (!bdrv_is_read_only(bm->bus->dma->aiocb->bs)) {
> +                    /* We can't cancel Scatter Gather DMA in the middle of the
> +                     * operation or a partial (not full) DMA transfer would
> +                     * reach the storage so we wait for completion instead
> +                     * (we beahve like if the DMA was completed by the time the
> +                     * guest trying to cancel dma with bmdma_cmd_writeb with
> +                     * BM_CMD_START not set). */
> +                    blk_drain_all();
> +                    assert(bm->bus->dma->aiocb == NULL);
> +                } else {
> +                    /* On a read-only device (e.g. CDROM) we can't cause incon-
> +                     * sistencies and thus cancel the AIOCB locally and avoid
> +                     * to be called back later if the original request is
> +                     * completed. */
> +                    BlockAIOCB *aiocb = bm->bus->dma->aiocb;
> +                    aiocb->cb(aiocb->opaque, -ECANCELED);
> +                    aiocb->cb = NULL;

I'm concerned that this isn't safe.

What happens if the request does complete (e.g. will guest RAM be
modified by the read operation)?

What happens if a new request is started and then old NOPed request
completes?

Taking a step back, what are the semantics of writing !(val &
BM_CMD_START)?  Is the device guaranteed to cancel/complete requests
during the register write?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-09-03 16:59   ` Stefan Hajnoczi
@ 2015-09-06  9:24     ` Peter Lieven
  2015-09-07 13:43       ` Stefan Hajnoczi
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Lieven @ 2015-09-06  9:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, jsnow, qemu-devel, qemu-block

Am 03.09.2015 um 18:59 schrieb Stefan Hajnoczi:
> On Thu, Aug 20, 2015 at 10:14:08AM +0200, Peter Lieven wrote:
>> the blk_drain_all() that is executed if the guest issues a DMA cancel
>> leads to a stuck main loop if the storage backend (e.g. a NFS share)
>> is unresponsive.
>>
>> This scenario is a common case for CDROM images mounted from an
>> NFS share. In this case a broken NFS server can take down the
>> whole VM even if the mounted CDROM is not used and was just not
>> unmounted after usage.
>>
>> This approach avoids the blk_drain_all for read-only media and
>> cancelles the AIO locally and makes the callback a NOP if the
>> original request is completed after the NFS share is responsive
>> again.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  hw/ide/pci.c | 32 ++++++++++++++++++--------------
>>  1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>> index d31ff88..a8b4175 100644
>> --- a/hw/ide/pci.c
>> +++ b/hw/ide/pci.c
>> @@ -240,21 +240,25 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>>      /* Ignore writes to SSBM if it keeps the old value */
>>      if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>>          if (!(val & BM_CMD_START)) {
>> -            /*
>> -             * We can't cancel Scatter Gather DMA in the middle of the
>> -             * operation or a partial (not full) DMA transfer would reach
>> -             * the storage so we wait for completion instead (we beahve
>> -             * like if the DMA was completed by the time the guest trying
>> -             * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> -             * set).
>> -             *
>> -             * In the future we'll be able to safely cancel the I/O if the
>> -             * whole DMA operation will be submitted to disk with a single
>> -             * aio operation with preadv/pwritev.
>> -             */
>>              if (bm->bus->dma->aiocb) {
>> -                blk_drain_all();
>> -                assert(bm->bus->dma->aiocb == NULL);
>> +                if (!bdrv_is_read_only(bm->bus->dma->aiocb->bs)) {
>> +                    /* We can't cancel Scatter Gather DMA in the middle of the
>> +                     * operation or a partial (not full) DMA transfer would
>> +                     * reach the storage so we wait for completion instead
>> +                     * (we beahve like if the DMA was completed by the time the
>> +                     * guest trying to cancel dma with bmdma_cmd_writeb with
>> +                     * BM_CMD_START not set). */
>> +                    blk_drain_all();
>> +                    assert(bm->bus->dma->aiocb == NULL);
>> +                } else {
>> +                    /* On a read-only device (e.g. CDROM) we can't cause incon-
>> +                     * sistencies and thus cancel the AIOCB locally and avoid
>> +                     * to be called back later if the original request is
>> +                     * completed. */
>> +                    BlockAIOCB *aiocb = bm->bus->dma->aiocb;
>> +                    aiocb->cb(aiocb->opaque, -ECANCELED);
>> +                    aiocb->cb = NULL;
> I'm concerned that this isn't safe.
>
> What happens if the request does complete (e.g. will guest RAM be
> modified by the read operation)?

I am afraid you are right. The callback of the storage driver will
likely overwrite the memory. This could be solved for storage drivers
which don't do zero copy if it would be possible to notifiy them about
the cancellation.

>
> What happens if a new request is started and then old NOPed request
> completes?

That should work because the callback of the NOPed request is never
fired.

>
> Taking a step back, what are the semantics of writing !(val &
> BM_CMD_START)?  Is the device guaranteed to cancel/complete requests
> during the register write?

I have to check that. John, do you have an idea?

Stefan, or do you have a better approach for getting rid of the bdrv_drain_all
in this piece of code?

Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-09-06  9:24     ` Peter Lieven
@ 2015-09-07 13:43       ` Stefan Hajnoczi
  2015-09-07 14:05         ` Peter Lieven
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2015-09-07 13:43 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, jsnow, qemu-devel, qemu-block

On Sun, Sep 06, 2015 at 11:24:10AM +0200, Peter Lieven wrote:
> > Taking a step back, what are the semantics of writing !(val &
> > BM_CMD_START)?  Is the device guaranteed to cancel/complete requests
> > during the register write?
> 
> I have to check that. John, do you have an idea?
> 
> Stefan, or do you have a better approach for getting rid of the bdrv_drain_all
> in this piece of code?

What's needed is an asynchronous approach that honors the semantics of
the Start/Stop bit.

>From "Programming Interface for Bus Master IDE Controller", Revision 1.0:

"Start/Stop Bus Master: Writing a '1' to this bit enables bus master
operation of the controller.  Bus master operation begins when this bit
is detected changing from a zero to a one. The controller will transfer
data between the IDE device and memory only when this bit is set.
Master operation can be halted by writing a '0' to this bit. All state
information is lost when a '0' is written; Master mode operation cannot
be stopped and then resumed. If this bit is reset while bus master
operation is still active (i.e., the Bus Master IDE Active bit of the
Bus Master IDE Status register for that IDE channel is set) and the
drive has not yet finished its data transfer (The Interupt bit in the
Bus Master IDE Status register for that IDE channel is not set), the bus
master command is said to be aborted and data transfered from the drive
may be discarded before being written to system memory. This bit is
intended to be reset after the data transfer is completed, as indicated
by either the Bus Master IDE Active bit or the Interrupt bit of the Bus
Master IDE Status register for that IDE channel being set, or both."

bdrv_aio_cancel() can be used to nudge the request to cancel.  The
completion function is still called at some later point - and it could
take an arbitrary amount of time before the request finishes.

>From the IDE emulation perspective, there are two requirements:

1. After the Start bit has been cleared, no IDE state changes should
   occur due to the pending request.  No interrupts should be raised and
   no registers should change when the request completes
   (success/error/cancelled).  Also, the guest should be able to set the
   bit again and submit a new request.

2. No guest memory should be touched by a running request after the bit
   is cleared.  QEMU needs to go through bounce buffers in all cases
   instead of doing zero copy (mapping the PRDT).

   Using bounce buffers should be doable but you need to check the IDE
   code paths (PIO, DMA, and ATAPI) reachable from hw/ide/pci.c.
   There's also a denial-of-service scenario where the guest repeatedly
   submits an I/O requests and then clears the Start/Stop bit.  That
   would cause QEMU to build up many pending I/O requests and bounce
   buffers.  That needs to be handled, for example by punishing the
   guest by failing news I/O if there are more than 16 cancelled
   requests pending.

I'm interested in other ideas, but I think the solution space for
synchronous approaches that implement the semantics correctly is small.
I can think of a few other approaches that are easier to implement,
perform better, etc but they aren't correct.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead
  2015-09-07 13:43       ` Stefan Hajnoczi
@ 2015-09-07 14:05         ` Peter Lieven
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Lieven @ 2015-09-07 14:05 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, jsnow, qemu-devel, qemu-block

Am 07.09.2015 um 15:43 schrieb Stefan Hajnoczi:
> On Sun, Sep 06, 2015 at 11:24:10AM +0200, Peter Lieven wrote:
>>> Taking a step back, what are the semantics of writing !(val &
>>> BM_CMD_START)?  Is the device guaranteed to cancel/complete requests
>>> during the register write?
>> I have to check that. John, do you have an idea?
>>
>> Stefan, or do you have a better approach for getting rid of the bdrv_drain_all
>> in this piece of code?
> What's needed is an asynchronous approach that honors the semantics of
> the Start/Stop bit.
>
> From "Programming Interface for Bus Master IDE Controller", Revision 1.0:
>
> "Start/Stop Bus Master: Writing a '1' to this bit enables bus master
> operation of the controller.  Bus master operation begins when this bit
> is detected changing from a zero to a one. The controller will transfer
> data between the IDE device and memory only when this bit is set.
> Master operation can be halted by writing a '0' to this bit. All state
> information is lost when a '0' is written; Master mode operation cannot
> be stopped and then resumed. If this bit is reset while bus master
> operation is still active (i.e., the Bus Master IDE Active bit of the
> Bus Master IDE Status register for that IDE channel is set) and the
> drive has not yet finished its data transfer (The Interupt bit in the
> Bus Master IDE Status register for that IDE channel is not set), the bus
> master command is said to be aborted and data transfered from the drive
> may be discarded before being written to system memory. This bit is
> intended to be reset after the data transfer is completed, as indicated
> by either the Bus Master IDE Active bit or the Interrupt bit of the Bus
> Master IDE Status register for that IDE channel being set, or both."
>
> bdrv_aio_cancel() can be used to nudge the request to cancel.  The
> completion function is still called at some later point - and it could
> take an arbitrary amount of time before the request finishes.
>
> From the IDE emulation perspective, there are two requirements:
>
> 1. After the Start bit has been cleared, no IDE state changes should
>    occur due to the pending request.  No interrupts should be raised and
>    no registers should change when the request completes
>    (success/error/cancelled).  Also, the guest should be able to set the
>    bit again and submit a new request.

I think my approach should fullfil this requirement already The callback
into the IDE layer is set to NULL.

>
> 2. No guest memory should be touched by a running request after the bit
>    is cleared.  QEMU needs to go through bounce buffers in all cases
>    instead of doing zero copy (mapping the PRDT).
>
>    Using bounce buffers should be doable but you need to check the IDE
>    code paths (PIO, DMA, and ATAPI) reachable from hw/ide/pci.c.
>    There's also a denial-of-service scenario where the guest repeatedly
>    submits an I/O requests and then clears the Start/Stop bit.  That
>    would cause QEMU to build up many pending I/O requests and bounce
>    buffers.  That needs to be handled, for example by punishing the
>    guest by failing news I/O if there are more than 16 cancelled
>    requests pending.

This sounds doable. I will look into this.

Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-07 14:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-20  8:14 [Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
2015-08-20  8:14 ` [Qemu-devel] [PATCH 1/2] block/io: allow AIOCB without callback Peter Lieven
2015-08-21  6:12   ` Eric Blake
2015-08-31  8:38     ` Peter Lieven
2015-08-20  8:14 ` [Qemu-devel] [PATCH 2/2] ide/atapi: partially avoid deadlock if the storage backend is dead Peter Lieven
2015-08-21  6:13   ` Eric Blake
2015-09-03 16:59   ` Stefan Hajnoczi
2015-09-06  9:24     ` Peter Lieven
2015-09-07 13:43       ` Stefan Hajnoczi
2015-09-07 14:05         ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.