qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
@ 2019-07-08  2:55 shaju.abraham
  2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: shaju.abraham @ 2019-07-08  2:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Shaju Abraham, jsnow, qemu-block

From: Shaju Abraham <shaju.abraham@nutanix.com>

During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
The function iscsi_translate_sense() later translaters this error to -ECANCELED
and this value is passed to the callback function. In the case of  IDE DMA read
or write, the callback function returns immediately if the value of the ret
argument is -ECANCELED.
Later when ide_cancel_dma_sync() function is invoked  the assertion
"s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
Fix the issue by making the value of s->bus->dma->aiocb = NULL when
-ECANCELED is passed to the callback.

Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
---
 hw/ide/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 6afadf8..78ea357 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
     bool stay_active = false;
 
     if (ret == -ECANCELED) {
+        s->bus->dma->aiocb = NULL;
         return;
     }
 
-- 
1.9.4



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-08  2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
@ 2019-07-11 12:24 ` Stefan Hajnoczi
  2019-07-12 16:44   ` John Snow
  2019-07-26  0:58 ` [Qemu-devel] " John Snow
  2019-08-13 22:51 ` John Snow
  2 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2019-07-11 12:24 UTC (permalink / raw)
  To: shaju.abraham; +Cc: jsnow, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On Sun, Jul 07, 2019 at 07:55:03PM -0700, shaju.abraham@nutanix.com wrote:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

CCing John Snow, IDE maintainer.

You can use scripts/get_maintainer.pl -f hw/ide/core.c to find out who
to send patches to.

Stefan

> From: Shaju Abraham <shaju.abraham@nutanix.com>
> 
> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> and this value is passed to the callback function. In the case of  IDE DMA read
> or write, the callback function returns immediately if the value of the ret
> argument is -ECANCELED.
> Later when ide_cancel_dma_sync() function is invoked  the assertion
> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> -ECANCELED is passed to the callback.
> 
> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf8..78ea357 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      bool stay_active = false;
>  
>      if (ret == -ECANCELED) {
> +        s->bus->dma->aiocb = NULL;
>          return;
>      }
>  
> -- 
> 1.9.4
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2019-07-12 16:44   ` John Snow
  0 siblings, 0 replies; 12+ messages in thread
From: John Snow @ 2019-07-12 16:44 UTC (permalink / raw)
  To: Stefan Hajnoczi, shaju.abraham; +Cc: qemu-devel, qemu-block



On 7/11/19 8:24 AM, Stefan Hajnoczi wrote:
> On Sun, Jul 07, 2019 at 07:55:03PM -0700, shaju.abraham@nutanix.com wrote:
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> CCing John Snow, IDE maintainer.
> 
> You can use scripts/get_maintainer.pl -f hw/ide/core.c to find out who
> to send patches to.
> 
> Stefan
> 

ACK, I'll investigate this for the next rc.

--js

>> From: Shaju Abraham <shaju.abraham@nutanix.com>
>>
>> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
>> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
>> The function iscsi_translate_sense() later translaters this error to -ECANCELED
>> and this value is passed to the callback function. In the case of  IDE DMA read
>> or write, the callback function returns immediately if the value of the ret
>> argument is -ECANCELED.
>> Later when ide_cancel_dma_sync() function is invoked  the assertion
>> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
>> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
>> -ECANCELED is passed to the callback.
>>
>> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> ---
>>  hw/ide/core.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 6afadf8..78ea357 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>      bool stay_active = false;
>>  
>>      if (ret == -ECANCELED) {
>> +        s->bus->dma->aiocb = NULL;
>>          return;
>>      }
>>  
>> -- 
>> 1.9.4
>>
>>

-- 
—js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-08  2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
  2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2019-07-26  0:58 ` John Snow
  2019-07-26 20:18   ` John Snow
  2019-08-13 22:51 ` John Snow
  2 siblings, 1 reply; 12+ messages in thread
From: John Snow @ 2019-07-26  0:58 UTC (permalink / raw)
  To: shaju.abraham, qemu-devel; +Cc: qemu-block



On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
> From: Shaju Abraham <shaju.abraham@nutanix.com>
> 
> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> and this value is passed to the callback function. In the case of  IDE DMA read
> or write, the callback function returns immediately if the value of the ret
> argument is -ECANCELED.
> Later when ide_cancel_dma_sync() function is invoked  the assertion
> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> -ECANCELED is passed to the callback.
> 
> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf8..78ea357 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      bool stay_active = false;
>  
>      if (ret == -ECANCELED) {
> +        s->bus->dma->aiocb = NULL;
>          return;
>      }
>  
> 

The part that makes me nervous here is that I can't remember why we do
NO cleanup whatsoever for the ECANCELED case.

commit 0d910cfeaf2076b116b4517166d5deb0fea76394
Author: Fam Zheng <famz@redhat.com>
Date:   Thu Sep 11 13:41:07 2014 +0800

    ide/ahci: Check for -ECANCELED in aio callbacks


... This looks like we never expected the aio callbacks to ever get
called with ECANCELED, so we treat this as a QEMU-internal signal.

It looks like we expect these callbacks to do NOTHING in this case; but
I'm not sure where the IDE state machine does its cleanup otherwise.
(The DMA might have been canceled, but the DMA and IDE state machines
still need to exit their loop.)

If you take a look at this patch from 2014 though, there are many other
spots where we have littered ECANCELED checks that might also cause
problems if we're receiving error codes we thought we couldn't get normally.

I am worried this patch papers over something worse.

--js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-26  0:58 ` [Qemu-devel] " John Snow
@ 2019-07-26 20:18   ` John Snow
  2019-07-29 10:09     ` Stefan Hajnoczi
  0 siblings, 1 reply; 12+ messages in thread
From: John Snow @ 2019-07-26 20:18 UTC (permalink / raw)
  To: shaju.abraham, qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-block

Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
about this and I'd like to clear this up quickly if it's possible:

On 7/25/19 8:58 PM, John Snow wrote:
> 
> 
> On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
>> From: Shaju Abraham <shaju.abraham@nutanix.com>
>>
>> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
>> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
>> The function iscsi_translate_sense() later translaters this error to -ECANCELED
>> and this value is passed to the callback function. In the case of  IDE DMA read
>> or write, the callback function returns immediately if the value of the ret
>> argument is -ECANCELED.
>> Later when ide_cancel_dma_sync() function is invoked  the assertion
>> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
>> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
>> -ECANCELED is passed to the callback.
>>
>> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> ---
>>  hw/ide/core.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 6afadf8..78ea357 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>      bool stay_active = false;
>>  
>>      if (ret == -ECANCELED) {
>> +        s->bus->dma->aiocb = NULL;
>>          return;
>>      }
>>  
>>
> 
> The part that makes me nervous here is that I can't remember why we do
> NO cleanup whatsoever for the ECANCELED case.
> 
> commit 0d910cfeaf2076b116b4517166d5deb0fea76394
> Author: Fam Zheng <famz@redhat.com>
> Date:   Thu Sep 11 13:41:07 2014 +0800
> 
>     ide/ahci: Check for -ECANCELED in aio callbacks
> 
> 
> ... This looks like we never expected the aio callbacks to ever get
> called with ECANCELED, so we treat this as a QEMU-internal signal.
> 
> It looks like we expect these callbacks to do NOTHING in this case; but
> I'm not sure where the IDE state machine does its cleanup otherwise.
> (The DMA might have been canceled, but the DMA and IDE state machines
> still need to exit their loop.)
> 
> If you take a look at this patch from 2014 though, there are many other
> spots where we have littered ECANCELED checks that might also cause
> problems if we're receiving error codes we thought we couldn't get normally.
> 
> I am worried this patch papers over something worse.
> 
I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
if it's invoked with ECANCELED: shouldn't it be the case that the IDE
state machine needs to know that a transfer it was relying on to service
an ATA command was canceled and treat it like an error?

Why was it ever correct to ignore these? Is it because we only ever
canceled DMA during reset/shutdown/etc?

It appears as if iscsi requests can actually genuinely return an
ECANCELED errno, so there are likely several places in the IDE code that
need to accommodate this from happening.

The easiest fix LOOKS like just deleting the special-casing of ECANCELED
altogether and letting the error pathways handle things as normal.

Am I mistaken?

--js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-26 20:18   ` John Snow
@ 2019-07-29 10:09     ` Stefan Hajnoczi
  2019-07-29 19:45       ` John Snow
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2019-07-29 10:09 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, Paolo Bonzini, shaju.abraham, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]

On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote:
> Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
> about this and I'd like to clear this up quickly if it's possible:
> 
> On 7/25/19 8:58 PM, John Snow wrote:
> > 
> > 
> > On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
> >> From: Shaju Abraham <shaju.abraham@nutanix.com>
> >>
> >> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> >> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> >> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> >> and this value is passed to the callback function. In the case of  IDE DMA read
> >> or write, the callback function returns immediately if the value of the ret
> >> argument is -ECANCELED.
> >> Later when ide_cancel_dma_sync() function is invoked  the assertion
> >> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> >> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> >> -ECANCELED is passed to the callback.
> >>
> >> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> >> ---
> >>  hw/ide/core.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 6afadf8..78ea357 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>      bool stay_active = false;
> >>  
> >>      if (ret == -ECANCELED) {
> >> +        s->bus->dma->aiocb = NULL;
> >>          return;
> >>      }
> >>  
> >>
> > 
> > The part that makes me nervous here is that I can't remember why we do
> > NO cleanup whatsoever for the ECANCELED case.
> > 
> > commit 0d910cfeaf2076b116b4517166d5deb0fea76394
> > Author: Fam Zheng <famz@redhat.com>
> > Date:   Thu Sep 11 13:41:07 2014 +0800
> > 
> >     ide/ahci: Check for -ECANCELED in aio callbacks
> > 
> > 
> > ... This looks like we never expected the aio callbacks to ever get
> > called with ECANCELED, so we treat this as a QEMU-internal signal.
> > 
> > It looks like we expect these callbacks to do NOTHING in this case; but
> > I'm not sure where the IDE state machine does its cleanup otherwise.
> > (The DMA might have been canceled, but the DMA and IDE state machines
> > still need to exit their loop.)
> > 
> > If you take a look at this patch from 2014 though, there are many other
> > spots where we have littered ECANCELED checks that might also cause
> > problems if we're receiving error codes we thought we couldn't get normally.
> > 
> > I am worried this patch papers over something worse.
> > 
> I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
> if it's invoked with ECANCELED: shouldn't it be the case that the IDE
> state machine needs to know that a transfer it was relying on to service
> an ATA command was canceled and treat it like an error?
> 
> Why was it ever correct to ignore these? Is it because we only ever
> canceled DMA during reset/shutdown/etc?
> 
> It appears as if iscsi requests can actually genuinely return an
> ECANCELED errno, so there are likely several places in the IDE code that
> need to accommodate this from happening.
> 
> The easiest fix LOOKS like just deleting the special-casing of ECANCELED
> altogether and letting the error pathways handle things as normal.
> 
> Am I mistaken?

I think your instincts are right that there are deeper issues.  The
first step would be test cases, then you can be sure various scenarios
have been handled correctly.

I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and
ide_flush_cb() all differ in whether they reset s->pio_aiocb and
s->status before returning early due to -ECANCELED.  That must be a bug.

I didn't look at the ide_dma_cb() code path.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-29 10:09     ` Stefan Hajnoczi
@ 2019-07-29 19:45       ` John Snow
  2019-07-29 21:32         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: John Snow @ 2019-07-29 19:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, shaju.abraham, qemu-devel, qemu-block



On 7/29/19 6:09 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote:
>> Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
>> about this and I'd like to clear this up quickly if it's possible:
>>
>> On 7/25/19 8:58 PM, John Snow wrote:
>>>
>>>
>>> On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
>>>> From: Shaju Abraham <shaju.abraham@nutanix.com>
>>>>
>>>> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
>>>> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
>>>> The function iscsi_translate_sense() later translaters this error to -ECANCELED
>>>> and this value is passed to the callback function. In the case of  IDE DMA read
>>>> or write, the callback function returns immediately if the value of the ret
>>>> argument is -ECANCELED.
>>>> Later when ide_cancel_dma_sync() function is invoked  the assertion
>>>> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
>>>> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
>>>> -ECANCELED is passed to the callback.
>>>>
>>>> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
>>>> ---
>>>>  hw/ide/core.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>> index 6afadf8..78ea357 100644
>>>> --- a/hw/ide/core.c
>>>> +++ b/hw/ide/core.c
>>>> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>      bool stay_active = false;
>>>>  
>>>>      if (ret == -ECANCELED) {
>>>> +        s->bus->dma->aiocb = NULL;
>>>>          return;
>>>>      }
>>>>  
>>>>
>>>
>>> The part that makes me nervous here is that I can't remember why we do
>>> NO cleanup whatsoever for the ECANCELED case.
>>>
>>> commit 0d910cfeaf2076b116b4517166d5deb0fea76394
>>> Author: Fam Zheng <famz@redhat.com>
>>> Date:   Thu Sep 11 13:41:07 2014 +0800
>>>
>>>     ide/ahci: Check for -ECANCELED in aio callbacks
>>>
>>>
>>> ... This looks like we never expected the aio callbacks to ever get
>>> called with ECANCELED, so we treat this as a QEMU-internal signal.
>>>
>>> It looks like we expect these callbacks to do NOTHING in this case; but
>>> I'm not sure where the IDE state machine does its cleanup otherwise.
>>> (The DMA might have been canceled, but the DMA and IDE state machines
>>> still need to exit their loop.)
>>>
>>> If you take a look at this patch from 2014 though, there are many other
>>> spots where we have littered ECANCELED checks that might also cause
>>> problems if we're receiving error codes we thought we couldn't get normally.
>>>
>>> I am worried this patch papers over something worse.
>>>
>> I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
>> if it's invoked with ECANCELED: shouldn't it be the case that the IDE
>> state machine needs to know that a transfer it was relying on to service
>> an ATA command was canceled and treat it like an error?
>>
>> Why was it ever correct to ignore these? Is it because we only ever
>> canceled DMA during reset/shutdown/etc?
>>
>> It appears as if iscsi requests can actually genuinely return an
>> ECANCELED errno, so there are likely several places in the IDE code that
>> need to accommodate this from happening.
>>
>> The easiest fix LOOKS like just deleting the special-casing of ECANCELED
>> altogether and letting the error pathways handle things as normal.
>>
>> Am I mistaken?
> 
> I think your instincts are right that there are deeper issues.  The
> first step would be test cases, then you can be sure various scenarios
> have been handled correctly.
> 

Suggestions? I'm not sure what's supposed to work and in what way here.
I guess this stuff was introduced for bdrv_aio_cancel_async, but it's
not immediately clear what's supposed to happen when you call that.

> I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and
> ide_flush_cb() all differ in whether they reset s->pio_aiocb and
> s->status before returning early due to -ECANCELED.  That must be a bug.
> 
> I didn't look at the ide_dma_cb() code path.
> 
> Stefan
> 

Hm ...

It looks like canceling the ide_dma_cb AIOCB objects doesn't do anything
too useful?

dma_blk_io and friends will establish dma_aio_cancel as the async cancel
callback. So if we do cancel these objects, we're going to call this
function:

static void dma_aio_cancel(BlockAIOCB *acb)
{
    DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);

    trace_dma_aio_cancel(dbs);

    if (dbs->acb) {
        blk_aio_cancel_async(dbs->acb);
    }
    if (dbs->bh) {
        cpu_unregister_map_client(dbs->bh);
        qemu_bh_delete(dbs->bh);
        dbs->bh = NULL;
    }
}

but there's no cancel callback for the lower layer in dbs->acb, so
that's just a nop, I think -- blk_aio_prwv doesn't offer an asynchronous
cancel mechanism.

Next, we'll unschedule the BH if there is one. I think the only case
where there is one is the reschedule_dma case of dma_blk_cb. (I'm not
too familiar with these DMA helpers: in what cases do we expect the iov
to be empty?)

So it looks like this cancellation will produce one of two effects,
depending on when it's invoked:

1) We'll stall the DMA permanently by deleting that BH, because
dma_complete will never get invoked and therefore nobody will ever call
ide_dma_cb with any return value of any kind. The IDE state machine
likely just hangs waiting for the DMA to finish until the guest OS
decides to reset the errant controller.

2) The DMA will continue blissfully unaware it was canceled, because the
lower AIOCB has no cancel method, and so will finish, call back to
dma_blk_cb, and continue the transfer loop unaware.


... Does your reading align with mine?


If it does -- if there are indeed no places in the code today that
artificially inject -ECANCELED -- I need to remove these special stanzas
from the IDE code and allow the IDE state machine to handle these errors
as true errors.

I'm just not confident enough in my unwinding of the DMA callback
spaghetti, though.

--js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-29 19:45       ` John Snow
@ 2019-07-29 21:32         ` Paolo Bonzini
  2019-07-29 21:37           ` John Snow
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2019-07-29 21:32 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi
  Cc: Kevin Wolf, shaju.abraham, qemu-devel, qemu-block

On 29/07/19 21:45, John Snow wrote:
> Next, we'll unschedule the BH if there is one. I think the only case
> where there is one is the reschedule_dma case of dma_blk_cb. (I'm not
> too familiar with these DMA helpers: in what cases do we expect the iov
> to be empty?)

When there is another I/O that is using the DMA bounce buffer (the one
case that comes to mind in which you do DMA from MMIO areas is
loading/saving VGA RAM).

> So it looks like this cancellation will produce one of two effects,
> depending on when it's invoked:
> 
> 1) We'll stall the DMA permanently by deleting that BH, because
> dma_complete will never get invoked and therefore nobody will ever call
> ide_dma_cb with any return value of any kind. The IDE state machine
> likely just hangs waiting for the DMA to finish until the guest OS
> decides to reset the errant controller.
> 
> 2) The DMA will continue blissfully unaware it was canceled, because the
> lower AIOCB has no cancel method, and so will finish, call back to
> dma_blk_cb, and continue the transfer loop unaware.
> 
> 
> ... Does your reading align with mine?
> 
> 
> If it does -- if there are indeed no places in the code today that
> artificially inject -ECANCELED -- I need to remove these special stanzas
> from the IDE code and allow the IDE state machine to handle these errors
> as true errors.

The bug is that there is no place to inject -ECANCELED in the dbs->bh
case.  I've sent an obviously^W untested patch.

Paolo

> I'm just not confident enough in my unwinding of the DMA callback
> spaghetti, though.
> 
> --js
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-29 21:32         ` Paolo Bonzini
@ 2019-07-29 21:37           ` John Snow
  2019-07-29 21:49             ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: John Snow @ 2019-07-29 21:37 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, shaju.abraham, qemu-devel, qemu-block



On 7/29/19 5:32 PM, Paolo Bonzini wrote:
> On 29/07/19 21:45, John Snow wrote:
>> Next, we'll unschedule the BH if there is one. I think the only case
>> where there is one is the reschedule_dma case of dma_blk_cb. (I'm not
>> too familiar with these DMA helpers: in what cases do we expect the iov
>> to be empty?)
> 
> When there is another I/O that is using the DMA bounce buffer (the one
> case that comes to mind in which you do DMA from MMIO areas is
> loading/saving VGA RAM).
> 
>> So it looks like this cancellation will produce one of two effects,
>> depending on when it's invoked:
>>
>> 1) We'll stall the DMA permanently by deleting that BH, because
>> dma_complete will never get invoked and therefore nobody will ever call
>> ide_dma_cb with any return value of any kind. The IDE state machine
>> likely just hangs waiting for the DMA to finish until the guest OS
>> decides to reset the errant controller.
>>
>> 2) The DMA will continue blissfully unaware it was canceled, because the
>> lower AIOCB has no cancel method, and so will finish, call back to
>> dma_blk_cb, and continue the transfer loop unaware.
>>
>>
>> ... Does your reading align with mine?
>>
>>
>> If it does -- if there are indeed no places in the code today that
>> artificially inject -ECANCELED -- I need to remove these special stanzas
>> from the IDE code and allow the IDE state machine to handle these errors
>> as true errors.
> 
> The bug is that there is no place to inject -ECANCELED in the dbs->bh
> case.  I've sent an obviously^W untested patch.
> 

Where does it inject -ECANCELED in the non-dbs->bh case?

--js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-29 21:37           ` John Snow
@ 2019-07-29 21:49             ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2019-07-29 21:49 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi
  Cc: Kevin Wolf, shaju.abraham, qemu-devel, qemu-block

On 29/07/19 23:37, John Snow wrote:
>>>
>>> If it does -- if there are indeed no places in the code today that
>>> artificially inject -ECANCELED -- I need to remove these special stanzas
>>> from the IDE code and allow the IDE state machine to handle these errors
>>> as true errors.
>> The bug is that there is no place to inject -ECANCELED in the dbs->bh
>> case.  I've sent an obviously^W untested patch.
>
> Where does it inject -ECANCELED in the non-dbs->bh case?

It's simply part of the contract of bdrv_aio_cancel_async that the
completion callback will be invoked (most of the time the I/O will just
go on without any cancellation, as you noted).

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-07-08  2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
  2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2019-07-26  0:58 ` [Qemu-devel] " John Snow
@ 2019-08-13 22:51 ` John Snow
  2019-08-14  2:30   ` Shaju Abraham
  2 siblings, 1 reply; 12+ messages in thread
From: John Snow @ 2019-08-13 22:51 UTC (permalink / raw)
  To: shaju.abraham, qemu-devel; +Cc: qemu-block



On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
> From: Shaju Abraham <shaju.abraham@nutanix.com>
> 
> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> and this value is passed to the callback function. In the case of  IDE DMA read
> or write, the callback function returns immediately if the value of the ret
> argument is -ECANCELED.
> Later when ide_cancel_dma_sync() function is invoked  the assertion
> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> -ECANCELED is passed to the callback.
> 
> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf8..78ea357 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      bool stay_active = false;
>  
>      if (ret == -ECANCELED) {
> +        s->bus->dma->aiocb = NULL;
>          return;
>      }
>  
> 

Hopefully just as adequately addressed by the patches in

https://github.com/jnsnow/qemu/commits/ide

but if you wanted to give it a test and confirm for me, I wouldn't be
upset by that.

--js


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
  2019-08-13 22:51 ` John Snow
@ 2019-08-14  2:30   ` Shaju Abraham
  0 siblings, 0 replies; 12+ messages in thread
From: Shaju Abraham @ 2019-08-14  2:30 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: qemu-block

I do not have a test case to reproduce this issue. It is seen rarely. The fix looks good to me, will confirm if I am able to reproduce the error scenario.

Regards
Shaju

On 8/14/19, 4:21 AM, "John Snow" <jsnow@redhat.com> wrote:

    
    
    On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
    > From: Shaju Abraham <shaju.abraham@nutanix.com>
    > 
    > During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
    > a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
    > The function iscsi_translate_sense() later translaters this error to -ECANCELED
    > and this value is passed to the callback function. In the case of  IDE DMA read
    > or write, the callback function returns immediately if the value of the ret
    > argument is -ECANCELED.
    > Later when ide_cancel_dma_sync() function is invoked  the assertion
    > "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
    > Fix the issue by making the value of s->bus->dma->aiocb = NULL when
    > -ECANCELED is passed to the callback.
    > 
    > Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
    > ---
    >  hw/ide/core.c | 1 +
    >  1 file changed, 1 insertion(+)
    > 
    > diff --git a/hw/ide/core.c b/hw/ide/core.c
    > index 6afadf8..78ea357 100644
    > --- a/hw/ide/core.c
    > +++ b/hw/ide/core.c
    > @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
    >      bool stay_active = false;
    >  
    >      if (ret == -ECANCELED) {
    > +        s->bus->dma->aiocb = NULL;
    >          return;
    >      }
    >  
    > 
    
    Hopefully just as adequately addressed by the patches in
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jnsnow_qemu_commits_ide&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=lmNnHLnsZKEaZkunWBMldNPiL87un4Q2Brtsa0zCKiQ&s=KGmAtez5AckTpNugzMzxMObkZKQ3A5vIIiukShVYUXM&e= 
    
    but if you wanted to give it a test and confirm for me, I wouldn't be
    upset by that.
    
    --js
    


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-14  2:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08  2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-07-12 16:44   ` John Snow
2019-07-26  0:58 ` [Qemu-devel] " John Snow
2019-07-26 20:18   ` John Snow
2019-07-29 10:09     ` Stefan Hajnoczi
2019-07-29 19:45       ` John Snow
2019-07-29 21:32         ` Paolo Bonzini
2019-07-29 21:37           ` John Snow
2019-07-29 21:49             ` Paolo Bonzini
2019-08-13 22:51 ` John Snow
2019-08-14  2:30   ` Shaju Abraham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).