qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	shaju.abraham@nutanix.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
Date: Mon, 29 Jul 2019 11:09:46 +0100	[thread overview]
Message-ID: <20190729100946.GC3369@stefanha-x1.localdomain> (raw)
In-Reply-To: <34a8030e-a173-162d-6786-3dafa5a1d4ed@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]

On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote:
> Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
> about this and I'd like to clear this up quickly if it's possible:
> 
> On 7/25/19 8:58 PM, John Snow wrote:
> > 
> > 
> > On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
> >> From: Shaju Abraham <shaju.abraham@nutanix.com>
> >>
> >> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> >> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> >> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> >> and this value is passed to the callback function. In the case of  IDE DMA read
> >> or write, the callback function returns immediately if the value of the ret
> >> argument is -ECANCELED.
> >> Later when ide_cancel_dma_sync() function is invoked  the assertion
> >> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> >> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> >> -ECANCELED is passed to the callback.
> >>
> >> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> >> ---
> >>  hw/ide/core.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 6afadf8..78ea357 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>      bool stay_active = false;
> >>  
> >>      if (ret == -ECANCELED) {
> >> +        s->bus->dma->aiocb = NULL;
> >>          return;
> >>      }
> >>  
> >>
> > 
> > The part that makes me nervous here is that I can't remember why we do
> > NO cleanup whatsoever for the ECANCELED case.
> > 
> > commit 0d910cfeaf2076b116b4517166d5deb0fea76394
> > Author: Fam Zheng <famz@redhat.com>
> > Date:   Thu Sep 11 13:41:07 2014 +0800
> > 
> >     ide/ahci: Check for -ECANCELED in aio callbacks
> > 
> > 
> > ... This looks like we never expected the aio callbacks to ever get
> > called with ECANCELED, so we treat this as a QEMU-internal signal.
> > 
> > It looks like we expect these callbacks to do NOTHING in this case; but
> > I'm not sure where the IDE state machine does its cleanup otherwise.
> > (The DMA might have been canceled, but the DMA and IDE state machines
> > still need to exit their loop.)
> > 
> > If you take a look at this patch from 2014 though, there are many other
> > spots where we have littered ECANCELED checks that might also cause
> > problems if we're receiving error codes we thought we couldn't get normally.
> > 
> > I am worried this patch papers over something worse.
> > 
> I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
> if it's invoked with ECANCELED: shouldn't it be the case that the IDE
> state machine needs to know that a transfer it was relying on to service
> an ATA command was canceled and treat it like an error?
> 
> Why was it ever correct to ignore these? Is it because we only ever
> canceled DMA during reset/shutdown/etc?
> 
> It appears as if iscsi requests can actually genuinely return an
> ECANCELED errno, so there are likely several places in the IDE code that
> need to accommodate this from happening.
> 
> The easiest fix LOOKS like just deleting the special-casing of ECANCELED
> altogether and letting the error pathways handle things as normal.
> 
> Am I mistaken?

I think your instincts are right that there are deeper issues.  The
first step would be test cases, then you can be sure various scenarios
have been handled correctly.

I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and
ide_flush_cb() all differ in whether they reset s->pio_aiocb and
s->status before returning early due to -ECANCELED.  That must be a bug.

I didn't look at the ide_dma_cb() code path.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-07-29 10:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08  2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-07-12 16:44   ` John Snow
2019-07-26  0:58 ` [Qemu-devel] " John Snow
2019-07-26 20:18   ` John Snow
2019-07-29 10:09     ` Stefan Hajnoczi [this message]
2019-07-29 19:45       ` John Snow
2019-07-29 21:32         ` Paolo Bonzini
2019-07-29 21:37           ` John Snow
2019-07-29 21:49             ` Paolo Bonzini
2019-08-13 22:51 ` John Snow
2019-08-14  2:30   ` Shaju Abraham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190729100946.GC3369@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shaju.abraham@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).