From: Stefan Hajnoczi <stefanha@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
shaju.abraham@nutanix.com, qemu-devel@nongnu.org,
qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error
Date: Mon, 29 Jul 2019 11:09:46 +0100 [thread overview]
Message-ID: <20190729100946.GC3369@stefanha-x1.localdomain> (raw)
In-Reply-To: <34a8030e-a173-162d-6786-3dafa5a1d4ed@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]
On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote:
> Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain
> about this and I'd like to clear this up quickly if it's possible:
>
> On 7/25/19 8:58 PM, John Snow wrote:
> >
> >
> > On 7/7/19 10:55 PM, shaju.abraham@nutanix.com wrote:
> >> From: Shaju Abraham <shaju.abraham@nutanix.com>
> >>
> >> During the IDE DMA transfer for a ISCSI target,when libiscsi encounters
> >> a SENSE KEY error, it sets the task->sense to the value "COMMAND ABORTED".
> >> The function iscsi_translate_sense() later translaters this error to -ECANCELED
> >> and this value is passed to the callback function. In the case of IDE DMA read
> >> or write, the callback function returns immediately if the value of the ret
> >> argument is -ECANCELED.
> >> Later when ide_cancel_dma_sync() function is invoked the assertion
> >> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets terminated.
> >> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> >> -ECANCELED is passed to the callback.
> >>
> >> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> >> ---
> >> hw/ide/core.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 6afadf8..78ea357 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >> bool stay_active = false;
> >>
> >> if (ret == -ECANCELED) {
> >> + s->bus->dma->aiocb = NULL;
> >> return;
> >> }
> >>
> >>
> >
> > The part that makes me nervous here is that I can't remember why we do
> > NO cleanup whatsoever for the ECANCELED case.
> >
> > commit 0d910cfeaf2076b116b4517166d5deb0fea76394
> > Author: Fam Zheng <famz@redhat.com>
> > Date: Thu Sep 11 13:41:07 2014 +0800
> >
> > ide/ahci: Check for -ECANCELED in aio callbacks
> >
> >
> > ... This looks like we never expected the aio callbacks to ever get
> > called with ECANCELED, so we treat this as a QEMU-internal signal.
> >
> > It looks like we expect these callbacks to do NOTHING in this case; but
> > I'm not sure where the IDE state machine does its cleanup otherwise.
> > (The DMA might have been canceled, but the DMA and IDE state machines
> > still need to exit their loop.)
> >
> > If you take a look at this patch from 2014 though, there are many other
> > spots where we have littered ECANCELED checks that might also cause
> > problems if we're receiving error codes we thought we couldn't get normally.
> >
> > I am worried this patch papers over something worse.
> >
> I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb
> if it's invoked with ECANCELED: shouldn't it be the case that the IDE
> state machine needs to know that a transfer it was relying on to service
> an ATA command was canceled and treat it like an error?
>
> Why was it ever correct to ignore these? Is it because we only ever
> canceled DMA during reset/shutdown/etc?
>
> It appears as if iscsi requests can actually genuinely return an
> ECANCELED errno, so there are likely several places in the IDE code that
> need to accommodate this from happening.
>
> The easiest fix LOOKS like just deleting the special-casing of ECANCELED
> altogether and letting the error pathways handle things as normal.
>
> Am I mistaken?
I think your instincts are right that there are deeper issues. The
first step would be test cases, then you can be sure various scenarios
have been handled correctly.
I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and
ide_flush_cb() all differ in whether they reset s->pio_aiocb and
s->status before returning early due to -ECANCELED. That must be a bug.
I didn't look at the ide_dma_cb() code path.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-07-29 10:10 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 2:55 [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error shaju.abraham
2019-07-11 12:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-07-12 16:44 ` John Snow
2019-07-26 0:58 ` [Qemu-devel] " John Snow
2019-07-26 20:18 ` John Snow
2019-07-29 10:09 ` Stefan Hajnoczi [this message]
2019-07-29 19:45 ` John Snow
2019-07-29 21:32 ` Paolo Bonzini
2019-07-29 21:37 ` John Snow
2019-07-29 21:49 ` Paolo Bonzini
2019-08-13 22:51 ` John Snow
2019-08-14 2:30 ` Shaju Abraham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190729100946.GC3369@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shaju.abraham@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).