From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:37750) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ugvlu-00071P-1p for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ugvls-0006WY-9P for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:25 -0400 Received: from mail-ee0-f52.google.com ([74.125.83.52]:46933) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ugvls-0006WM-34 for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:24 -0400 Received: by mail-ee0-f52.google.com with SMTP id c13so3948200eek.39 for ; Mon, 27 May 2013 04:39:23 -0700 (PDT) Date: Mon, 27 May 2013 13:39:20 +0200 From: Stefan Hajnoczi Message-ID: <20130527113920.GA23204@stefanha-thinkpad.redhat.com> References: <33183CC9F5247A488A2544077AF19020697A3B72@szxeml538-mbx.china.huawei.com> <51A24172.4020208@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <51A24172.4020208@suse.de> Subject: Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andreas =?iso-8859-1?Q?F=E4rber?= Cc: "kwolf@redhat.com" , Stefano Stabellini , Luonengjun , "qemu-devel@nongnu.org" , Wangzhenguo , "Gonglei (Arei)" , "Huangweidong (Hardware)" On Sun, May 26, 2013 at 07:08:02PM +0200, Andreas Färber wrote: > Am 21.05.2013 09:12, schrieb Gonglei (Arei): > > Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, > > after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. > > But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... > > > > the path of kernel's action is : > > scsi_softirq_done > > scsi_eh_scmd_add > > scsi_error_handler > > shost->transportt->eh_strategy_handler > > ata_scsi_error > > ap->ops->lost_interrupt > > ata_sff_lost_interrupt > > Finally, the file system becomes read-only. > > > > Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? > > Have you actually tried that out with a patch such as the following? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index c7a8041..bf1ff18 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) > { > IDEState *s = opaque; > > + s->status &= ~BUSY_STAT; > + > if (ret < 0) { > /* XXX: What sector number to set here? */ > if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { > @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) > return; > } > > + s->status |= BUSY_STAT; > bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); > bdrv_aio_flush(s->bs, ide_flush_cb, s); > } > > No clue if this is spec-compliant. ;) > > Note however that qemu_fdatasync() is done in the flush callback of > block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or > bdrv_flush_all() may potentially run into issues beyond just ATA: This is an IDE emulation bug. virtio-blk, for example, doesn't have this kind of busy status bit. It's probably not an issue with SCSI either. Stefan