From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:33485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UgeQZ-0001GC-AS for qemu-devel@nongnu.org; Sun, 26 May 2013 13:08:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UgeQT-0001K7-Bz for qemu-devel@nongnu.org; Sun, 26 May 2013 13:08:15 -0400 Received: from cantor2.suse.de ([195.135.220.15]:56593 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UgeQT-0001Jz-2U for qemu-devel@nongnu.org; Sun, 26 May 2013 13:08:09 -0400 Message-ID: <51A24172.4020208@suse.de> Date: Sun, 26 May 2013 19:08:02 +0200 From: =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= MIME-Version: 1.0 References: <33183CC9F5247A488A2544077AF19020697A3B72@szxeml538-mbx.china.huawei.com> In-Reply-To: <33183CC9F5247A488A2544077AF19020697A3B72@szxeml538-mbx.china.huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Gonglei (Arei)" Cc: "kwolf@redhat.com" , Stefano Stabellini , Stefan Hajnoczi , Luonengjun , "qemu-devel@nongnu.org" , Wangzhenguo , "Huangweidong (Hardware)" Am 21.05.2013 09:12, schrieb Gonglei (Arei): > Through analysis, I found that because the system call the fdatasync co= mmand in the Qemu over 30s,=20 > after the Guest's kernel thread detects the io transferation is timeout= , went to check IDE disk state.=20 > But the IDE disk status is 0x50, rather than the BSY status, and then d= eparted error process... >=20 > the path of kernel's action is : > scsi_softirq_done > scsi_eh_scmd_add > scsi_error_handler > shost->transportt->eh_strategy_handler=20 > ata_scsi_error=20 > ap->ops->lost_interrupt > ata_sff_lost_interrupt > Finally, the file system becomes read-only. >=20 > Why not set the IDE disk for the BSY status When 0xe7 command is execut= ed in the Qemu? Have you actually tried that out with a patch such as the following? diff --git a/hw/ide/core.c b/hw/ide/core.c index c7a8041..bf1ff18 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) { IDEState *s =3D opaque; + s->status &=3D ~BUSY_STAT; + if (ret < 0) { /* XXX: What sector number to set here? */ if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) return; } + s->status |=3D BUSY_STAT; bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); bdrv_aio_flush(s->bs, ide_flush_cb, s); } No clue if this is spec-compliant. ;) Note however that qemu_fdatasync() is done in the flush callback of block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or bdrv_flush_all() may potentially run into issues beyond just ATA: hw/block/virtio-blk.c hw/block/xen_disk.c hw/ide/core.c hw/scsi/scsi-disk.c cpus.c:do_vm_stop() hw/xen/xen_platform.c:platform_fixed_ioport_writew() qemu_fdatasync() further occurs in: hw/block/dataplane/virtio-blk.c:process_request() hw/9pfs/virtio-9p-*.c Quite possibly not all of them are problematic, but flush times >30 sec are very likely not well tested by developers... Regards, Andreas --=20 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imend=C3=B6rffer; HRB 16746 AG N=C3=BC= rnberg