From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:37750)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1Ugvlu-00071P-1p
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1Ugvls-0006WY-9P
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:25 -0400
Received: from mail-ee0-f52.google.com ([74.125.83.52]:46933)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1Ugvls-0006WM-34
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:39:24 -0400
Received: by mail-ee0-f52.google.com with SMTP id c13so3948200eek.39
	for <qemu-devel@nongnu.org>; Mon, 27 May 2013 04:39:23 -0700 (PDT)
Date: Mon, 27 May 2013 13:39:20 +0200
From: Stefan Hajnoczi <stefanha@gmail.com>
Message-ID: <20130527113920.GA23204@stefanha-thinkpad.redhat.com>
References: <33183CC9F5247A488A2544077AF19020697A3B72@szxeml538-mbx.china.huawei.com>
	<51A24172.4020208@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <51A24172.4020208@suse.de>
Subject: Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs,
 the SUSE guest reports "lost interrupt and the file system becomes
 read-only"
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andreas =?iso-8859-1?Q?F=E4rber?= <afaerber@suse.de>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Luonengjun <luonengjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Wangzhenguo <wangzhenguo@huawei.com>, "Gonglei (Arei)" <arei.gonglei@huawei.com>, "Huangweidong (Hardware)" <huangweidong@huawei.com>

On Sun, May 26, 2013 at 07:08:02PM +0200, Andreas Färber wrote:
> Am 21.05.2013 09:12, schrieb Gonglei (Arei):
> > Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, 
> > after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. 
> > But the IDE disk status is 0x50, rather than the BSY status, and then departed error process...
> > 
> > the path of kernel's action is :
> > scsi_softirq_done
> >  scsi_eh_scmd_add
> >    scsi_error_handler
> >      shost->transportt->eh_strategy_handler 
> > 		ata_scsi_error 
> > 			ap->ops->lost_interrupt
> > 				ata_sff_lost_interrupt
> > Finally, the file system becomes read-only.
> > 
> > Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu?
> 
> Have you actually tried that out with a patch such as the following?
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index c7a8041..bf1ff18 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret)
>  {
>      IDEState *s = opaque;
> 
> +    s->status &= ~BUSY_STAT;
> +
>      if (ret < 0) {
>          /* XXX: What sector number to set here? */
>          if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) {
> @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s)
>          return;
>      }
> 
> +    s->status |= BUSY_STAT;
>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
>  }
> 
> No clue if this is spec-compliant. ;)
> 
> Note however that qemu_fdatasync() is done in the flush callback of
> block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or
> bdrv_flush_all() may potentially run into issues beyond just ATA:

This is an IDE emulation bug.  virtio-blk, for example, doesn't have
this kind of busy status bit.  It's probably not an issue with SCSI
either.

Stefan