* [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" @ 2013-05-21 7:12 Gonglei (Arei) 2013-05-21 11:50 ` Andreas Färber 2013-05-26 17:08 ` Andreas Färber 0 siblings, 2 replies; 6+ messages in thread From: Gonglei (Arei) @ 2013-05-21 7:12 UTC (permalink / raw) To: kwolf, qemu-devel; +Cc: Wangzhenguo, Luonengjun, Huangweidong (Hardware) In the case of physical hard disk's speed which processing IO (when grouping RAID) is very slow, I encountered a problem. I dd big file in SUSE virtual machine, the command is linux:/ # dd if=/dev/zero of=./info bs=1M count=5000;sync but finally I get those message: linux:~ # dmesg [ 174.804114] ata1: lost interrupt (Status 0x50) [ 174.812305] end_request: I/O error, dev sda, sector 12085270 [ 174.812309] Buffer I/O error on device sda2, logical block 984530 [ 174.812310] lost page write due to I/O error on sda2 [ 174.813268] Aborting journal on device sda2. [ 174.828330] journal commit I/O error [ 174.828373] ext3_abort called. [ 174.828375] EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal [ 174.828377] Remounting filesystem read-only [ 182.286424] __journal_remove_journal_head: freeing b_committed_data [ 182.286434] __journal_remove_journal_head: freeing b_committed_data [ 182.286442] __journal_remove_journal_head: freeing b_committed_data [ 182.286452] __journal_remove_journal_head: freeing b_committed_data [ 182.286472] __journal_remove_journal_head: freeing b_committed_data Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... the path of kernel's action is : scsi_softirq_done scsi_eh_scmd_add scsi_error_handler shost->transportt->eh_strategy_handler ata_scsi_error ap->ops->lost_interrupt ata_sff_lost_interrupt Finally, the file system becomes read-only. Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? Anyone know it? thanks! Best Regards! -Arei ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" 2013-05-21 7:12 [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" Gonglei (Arei) @ 2013-05-21 11:50 ` Andreas Färber 2013-05-21 12:04 ` Gonglei (Arei) 2013-05-26 17:08 ` Andreas Färber 1 sibling, 1 reply; 6+ messages in thread From: Andreas Färber @ 2013-05-21 11:50 UTC (permalink / raw) To: Gonglei (Arei) Cc: kwolf, Luonengjun, qemu-devel, Wangzhenguo, Bo Yang, Huangweidong (Hardware) Hi, Am 21.05.2013 09:12, schrieb Gonglei (Arei): > In the case of physical hard disk's speed which processing IO (when grouping RAID) is very slow, I encountered a problem. > I dd big file in SUSE virtual machine, the command is > linux:/ # dd if=/dev/zero of=./info bs=1M count=5000;sync > > but finally I get those message: > linux:~ # dmesg > [ 174.804114] ata1: lost interrupt (Status 0x50) > [ 174.812305] end_request: I/O error, dev sda, sector 12085270 > [ 174.812309] Buffer I/O error on device sda2, logical block 984530 > [ 174.812310] lost page write due to I/O error on sda2 > [ 174.813268] Aborting journal on device sda2. > [ 174.828330] journal commit I/O error > [ 174.828373] ext3_abort called. > [ 174.828375] EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal > [ 174.828377] Remounting filesystem read-only > [ 182.286424] __journal_remove_journal_head: freeing b_committed_data > [ 182.286434] __journal_remove_journal_head: freeing b_committed_data > [ 182.286442] __journal_remove_journal_head: freeing b_committed_data > [ 182.286452] __journal_remove_journal_head: freeing b_committed_data > [ 182.286472] __journal_remove_journal_head: freeing b_committed_data > > > Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, Could you share your QEMU command line being used on the host? In particular I'm wondering about -drive's cache option used - I've only seen issues with cache=unsafe so far. Is it an upstream qemu-system-x86_64 or a SLES qemu-kvm? What version? Regards, Andreas > after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. > But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... > > the path of kernel's action is : > scsi_softirq_done > scsi_eh_scmd_add > scsi_error_handler > shost->transportt->eh_strategy_handler > ata_scsi_error > ap->ops->lost_interrupt > ata_sff_lost_interrupt > Finally, the file system becomes read-only. > > Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? > Anyone know it? thanks! > > Best Regards! > -Arei -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" 2013-05-21 11:50 ` Andreas Färber @ 2013-05-21 12:04 ` Gonglei (Arei) 0 siblings, 0 replies; 6+ messages in thread From: Gonglei (Arei) @ 2013-05-21 12:04 UTC (permalink / raw) To: Andreas Färber Cc: kwolf, Luonengjun, qemu-devel, Wangzhenguo, Bo Yang, Huangweidong (Hardware) Hi, Andreas > -----Original Message----- > From: Andreas Färber [mailto:afaerber@suse.de] > Sent: Tuesday, May 21, 2013 7:50 PM > To: Gonglei (Arei) > Cc: kwolf@redhat.com; qemu-devel@nongnu.org; Wangzhenguo; Luonengjun; > Huangweidong (Hardware); Bo Yang > Subject: Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE > guest reports "lost interrupt and the file system becomes read-only" > > Hi, > > Am 21.05.2013 09:12, schrieb Gonglei (Arei): > > In the case of physical hard disk's speed which processing IO (when grouping > RAID) is very slow, I encountered a problem. > > I dd big file in SUSE virtual machine, the command is > > linux:/ # dd if=/dev/zero of=./info bs=1M count=5000;sync > > > > but finally I get those message: > > linux:~ # dmesg > > [ 174.804114] ata1: lost interrupt (Status 0x50) > > [ 174.812305] end_request: I/O error, dev sda, sector 12085270 > > [ 174.812309] Buffer I/O error on device sda2, logical block 984530 > > [ 174.812310] lost page write due to I/O error on sda2 > > [ 174.813268] Aborting journal on device sda2. > > [ 174.828330] journal commit I/O error > > [ 174.828373] ext3_abort called. > > [ 174.828375] EXT3-fs error (device sda2): ext3_journal_start_sb: Detected > aborted journal > > [ 174.828377] Remounting filesystem read-only > > [ 182.286424] __journal_remove_journal_head: freeing b_committed_data > > [ 182.286434] __journal_remove_journal_head: freeing b_committed_data > > [ 182.286442] __journal_remove_journal_head: freeing b_committed_data > > [ 182.286452] __journal_remove_journal_head: freeing b_committed_data > > [ 182.286472] __journal_remove_journal_head: freeing b_committed_data > > > > > > Through analysis, I found that because the system call the fdatasync > command in the Qemu over 30s, > > Could you share your QEMU command line being used on the host? In > particular I'm wondering about -drive's cache option used - I've only > seen issues with cache=unsafe so far. > That's OK. linux-XQiARZ:~ # ps -ef | grep qemu root 6303 1 2 14:04 ? 00:08:55 qemu-system-i386 -xen-domid 779 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-779,server,nowait -mon chardev=libxl-cmd,mode=control -name suse -vnc 0.0.0.0:0 -serial pty -boot order=c -usb -usbdevice tablet -device usb-ehci,id=ehci -smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:34:40:46 -netdev type=tap,id=net0,ifname=tap779.0,bridge=br0,script=/etc/xen/scripts/qemu-ifup,downscript=no -M xenfv -m 2040 -drive file=/dev/xen/blktap-2/tapdev0,if=ide,index=0,media=disk,format=raw > Is it an upstream qemu-system-x86_64 or a SLES qemu-kvm? What version? My environment is xen-4.1.2+qemu-1.2.2- release + SLSE11SP1 Guest > > Regards, > Andreas > > > after the Guest's kernel thread detects the io transferation is timeout, went > to check IDE disk state. > > But the IDE disk status is 0x50, rather than the BSY status, and then > departed error process... > > > > the path of kernel's action is : > > scsi_softirq_done > > scsi_eh_scmd_add > > scsi_error_handler > > shost->transportt->eh_strategy_handler > > ata_scsi_error > > ap->ops->lost_interrupt > > ata_sff_lost_interrupt > > Finally, the file system becomes read-only. > > > > Why not set the IDE disk for the BSY status When 0xe7 command is executed > in the Qemu? > > Anyone know it? thanks! > > > > Best Regards! > > -Arei > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" 2013-05-21 7:12 [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" Gonglei (Arei) 2013-05-21 11:50 ` Andreas Färber @ 2013-05-26 17:08 ` Andreas Färber 2013-05-27 11:39 ` Stefan Hajnoczi 2013-05-27 21:05 ` Paolo Bonzini 1 sibling, 2 replies; 6+ messages in thread From: Andreas Färber @ 2013-05-26 17:08 UTC (permalink / raw) To: Gonglei (Arei) Cc: kwolf, Stefano Stabellini, Stefan Hajnoczi, Luonengjun, qemu-devel, Wangzhenguo, Huangweidong (Hardware) Am 21.05.2013 09:12, schrieb Gonglei (Arei): > Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, > after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. > But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... > > the path of kernel's action is : > scsi_softirq_done > scsi_eh_scmd_add > scsi_error_handler > shost->transportt->eh_strategy_handler > ata_scsi_error > ap->ops->lost_interrupt > ata_sff_lost_interrupt > Finally, the file system becomes read-only. > > Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? Have you actually tried that out with a patch such as the following? diff --git a/hw/ide/core.c b/hw/ide/core.c index c7a8041..bf1ff18 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) { IDEState *s = opaque; + s->status &= ~BUSY_STAT; + if (ret < 0) { /* XXX: What sector number to set here? */ if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) return; } + s->status |= BUSY_STAT; bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); bdrv_aio_flush(s->bs, ide_flush_cb, s); } No clue if this is spec-compliant. ;) Note however that qemu_fdatasync() is done in the flush callback of block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or bdrv_flush_all() may potentially run into issues beyond just ATA: hw/block/virtio-blk.c hw/block/xen_disk.c hw/ide/core.c hw/scsi/scsi-disk.c cpus.c:do_vm_stop() hw/xen/xen_platform.c:platform_fixed_ioport_writew() qemu_fdatasync() further occurs in: hw/block/dataplane/virtio-blk.c:process_request() hw/9pfs/virtio-9p-*.c Quite possibly not all of them are problematic, but flush times >30 sec are very likely not well tested by developers... Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" 2013-05-26 17:08 ` Andreas Färber @ 2013-05-27 11:39 ` Stefan Hajnoczi 2013-05-27 21:05 ` Paolo Bonzini 1 sibling, 0 replies; 6+ messages in thread From: Stefan Hajnoczi @ 2013-05-27 11:39 UTC (permalink / raw) To: Andreas Färber Cc: kwolf, Stefano Stabellini, Luonengjun, qemu-devel, Wangzhenguo, Gonglei (Arei), Huangweidong (Hardware) On Sun, May 26, 2013 at 07:08:02PM +0200, Andreas Färber wrote: > Am 21.05.2013 09:12, schrieb Gonglei (Arei): > > Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, > > after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. > > But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... > > > > the path of kernel's action is : > > scsi_softirq_done > > scsi_eh_scmd_add > > scsi_error_handler > > shost->transportt->eh_strategy_handler > > ata_scsi_error > > ap->ops->lost_interrupt > > ata_sff_lost_interrupt > > Finally, the file system becomes read-only. > > > > Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? > > Have you actually tried that out with a patch such as the following? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index c7a8041..bf1ff18 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) > { > IDEState *s = opaque; > > + s->status &= ~BUSY_STAT; > + > if (ret < 0) { > /* XXX: What sector number to set here? */ > if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { > @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) > return; > } > > + s->status |= BUSY_STAT; > bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); > bdrv_aio_flush(s->bs, ide_flush_cb, s); > } > > No clue if this is spec-compliant. ;) > > Note however that qemu_fdatasync() is done in the flush callback of > block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or > bdrv_flush_all() may potentially run into issues beyond just ATA: This is an IDE emulation bug. virtio-blk, for example, doesn't have this kind of busy status bit. It's probably not an issue with SCSI either. Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" 2013-05-26 17:08 ` Andreas Färber 2013-05-27 11:39 ` Stefan Hajnoczi @ 2013-05-27 21:05 ` Paolo Bonzini 1 sibling, 0 replies; 6+ messages in thread From: Paolo Bonzini @ 2013-05-27 21:05 UTC (permalink / raw) To: Andreas Färber Cc: kwolf, Stefano Stabellini, Stefan Hajnoczi, Luonengjun, qemu-devel, Wangzhenguo, Gonglei (Arei), Huangweidong (Hardware) Il 26/05/2013 19:08, Andreas Färber ha scritto: > Have you actually tried that out with a patch such as the following? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index c7a8041..bf1ff18 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) > { > IDEState *s = opaque; > > + s->status &= ~BUSY_STAT; > + > if (ret < 0) { > /* XXX: What sector number to set here? */ > if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { > @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) > return; > } > > + s->status |= BUSY_STAT; > bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); > bdrv_aio_flush(s->bs, ide_flush_cb, s); > } Yes, this patch is correct. Can you resend with S-o-b? Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-27 21:05 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-05-21 7:12 [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports "lost interrupt and the file system becomes read-only" Gonglei (Arei) 2013-05-21 11:50 ` Andreas Färber 2013-05-21 12:04 ` Gonglei (Arei) 2013-05-26 17:08 ` Andreas Färber 2013-05-27 11:39 ` Stefan Hajnoczi 2013-05-27 21:05 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.