All of lore.kernel.org
 help / color / mirror / Atom feed
* Windows Bug Check 0x101 issue
@ 2008-03-24  9:53 Kouya Shimura
  2008-03-24 11:01 ` Samuel Thibault
  2008-03-24 11:05 ` Keir Fraser
  0 siblings, 2 replies; 22+ messages in thread
From: Kouya Shimura @ 2008-03-24  9:53 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 978 bytes --]

Hi,

Qemu-dm might be blocked with fsync system call over 3 seconds
when the dom0 is overloaded.

It causes SMP Windows 2008 crashes with Bug Check 0x101.
0x101 indicates that an expected clock interrupt on a secondary
processor, in a multi-processor system, was not received within
the allocated interval.

It can be easily reproduced with the following modification:

diff -r 76c9cf11ce23 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Fri Mar 21 09:45:34 2008 +0000
+++ b/tools/ioemu/block-raw.c	Mon Mar 24 16:28:16 2008 +0900
@@ -603,6 +603,11 @@ static void raw_flush(BlockDriverState *
 static void raw_flush(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
+#if 1 /* reproduce Windows Bug Check 0x101 */
+    extern int send_vcpu;
+    if (send_vcpu != 0)
+	sleep(4);
+#endif
     fsync(s->fd);
 }
 
An attached patch fixes it. However I think the root cause is
that a timer event can't interrupt an i/o emulation.
How should we fix it?

Thanks,
Kouya


[-- Attachment #2: sample_fix.patch --]
[-- Type: text/plain, Size: 825 bytes --]

diff -r 76c9cf11ce23 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Fri Mar 21 09:45:34 2008 +0000
+++ b/tools/ioemu/block-raw.c	Mon Mar 24 17:56:19 2008 +0900
@@ -496,6 +496,10 @@ static void raw_aio_cancel(BlockDriverAI
         pacb = &acb->next;
     }
 }
+
+static void aio_fsync_cb(void *opague, int ret)
+{
+}
 #endif
 
 static void raw_close(BlockDriverState *bs)
@@ -602,8 +606,20 @@ static int raw_create(const char *filena
 
 static void raw_flush(BlockDriverState *bs)
 {
+#ifdef NO_AIO
     BDRVRawState *s = bs->opaque;
     fsync(s->fd);
+#else
+    RawAIOCB *acb;
+
+    acb = raw_aio_setup(bs, 0, NULL, 0, aio_fsync_cb, NULL);
+    if (!acb)
+        return;
+    if (aio_fsync(O_SYNC, &acb->aiocb) < 0) {
+        qemu_aio_release(acb);
+        return;
+    }
+#endif
 }
 
 BlockDriver bdrv_raw = {

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-24  9:53 Windows Bug Check 0x101 issue Kouya Shimura
@ 2008-03-24 11:01 ` Samuel Thibault
  2008-03-24 11:05 ` Keir Fraser
  1 sibling, 0 replies; 22+ messages in thread
From: Samuel Thibault @ 2008-03-24 11:01 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

Kouya Shimura, le Mon 24 Mar 2008 18:53:18 +0900, a écrit :
Content-Description: message body text
> An attached patch fixes it. However I think the root cause is
> that a timer event can't interrupt an i/o emulation.

The way qemu is designed wouldn't permit that anyway.

> How should we fix it?

We need to change the behavior of the flush operation, to make it
asynchronous, so that things can continue while the host OS is syncing,
and eventually the SCSI or IDE layer will report the completion of the
flush.

> +static void aio_fsync_cb(void *opague, int ret)
> +{
> +}
>  #endif
>  
>  static void raw_close(BlockDriverState *bs)
> @@ -602,8 +606,20 @@ static int raw_create(const char *filena
>  
>  static void raw_flush(BlockDriverState *bs)
>  {
> +#ifdef NO_AIO
>      BDRVRawState *s = bs->opaque;
>      fsync(s->fd);
> +#else
> +    RawAIOCB *acb;
> +
> +    acb = raw_aio_setup(bs, 0, NULL, 0, aio_fsync_cb, NULL);
> +    if (!acb)
> +        return;
> +    if (aio_fsync(O_SYNC, &acb->aiocb) < 0) {
> +        qemu_aio_release(acb);
> +        return;
> +    }
> +#endif
>  }

That's not correct: callers of bdrv_flush() assume that when it returns,
data _is_ on the disk.  We need to change that assumption, so that
your code becomes correct (and reports asynchronous completion from
aio_fsync_cb).

Samuel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-24  9:53 Windows Bug Check 0x101 issue Kouya Shimura
  2008-03-24 11:01 ` Samuel Thibault
@ 2008-03-24 11:05 ` Keir Fraser
  2008-03-25 10:54   ` Kouya Shimura
  1 sibling, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2008-03-24 11:05 UTC (permalink / raw)
  To: Kouya Shimura, xen-devel

On 24/3/08 09:53, "Kouya Shimura" <kouya@jp.fujitsu.com> wrote:

> An attached patch fixes it. However I think the root cause is
> that a timer event can't interrupt an i/o emulation.
> How should we fix it?

The attached patch also makes the fsync asynchronous, which seems a bit
dangerous if we acknowledge the flush to the guest meanwhile. If whatever
happens on return from raw_flush() could be deferred to aio_fsync_cb() then
this might be a good fix.

In general timer events *cannot* interrupt i/o emulation, as we cannot
really take an interrupt in the middle of executing an instruction. That
would be a mess!

However in this case I guess the IDE/SCSI device model could allow
asynchronous reporting of flush completion to the guest, and this new aspect
of the device model would obviously tie into your patch, being done on
aio_fsync_cb().

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-24 11:05 ` Keir Fraser
@ 2008-03-25 10:54   ` Kouya Shimura
  2008-03-25 10:59     ` Keir Fraser
  2008-03-25 11:28     ` Ian Jackson
  0 siblings, 2 replies; 22+ messages in thread
From: Kouya Shimura @ 2008-03-25 10:54 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 496 bytes --]

Keir Fraser writes:
> However in this case I guess the IDE/SCSI device model could allow
> asynchronous reporting of flush completion to the guest, and this new aspect
> of the device model would obviously tie into your patch, being done on
> aio_fsync_cb().

Here is a revised patch. Is this a good fix?

I'm not an expert on the SCSI. So please excuse
the SCSI side is not implemented yet.
(I tried but it doesn't work well)

Thanks,
Kouya

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>


[-- Attachment #2: aio_fsync.patch --]
[-- Type: text/plain, Size: 2292 bytes --]

diff -r 76c9cf11ce23 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Fri Mar 21 09:45:34 2008 +0000
+++ b/tools/ioemu/block-raw.c	Tue Mar 25 19:40:42 2008 +0900
@@ -606,6 +606,18 @@ static void raw_flush(BlockDriverState *
     fsync(s->fd);
 }
 
+void bdrv_aio_flush(BlockDriverState *bs, 
+                    BlockDriverCompletionFunc *cb, void *opaque)
+{
+    RawAIOCB *acb;
+
+    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
+    if (!acb)
+        return;
+    if (aio_fsync(O_SYNC, &acb->aiocb) < 0)
+        qemu_aio_release(acb);
+}
+
 BlockDriver bdrv_raw = {
     "raw",
     sizeof(BDRVRawState),
diff -r 76c9cf11ce23 tools/ioemu/hw/ide.c
--- a/tools/ioemu/hw/ide.c	Fri Mar 21 09:45:34 2008 +0000
+++ b/tools/ioemu/hw/ide.c	Tue Mar 25 19:40:42 2008 +0900
@@ -1738,6 +1738,13 @@ static void ide_clear_hob(IDEState *ide_
     ide_if[1].select &= ~(1 << 7);
 }
 
+static void ide_flush_cb(void *opaque, int ret)
+{
+    IDEState *s = (IDEState *)opaque;
+    s->status = READY_STAT;
+    ide_set_irq(s);
+}
+
 static void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
     IDEState *ide_if = opaque;
@@ -1976,10 +1983,13 @@ static void ide_ioport_write(void *opaqu
             break;
         case WIN_FLUSH_CACHE:
         case WIN_FLUSH_CACHE_EXT:
-            if (s->bs)
-                bdrv_flush(s->bs);
-	    s->status = READY_STAT;
-            ide_set_irq(s);
+            if (s->bs) {
+                bdrv_aio_flush(s->bs, ide_flush_cb, s);
+                s->status = BUSY_STAT;
+            } else {
+                s->status = READY_STAT;
+                ide_set_irq(s);
+            }
             break;
         case WIN_IDLEIMMEDIATE:
         case WIN_STANDBY:
diff -r 76c9cf11ce23 tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Fri Mar 21 09:45:34 2008 +0000
+++ b/tools/ioemu/vl.h	Tue Mar 25 19:40:42 2008 +0900
@@ -653,6 +653,8 @@ BlockDriverAIOCB *bdrv_aio_write(BlockDr
                                  const uint8_t *buf, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+void bdrv_aio_flush(BlockDriverState *bs, 
+                    BlockDriverCompletionFunc *cb, void *opaque);
 
 void qemu_aio_init(void);
 void qemu_aio_poll(void);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-25 10:54   ` Kouya Shimura
@ 2008-03-25 10:59     ` Keir Fraser
  2008-03-25 11:28     ` Ian Jackson
  1 sibling, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2008-03-25 10:59 UTC (permalink / raw)
  To: Kouya Shimura, Samuel Thibault; +Cc: xen-devel

Certainly it looks better. What do you think, Samuel?

 -- Keir

On 25/3/08 10:54, "Kouya Shimura" <kouya@jp.fujitsu.com> wrote:

> Keir Fraser writes:
>> However in this case I guess the IDE/SCSI device model could allow
>> asynchronous reporting of flush completion to the guest, and this new aspect
>> of the device model would obviously tie into your patch, being done on
>> aio_fsync_cb().
> 
> Here is a revised patch. Is this a good fix?
> 
> I'm not an expert on the SCSI. So please excuse
> the SCSI side is not implemented yet.
> (I tried but it doesn't work well)
> 
> Thanks,
> Kouya
> 
> Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
> 
> diff -r 76c9cf11ce23 tools/ioemu/block-raw.c
> --- a/tools/ioemu/block-raw.c Fri Mar 21 09:45:34 2008 +0000
> +++ b/tools/ioemu/block-raw.c Tue Mar 25 19:40:42 2008 +0900
> @@ -606,6 +606,18 @@ static void raw_flush(BlockDriverState *
>      fsync(s->fd);
>  }
>  
> +void bdrv_aio_flush(BlockDriverState *bs,
> +                    BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    RawAIOCB *acb;
> +
> +    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
> +    if (!acb)
> +        return;
> +    if (aio_fsync(O_SYNC, &acb->aiocb) < 0)
> +        qemu_aio_release(acb);
> +}
> +
>  BlockDriver bdrv_raw = {
>      "raw",
>      sizeof(BDRVRawState),
> diff -r 76c9cf11ce23 tools/ioemu/hw/ide.c
> --- a/tools/ioemu/hw/ide.c Fri Mar 21 09:45:34 2008 +0000
> +++ b/tools/ioemu/hw/ide.c Tue Mar 25 19:40:42 2008 +0900
> @@ -1738,6 +1738,13 @@ static void ide_clear_hob(IDEState *ide_
>      ide_if[1].select &= ~(1 << 7);
>  }
>  
> +static void ide_flush_cb(void *opaque, int ret)
> +{
> +    IDEState *s = (IDEState *)opaque;
> +    s->status = READY_STAT;
> +    ide_set_irq(s);
> +}
> +
>  static void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  {
>      IDEState *ide_if = opaque;
> @@ -1976,10 +1983,13 @@ static void ide_ioport_write(void *opaqu
>              break;
>          case WIN_FLUSH_CACHE:
>          case WIN_FLUSH_CACHE_EXT:
> -            if (s->bs)
> -                bdrv_flush(s->bs);
> -     s->status = READY_STAT;
> -            ide_set_irq(s);
> +            if (s->bs) {
> +                bdrv_aio_flush(s->bs, ide_flush_cb, s);
> +                s->status = BUSY_STAT;
> +            } else {
> +                s->status = READY_STAT;
> +                ide_set_irq(s);
> +            }
>              break;
>          case WIN_IDLEIMMEDIATE:
>          case WIN_STANDBY:
> diff -r 76c9cf11ce23 tools/ioemu/vl.h
> --- a/tools/ioemu/vl.h Fri Mar 21 09:45:34 2008 +0000
> +++ b/tools/ioemu/vl.h Tue Mar 25 19:40:42 2008 +0900
> @@ -653,6 +653,8 @@ BlockDriverAIOCB *bdrv_aio_write(BlockDr
>                                   const uint8_t *buf, int nb_sectors,
>                                   BlockDriverCompletionFunc *cb, void
> *opaque);
>  void bdrv_aio_cancel(BlockDriverAIOCB *acb);
> +void bdrv_aio_flush(BlockDriverState *bs,
> +                    BlockDriverCompletionFunc *cb, void *opaque);
>  
>  void qemu_aio_init(void);
>  void qemu_aio_poll(void);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-25 10:54   ` Kouya Shimura
  2008-03-25 10:59     ` Keir Fraser
@ 2008-03-25 11:28     ` Ian Jackson
  2008-03-25 17:57       ` Samuel Thibault
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2008-03-25 11:28 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

Kouya Shimura writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> Here is a revised patch. Is this a good fix?

Sadly not.

There are two problems:

One is that the qemu block driver abstraction does not support
asynchronous flush.  To fix this problem, it is necessary to add a new
entrypoint for block drivers (and a new fixup wrapper in block.c to
deal with drivers which only support the synch entrypoint).

The other is that the IDE flush necessarily blocks.  The IDE command
being used here is an IO write which instructs the disk to flush and
which the guest operating system expects to complete only when the
flush is done.  Windows is evidently expecting to successfully receive
a timer interrupt on another CPU while the first one is blocked on the
IO operation - but as I understand it our emulation and threading
model doesn't permit this.

As a workaround, perhaps we should provide a way for the Xen
administrator to turn this operation into a nonblocking lazy flush
request ?

> +void bdrv_aio_flush(BlockDriverState *bs, 
> +                    BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    RawAIOCB *acb;
> +
> +    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
> +    if (!acb)
> +        return;
> +    if (aio_fsync(O_SYNC, &acb->aiocb) < 0)
> +        qemu_aio_release(acb);
> +}

This is quite wrong, I'm afraid.  You absolutely must not call through
to raw_aio_* functions from block.c in this way.  This will break for
all block backends other than raw.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-25 11:28     ` Ian Jackson
@ 2008-03-25 17:57       ` Samuel Thibault
  2008-03-26  7:07         ` Kouya Shimura
  0 siblings, 1 reply; 22+ messages in thread
From: Samuel Thibault @ 2008-03-25 17:57 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Kouya Shimura

Ian Jackson, le Tue 25 Mar 2008 11:28:32 +0000, a écrit :
> The other is that the IDE flush necessarily blocks.

What do you mean by that?  In a real machine, the processor doesn't
block while the flush is being done, and the OS just expects to see an
irq some time after.  In that regard his patch should work fine.

That said it can't be applied as is because of the other points you
raised, of course.

Samuel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-25 17:57       ` Samuel Thibault
@ 2008-03-26  7:07         ` Kouya Shimura
  2008-03-26 10:13           ` Samuel Thibault
  2008-03-26 10:29           ` Ian Jackson
  0 siblings, 2 replies; 22+ messages in thread
From: Kouya Shimura @ 2008-03-26  7:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Keir Fraser, Samuel Thibault

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 689 bytes --]

Samuel Thibault writes:
> Ian Jackson, le Tue 25 Mar 2008 11:28:32 +0000, a écrit :
> > The other is that the IDE flush necessarily blocks.
> 
> What do you mean by that?  In a real machine, the processor doesn't
> block while the flush is being done, and the OS just expects to see an
> irq some time after.  In that regard his patch should work fine.
> 
> That said it can't be applied as is because of the other points you
> raised, of course.

Anyway, I remade the patch as you point out. Is it enough?

To be honest, I'm skeptical about the necessity of the flush
for a *emulated* IDE disk but shared SCSI disk.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>


[-- Attachment #2: qemu_aio_fsync.patch --]
[-- Type: text/plain, Size: 5914 bytes --]

diff -r e768be7bf561 tools/ioemu/block-qcow.c
--- a/tools/ioemu/block-qcow.c	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/block-qcow.c	Wed Mar 26 14:53:46 2008 +0900
@@ -725,6 +725,13 @@ static void qcow_aio_cancel(BlockDriverA
     qemu_aio_release(acb);
 }
 
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
+}
+
 static void qcow_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -899,6 +906,7 @@ BlockDriver bdrv_qcow = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
     .bdrv_get_info = qcow_get_info,
diff -r e768be7bf561 tools/ioemu/block-qcow2.c
--- a/tools/ioemu/block-qcow2.c	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/block-qcow2.c	Wed Mar 26 14:52:15 2008 +0900
@@ -1005,6 +1005,13 @@ static void qcow_aio_cancel(BlockDriverA
     if (acb->hd_aiocb)
         bdrv_aio_cancel(acb->hd_aiocb);
     qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
 }
 
 static void qcow_close(BlockDriverState *bs)
@@ -2235,6 +2242,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
 
diff -r e768be7bf561 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/block-raw.c	Wed Mar 26 13:53:01 2008 +0900
@@ -496,6 +496,21 @@ static void raw_aio_cancel(BlockDriverAI
         pacb = &acb->next;
     }
 }
+
+static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    RawAIOCB *acb;
+
+    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
+    if (!acb)
+        return NULL;
+    if (aio_fsync(O_SYNC, &acb->aiocb) < 0) {
+        qemu_aio_release(acb);
+        return NULL;
+    }
+    return &acb->common;
+}
 #endif
 
 static void raw_close(BlockDriverState *bs)
@@ -621,6 +636,7 @@ BlockDriver bdrv_raw = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .protocol_name = "file",
@@ -959,6 +975,7 @@ BlockDriver bdrv_host_device = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .bdrv_pread = raw_pread,
diff -r e768be7bf561 tools/ioemu/block.c
--- a/tools/ioemu/block.c	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/block.c	Wed Mar 26 14:39:15 2008 +0900
@@ -1138,6 +1138,19 @@ void bdrv_aio_cancel(BlockDriverAIOCB *a
     drv->bdrv_aio_cancel(acb);
 }
 
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!drv)
+        return NULL;
+    if (!drv->bdrv_aio_flush)
+        return NULL;
+
+    return drv->bdrv_aio_flush(bs, cb, opaque);
+}
+
 
 /**************************************************************/
 /* async block device emulation */
diff -r e768be7bf561 tools/ioemu/block_int.h
--- a/tools/ioemu/block_int.h	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/block_int.h	Wed Mar 26 13:53:01 2008 +0900
@@ -49,6 +49,8 @@ struct BlockDriver {
         int64_t sector_num, const uint8_t *buf, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
     void (*bdrv_aio_cancel)(BlockDriverAIOCB *acb);
+    BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
     int aiocb_size;
 
     const char *protocol_name;
diff -r e768be7bf561 tools/ioemu/hw/ide.c
--- a/tools/ioemu/hw/ide.c	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/hw/ide.c	Wed Mar 26 14:36:24 2008 +0900
@@ -1070,6 +1070,14 @@ static void ide_sector_write_dma(IDEStat
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     ide_dma_start(s, ide_write_dma_cb);
+}
+
+static void ide_flush_cb(void *opaque, int ret)
+{
+    IDEState *s = opaque;
+
+    s->status = READY_STAT;
+    ide_set_irq(s);
 }
 
 static void ide_atapi_cmd_ok(IDEState *s)
@@ -1976,9 +1984,14 @@ static void ide_ioport_write(void *opaqu
             break;
         case WIN_FLUSH_CACHE:
         case WIN_FLUSH_CACHE_EXT:
-            if (s->bs)
+            if (s->bs) {
+                if (bdrv_aio_flush(s->bs, ide_flush_cb, s) != NULL) {
+                    s->status = BUSY_STAT;
+                    break;
+                }
                 bdrv_flush(s->bs);
-	    s->status = READY_STAT;
+            }
+            s->status = READY_STAT;
             ide_set_irq(s);
             break;
         case WIN_IDLEIMMEDIATE:
diff -r e768be7bf561 tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Thu Mar 20 14:29:09 2008 -0600
+++ b/tools/ioemu/vl.h	Wed Mar 26 13:53:01 2008 +0900
@@ -653,6 +653,8 @@ BlockDriverAIOCB *bdrv_aio_write(BlockDr
                                  const uint8_t *buf, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque);
 
 void qemu_aio_init(void);
 void qemu_aio_poll(void);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26  7:07         ` Kouya Shimura
@ 2008-03-26 10:13           ` Samuel Thibault
  2008-03-26 10:29           ` Ian Jackson
  1 sibling, 0 replies; 22+ messages in thread
From: Samuel Thibault @ 2008-03-26 10:13 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel, Ian Jackson, Keir Fraser

Kouya Shimura, le Wed 26 Mar 2008 16:07:26 +0900, a écrit :
> To be honest, I'm skeptical about the necessity of the flush
> for a *emulated* IDE disk but shared SCSI disk.

Emulation doesn't mean that we can afford losing the data. If the guest
wants to have data flushed, it is because it needs the data to get to
the platters, so we need to enforce that.

Samuel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 10:29           ` Ian Jackson
@ 2008-03-26 10:27             ` Alan Cox
  2008-03-26 10:51               ` Ian Jackson
  2008-03-26 10:39             ` Ian Jackson
  2008-03-27  5:20             ` Kouya Shimura
  2 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2008-03-26 10:27 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

> > +static void ide_flush_cb(void *opaque, int ret)
> > +{
> > +    IDEState *s = opaque;
> > +
> > +    s->status = READY_STAT;
> > +    ide_set_irq(s);
> 
> You need to check the return value (ret) and set an appropriate IDE
> error status if the operation failed.  ide_abort_command may be of
> some use.

Flush errors are a bit hard to emulate. On a flush failing you have to
provide back an error state indicating the LBA block number (LBA28 or
LBA48 according to which flush command) of the specific failing block.
Calling the cache flush again restarts the flushing process having
dropped that block.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26  7:07         ` Kouya Shimura
  2008-03-26 10:13           ` Samuel Thibault
@ 2008-03-26 10:29           ` Ian Jackson
  2008-03-26 10:27             ` Alan Cox
                               ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Ian Jackson @ 2008-03-26 10:29 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel, Keir Fraser, Samuel Thibault

Kouya Shimura writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> Samuel Thibault writes:
> > Ian Jackson, le Tue 25 Mar 2008 11:28:32 +0000, a écrit :
> > > The other is that the IDE flush necessarily blocks.
> > 
> > What do you mean by that?  In a real machine, the processor doesn't
> > block while the flush is being done, and the OS just expects to see an
> > irq some time after.  In that regard his patch should work fine.

I must have misread the IDE spec I was reading.

> > That said it can't be applied as is because of the other points you
> > raised, of course.
> 
> Anyway, I remade the patch as you point out. Is it enough?

This one is much better but I still have a comment ...

> To be honest, I'm skeptical about the necessity of the flush
> for a *emulated* IDE disk but shared SCSI disk.

I'm not sure I follow.  (What do you mean by `shared SCSI disk'?
We're discussing the IDE driver here ...)

> +BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
> +                                 BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (!drv)
> +        return NULL;
> +    if (!drv->bdrv_aio_flush)
> +        return NULL;
> +
> +    return drv->bdrv_aio_flush(bs, cb, opaque);
> +}

This function should emulate the asynchronous operation with the
synchronous one: so it should call the synchronous ->bdrv_flush and
then call the callback.

That will make it work like bdrv_aio_{read,write}.  (Although there is
I think no need to replicate the baroque bdrv_aio_{read,write}_em
emulation methods; that's an overly-complex way to do things.)

If you do that then callers who get NULL from bdrv_aio_flush will know
that it is actually an error.  Your current code, as well as needing
the non-aio emulation at each call site, will try the fallback after
errors as well as lack of aio support.

> +static void ide_flush_cb(void *opaque, int ret)
> +{
> +    IDEState *s = opaque;
> +
> +    s->status = READY_STAT;
> +    ide_set_irq(s);

You need to check the return value (ret) and set an appropriate IDE
error status if the operation failed.  ide_abort_command may be of
some use.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 10:29           ` Ian Jackson
  2008-03-26 10:27             ` Alan Cox
@ 2008-03-26 10:39             ` Ian Jackson
  2008-03-27  5:20             ` Kouya Shimura
  2 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2008-03-26 10:39 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

Ian Jackson writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> If you do that then callers who get NULL from bdrv_aio_flush will know
> that it is actually an error.

Sorry, rereading the code I see that that's not quite right.
bdrv_aio_* routines return NULL to indicate that the callback has
already been done; any error is given to the callback to handle.
That's what this one should do; that part code should look quite like
the innards of bdrv_aio_read_em.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 10:27             ` Alan Cox
@ 2008-03-26 10:51               ` Ian Jackson
  2008-03-26 12:34                 ` Alan Cox
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2008-03-26 10:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

Alan Cox writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> Flush errors are a bit hard to emulate. On a flush failing you have to
> provide back an error state indicating the LBA block number (LBA28 or
> LBA48 according to which flush command) of the specific failing block.
> Calling the cache flush again restarts the flushing process having
> dropped that block.

Hrm.  I think it would be better to return an endless sequence of
errors than to return no error at all.  Since we don't know which
block failed, we could report an error flushing block 0, and then an
error flushing block 1, and so on ?

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 10:51               ` Ian Jackson
@ 2008-03-26 12:34                 ` Alan Cox
  2008-03-26 14:18                   ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2008-03-26 12:34 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

> Hrm.  I think it would be better to return an endless sequence of
> errors than to return no error at all.  Since we don't know which
> block failed, we could report an error flushing block 0, and then an
> error flushing block 1, and so on ?

On a raid volume that will give wrong results and incorrect recovery
behaviour. You need to report correct information. In addition a lot of
software treats repeated errors on flush as a device failure after a
certain count.

If a write to the underlying emulated media fails you know at least which
write fails and usually the page that failed (as you get a short write)

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 12:34                 ` Alan Cox
@ 2008-03-26 14:18                   ` Ian Jackson
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2008-03-26 14:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

Alan Cox writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> On a raid volume that will give wrong results and incorrect recovery
> behaviour. You need to report correct information. In addition a lot of
> software treats repeated errors on flush as a device failure after a
> certain count.
> 
> If a write to the underlying emulated media fails you know at least which
> write fails and usually the page that failed (as you get a short write)

Well, no, because in this case we're calling f(data)sync.  So we get
no information from the dom0 kernel about what went wrong.

You're right about _writes_ rather than flush requests, if the drive
write cache is turned off.  I think that reporting errors in those
cases is already done correctly now, including details of the block
offsets, but I haven't double-checked.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-26 10:29           ` Ian Jackson
  2008-03-26 10:27             ` Alan Cox
  2008-03-26 10:39             ` Ian Jackson
@ 2008-03-27  5:20             ` Kouya Shimura
  2008-03-27  9:08               ` Alan Cox
  2 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2008-03-27  5:20 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Samuel Thibault

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 687 bytes --]

Ian Jackson writes:
> This one is much better but I still have a comment ...

Thank you for many comments. Actually I'm not good at qemu.
How about the attached patch?

> > +static void ide_flush_cb(void *opaque, int ret)
> > +{
> > +    IDEState *s = opaque;
> > +
> > +    s->status = READY_STAT;
> > +    ide_set_irq(s);
> 
> You need to check the return value (ret) and set an appropriate IDE
> error status if the operation failed.  ide_abort_command may be of
> some use.

This patch uses ide_abort_command(). 
As for the failure case, it looks too much for me.
I hope someone who is expert on the IDE fixes it.

Thanks,
Kouya

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>


[-- Attachment #2: qemu_aio_fsync2.patch --]
[-- Type: text/plain, Size: 7063 bytes --]

diff -r 966c04d42e94 tools/ioemu/block-qcow.c
--- a/tools/ioemu/block-qcow.c	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/block-qcow.c	Thu Mar 27 13:59:13 2008 +0900
@@ -725,6 +725,13 @@ static void qcow_aio_cancel(BlockDriverA
     qemu_aio_release(acb);
 }
 
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
+}
+
 static void qcow_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -899,6 +906,7 @@ BlockDriver bdrv_qcow = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
     .bdrv_get_info = qcow_get_info,
diff -r 966c04d42e94 tools/ioemu/block-qcow2.c
--- a/tools/ioemu/block-qcow2.c	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/block-qcow2.c	Thu Mar 27 13:59:13 2008 +0900
@@ -1007,6 +1007,13 @@ static void qcow_aio_cancel(BlockDriverA
     qemu_aio_release(acb);
 }
 
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
+}
+
 static void qcow_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -2235,6 +2242,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
 
diff -r 966c04d42e94 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/block-raw.c	Thu Mar 27 13:59:13 2008 +0900
@@ -496,6 +496,21 @@ static void raw_aio_cancel(BlockDriverAI
         pacb = &acb->next;
     }
 }
+
+static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    RawAIOCB *acb;
+
+    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
+    if (!acb)
+        return NULL;
+    if (aio_fsync(O_SYNC, &acb->aiocb) < 0) {
+        qemu_aio_release(acb);
+        return NULL;
+    }
+    return &acb->common;
+}
 #endif
 
 static void raw_close(BlockDriverState *bs)
@@ -621,6 +636,7 @@ BlockDriver bdrv_raw = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .protocol_name = "file",
@@ -959,6 +975,7 @@ BlockDriver bdrv_host_device = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .bdrv_pread = raw_pread,
diff -r 966c04d42e94 tools/ioemu/block.c
--- a/tools/ioemu/block.c	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/block.c	Thu Mar 27 13:59:13 2008 +0900
@@ -48,6 +48,8 @@ static BlockDriverAIOCB *bdrv_aio_write_
         int64_t sector_num, const uint8_t *buf, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
 static void bdrv_aio_cancel_em(BlockDriverAIOCB *acb);
+static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
 static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, 
                         uint8_t *buf, int nb_sectors);
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
@@ -155,6 +157,8 @@ void bdrv_register(BlockDriver *bdrv)
         bdrv->bdrv_read = bdrv_read_em;
         bdrv->bdrv_write = bdrv_write_em;
     }
+    if (!bdrv->bdrv_aio_flush)
+        bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
     bdrv->next = first_drv;
     first_drv = bdrv;
 }
@@ -1138,6 +1142,17 @@ void bdrv_aio_cancel(BlockDriverAIOCB *a
     drv->bdrv_aio_cancel(acb);
 }
 
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!drv)
+        return NULL;
+
+    return drv->bdrv_aio_flush(bs, cb, opaque);
+}
+
 
 /**************************************************************/
 /* async block device emulation */
@@ -1213,6 +1228,14 @@ static void bdrv_aio_cancel_em(BlockDriv
     qemu_aio_release(acb);
 }
 #endif /* !QEMU_TOOL */
+
+static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    bdrv_flush(bs);
+    cb(opaque, 0);
+    return NULL;
+}
 
 /**************************************************************/
 /* sync block device emulation */
diff -r 966c04d42e94 tools/ioemu/block_int.h
--- a/tools/ioemu/block_int.h	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/block_int.h	Thu Mar 27 13:59:13 2008 +0900
@@ -49,6 +49,8 @@ struct BlockDriver {
         int64_t sector_num, const uint8_t *buf, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
     void (*bdrv_aio_cancel)(BlockDriverAIOCB *acb);
+    BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
     int aiocb_size;
 
     const char *protocol_name;
diff -r 966c04d42e94 tools/ioemu/hw/ide.c
--- a/tools/ioemu/hw/ide.c	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/hw/ide.c	Thu Mar 27 13:59:13 2008 +0900
@@ -1072,6 +1072,17 @@ static void ide_sector_write_dma(IDEStat
     ide_dma_start(s, ide_write_dma_cb);
 }
 
+static void ide_flush_cb(void *opaque, int ret)
+{
+    IDEState *s = opaque;
+
+    if (ret)
+        ide_abort_command(s);
+    else
+        s->status = READY_STAT;
+    ide_set_irq(s);
+}
+
 static void ide_atapi_cmd_ok(IDEState *s)
 {
     s->error = 0;
@@ -1976,9 +1987,12 @@ static void ide_ioport_write(void *opaqu
             break;
         case WIN_FLUSH_CACHE:
         case WIN_FLUSH_CACHE_EXT:
-            if (s->bs)
-                bdrv_flush(s->bs);
-	    s->status = READY_STAT;
+            if (s->bs) {
+                s->status = BUSY_STAT;
+                bdrv_aio_flush(s->bs, ide_flush_cb, s);
+                break;
+            }
+            s->status = READY_STAT;
             ide_set_irq(s);
             break;
         case WIN_IDLEIMMEDIATE:
diff -r 966c04d42e94 tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Wed Mar 26 09:12:57 2008 +0000
+++ b/tools/ioemu/vl.h	Thu Mar 27 13:59:13 2008 +0900
@@ -653,6 +653,8 @@ BlockDriverAIOCB *bdrv_aio_write(BlockDr
                                  const uint8_t *buf, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque);
 
 void qemu_aio_init(void);
 void qemu_aio_poll(void);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-27  5:20             ` Kouya Shimura
@ 2008-03-27  9:08               ` Alan Cox
  2008-03-27 16:34                 ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2008-03-27  9:08 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel, Ian Jackson, Keir Fraser, Samuel Thibault

On Thu, 27 Mar 2008 14:20:32 +0900
Kouya Shimura <kouya@jp.fujitsu.com> wrote:

> Ian Jackson writes:
> > This one is much better but I still have a comment ...
> 
> Thank you for many comments. Actually I'm not good at qemu.
> How about the attached patch?
> 
> > > +static void ide_flush_cb(void *opaque, int ret)
> > > +{
> > > +    IDEState *s = opaque;
> > > +
> > > +    s->status = READY_STAT;
> > > +    ide_set_irq(s);
> > 
> > You need to check the return value (ret) and set an appropriate IDE
> > error status if the operation failed.  ide_abort_command may be of
> > some use.
> 
> This patch uses ide_abort_command(). 
> As for the failure case, it looks too much for me.
> I hope someone who is expert on the IDE fixes it.

If a flush fails you can't just just reply with an abrt as I said earlier,
many OS will then loop forever trying to finish the flush. flush cache
has unusual error reporting behaviour (see ATA-7). An abort means a cache
flush fail and to retry for further sectors, if you can't properly
emulate reporting block numbers back then you need to offline the virtual
device.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-27 16:34                 ` Ian Jackson
@ 2008-03-27 16:30                   ` Alan Cox
  2008-03-27 17:39                     ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2008-03-27 16:30 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

> having decided to violate it, perhaps the right answer is to make the
> disk simply vanish: ignore all future command writes, zero all of the
> registers that the host is attempting to read, not issue an interrupt,
> and somehow discard the responses from any overlapped IO which is in
> flight.

I think it is safest - and if the underlying storage fails something bad
has happened and anything else we do would make it worse
> 
> Is that what you meant by `offline the virtual device' ?

Just leave the busy bit set forever, the host will get fed up of waiting,
reset, rinse repeat a few times and (except for older Linux) then offline
the device. Older Linux (ie drivers/ide) has problems coping with failed
drives so will carry on spewing but limp along ok.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-27  9:08               ` Alan Cox
@ 2008-03-27 16:34                 ` Ian Jackson
  2008-03-27 16:30                   ` Alan Cox
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2008-03-27 16:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: xen-devel, Keir Fraser, Kouya Shimura, Samuel Thibault

Alan Cox writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> [...]  if you can't properly
> emulate reporting block numbers back then you need to offline the virtual
> device.

I've been reading the ATA-7 spec and I'm afraid it's not clear to me
what options are open to the emulation in this situation.  Setting DF
(device fault) is still no good as the spec clearly indicates that we
should still have correct LBA offsets and retry behaviour.

I can't see an approach that isn't a violation of the ATA spec.  So
having decided to violate it, perhaps the right answer is to make the
disk simply vanish: ignore all future command writes, zero all of the
registers that the host is attempting to read, not issue an interrupt,
and somehow discard the responses from any overlapped IO which is in
flight.

Is that what you meant by `offline the virtual device' ?

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-27 16:30                   ` Alan Cox
@ 2008-03-27 17:39                     ` Ian Jackson
  2008-03-28  3:09                       ` Kouya Shimura
       [not found]                       ` <87k5jnl7qt.fsf@basil.nowhere.org>
  0 siblings, 2 replies; 22+ messages in thread
From: Ian Jackson @ 2008-03-27 17:39 UTC (permalink / raw)
  To: Kouya Shimura, Alan Cox; +Cc: xen-devel, Keir Fraser, Samuel Thibault

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 884 bytes --]

Alan Cox writes ("Re: [Xen-devel] Windows Bug Check 0x101 issue"):
> Just leave the busy bit set forever, the host will get fed up of waiting,
> reset, rinse repeat a few times and (except for older Linux) then offline
> the device. Older Linux (ie drivers/ide) has problems coping with failed
> drives so will carry on spewing but limp along ok.

Thanks for the advice.  I think the patch below does this.  I have
compiled it but I don't have an easy setup for testing it and there
may be some side-effects on bizarre setups with emulated ide channels
with slaves but no masters.  Comments welcome.

Before we apply this: Kouya Shimra, could you test it and let us know
if it solves your original problem ?

Thanks,
Ian.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Modified-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>


[-- Attachment #2: make flush errors cause device to vanish nastily --]
[-- Type: text/plain, Size: 9530 bytes --]

diff -r b667e220e556 tools/ioemu/block-qcow.c
--- a/tools/ioemu/block-qcow.c	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/block-qcow.c	Thu Mar 27 15:22:33 2008 +0000
@@ -783,6 +783,13 @@ static void qcow_aio_cancel(BlockDriverA
     qemu_aio_release(acb);
 }
 
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
+}
+
 static void qcow_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
@@ -957,6 +964,7 @@ BlockDriver bdrv_qcow = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
     .bdrv_get_info = qcow_get_info,
diff -r b667e220e556 tools/ioemu/block-qcow2.c
--- a/tools/ioemu/block-qcow2.c	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/block-qcow2.c	Thu Mar 27 15:22:33 2008 +0000
@@ -1005,6 +1005,13 @@ static void qcow_aio_cancel(BlockDriverA
     if (acb->hd_aiocb)
         bdrv_aio_cancel(acb->hd_aiocb);
     qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BDRVQcowState *s = bs->opaque;
+    return bdrv_aio_flush(s->hd, cb, opaque);
 }
 
 static void qcow_close(BlockDriverState *bs)
@@ -2235,6 +2242,7 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_aio_read = qcow_aio_read,
     .bdrv_aio_write = qcow_aio_write,
     .bdrv_aio_cancel = qcow_aio_cancel,
+    .bdrv_aio_flush = qcow_aio_flush,
     .aiocb_size = sizeof(QCowAIOCB),
     .bdrv_write_compressed = qcow_write_compressed,
 
diff -r b667e220e556 tools/ioemu/block-raw.c
--- a/tools/ioemu/block-raw.c	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/block-raw.c	Thu Mar 27 15:22:33 2008 +0000
@@ -496,6 +496,21 @@ static void raw_aio_cancel(BlockDriverAI
         pacb = &acb->next;
     }
 }
+
+static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    RawAIOCB *acb;
+
+    acb = raw_aio_setup(bs, 0, NULL, 0, cb, opaque);
+    if (!acb)
+        return NULL;
+    if (aio_fsync(O_SYNC, &acb->aiocb) < 0) {
+        qemu_aio_release(acb);
+        return NULL;
+    }
+    return &acb->common;
+}
 #endif
 
 static void raw_close(BlockDriverState *bs)
@@ -621,6 +636,7 @@ BlockDriver bdrv_raw = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .protocol_name = "file",
@@ -959,6 +975,7 @@ BlockDriver bdrv_host_device = {
     .bdrv_aio_read = raw_aio_read,
     .bdrv_aio_write = raw_aio_write,
     .bdrv_aio_cancel = raw_aio_cancel,
+    .bdrv_aio_flush = raw_aio_flush,
     .aiocb_size = sizeof(RawAIOCB),
 #endif
     .bdrv_pread = raw_pread,
diff -r b667e220e556 tools/ioemu/block.c
--- a/tools/ioemu/block.c	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/block.c	Thu Mar 27 15:22:33 2008 +0000
@@ -48,6 +48,8 @@ static BlockDriverAIOCB *bdrv_aio_write_
         int64_t sector_num, const uint8_t *buf, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
 static void bdrv_aio_cancel_em(BlockDriverAIOCB *acb);
+static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
 static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, 
                         uint8_t *buf, int nb_sectors);
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
@@ -155,6 +157,8 @@ void bdrv_register(BlockDriver *bdrv)
         bdrv->bdrv_read = bdrv_read_em;
         bdrv->bdrv_write = bdrv_write_em;
     }
+    if (!bdrv->bdrv_aio_flush)
+        bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
     bdrv->next = first_drv;
     first_drv = bdrv;
 }
@@ -1138,6 +1142,17 @@ void bdrv_aio_cancel(BlockDriverAIOCB *a
     drv->bdrv_aio_cancel(acb);
 }
 
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!drv)
+        return NULL;
+
+    return drv->bdrv_aio_flush(bs, cb, opaque);
+}
+
 
 /**************************************************************/
 /* async block device emulation */
@@ -1213,6 +1228,14 @@ static void bdrv_aio_cancel_em(BlockDriv
     qemu_aio_release(acb);
 }
 #endif /* !QEMU_TOOL */
+
+static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    bdrv_flush(bs);
+    cb(opaque, 0);
+    return NULL;
+}
 
 /**************************************************************/
 /* sync block device emulation */
diff -r b667e220e556 tools/ioemu/block_int.h
--- a/tools/ioemu/block_int.h	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/block_int.h	Thu Mar 27 15:22:33 2008 +0000
@@ -49,6 +49,8 @@ struct BlockDriver {
         int64_t sector_num, const uint8_t *buf, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
     void (*bdrv_aio_cancel)(BlockDriverAIOCB *acb);
+    BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
     int aiocb_size;
 
     const char *protocol_name;
diff -r b667e220e556 tools/ioemu/hw/ide.c
--- a/tools/ioemu/hw/ide.c	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/hw/ide.c	Thu Mar 27 17:27:59 2008 +0000
@@ -751,6 +751,7 @@ static inline void ide_set_irq(IDEState 
 static inline void ide_set_irq(IDEState *s)
 {
     BMDMAState *bm = s->bmdma;
+    if (!s->bs) return; /* yikes */
     if (!(s->cmd & IDE_CMD_DISABLE_IRQ)) {
         if (bm) {
             bm->status |= BM_STATUS_INT;
@@ -916,6 +917,8 @@ static void ide_read_dma_cb(void *opaque
     int n;
     int64_t sector_num;
 
+    if (!s->bs) return; /* yikes */
+
     n = s->io_buffer_size >> 9;
     sector_num = ide_get_sector(s);
     if (n > 0) {
@@ -1024,6 +1027,8 @@ static void ide_write_dma_cb(void *opaqu
     int n;
     int64_t sector_num;
 
+    if (!s->bs) return; /* yikes */
+
     n = s->io_buffer_size >> 9;
     sector_num = ide_get_sector(s);
     if (n > 0) {
@@ -1070,6 +1075,39 @@ static void ide_sector_write_dma(IDEStat
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     ide_dma_start(s, ide_write_dma_cb);
+}
+
+static void ide_device_utterly_broken(IDEState *s) {
+    s->status |= BUSY_STAT;
+    s->bs = NULL;
+    /* This prevents all future commands from working.  All of the
+     * asynchronous callbacks (and ide_set_irq, as a safety measure)
+     * check to see whether this has happened and bail if so.
+     */
+}
+
+static void ide_flush_cb(void *opaque, int ret)
+{
+    IDEState *s = opaque;
+
+    if (!s->bs) return; /* yikes */
+
+    if (ret) {
+        /* We are completely doomed.  The IDE spec does not permit us
+	 * to return an error from a flush except via a protocol which
+	 * requires us to say where the error is and which
+	 * contemplates the guest repeating the flush attempt to
+	 * attempt flush the remaining data.  We can't support that
+	 * because f(data)sync (which is what the block drivers use
+	 * eventually) doesn't report the necessary information or
+	 * give us the necessary control.  So we make the disk vanish.
+	 */
+	ide_device_utterly_broken(s);
+	return;
+    }
+    else
+        s->status = READY_STAT;
+    ide_set_irq(s);
 }
 
 static void ide_atapi_cmd_ok(IDEState *s)
@@ -1297,6 +1335,8 @@ static void ide_atapi_cmd_read_dma_cb(vo
     BMDMAState *bm = opaque;
     IDEState *s = bm->ide_if;
     int data_offset, n;
+
+    if (!s->bs) return; /* yikes */
 
     if (ret < 0) {
         ide_atapi_io_error(s, ret);
@@ -1703,6 +1743,8 @@ static void cdrom_change_cb(void *opaque
     IDEState *s = opaque;
     int64_t nb_sectors;
 
+    if (!s->bs) return; /* yikes */
+
     /* XXX: send interrupt too */
     bdrv_get_geometry(s->bs, &nb_sectors);
     s->nb_sectors = nb_sectors;
@@ -1806,8 +1848,8 @@ static void ide_ioport_write(void *opaqu
         printf("ide: CMD=%02x\n", val);
 #endif
         s = ide_if->cur_drive;
-        /* ignore commands to non existant slave */
-        if (s != ide_if && !s->bs) 
+        /* ignore commands to non existant device */
+        if (!s->bs) 
             break;
 
         switch(val) {
@@ -1976,10 +2018,8 @@ static void ide_ioport_write(void *opaqu
             break;
         case WIN_FLUSH_CACHE:
         case WIN_FLUSH_CACHE_EXT:
-            if (s->bs)
-                bdrv_flush(s->bs);
-	    s->status = READY_STAT;
-            ide_set_irq(s);
+            s->status = BUSY_STAT;
+            bdrv_aio_flush(s->bs, ide_flush_cb, s);
             break;
         case WIN_IDLEIMMEDIATE:
         case WIN_STANDBY:
diff -r b667e220e556 tools/ioemu/vl.h
--- a/tools/ioemu/vl.h	Thu Mar 27 11:39:57 2008 +0000
+++ b/tools/ioemu/vl.h	Thu Mar 27 15:22:33 2008 +0000
@@ -653,6 +653,8 @@ BlockDriverAIOCB *bdrv_aio_write(BlockDr
                                  const uint8_t *buf, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, 
+                                 BlockDriverCompletionFunc *cb, void *opaque);
 
 void qemu_aio_init(void);
 void qemu_aio_poll(void);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
  2008-03-27 17:39                     ` Ian Jackson
@ 2008-03-28  3:09                       ` Kouya Shimura
       [not found]                       ` <87k5jnl7qt.fsf@basil.nowhere.org>
  1 sibling, 0 replies; 22+ messages in thread
From: Kouya Shimura @ 2008-03-28  3:09 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Alan Cox, Samuel Thibault

Ian Jackson writes:
> Before we apply this: Kouya Shimra, could you test it and let us know
> if it solves your original problem ?

Many thanks. It works fine for me. 

BSOD of Bug Check 0x101 can be reproduced by running iozone benchmark
on dom0. I've tried 10 times and never seen the BSOD with this patch.
The BSOD was sure to happen everytime before.

Moreover, I tried the case of aio_fsync failure by using a debugger.
After the manually injection of failure (force to set "ret=1"), 
I got a BSOD of STOP 0x000000F4 what we probably expect.

In addition to prevent Windows from the BSOD, this patch improves the
interactivity of Windows remarkably.  Without this patch, I/O load of
dom0 affects the response of a guest Windows. 
(The mouse pointer stucks sometimes)

Thanks,
Kouya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Windows Bug Check 0x101 issue
       [not found]                       ` <87k5jnl7qt.fsf@basil.nowhere.org>
@ 2008-03-28  9:32                         ` Ian Jackson
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2008-03-28  9:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Samuel Thibault, xen-devel, Alan Cox, Kouya Shimura, Keir Fraser

Andi Kleen writes ("Re: Windows Bug Check 0x101 issue"):
> It would be certainly possible to implement kernel interfaces
> that would allow discovering the error easily. I guess aio could
> pass it somewhere

In principle, yes.  But like your other proposals, I don't think
that's really a practical answer.  If such a kernel interface appears
then we should use it.

In the meantime people who want this kind of error reporting and
recovery should turn the emulated drive's cache off.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-03-28  9:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-24  9:53 Windows Bug Check 0x101 issue Kouya Shimura
2008-03-24 11:01 ` Samuel Thibault
2008-03-24 11:05 ` Keir Fraser
2008-03-25 10:54   ` Kouya Shimura
2008-03-25 10:59     ` Keir Fraser
2008-03-25 11:28     ` Ian Jackson
2008-03-25 17:57       ` Samuel Thibault
2008-03-26  7:07         ` Kouya Shimura
2008-03-26 10:13           ` Samuel Thibault
2008-03-26 10:29           ` Ian Jackson
2008-03-26 10:27             ` Alan Cox
2008-03-26 10:51               ` Ian Jackson
2008-03-26 12:34                 ` Alan Cox
2008-03-26 14:18                   ` Ian Jackson
2008-03-26 10:39             ` Ian Jackson
2008-03-27  5:20             ` Kouya Shimura
2008-03-27  9:08               ` Alan Cox
2008-03-27 16:34                 ` Ian Jackson
2008-03-27 16:30                   ` Alan Cox
2008-03-27 17:39                     ` Ian Jackson
2008-03-28  3:09                       ` Kouya Shimura
     [not found]                       ` <87k5jnl7qt.fsf@basil.nowhere.org>
2008-03-28  9:32                         ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.