All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush
@ 2009-05-28 16:33 Andrea Arcangeli
  2009-05-30 10:08 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2009-05-28 16:33 UTC (permalink / raw)
  To: qemu-devel

Hello,

the debug code in my ide_dma_cancel patch (not yet included upstream)
made us notice that when qemu_aio_flush returns, there are still
pending aio commands that waits to complete. Auditing the code I found
strange stuff like the fact qemu_aio_waits does nothing if there's an
unrelated (no aio related) bh executed. And I think I found the reason
of why there was still pending aio when qemu_aio_flush because
qemu_aio_wait does a lot more than wait, it can start aio, and if the
previous ->io_flush returned zero, the loop ends and ->io_flush isn't
repeated. The fact an unrelated bh can make qemu_aio_wait a noop seems
troublesome for all callers that aren't calling qemu_aio_wait in a
loop like qemu_aio_flush, so I preferred to change those callers to a
safer qemu_aio_flush in case the bh executed generates more pending
I/O. What you think about this patch against qemu git?

------------
Subject: fix bdrv_read/write_em and qemu_aio_flush
From: Andrea Arcangeli <aarcange@redhat.com>

qemu_aio_wait() was used to start aio through qemu_bh_poll(), like in case of
qcow2 reads from holes. The bh is global. I can't see how it can be possibly
safe to make qemu_aio_wait a noop, if any unrelated bh was pending and had to
be executed. 

In addition qemu_aio_wait by invoking the bh, could execute new aio callbacks
that would issue more aio commands during their completion handlers, breaking
the invariant that qemu_aio_poll returns only when all ->io_flush methods
returns 0 (which lead to a failure in my fix to ide_dma_cancel that has debug
code to catch that, code that isn't yet in qemu upstream).

	->io_flush returns 0
	qemu_aio_wait()
	qemu_aio_bh() -> qcow_aio_read_bh -> ide_read_dma_cb -> ide_dma_submit_check
	break the loop despite ->io_flush wouldn't return 0

I also changed most aio_wait callers to call aio_flush instead, this will
ensure that even a read from hole submitted with bdrv_aio_readv will be
complete by the time aio_flush returns.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/aio.c b/aio.c
index 11fbb6c..b304852 100644
--- a/aio.c
+++ b/aio.c
@@ -103,6 +103,12 @@ void qemu_aio_flush(void)
     do {
         ret = 0;
 
+	/*
+	 * If there are pending emulated aio start them so
+	 * flush will be able to return 1.
+	 */
+	qemu_bh_poll();
+
         LIST_FOREACH(node, &aio_handlers, node) {
             ret |= node->io_flush(node->opaque);
         }
@@ -115,9 +121,6 @@ void qemu_aio_wait(void)
 {
     int ret;
 
-    if (qemu_bh_poll())
-        return;
-
     do {
         AioHandler *node;
         fd_set rdfds, wrfds;
diff --git a/block.c b/block.c
index c80d4b9..78088a9 100644
--- a/block.c
+++ b/block.c
@@ -1487,7 +1487,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
         return -1;
 
     while (async_ret == NOT_DONE) {
-        qemu_aio_wait();
+        qemu_aio_flush();
     }
 
     return async_ret;
@@ -1510,7 +1510,7 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
     if (acb == NULL)
         return -1;
     while (async_ret == NOT_DONE) {
-        qemu_aio_wait();
+        qemu_aio_flush();
     }
     return async_ret;
 }
diff --git a/qemu-io.c b/qemu-io.c
index f0a17b9..0d52074 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -153,7 +153,7 @@ static int do_aio_readv(QEMUIOVector *qiov, int64_t offset, int *total)
 		return -EIO;
 
 	while (async_ret == NOT_DONE)
-		qemu_aio_wait();
+		qemu_aio_flush();
 
 	*total = qiov->size;
 	return async_ret < 0 ? async_ret : 1;
@@ -170,7 +170,7 @@ static int do_aio_writev(QEMUIOVector *qiov, int64_t offset, int *total)
 		return -EIO;
 
 	while (async_ret == NOT_DONE)
-		qemu_aio_wait();
+		qemu_aio_flush();
 
 	*total = qiov->size;
 	return async_ret < 0 ? async_ret : 1;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush
  2009-05-28 16:33 [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush Andrea Arcangeli
@ 2009-05-30 10:08 ` Christoph Hellwig
  2009-05-30 12:17   ` Andrea Arcangeli
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2009-05-30 10:08 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: qemu-devel

On Thu, May 28, 2009 at 06:33:10PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> the debug code in my ide_dma_cancel patch (not yet included upstream)
> made us notice that when qemu_aio_flush returns, there are still
> pending aio commands that waits to complete. Auditing the code I found
> strange stuff like the fact qemu_aio_waits does nothing if there's an
> unrelated (no aio related) bh executed. And I think I found the reason
> of why there was still pending aio when qemu_aio_flush because
> qemu_aio_wait does a lot more than wait, it can start aio, and if the
> previous ->io_flush returned zero, the loop ends and ->io_flush isn't
> repeated. The fact an unrelated bh can make qemu_aio_wait a noop seems
> troublesome for all callers that aren't calling qemu_aio_wait in a
> loop like qemu_aio_flush, so I preferred to change those callers to a
> safer qemu_aio_flush in case the bh executed generates more pending
> I/O. What you think about this patch against qemu git?

Looks good to me.  In my unsubmitted aio support patches for qemu-io
I had to call qemu_aio_wait at least twice to get aio requests reliably
completed, but with this patch and calling qemu_aio_flush it always
completes all requests.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush
  2009-05-30 10:08 ` Christoph Hellwig
@ 2009-05-30 12:17   ` Andrea Arcangeli
  2009-06-04 11:26     ` [Qemu-devel] [PATCH] fix qemu_aio_flush Andrea Arcangeli
  0 siblings, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2009-05-30 12:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Hi Christoph,

On Sat, May 30, 2009 at 12:08:42PM +0200, Christoph Hellwig wrote:
> On Thu, May 28, 2009 at 06:33:10PM +0200, Andrea Arcangeli wrote:
> > Hello,
> > 
> > the debug code in my ide_dma_cancel patch (not yet included upstream)
> > made us notice that when qemu_aio_flush returns, there are still
> > pending aio commands that waits to complete. Auditing the code I found
> > strange stuff like the fact qemu_aio_waits does nothing if there's an
> > unrelated (no aio related) bh executed. And I think I found the reason
> > of why there was still pending aio when qemu_aio_flush because
> > qemu_aio_wait does a lot more than wait, it can start aio, and if the
> > previous ->io_flush returned zero, the loop ends and ->io_flush isn't
> > repeated. The fact an unrelated bh can make qemu_aio_wait a noop seems
> > troublesome for all callers that aren't calling qemu_aio_wait in a
> > loop like qemu_aio_flush, so I preferred to change those callers to a
> > safer qemu_aio_flush in case the bh executed generates more pending
> > I/O. What you think about this patch against qemu git?
> 
> Looks good to me.  In my unsubmitted aio support patches for qemu-io
> I had to call qemu_aio_wait at least twice to get aio requests reliably
> completed, but with this patch and calling qemu_aio_flush it always
> completes all requests.

Exactly! In effect this could be slightly optimized in the future, if
we could track a single aio request, instead of waiting for them all.
This is a bit the equivalent of 'sync' instead of wait_on_page in the
kernel, because we don't have a wait_on_page here, so we've to flush
them all to be safe from bh execution. Given the potential of not
waiting and I/O corruption or misbehavior of ide.c because of
qemu_aio_flush returning too early without my patch, I think it's good
idea to apply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH] fix qemu_aio_flush
  2009-05-30 12:17   ` Andrea Arcangeli
@ 2009-06-04 11:26     ` Andrea Arcangeli
  2009-06-04 11:51       ` [Qemu-devel] " Kevin Wolf
  2009-06-05 15:57       ` [Qemu-devel] " Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2009-06-04 11:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel

Hello,

Kevin has good point that when highlevel callback handler completes we
should be guaranteed all underlying layers of callbacks events
completed for that specific aio operation. So it seems the main bug
was only in qemu_aio_flush() (only made visible by the debug code
included in the ide_dma_cancel patch). I guess that's a problem for
savevm/reset that assumes there is no outstanding aio waiting to be
run while in fact there can be because of the bug. Patch is much
simpler as seen below:

----------

From: Andrea Arcangeli <aarcange@redhat.com>

qemu_aio_wait by invoking the bh or one of the aio completion
callbacks, could end up submitting new pending aio, breaking the
invariant that qemu_aio_poll returns only when no pending aio is
outstanding (possibly a problem for migration as such).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/aio.c b/aio.c
index 11fbb6c..dc9b85d 100644
--- a/aio.c
+++ b/aio.c
@@ -103,11 +103,15 @@ void qemu_aio_flush(void)
     do {
         ret = 0;
 
+	/*
+	 * If there are pending emulated aio start them now so flush
+	 * will be able to return 1.
+	 */
+        qemu_aio_wait();
+
         LIST_FOREACH(node, &aio_handlers, node) {
             ret |= node->io_flush(node->opaque);
         }
-
-        qemu_aio_wait();
     } while (ret > 0);
 }
 
diff --git a/qemu-aio.h b/qemu-aio.h
index 7967829..f262344 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -24,9 +24,10 @@ typedef int (AioFlushHandler)(void *opaque);
  * outstanding AIO operations have been completed or cancelled. */
 void qemu_aio_flush(void);
 
-/* Wait for a single AIO completion to occur.  This function will until a
- * single AIO opeartion has completed.  It is intended to be used as a looping
- * primative when simulating synchronous IO based on asynchronous IO. */
+/* Wait for a single AIO completion to occur.  This function will wait
+ * until a single AIO event has completed and it will ensure something
+ * has moved before returning. This can issue new pending aio as
+ * result of executing I/O completion or bh callbacks. */
 void qemu_aio_wait(void);
 
 /* Register a file descriptor and associated callbacks.  Behaves very similarly

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] Re: [PATCH] fix qemu_aio_flush
  2009-06-04 11:26     ` [Qemu-devel] [PATCH] fix qemu_aio_flush Andrea Arcangeli
@ 2009-06-04 11:51       ` Kevin Wolf
  2009-06-05 15:57       ` [Qemu-devel] " Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2009-06-04 11:51 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Christoph Hellwig, qemu-devel

Andrea Arcangeli schrieb:
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> qemu_aio_wait by invoking the bh or one of the aio completion
> callbacks, could end up submitting new pending aio, breaking the
> invariant that qemu_aio_poll returns only when no pending aio is
> outstanding (possibly a problem for migration as such).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

You mean qemu_aio_flush, not poll. Anyway, the code looks fine.

Acked-by: Kevin Wolf <kwolf@redhat.com>

> ---
> 
> diff --git a/aio.c b/aio.c
> index 11fbb6c..dc9b85d 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -103,11 +103,15 @@ void qemu_aio_flush(void)
>      do {
>          ret = 0;
>  
> +	/*
> +	 * If there are pending emulated aio start them now so flush
> +	 * will be able to return 1.
> +	 */
> +        qemu_aio_wait();
> +
>          LIST_FOREACH(node, &aio_handlers, node) {
>              ret |= node->io_flush(node->opaque);
>          }
> -
> -        qemu_aio_wait();
>      } while (ret > 0);
>  }
>  
> diff --git a/qemu-aio.h b/qemu-aio.h
> index 7967829..f262344 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -24,9 +24,10 @@ typedef int (AioFlushHandler)(void *opaque);
>   * outstanding AIO operations have been completed or cancelled. */
>  void qemu_aio_flush(void);
>  
> -/* Wait for a single AIO completion to occur.  This function will until a
> - * single AIO opeartion has completed.  It is intended to be used as a looping
> - * primative when simulating synchronous IO based on asynchronous IO. */
> +/* Wait for a single AIO completion to occur.  This function will wait
> + * until a single AIO event has completed and it will ensure something
> + * has moved before returning. This can issue new pending aio as
> + * result of executing I/O completion or bh callbacks. */
>  void qemu_aio_wait(void);
>  
>  /* Register a file descriptor and associated callbacks.  Behaves very similarly
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] fix qemu_aio_flush
  2009-06-04 11:26     ` [Qemu-devel] [PATCH] fix qemu_aio_flush Andrea Arcangeli
  2009-06-04 11:51       ` [Qemu-devel] " Kevin Wolf
@ 2009-06-05 15:57       ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2009-06-05 15:57 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Kevin Wolf, Christoph Hellwig, qemu-devel

On Thu, Jun 04, 2009 at 01:26:45PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> Kevin has good point that when highlevel callback handler completes we
> should be guaranteed all underlying layers of callbacks events
> completed for that specific aio operation. So it seems the main bug
> was only in qemu_aio_flush() (only made visible by the debug code
> included in the ide_dma_cancel patch). I guess that's a problem for
> savevm/reset that assumes there is no outstanding aio waiting to be
> run while in fact there can be because of the bug. Patch is much
> simpler as seen below:

Yes, this looks better.  

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-05 15:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28 16:33 [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush Andrea Arcangeli
2009-05-30 10:08 ` Christoph Hellwig
2009-05-30 12:17   ` Andrea Arcangeli
2009-06-04 11:26     ` [Qemu-devel] [PATCH] fix qemu_aio_flush Andrea Arcangeli
2009-06-04 11:51       ` [Qemu-devel] " Kevin Wolf
2009-06-05 15:57       ` [Qemu-devel] " Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.