All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] block: fix streaming/closing race
@ 2012-03-29 16:08 Paolo Bonzini
  2012-03-30  7:19 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2012-03-29 16:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.  The cancellation must be synchronous, so add a flag
saying whether streaming has in-flight I/O.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c        |   14 ++++++++++++++
 block/stream.c |    4 +++-
 block_int.h    |    2 ++
 3 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index b88ee90..1aab4bf 100644
--- a/block.c
+++ b/block.c
@@ -813,6 +813,9 @@ unlink_and_fail:
 void bdrv_close(BlockDriverState *bs)
 {
     if (bs->drv) {
+        if (bs->job) {
+            block_job_cancel_sync(bs->job);
+        }
         if (bs == bs_snapshots) {
             bs_snapshots = NULL;
         }
@@ -4069,3 +4072,14 @@ bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
 }
+
+void block_job_cancel_sync(BlockJob *job)
+{
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+    block_job_cancel(job);
+    while (bs->job != NULL && bs->job->busy) {
+        qemu_aio_wait();
+    }
+}
diff --git a/block/stream.c b/block/stream.c
index d1b3986..b8a1628 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -175,7 +175,7 @@ retry:
             break;
         }
 
-
+        s->common.busy = true;
         if (base) {
             ret = is_allocated_base(bs, base, sector_num,
                                     STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
@@ -189,6 +189,7 @@ retry:
             if (s->common.speed) {
                 uint64_t delay_ns = ratelimit_calculate_delay(&s->limit, n);
                 if (delay_ns > 0) {
+                    s->common.busy = false;
                     co_sleep_ns(rt_clock, delay_ns);
 
                     /* Recheck cancellation and that sectors are unallocated */
@@ -208,6 +209,7 @@ retry:
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that qemu_aio_flush() returns.
          */
+        s->common.busy = false;
         co_sleep_ns(rt_clock, 0);
     }
 
diff --git a/block_int.h b/block_int.h
index b460c36..f5c3dff 100644
--- a/block_int.h
+++ b/block_int.h
@@ -89,6 +89,7 @@ struct BlockJob {
     const BlockJobType *job_type;
     BlockDriverState *bs;
     bool cancelled;
+    bool busy;
 
     /* These fields are published by the query-block-jobs QMP API */
     int64_t offset;
@@ -329,6 +330,7 @@ void block_job_complete(BlockJob *job, int ret);
 int block_job_set_speed(BlockJob *job, int64_t value);
 void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
+void block_job_cancel_sync(BlockJob *job);
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
                  const char *base_id, BlockDriverCompletionFunc *cb,
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block: fix streaming/closing race
  2012-03-29 16:08 [Qemu-devel] [PATCH] block: fix streaming/closing race Paolo Bonzini
@ 2012-03-30  7:19 ` Stefan Hajnoczi
  2012-03-30  8:31   ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-03-30  7:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On Thu, Mar 29, 2012 at 06:08:40PM +0200, Paolo Bonzini wrote:
> Streaming can issue I/O while qcow2_close is running.  This causes the
> L2 caches to become very confused or, alternatively, could cause a
> segfault when the streaming coroutine is reentered after closing its
> block device.  The fix is to cancel streaming jobs when closing their
> underlying device.  The cancellation must be synchronous, so add a flag
> saying whether streaming has in-flight I/O.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c        |   14 ++++++++++++++
>  block/stream.c |    4 +++-
>  block_int.h    |    2 ++
>  3 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/block.c b/block.c
> index b88ee90..1aab4bf 100644
> --- a/block.c
> +++ b/block.c
> @@ -813,6 +813,9 @@ unlink_and_fail:
>  void bdrv_close(BlockDriverState *bs)
>  {
>      if (bs->drv) {
> +        if (bs->job) {
> +            block_job_cancel_sync(bs->job);
> +        }
>          if (bs == bs_snapshots) {
>              bs_snapshots = NULL;
>          }
> @@ -4069,3 +4072,14 @@ bool block_job_is_cancelled(BlockJob *job)
>  {
>      return job->cancelled;
>  }
> +
> +void block_job_cancel_sync(BlockJob *job)
> +{
> +    BlockDriverState *bs = job->bs;
> +
> +    assert(bs->job == job);
> +    block_job_cancel(job);
> +    while (bs->job != NULL && bs->job->busy) {

It's not clear to me why we have a busy flag.  Is canceling and waiting
for bs->job == NULL not enough?  Perhaps you're trying to take a
shortcut and not wait until the job is fully stopped - it's not worth it
since the streaming block job does no significant work after detecting
cancelation.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block: fix streaming/closing race
  2012-03-30  7:19 ` Stefan Hajnoczi
@ 2012-03-30  8:31   ` Paolo Bonzini
  2012-03-30 10:00     ` Paolo Bonzini
  2012-03-30 10:25     ` Stefan Hajnoczi
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Bonzini @ 2012-03-30  8:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>> > +void block_job_cancel_sync(BlockJob *job)
>> > +{
>> > +    BlockDriverState *bs = job->bs;
>> > +
>> > +    assert(bs->job == job);
>> > +    block_job_cancel(job);
>> > +    while (bs->job != NULL && bs->job->busy) {
> It's not clear to me why we have a busy flag.

Because the coroutine does not restart if you do qemu_aio_wait and the
coroutine is waiting in co_sleep.  So the busy flag communicates that
the coroutine is quiescent and, when cancelled, will not issue any new I/O.

However, even this is not enough.  It fixes a race with closing, but not
with deleting bs.  So the bug does not show anymore when you quit QEMU
(because the coroutine is not restarted), but it is still there if you
hot-unplug a device while streaming is in effect.

> Is canceling and waiting for bs->job == NULL not enough?  Perhaps
> you're trying to take a shortcut and not wait until the job is fully
> stopped

No, simply it's impossible to wait for full completion with qemu_aio_wait.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block: fix streaming/closing race
  2012-03-30  8:31   ` Paolo Bonzini
@ 2012-03-30 10:00     ` Paolo Bonzini
  2012-03-30 10:25     ` Stefan Hajnoczi
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2012-03-30 10:00 UTC (permalink / raw)
  Cc: kwolf, Stefan Hajnoczi, qemu-devel, stefanha

Il 30/03/2012 10:31, Paolo Bonzini ha scritto:
> However, even this is not enough.  It fixes a race with closing, but not
> with deleting bs.  So the bug does not show anymore when you quit QEMU
> (because the coroutine is not restarted), but it is still there if you
> hot-unplug a device while streaming is in effect.

Nevermind, hot-unplug is handled by the reference counting mechanism; so
this patch is enough.  However, we may still want to cancel the job when
a device is hot-unplugged.  I'll resubmit this patch as a series and
with a better commit message.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block: fix streaming/closing race
  2012-03-30  8:31   ` Paolo Bonzini
  2012-03-30 10:00     ` Paolo Bonzini
@ 2012-03-30 10:25     ` Stefan Hajnoczi
  2012-03-30 10:29       ` Paolo Bonzini
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-03-30 10:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On Fri, Mar 30, 2012 at 9:31 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>>> > +void block_job_cancel_sync(BlockJob *job)
>>> > +{
>>> > +    BlockDriverState *bs = job->bs;
>>> > +
>>> > +    assert(bs->job == job);
>>> > +    block_job_cancel(job);
>>> > +    while (bs->job != NULL && bs->job->busy) {
>> It's not clear to me why we have a busy flag.
>
> Because the coroutine does not restart if you do qemu_aio_wait and the
> coroutine is waiting in co_sleep.  So the busy flag communicates that
> the coroutine is quiescent and, when cancelled, will not issue any new I/O.

Ah, yes.  This is tricky, a comment would be nice.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block: fix streaming/closing race
  2012-03-30 10:25     ` Stefan Hajnoczi
@ 2012-03-30 10:29       ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2012-03-30 10:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

Il 30/03/2012 12:25, Stefan Hajnoczi ha scritto:
> On Fri, Mar 30, 2012 at 9:31 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>>>>> +void block_job_cancel_sync(BlockJob *job)
>>>>> +{
>>>>> +    BlockDriverState *bs = job->bs;
>>>>> +
>>>>> +    assert(bs->job == job);
>>>>> +    block_job_cancel(job);
>>>>> +    while (bs->job != NULL && bs->job->busy) {
>>> It's not clear to me why we have a busy flag.
>>
>> Because the coroutine does not restart if you do qemu_aio_wait and the
>> coroutine is waiting in co_sleep.  So the busy flag communicates that
>> the coroutine is quiescent and, when cancelled, will not issue any new I/O.
> 
> Ah, yes.  This is tricky, a comment would be nice.

I'll just add gtkdoc comments for the blockjob interface.  We need to
start somewhere...

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-30 10:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29 16:08 [Qemu-devel] [PATCH] block: fix streaming/closing race Paolo Bonzini
2012-03-30  7:19 ` Stefan Hajnoczi
2012-03-30  8:31   ` Paolo Bonzini
2012-03-30 10:00     ` Paolo Bonzini
2012-03-30 10:25     ` Stefan Hajnoczi
2012-03-30 10:29       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.