[Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end
@ 2017-03-16  0:46 John Snow
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim John Snow
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John Snow @ 2017-03-16  0:46 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pbonzini, qemu-devel, John Snow

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1367369#c8

It's possible to wedge QEMU if the guest tries to reset a virtio-pci
device as QEMU is also using the drive for a blockjob. This patchset
aims to allow us to safely pause/resume jobs attached to individual
nodes in a manner similar to how bdrv_drain_all_begin/end do.

Kevin suggested a DevOps approach which I've approximated here, but
I've got a number of questions before I push the point any further.

John Snow (3):
  blockjob: add block_job_start_shim
  block-backend: add drained_begin / drained_end ops
  blockjob: add devops to blockjob backends

 block/block-backend.c          | 25 ++++++++++++++++++
 blockjob.c                     | 60 +++++++++++++++++++++++++++++++++---------
 include/sysemu/block-backend.h |  8 ++++++
 3 files changed, 81 insertions(+), 12 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim
  2017-03-16  0:46 [Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end John Snow
@ 2017-03-16  0:46 ` John Snow
  2017-03-16  8:20   ` Paolo Bonzini
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops John Snow
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends John Snow
  2 siblings, 1 reply; 10+ messages in thread
From: John Snow @ 2017-03-16  0:46 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pbonzini, qemu-devel, John Snow

The purpose of this shim is to allow us to pause pre-started jobs.
The purpose of *that* is to allow us to buffer a pause request that
will be able to take effect before the job ever does any work, allowing
us to create jobs during a quiescent state (under which they will be
automatically paused), then resuming the jobs after the critical section
in any order, either:

(1) -block_job_start
    -block_job_resume (via e.g. drained_end)

(2) -block_job_resume (via e.g. drained_end)
    -block_job_start

The problem that requires a shim is the idea that a job must start in the
busy=true state only its first time-- all subsequent entries require busy
to be false, and the toggling of this state is otherwise handled during
existing pause and yield points.

The shim simply allows us to mandate that a job can "start," set busy to
true, then immediately pause only if necessary.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockjob.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 69126af..6d08ce5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -250,16 +250,28 @@ static bool block_job_started(BlockJob *job)
     return job->co;
 }
 
+/**
+ * This code must run after the job's coroutine has been entered,
+ * but not before.
+ */
+static void coroutine_fn block_job_start_shim(void *opaque)
+{
+    BlockJob *job = opaque;
+
+    assert(job && job->driver && job->driver->start);
+    block_job_pause_point(job);
+    job->driver->start(job);
+}
+
 void block_job_start(BlockJob *job)
 {
     assert(job && !block_job_started(job) && job->paused &&
-           !job->busy && job->driver->start);
-    job->co = qemu_coroutine_create(job->driver->start, job);
-    if (--job->pause_count == 0) {
-        job->paused = false;
-        job->busy = true;
-        qemu_coroutine_enter(job->co);
-    }
+           job->driver && job->driver->start);
+    job->co = qemu_coroutine_create(block_job_start_shim, job);
+    job->pause_count--;
+    job->busy = true;
+    job->paused = false;
+    qemu_coroutine_enter(job->co);
 }
 
 void block_job_ref(BlockJob *job)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops
  2017-03-16  0:46 [Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end John Snow
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim John Snow
@ 2017-03-16  0:46 ` John Snow
  2017-03-16  8:25   ` Paolo Bonzini
  2017-03-16 17:25   ` Kevin Wolf
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends John Snow
  2 siblings, 2 replies; 10+ messages in thread
From: John Snow @ 2017-03-16  0:46 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pbonzini, qemu-devel, John Snow

Signed-off-by: John Snow <jsnow@redhat.com>

---

RFC questions:

- Does the presence of blk->quiesce_counter relieve the burden of needing
  blk->public.io_limits_disabled? I could probably eliminate this counter
  entirely and just spy on the root node's quiescent state at key moments
  instead. I am confident I'm traipsing on delicate drain semantics.

- Should I treat the separation of a quisced BDS/BB as a drained_end request
  as an analogue to how blk_insert_bs (in this patch) handles this?

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/block-backend.c          | 25 +++++++++++++++++++++++++
 include/sysemu/block-backend.h |  8 ++++++++
 2 files changed, 33 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..eb85e8b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -65,6 +65,8 @@ struct BlockBackend {
     bool allow_write_beyond_eof;
 
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
+
+    int quiesce_counter;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
     }
     bdrv_ref(bs);
 
+    /* The new BDS may be quiescent, we should attempt to match */
+    if (bs->quiesce_counter) {
+        blk_root_drained_begin(blk->root);
+    }
+
     notifier_list_notify(&blk->insert_bs_notifiers, blk);
     if (blk->public.throttle_state) {
         throttle_timers_attach_aio_context(
@@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
 
     blk->dev_ops = ops;
     blk->dev_opaque = opaque;
+
+    /* Are we currently quiesced? Should we enforce this right now? */
+    if (blk->quiesce_counter && ops->drained_begin) {
+        ops->drained_begin(opaque);
+    }
 }
 
 /*
@@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
 
+    blk->quiesce_counter++;
+
     /* Note that blk->root may not be accessible here yet if we are just
      * attaching to a BlockDriverState that is drained. Use child instead. */
 
+    if (blk->dev_ops && blk->dev_ops->drained_begin) {
+        blk->dev_ops->drained_begin(blk->dev_opaque);
+    }
+
     if (blk->public.io_limits_disabled++ == 0) {
         throttle_group_restart_blk(blk);
     }
@@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
 
+    assert(blk->quiesce_counter);
+    blk->quiesce_counter--;
+
     assert(blk->public.io_limits_disabled);
     --blk->public.io_limits_disabled;
+
+    if (blk->dev_ops && blk->dev_ops->drained_end) {
+        blk->dev_ops->drained_end(blk->dev_opaque);
+    }
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 096c17f..c6f4408 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -58,6 +58,14 @@ typedef struct BlockDevOps {
      * Runs when the size changed (e.g. monitor command block_resize)
      */
     void (*resize_cb)(void *opaque);
+    /*
+     * Runs when the backend receives a drain request.
+     */
+    void (*drained_begin)(void *opaque);
+    /*
+     * Runs when the backend's drain request ends.
+     */
+    void (*drained_end)(void *opaque);
 } BlockDevOps;
 
 /* This struct is embedded in (the private) BlockBackend struct and contains
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends
  2017-03-16  0:46 [Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end John Snow
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim John Snow
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops John Snow
@ 2017-03-16  0:46 ` John Snow
  2017-03-16  8:28   ` Paolo Bonzini
  2017-03-16 17:36   ` Kevin Wolf
  2 siblings, 2 replies; 10+ messages in thread
From: John Snow @ 2017-03-16  0:46 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pbonzini, qemu-devel, John Snow

This lets us hook into drained_begin and drained_end requests from the
backend level, which is particularly useful for making sure that all
jobs associated with a particular node (whether the source or the target)
receive a drain request.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

--

RFC topics:

- BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
  currently a 'device'... Do we want to loosen this restriction, find another
  way to deliver callbacks to BlockJobs attached to BlkBackends, or do something
  crazy like make a BlockJob device object?

  struct JobDevice {
    DeviceState parent_obj;
    BlockJob *job;
  } ...??

- Not so sure about leaving the initial job refcount at 1, since we are giving
  away a handle to this job as dev_opaque. By incrementing it to 2, however,
  it's not clear whose responsibility it is to decrement it again.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockjob.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 6d08ce5..0542677 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -68,6 +68,23 @@ static const BdrvChildRole child_job = {
     .stay_at_node       = true,
 };
 
+static void block_job_drained_begin(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+    .drained_begin = block_job_drained_begin,
+    .drained_end = block_job_drained_end,
+};
+
 BlockJob *block_job_next(BlockJob *job)
 {
     if (!job) {
@@ -205,11 +222,6 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     }
 
     job = g_malloc0(driver->instance_size);
-    error_setg(&job->blocker, "block device is in use by block job: %s",
-               BlockJobType_lookup[driver->job_type]);
-    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
-    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
-
     job->driver        = driver;
     job->id            = g_strdup(job_id);
     job->blk           = blk;
@@ -219,8 +231,20 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->paused        = true;
     job->pause_count   = 1;
     job->refcnt        = 1;
+    /* RFC: job is now stored both in bs->job and blk->dev_opaque,
+     * but we cannot expect the backend to know how to put down the ref
+     * so it'd be up to this code here to increment it anyway, making it
+     * a bit of a useless exercise. It would be up to us to ensure that
+     * we destruct the blkbackend before putting down our last reference. */
+
+    error_setg(&job->blocker, "block device is in use by block job: %s",
+               BlockJobType_lookup[driver->job_type]);
+    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
     bs->job = job;
 
+    blk_set_dev_ops(blk, &block_job_dev_ops, job);
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+
     QLIST_INSERT_HEAD(&block_jobs, job, job_list);
 
     blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim John Snow
@ 2017-03-16  8:20   ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-16  8:20 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: kwolf, qemu-devel



On 16/03/2017 01:46, John Snow wrote:
> +/**
> + * This code must run after the job's coroutine has been entered,
> + * but not before.
> + */
> +static void coroutine_fn block_job_start_shim(void *opaque)
> +{
> +    BlockJob *job = opaque;
> +
> +    assert(job && job->driver && job->driver->start);
> +    block_job_pause_point(job);
> +    job->driver->start(job);
> +}

Poor function, don't call it a shim :)  Let's just call it
block_job_co_entry.

Paolo

>  void block_job_start(BlockJob *job)
>  {
>      assert(job && !block_job_started(job) && job->paused &&
> -           !job->busy && job->driver->start);
> -    job->co = qemu_coroutine_create(job->driver->start, job);
> -    if (--job->pause_count == 0) {
> -        job->paused = false;
> -        job->busy = true;
> -        qemu_coroutine_enter(job->co);
> -    }
> +           job->driver && job->driver->start);
> +    job->co = qemu_coroutine_create(block_job_start_shim, job);
> +    job->pause_count--;
> +    job->busy = true;
> +    job->paused = false;
> +    qemu_coroutine_enter(job->co);
>  }
>  
>  void block_job_ref(BlockJob *job)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops John Snow
@ 2017-03-16  8:25   ` Paolo Bonzini
  2017-03-16 17:25   ` Kevin Wolf
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-16  8:25 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: kwolf, qemu-devel



On 16/03/2017 01:46, John Snow wrote:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Tricky question.  I believe io_limits_disabled is a bit of a hack.

Certainly you could get rid of disable_external now that we have
drained_begin/drained_end, but it would be a separate patch.

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

I think so.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends John Snow
@ 2017-03-16  8:28   ` Paolo Bonzini
  2017-03-16 17:36   ` Kevin Wolf
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-03-16  8:28 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: kwolf, qemu-devel



On 16/03/2017 01:46, John Snow wrote:
> - BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
>   currently a 'device'... Do we want to loosen this restriction, find another
>   way to deliver callbacks to BlockJobs attached to BlkBackends, or do something
>   crazy like make a BlockJob device object?
> 
>   struct JobDevice {
>     DeviceState parent_obj;
>     BlockJob *job;
>   } ...??

I think we want to loosen it.

> - Not so sure about leaving the initial job refcount at 1, since we are giving
>   away a handle to this job as dev_opaque. By incrementing it to 2, however,
>   it's not clear whose responsibility it is to decrement it again.

It would be another callback, sent at the time the dev_ops are removed.
But devices have been treating dev_opaque as a weak reference, it makes
sense to do the same for jobs.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops John Snow
  2017-03-16  8:25   ` Paolo Bonzini
@ 2017-03-16 17:25   ` Kevin Wolf
  2017-03-16 19:58     ` John Snow
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2017-03-16 17:25 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, pbonzini, qemu-devel

Am 16.03.2017 um 01:46 hat John Snow geschrieben:
> Signed-off-by: John Snow <jsnow@redhat.com>
> 
> ---
> 
> RFC questions:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Not for 2.9 anyway. :-)

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

It's already taken care of, you don't need to add anything for that
(see below).

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/block-backend.c          | 25 +++++++++++++++++++++++++
>  include/sysemu/block-backend.h |  8 ++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5742c09..eb85e8b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -65,6 +65,8 @@ struct BlockBackend {
>      bool allow_write_beyond_eof;
>  
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
> +
> +    int quiesce_counter;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
>      }
>      bdrv_ref(bs);
>  
> +    /* The new BDS may be quiescent, we should attempt to match */
> +    if (bs->quiesce_counter) {
> +        blk_root_drained_begin(blk->root);
> +    }

Do you really need this hunk? BB and BDS should already be kept in sync,
it's just the BB and its user that aren't. This is the call chain that
already leads to blk_root_drained_begin():

blk_insert_bs
    bdrv_root_attach_child
        bdrv_replace_child
            bdrv_replace_child_noperm
                child->role->drained_begin(child)

Did you check whether blk->public.io_limits_disabled ever goes back to 0
with your patch? I think it's increased twice and decreased only once.

To answer your RFC question, bdrv_replace_child_noperm() also takes care
of calling drained_end() when the BDS is detached from the BB.

>      notifier_list_notify(&blk->insert_bs_notifiers, blk);
>      if (blk->public.throttle_state) {
>          throttle_timers_attach_aio_context(
> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
>  
>      blk->dev_ops = ops;
>      blk->dev_opaque = opaque;
> +
> +    /* Are we currently quiesced? Should we enforce this right now? */
> +    if (blk->quiesce_counter && ops->drained_begin) {
> +        ops->drained_begin(opaque);
> +    }
>  }
>  
>  /*
> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>  
> +    blk->quiesce_counter++;
> +
>      /* Note that blk->root may not be accessible here yet if we are just
>       * attaching to a BlockDriverState that is drained. Use child instead. */
>  
> +    if (blk->dev_ops && blk->dev_ops->drained_begin) {
> +        blk->dev_ops->drained_begin(blk->dev_opaque);
> +    }

Should we do this only if blk->quiesce_counter was 0? (And
dev_ops->drained_end() when it reaches 0 again.)

The difference is that the implementation of the callback would have to
have its own counter as it is in this patch. Not really that bad, but at
the moment I don't see a reason why we would need to support nested
drained sections for dev_ops.

> +
>      if (blk->public.io_limits_disabled++ == 0) {
>          throttle_group_restart_blk(blk);
>      }
> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>  
> +    assert(blk->quiesce_counter);
> +    blk->quiesce_counter--;
> +
>      assert(blk->public.io_limits_disabled);
>      --blk->public.io_limits_disabled;
> +
> +    if (blk->dev_ops && blk->dev_ops->drained_end) {
> +        blk->dev_ops->drained_end(blk->dev_opaque);
> +    }

The basic idea looks good to me.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends
  2017-03-16  0:46 ` [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends John Snow
  2017-03-16  8:28   ` Paolo Bonzini
@ 2017-03-16 17:36   ` Kevin Wolf
  1 sibling, 0 replies; 10+ messages in thread
From: Kevin Wolf @ 2017-03-16 17:36 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, pbonzini, qemu-devel

Am 16.03.2017 um 01:46 hat John Snow geschrieben:
> This lets us hook into drained_begin and drained_end requests from the
> backend level, which is particularly useful for making sure that all
> jobs associated with a particular node (whether the source or the target)
> receive a drain request.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> 
> --
> 
> RFC topics:
> 
> - BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
>   currently a 'device'... Do we want to loosen this restriction, find another
>   way to deliver callbacks to BlockJobs attached to BlkBackends, or do something
>   crazy like make a BlockJob device object?
> 
>   struct JobDevice {
>     DeviceState parent_obj;
>     BlockJob *job;
>   } ...??

We should probably rename BlkDevOps to something that works not only for
devices, but for any user of a BlockBackend. I don't think the
implementation has to be changed, it's just a struct of callbacks and an
opaque pointer that is actually properly treated as opaque.

BlockBackend also has a dev field, which is indeed supposed to be a
DeviceState and is sometimes casted to one (if we can assert
!blk->legacy_dev), but it's a concept completely separate from
BlkDevOps. So we just need to be sure not to call blk_attach_dev() or
blk_attach_dev_legacy().

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops
  2017-03-16 17:25   ` Kevin Wolf
@ 2017-03-16 19:58     ` John Snow
  0 siblings, 0 replies; 10+ messages in thread
From: John Snow @ 2017-03-16 19:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, qemu-devel, qemu-block



On 03/16/2017 01:25 PM, Kevin Wolf wrote:
> Am 16.03.2017 um 01:46 hat John Snow geschrieben:
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> ---
>>
>> RFC questions:
>>
>> - Does the presence of blk->quiesce_counter relieve the burden of needing
>>   blk->public.io_limits_disabled? I could probably eliminate this counter
>>   entirely and just spy on the root node's quiescent state at key moments
>>   instead. I am confident I'm traipsing on delicate drain semantics.
> 
> Not for 2.9 anyway. :-)
> 
>> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>>   as an analogue to how blk_insert_bs (in this patch) handles this?
> 
> It's already taken care of, you don't need to add anything for that
> (see below).
> 
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/block-backend.c          | 25 +++++++++++++++++++++++++
>>  include/sysemu/block-backend.h |  8 ++++++++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 5742c09..eb85e8b 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -65,6 +65,8 @@ struct BlockBackend {
>>      bool allow_write_beyond_eof;
>>  
>>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>> +
>> +    int quiesce_counter;
>>  };
>>  
>>  typedef struct BlockBackendAIOCB {
>> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
>>      }
>>      bdrv_ref(bs);
>>  
>> +    /* The new BDS may be quiescent, we should attempt to match */
>> +    if (bs->quiesce_counter) {
>> +        blk_root_drained_begin(blk->root);
>> +    }
> 
> Do you really need this hunk? BB and BDS should already be kept in sync,
> it's just the BB and its user that aren't. This is the call chain that
> already leads to blk_root_drained_begin():
> 

Oh, good call..!

> blk_insert_bs
>     bdrv_root_attach_child
>         bdrv_replace_child
>             bdrv_replace_child_noperm
>                 child->role->drained_begin(child)
> 
> Did you check whether blk->public.io_limits_disabled ever goes back to 0
> with your patch? I think it's increased twice and decreased only once.
> 

I didn't, but with your change above, it works out just fine.

> To answer your RFC question, bdrv_replace_child_noperm() also takes care
> of calling drained_end() when the BDS is detached from the BB.
> 

Excellent.

>>      notifier_list_notify(&blk->insert_bs_notifiers, blk);
>>      if (blk->public.throttle_state) {
>>          throttle_timers_attach_aio_context(
>> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
>>  
>>      blk->dev_ops = ops;
>>      blk->dev_opaque = opaque;
>> +
>> +    /* Are we currently quiesced? Should we enforce this right now? */
>> +    if (blk->quiesce_counter && ops->drained_begin) {
>> +        ops->drained_begin(opaque);
>> +    }
>>  }
>>  
>>  /*
>> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>>  {
>>      BlockBackend *blk = child->opaque;
>>  
>> +    blk->quiesce_counter++;
>> +
>>      /* Note that blk->root may not be accessible here yet if we are just
>>       * attaching to a BlockDriverState that is drained. Use child instead. */
>>  
>> +    if (blk->dev_ops && blk->dev_ops->drained_begin) {
>> +        blk->dev_ops->drained_begin(blk->dev_opaque);
>> +    }
> 
> Should we do this only if blk->quiesce_counter was 0? (And
> dev_ops->drained_end() when it reaches 0 again.)
> 

Ah yes, actually, that led to the question about .io_limits_disabled,
because I could have both begin/end only operate on the edge between 0/1.

> The difference is that the implementation of the callback would have to
> have its own counter as it is in this patch. Not really that bad, but at
> the moment I don't see a reason why we would need to support nested
> drained sections for dev_ops.
> 

Coincidentally, block_job_pause already supports counters! Entirely by
accident this worked.

>> +
>>      if (blk->public.io_limits_disabled++ == 0) {
>>          throttle_group_restart_blk(blk);
>>      }
>> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>>  {
>>      BlockBackend *blk = child->opaque;
>>  
>> +    assert(blk->quiesce_counter);
>> +    blk->quiesce_counter--;
>> +
>>      assert(blk->public.io_limits_disabled);
>>      --blk->public.io_limits_disabled;
>> +
>> +    if (blk->dev_ops && blk->dev_ops->drained_end) {
>> +        blk->dev_ops->drained_end(blk->dev_opaque);
>> +    }
> 
> The basic idea looks good to me.
> 
> Kevin
> 

I'll send a v2 shortly, ever-so-slightly touched up.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-16 19:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16  0:46 [Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end John Snow
2017-03-16  0:46 ` [Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim John Snow
2017-03-16  8:20   ` Paolo Bonzini
2017-03-16  0:46 ` [Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops John Snow
2017-03-16  8:25   ` Paolo Bonzini
2017-03-16 17:25   ` Kevin Wolf
2017-03-16 19:58     ` John Snow
2017-03-16  0:46 ` [Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends John Snow
2017-03-16  8:28   ` Paolo Bonzini
2017-03-16 17:36   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.