All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
@ 2015-05-19  8:04   ` Paolo Bonzini
  2015-05-19 16:48     ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2015-05-19  8:04 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: qemu-block, Michael S. Tsirkin, Juan Quintela, mreitz,
	Stefan Hajnoczi, Amit Shah

On 19/05/2015 13:49, Fam Zheng wrote:
> When block job mirror is finished, the source and target is synced. But
> we call bdrv_swap() later in main loop bh. If the guest write before
> that, target will not get the new data.

This is too late.  As a rule, the blocker must be established before
calling bdrv_drain, and removed on the next yield (in this case, before
the assignment to last_pause_ns).

Paolo

> Reported-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/mirror.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 58f391a..d96403b 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -320,6 +320,8 @@ static void mirror_drain(MirrorBlockJob *s)
>  
>  typedef struct {
>      int ret;
> +    /* Use to pause device IO during mirror exit */
> +    Error *blocker;
>  } MirrorExitData;
>  
>  static void mirror_exit(BlockJob *job, void *opaque)
> @@ -328,6 +330,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
>      MirrorExitData *data = opaque;
>      AioContext *replace_aio_context = NULL;
>  
> +    bdrv_op_unblock(s->common.bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
> +    error_free(data->blocker);
>      if (s->to_replace) {
>          replace_aio_context = bdrv_get_aio_context(s->to_replace);
>          aio_context_acquire(replace_aio_context);
> @@ -563,8 +567,10 @@ immediate_exit:
>      bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
>      bdrv_iostatus_disable(s->target);
>  
> -    data = g_malloc(sizeof(*data));
> +    data = g_malloc0(sizeof(*data));
>      data->ret = ret;
> +    error_setg(&data->blocker, "mirror job exiting");
> +    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
>      block_job_defer_to_main_loop(&s->common, mirror_exit, data);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19 16:48     ` Fam Zheng
@ 2015-05-19  8:49       ` Paolo Bonzini
  2015-05-19 18:37         ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2015-05-19  8:49 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, Michael S. Tsirkin, Juan Quintela, qemu-devel,
	mreitz, Stefan Hajnoczi, Amit Shah



On 19/05/2015 18:48, Fam Zheng wrote:
>> > 
>> > This is too late.  As a rule, the blocker must be established before
>> > calling bdrv_drain, and removed on the next yield (in this case, before
>> > the assignment to last_pause_ns).
> I don't understand. If the blocker is removed before mirror_run returns,
> wouldn't more device IO already hit source image by the time mirror_exit runs?

If you go to mirror_exit, you won't reach the assignment (so you have to
remove the blocker in mirror_exit too).

But if you don't go to mirror_exit because cnt != 0, you must remove the
blocker before the next I/O.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19 18:37         ` Fam Zheng
@ 2015-05-19 10:57           ` Paolo Bonzini
  2015-05-20  2:23             ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2015-05-19 10:57 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, Michael S. Tsirkin, Juan Quintela, qemu-devel,
	mreitz, Stefan Hajnoczi, Amit Shah



On 19/05/2015 20:37, Fam Zheng wrote:
> On Tue, 05/19 10:49, Paolo Bonzini wrote:
>>
>>
>> On 19/05/2015 18:48, Fam Zheng wrote:
>>>>>
>>>>> This is too late.  As a rule, the blocker must be established before
>>>>> calling bdrv_drain, and removed on the next yield (in this case, before
>>>>> the assignment to last_pause_ns).
>>> I don't understand. If the blocker is removed before mirror_run returns,
>>> wouldn't more device IO already hit source image by the time mirror_exit runs?
>>
>> If you go to mirror_exit, you won't reach the assignment (so you have to
>> remove the blocker in mirror_exit too).
>>
>> But if you don't go to mirror_exit because cnt != 0, you must remove the
>> blocker before the next I/O.
>>
> 
> OK, but I'm still not clear how is it too late in this patch? Although the
> blocker is set after bdrv_drain, we know there is no dirty data because cnt is
> 0, and we'll be holding a blocker when releasing the AioContext, no new IO is
> allowed.

So you rely on the caller of mirror_run not calling aio_context_release
between bdrv_drain and block_job_defer_to_main_loop?  That indeed should
work, but why not stick to a common pattern of blocking I/O before
bdrv_drain?  That's how bdrv_drain is almost always used in the code, so
it's a safe thing to do and the preemption points are then documented
more clearly.

I think it would be nice to have all bdrv_drain calls:

- either preceded by a device I/O blocker

- or preceded by a comment explaining why the call is there and why it
doesn't need the blocker

This is not a NACK, but I would like to understand the disadvantages of
what I am suggesting here.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export
@ 2015-05-19 11:49 Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 01/13] block: Add op blocker type "device IO" Fam Zheng
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

v4: virtio-scsi-dataplane: Use assert in ctrl/event queue handler. [Paolo]
    Protect mirror complete in new patch 13. [Wen]
    Add Max's rev-by in 02, 03, 04.
    Fix 05, 06 per Max's comments.

Reported by Paolo.

Unlike the iohandler in main loop, iothreads currently process the event
notifier used by virtio-blk ioeventfd in nested aio_poll. This is dangerous
without proper protection, because guest requests could sneak to block layer
where they mustn't.

For example, a QMP transaction may involve multiple bdrv_drain_all() in
handling the list of AioContext it works on. If an aio_poll in one of the
bdrv_drain_all() happens to process a guest VQ kick, and dispatches the
ioeventfd event to virtio-blk, a new guest write is then submitted, and voila,
the transaction semantics is violated.

This series avoids this problem by disabling virtio-blk handlers during
bdrv_drain_all() and transactions.

- Patches 1~3 add the block layer op blocker change notifier code.
- Patches 4,5 secure virtio-blk dataplane.
- Patch 6 protects virtio-scsi dataplane.
- Patch 7 secures nbd export.
- Patch 8~11 protect each transaction type from being voilated by new IO
  generated in nested aio_poll.
- Patch 12 protects bdrv_drain and bdrv_drain_all.
- Patch 13 protects mirror complete.


Fam Zheng (13):
  block: Add op blocker type "device IO"
  block: Add op blocker notifier list
  block-backend: Add blk_op_blocker_add_notifier
  virtio-blk: Move complete_request to 'ops' structure
  virtio-blk: Don't handle output when there is "device IO" op blocker
  virtio-scsi-dataplane: Add "device IO" op blocker listener
  nbd-server: Clear "can_read" when "device io" blocker is set
  blockdev: Block device IO during internal snapshot transaction
  blockdev: Block device IO during external snapshot transaction
  blockdev: Block device IO during drive-backup transaction
  blockdev: Block device IO during blockdev-backup transaction
  block: Block "device IO" during bdrv_drain and bdrv_drain_all
  block/mirror: Block "device IO" during mirror exit

 block.c                         | 35 ++++++++++++++++++
 block/block-backend.c           |  6 +++
 block/io.c                      | 22 ++++++++++-
 block/mirror.c                  |  8 +++-
 blockdev.c                      | 49 ++++++++++++++++++++----
 blockjob.c                      |  1 +
 hw/block/dataplane/virtio-blk.c | 37 ++++++++++++++++---
 hw/block/virtio-blk.c           | 65 ++++++++++++++++++++++++++++++--
 hw/scsi/virtio-scsi-dataplane.c | 82 +++++++++++++++++++++++++++++++----------
 hw/scsi/virtio-scsi.c           |  4 ++
 include/block/block.h           |  9 +++++
 include/block/block_int.h       |  3 ++
 include/hw/virtio/virtio-blk.h  | 17 +++++++--
 include/hw/virtio/virtio-scsi.h |  3 ++
 include/sysemu/block-backend.h  |  2 +
 migration/block.c               |  1 +
 nbd.c                           | 24 ++++++++++++
 17 files changed, 327 insertions(+), 41 deletions(-)

-- 
2.4.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 01/13] block: Add op blocker type "device IO"
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 02/13] block: Add op blocker notifier list Fam Zheng
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

It blocks device IO.

All bdrv_op_block_all/blk_op_block_all callers are taken care of:

- virtio_blk_data_plane_create
- virtio_scsi_hotplug

  Device creation, unblock it.

- bdrv_set_backing_hd

  Backing hd is not used by device, so blocking is OK.

- backup_start

  Blocking target when backup is running, unblock it.

- mirror_complete

  Blocking s->to_replace until mirror_exit, OK.

- block_job_complete

  The block job may be long running. Unblock it.

- init_blk_migration

  The block migration may be long running, Unblock it.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockjob.c                      | 1 +
 hw/block/dataplane/virtio-blk.c | 1 +
 hw/scsi/virtio-scsi.c           | 1 +
 include/block/block.h           | 1 +
 migration/block.c               | 1 +
 5 files changed, 5 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 2755465..e39bdde 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -51,6 +51,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
                BlockJobType_lookup[driver->job_type]);
     bdrv_op_block_all(bs, job->blocker);
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, job->blocker);
 
     job->driver        = driver;
     job->bs            = bs;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3db139b..3ecc8bd 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -209,6 +209,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_MIRROR, s->blocker);
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_STREAM, s->blocker);
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_REPLACE, s->blocker);
+    blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
 
     *dataplane = s;
 }
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index e242fef..5e15fa6 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -772,6 +772,7 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
             return;
         }
         blk_op_block_all(sd->conf.blk, s->blocker);
+        blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
         aio_context_acquire(s->ctx);
         blk_set_aio_context(sd->conf.blk, s->ctx);
         aio_context_release(s->ctx);
diff --git a/include/block/block.h b/include/block/block.h
index 7d1a717..906fb31 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -159,6 +159,7 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_RESIZE,
     BLOCK_OP_TYPE_STREAM,
     BLOCK_OP_TYPE_REPLACE,
+    BLOCK_OP_TYPE_DEVICE_IO,
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
diff --git a/migration/block.c b/migration/block.c
index ddb59cc..b833bac 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -379,6 +379,7 @@ static void init_blk_migration(QEMUFile *f)
         alloc_aio_bitmap(bmds);
         error_setg(&bmds->blocker, "block device is in use by migration");
         bdrv_op_block_all(bs, bmds->blocker);
+        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, bmds->blocker);
         bdrv_ref(bs);
 
         block_mig_state.total_sector_sum += sectors;
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 02/13] block: Add op blocker notifier list
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 01/13] block: Add op blocker type "device IO" Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

BDS users can register a notifier and get notified about op blocker
changes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 35 +++++++++++++++++++++++++++++++++++
 include/block/block.h     |  8 ++++++++
 include/block/block_int.h |  3 +++
 3 files changed, 46 insertions(+)

diff --git a/block.c b/block.c
index 7904098..9714603 100644
--- a/block.c
+++ b/block.c
@@ -3375,6 +3375,19 @@ struct BdrvOpBlocker {
     QLIST_ENTRY(BdrvOpBlocker) list;
 };
 
+/* Add a notifier which will be notified when the first blocker of some type is
+ * added to bs, or when the last blocker is removed. The notifier handler
+ * should receive a BlockOpEvent data.
+ *
+ * Caller must hold bs->aio_context; the notifier handler is also called with
+ * it held.
+ */
+void bdrv_op_blocker_add_notifier(BlockDriverState *bs,
+                                  Notifier *notifier)
+{
+    notifier_list_add(&bs->op_blocker_notifiers, notifier);
+}
+
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
 {
     BdrvOpBlocker *blocker;
@@ -3391,26 +3404,48 @@ bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
     return false;
 }
 
+static void bdrv_op_blocker_notify(BlockDriverState *bs, BlockOpType op,
+                                   Error *reason, bool blocking)
+{
+    BlockOpEvent event = (BlockOpEvent) {
+        op = op,
+        reason = reason,
+        blocking = blocking,
+    };
+
+    notifier_list_notify(&bs->op_blocker_notifiers, &event);
+}
+
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason)
 {
+    bool blocked;
     BdrvOpBlocker *blocker;
     assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
 
     blocker = g_new0(BdrvOpBlocker, 1);
     blocker->reason = reason;
+    blocked = !QLIST_EMPTY(&bs->op_blockers[op]);
     QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list);
+    if (!blocked) {
+        bdrv_op_blocker_notify(bs, op, reason, true);
+    }
 }
 
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason)
 {
+    bool blocked;
     BdrvOpBlocker *blocker, *next;
     assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
+    blocked = !QLIST_EMPTY(&bs->op_blockers[op]);
     QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
         if (blocker->reason == reason) {
             QLIST_REMOVE(blocker, list);
             g_free(blocker);
         }
     }
+    if (blocked && QLIST_EMPTY(&bs->op_blockers[op])) {
+        bdrv_op_blocker_notify(bs, op, reason, false);
+    }
 }
 
 void bdrv_op_block_all(BlockDriverState *bs, Error *reason)
diff --git a/include/block/block.h b/include/block/block.h
index 906fb31..3420b2c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -163,6 +163,12 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
+typedef struct {
+    BlockOpType type;
+    Error *reason;
+    bool blocking;
+} BlockOpEvent;
+
 void bdrv_iostatus_enable(BlockDriverState *bs);
 void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
@@ -491,6 +497,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
 
+void bdrv_op_blocker_add_notifier(BlockDriverState *bs,
+                                  Notifier *notifier);
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index db29b74..29d1c84 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -418,6 +418,9 @@ struct BlockDriverState {
     /* operation blockers */
     QLIST_HEAD(, BdrvOpBlocker) op_blockers[BLOCK_OP_TYPE_MAX];
 
+    /* Callback when one list in op_blockers has "empty" status change. */
+    NotifierList op_blocker_notifiers;
+
     /* long-running background operation */
     BlockJob *job;
 
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 03/13] block-backend: Add blk_op_blocker_add_notifier
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 01/13] block: Add op blocker type "device IO" Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 02/13] block: Add op blocker notifier list Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Forward the call to bdrv_op_blocker_add_notifier.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/block-backend.c          | 6 ++++++
 include/sysemu/block-backend.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 93e46f3..16efe60 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -802,6 +802,12 @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason)
     bdrv_op_unblock_all(blk->bs, reason);
 }
 
+void blk_op_blocker_add_notifier(BlockBackend *blk,
+                                 Notifier *notifier)
+{
+    bdrv_op_blocker_add_notifier(blk->bs, notifier);
+}
+
 AioContext *blk_get_aio_context(BlockBackend *blk)
 {
     return bdrv_get_aio_context(blk->bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index b4a4d5e..cde9651 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -136,6 +136,8 @@ int blk_get_flags(BlockBackend *blk);
 int blk_get_max_transfer_length(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
+void blk_op_blocker_add_notifier(BlockBackend *blk,
+                                 Notifier *notifier);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
 void blk_op_unblock(BlockBackend *blk, BlockOpType op, Error *reason);
 void blk_op_block_all(BlockBackend *blk, Error *reason);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 04/13] virtio-blk: Move complete_request to 'ops' structure
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (2 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Should more ops be added to differentiate code between dataplane and
non-dataplane, the new saved_ops approach will be cleaner than messing
with N pointers.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 13 ++++++++-----
 hw/block/virtio-blk.c           |  8 ++++++--
 include/hw/virtio/virtio-blk.h  |  9 +++++++--
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3ecc8bd..e287e09 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -51,8 +51,7 @@ struct VirtIOBlockDataPlane {
 
     /* Operation blocker on BDS */
     Error *blocker;
-    void (*saved_complete_request)(struct VirtIOBlockReq *req,
-                                   unsigned char status);
+    const VirtIOBlockOps *saved_ops;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -88,6 +87,10 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
     qemu_bh_schedule(s->bh);
 }
 
+static const VirtIOBlockOps virtio_blk_data_plane_ops = {
+    .complete_request = complete_request_vring,
+};
+
 static void handle_notify(EventNotifier *e)
 {
     VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
@@ -270,8 +273,8 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     }
     s->host_notifier = *virtio_queue_get_host_notifier(vq);
 
-    s->saved_complete_request = vblk->complete_request;
-    vblk->complete_request = complete_request_vring;
+    s->saved_ops = vblk->ops;
+    vblk->ops = &virtio_blk_data_plane_ops;
 
     s->starting = false;
     s->started = true;
@@ -314,7 +317,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
         return;
     }
     s->stopping = true;
-    vblk->complete_request = s->saved_complete_request;
+    vblk->ops = s->saved_ops;
     trace_virtio_blk_data_plane_stop(s);
 
     aio_context_acquire(s->ctx);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e6afe97..f4a9d19 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -59,9 +59,13 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req,
     virtio_notify(vdev, s->vq);
 }
 
+static const VirtIOBlockOps virtio_blk_ops = (VirtIOBlockOps) {
+    .complete_request = virtio_blk_complete_request,
+};
+
 static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
-    req->dev->complete_request(req, status);
+    req->dev->ops->complete_request(req, status);
 }
 
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
@@ -905,7 +909,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
     s->vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
-    s->complete_request = virtio_blk_complete_request;
+    s->ops = &virtio_blk_ops;
     virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
     if (err != NULL) {
         error_propagate(errp, err);
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 6bf5905..28b3436 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -44,6 +44,12 @@ struct VirtIOBlkConf
 struct VirtIOBlockDataPlane;
 
 struct VirtIOBlockReq;
+
+typedef struct {
+    /* Function to push to vq and notify guest */
+    void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+} VirtIOBlockOps;
+
 typedef struct VirtIOBlock {
     VirtIODevice parent_obj;
     BlockBackend *blk;
@@ -54,8 +60,7 @@ typedef struct VirtIOBlock {
     unsigned short sector_mask;
     bool original_wce;
     VMChangeStateEntry *change;
-    /* Function to push to vq and notify guest */
-    void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+    const VirtIOBlockOps *ops;
     Notifier migration_state_notifier;
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (3 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

virtio-blk now listens to op blocker change of the associated block
backend.

Up on setting op blocker on BLOCK_OP_TYPE_DEVICE_IO:

  non-dataplane:
   1) Set VirtIOBlock.paused
   2) In virtio_blk_handle_output, do nothing if VirtIOBlock.paused

  dataplane:
   1) Clear the host event notifier
   2) In handle_notify, do nothing if VirtIOBlock.paused

Up on removing the op blocker:

  non-dataplane:
   1) Clear VirtIOBlock.paused
   2) Schedule a BH on the AioContext of the backend, which calls
   virtio_blk_handle_output, so that the previous unhandled kicks can
   make progress

  dataplane:
   1) Set the host event notifier
   2) Notify the host event notifier so that unhandled events are
   processed

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 25 ++++++++++++++++-
 hw/block/virtio-blk.c           | 59 +++++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-blk.h  |  8 +++++-
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e287e09..a5e8e35 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -87,8 +87,28 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
     qemu_bh_schedule(s->bh);
 }
 
+static void virtio_blk_data_plane_pause(VirtIOBlock *vblk)
+{
+    VirtIOBlockDataPlane *s = vblk->dataplane;
+
+    event_notifier_test_and_clear(&s->host_notifier);
+    aio_set_event_notifier(s->ctx, &s->host_notifier, NULL);
+}
+
+static void handle_notify(EventNotifier *e);
+static void virtio_blk_data_plane_resume(VirtIOBlock *vblk)
+{
+    VirtIOBlockDataPlane *s = vblk->dataplane;
+
+    aio_set_event_notifier(s->ctx, &s->host_notifier, handle_notify);
+
+    event_notifier_set(&s->host_notifier);
+}
+
 static const VirtIOBlockOps virtio_blk_data_plane_ops = {
-    .complete_request = complete_request_vring,
+    .complete_request           = complete_request_vring,
+    .pause                      = virtio_blk_data_plane_pause,
+    .resume                     = virtio_blk_data_plane_resume,
 };
 
 static void handle_notify(EventNotifier *e)
@@ -98,6 +118,9 @@ static void handle_notify(EventNotifier *e)
     VirtIOBlock *vblk = VIRTIO_BLK(s->vdev);
 
     event_notifier_test_and_clear(&s->host_notifier);
+    if (vblk->paused) {
+        return;
+    }
     blk_io_plug(s->conf->conf.blk);
     for (;;) {
         MultiReqBuffer mrb = {};
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f4a9d19..b4b19b5 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -59,8 +59,38 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req,
     virtio_notify(vdev, s->vq);
 }
 
+typedef struct {
+    QEMUBH *bh;
+    VirtIOBlock *s;
+} VirtIOBlockResumeData;
+
+static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq);
+static void virtio_blk_resume_bh_cb(void *opaque)
+{
+    VirtIOBlockResumeData *data = opaque;
+    qemu_bh_delete(data->bh);
+    virtio_blk_handle_output(VIRTIO_DEVICE(data->s), data->s->vq);
+}
+
+static void virtio_blk_pause(VirtIOBlock *vblk)
+{
+    /* TODO: stop ioeventfd */
+}
+
+static void virtio_blk_resume(VirtIOBlock *vblk)
+{
+    VirtIOBlockResumeData *data = g_new(VirtIOBlockResumeData, 1);
+    data->bh = aio_bh_new(blk_get_aio_context(vblk->blk),
+            virtio_blk_resume_bh_cb, data);
+    data->s = vblk;
+    data->s->paused = false;
+    qemu_bh_schedule(data->bh);
+}
+
 static const VirtIOBlockOps virtio_blk_ops = (VirtIOBlockOps) {
-    .complete_request = virtio_blk_complete_request,
+    .complete_request           = virtio_blk_complete_request,
+    .pause                      = virtio_blk_pause,
+    .resume                     = virtio_blk_resume,
 };
 
 static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
@@ -597,6 +627,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     VirtIOBlockReq *req;
     MultiReqBuffer mrb = {};
 
+    if (s->paused) {
+        return;
+    }
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * dataplane here instead of waiting for .set_status().
      */
@@ -787,7 +820,7 @@ static void virtio_blk_save(QEMUFile *f, void *opaque)
 
     virtio_save(vdev, f);
 }
-    
+
 static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
@@ -875,6 +908,25 @@ static void virtio_blk_migration_state_changed(Notifier *notifier, void *data)
     }
 }
 
+static void virtio_blk_op_blocker_changed(Notifier *notifier, void *opaque)
+{
+    BlockOpEvent *event = opaque;
+    VirtIOBlock *s = container_of(notifier, VirtIOBlock,
+                                  op_blocker_notifier);
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    if (event->blocking == s->paused) {
+        return;
+    }
+    if (event->blocking) {
+        s->ops->pause(s);
+    } else {
+        s->ops->resume(s);
+    }
+}
+
 static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -926,6 +978,9 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
     blk_iostatus_enable(s->blk);
+
+    s->op_blocker_notifier.notify = virtio_blk_op_blocker_changed;
+    blk_op_blocker_add_notifier(s->blk, &s->op_blocker_notifier);
 }
 
 static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 28b3436..aa15fea 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -42,12 +42,16 @@ struct VirtIOBlkConf
 };
 
 struct VirtIOBlockDataPlane;
-
+struct VirtIOBlock;
 struct VirtIOBlockReq;
 
 typedef struct {
     /* Function to push to vq and notify guest */
     void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+
+    /* Functions to pause/resume request handling */
+    void (*pause)(struct VirtIOBlock *vblk);
+    void (*resume)(struct VirtIOBlock *vblk);
 } VirtIOBlockOps;
 
 typedef struct VirtIOBlock {
@@ -62,6 +66,8 @@ typedef struct VirtIOBlock {
     VMChangeStateEntry *change;
     const VirtIOBlockOps *ops;
     Notifier migration_state_notifier;
+    Notifier op_blocker_notifier;
+    bool paused;
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (4 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

When a disk is attached to scsi-bus, virtio_scsi_hotplug will take care
of protecting the block device with op blockers. Currently we haven't
enabled block jobs (like what's done in virtio_blk_data_plane_create),
but it is necessary to honor "device IO" op blocker first before we do.
This is useful to make sure that guest IO requests are paused during qmp
transactions (such as multi-disk snapshot or backup).

A counter is added to the virtio-scsi device, which keeps track of
currently blocked disks. If it goes from 0 to 1, the ioeventfds are
disabled; when it goes back to 0, they are re-enabled.

Also in device initialization, push the enabling of ioeventfds to before
return, so the virtio_scsi_clear_aio is not needed there. Rename it,
pair with an enabling variant, fix one coding style issue, then use it
in the device pause points.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/scsi/virtio-scsi-dataplane.c | 82 +++++++++++++++++++++++++++++++----------
 hw/scsi/virtio-scsi.c           |  3 ++
 include/hw/virtio/virtio-scsi.h |  3 ++
 3 files changed, 68 insertions(+), 20 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 5575648..e220c12 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -40,7 +40,6 @@ void virtio_scsi_set_iothread(VirtIOSCSI *s, IOThread *iothread)
 
 static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
                                                VirtQueue *vq,
-                                               EventNotifierHandler *handler,
                                                int n)
 {
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
@@ -60,7 +59,6 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
     r = g_slice_new(VirtIOSCSIVring);
     r->host_notifier = *virtio_queue_get_host_notifier(vq);
     r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
-    aio_set_event_notifier(s->ctx, &r->host_notifier, handler);
 
     r->parent = s;
 
@@ -71,7 +69,6 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
     return r;
 
 fail_vring:
-    aio_set_event_notifier(s->ctx, &r->host_notifier, NULL);
     k->set_host_notifier(qbus->parent, n, false);
     g_slice_free(VirtIOSCSIVring, r);
     return NULL;
@@ -104,6 +101,9 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
     }
 }
 
+static void virtio_scsi_start_ioeventfd(VirtIOSCSI *s);
+static void virtio_scsi_stop_ioeventfd(VirtIOSCSI *s);
+
 static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
 {
     VirtIOSCSIVring *vring = container_of(notifier,
@@ -111,6 +111,7 @@ static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
     VirtIOSCSI *s = VIRTIO_SCSI(vring->parent);
     VirtIOSCSIReq *req;
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
     while ((req = virtio_scsi_pop_req_vring(s, vring))) {
         virtio_scsi_handle_ctrl_req(s, req);
@@ -124,6 +125,7 @@ static void virtio_scsi_iothread_handle_event(EventNotifier *notifier)
     VirtIOSCSI *s = vring->parent;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
 
     if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -143,6 +145,7 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier *notifier)
     VirtIOSCSIReq *req, *next;
     QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
     while ((req = virtio_scsi_pop_req_vring(s, vring))) {
         if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
@@ -155,8 +158,56 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier *notifier)
     }
 }
 
+void virtio_scsi_dataplane_blocker_notify(Notifier *notifier,
+                                          void *data)
+{
+    VirtIOSCSI *s = container_of(notifier, VirtIOSCSI, blocker_notifier);
+    BlockOpEvent *event = data;
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    if (event->blocking) {
+        s->pause_counter++;
+        if (s->pause_counter == 1) {
+            virtio_scsi_stop_ioeventfd(s);
+        }
+    } else {
+        s->pause_counter--;
+        if (s->pause_counter == 0) {
+            virtio_scsi_start_ioeventfd(s);
+        }
+    }
+    assert(s->pause_counter >= 0);
+}
+
 /* assumes s->ctx held */
-static void virtio_scsi_clear_aio(VirtIOSCSI *s)
+static void virtio_scsi_start_ioeventfd(VirtIOSCSI *s)
+{
+    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+    int i;
+
+    if (!s->dataplane_started || s->dataplane_stopping) {
+        return;
+    }
+    if (s->ctrl_vring) {
+        aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier,
+                               virtio_scsi_iothread_handle_ctrl);
+    }
+    if (s->event_vring) {
+        aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier,
+                               virtio_scsi_iothread_handle_event);
+    }
+    if (s->cmd_vrings) {
+        for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
+            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
+                                   virtio_scsi_iothread_handle_cmd);
+        }
+    }
+}
+
+/* assumes s->ctx held */
+static void virtio_scsi_stop_ioeventfd(VirtIOSCSI *s)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
     int i;
@@ -169,7 +220,8 @@ static void virtio_scsi_clear_aio(VirtIOSCSI *s)
     }
     if (s->cmd_vrings) {
         for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
-            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier, NULL);
+            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
+                                   NULL);
         }
     }
 }
@@ -229,24 +281,18 @@ void virtio_scsi_dataplane_start(VirtIOSCSI *s)
     }
 
     aio_context_acquire(s->ctx);
-    s->ctrl_vring = virtio_scsi_vring_init(s, vs->ctrl_vq,
-                                           virtio_scsi_iothread_handle_ctrl,
-                                           0);
+    s->ctrl_vring = virtio_scsi_vring_init(s, vs->ctrl_vq, 0);
     if (!s->ctrl_vring) {
         goto fail_vrings;
     }
-    s->event_vring = virtio_scsi_vring_init(s, vs->event_vq,
-                                            virtio_scsi_iothread_handle_event,
-                                            1);
+    s->event_vring = virtio_scsi_vring_init(s, vs->event_vq, 1);
     if (!s->event_vring) {
         goto fail_vrings;
     }
     s->cmd_vrings = g_new(VirtIOSCSIVring *, vs->conf.num_queues);
     for (i = 0; i < vs->conf.num_queues; i++) {
         s->cmd_vrings[i] =
-            virtio_scsi_vring_init(s, vs->cmd_vqs[i],
-                                   virtio_scsi_iothread_handle_cmd,
-                                   i + 2);
+            virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2);
         if (!s->cmd_vrings[i]) {
             goto fail_vrings;
         }
@@ -254,11 +300,11 @@ void virtio_scsi_dataplane_start(VirtIOSCSI *s)
 
     s->dataplane_starting = false;
     s->dataplane_started = true;
+    virtio_scsi_start_ioeventfd(s);
     aio_context_release(s->ctx);
     return;
 
 fail_vrings:
-    virtio_scsi_clear_aio(s);
     aio_context_release(s->ctx);
     virtio_scsi_vring_teardown(s);
     for (i = 0; i < vs->conf.num_queues + 2; i++) {
@@ -290,11 +336,7 @@ void virtio_scsi_dataplane_stop(VirtIOSCSI *s)
 
     aio_context_acquire(s->ctx);
 
-    aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier, NULL);
-    aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier, NULL);
-    for (i = 0; i < vs->conf.num_queues; i++) {
-        aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier, NULL);
-    }
+    virtio_scsi_stop_ioeventfd(s);
 
     blk_drain_all(); /* ensure there are no in-flight requests */
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 5e15fa6..16c253f 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -775,6 +775,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
         aio_context_acquire(s->ctx);
         blk_set_aio_context(sd->conf.blk, s->ctx);
+        s->blocker_notifier.notify = virtio_scsi_dataplane_blocker_notify;
+        blk_op_blocker_add_notifier(sd->conf.blk, &s->blocker_notifier);
         aio_context_release(s->ctx);
     }
 
@@ -799,6 +801,7 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     }
 
     if (s->ctx) {
+        notifier_remove(&s->blocker_notifier);
         blk_op_unblock_all(sd->conf.blk, s->blocker);
     }
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index b42e7f1..f8591fd 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -97,6 +97,8 @@ typedef struct VirtIOSCSI {
     bool dataplane_disabled;
     bool dataplane_fenced;
     Error *blocker;
+    Notifier blocker_notifier;
+    int pause_counter;
     Notifier migration_state_notifier;
     uint32_t host_features;
 } VirtIOSCSI;
@@ -170,6 +172,7 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
                             uint32_t event, uint32_t reason);
 
 void virtio_scsi_set_iothread(VirtIOSCSI *s, IOThread *iothread);
+void virtio_scsi_dataplane_blocker_notify(Notifier *notifier, void *data);
 void virtio_scsi_dataplane_start(VirtIOSCSI *s);
 void virtio_scsi_dataplane_stop(VirtIOSCSI *s);
 void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 07/13] nbd-server: Clear "can_read" when "device io" blocker is set
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (5 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

So that NBD export cannot submit IO during bdrv_drain_all().

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 nbd.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/nbd.c b/nbd.c
index 06b501b..b5026af 100644
--- a/nbd.c
+++ b/nbd.c
@@ -160,6 +160,8 @@ struct NBDExport {
     uint32_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;
+    Notifier blocker_notifier;
+    bool io_blocked;
 
     AioContext *ctx;
 };
@@ -1053,6 +1055,22 @@ static void blk_aio_detach(void *opaque)
     exp->ctx = NULL;
 }
 
+static void nbd_op_blocker_changed(Notifier *notifier, void *data)
+{
+    BlockOpEvent *event = data;
+    NBDClient *client;
+    NBDExport *exp = container_of(notifier, NBDExport, blocker_notifier);
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    exp->io_blocked = event->blocking;
+
+    QTAILQ_FOREACH(client, &exp->clients, next) {
+        nbd_update_can_read(client);
+    }
+}
+
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
                           uint32_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
@@ -1081,6 +1099,8 @@ NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
      * access since the export could be available before migration handover.
      */
     blk_invalidate_cache(blk, NULL);
+    exp->blocker_notifier.notify = nbd_op_blocker_changed;
+    blk_op_blocker_add_notifier(blk, &exp->blocker_notifier);
     return exp;
 
 fail:
@@ -1132,6 +1152,7 @@ void nbd_export_close(NBDExport *exp)
     nbd_export_set_name(exp, NULL);
     nbd_export_put(exp);
     if (exp->blk) {
+        notifier_remove(&exp->blocker_notifier);
         blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                         blk_aio_detach, exp);
         blk_unref(exp->blk);
@@ -1455,6 +1476,9 @@ static void nbd_update_can_read(NBDClient *client)
     bool can_read = client->recv_coroutine ||
                     client->nb_requests < MAX_NBD_REQUESTS;
 
+    if (client->exp && client->exp->io_blocked) {
+        can_read = false;
+    }
     if (can_read != client->can_read) {
         client->can_read = can_read;
         nbd_set_handlers(client);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 08/13] blockdev: Block device IO during internal snapshot transaction
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (6 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 09/13] blockdev: Block device IO during external " Fam Zheng
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..7f763d9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1262,6 +1262,7 @@ typedef struct InternalSnapshotState {
     BlockDriverState *bs;
     AioContext *aio_context;
     QEMUSnapshotInfo sn;
+    Error *blocker;
 } InternalSnapshotState;
 
 static void internal_snapshot_prepare(BlkTransactionState *common,
@@ -1300,6 +1301,10 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
     state->aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "internal snapshot in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+
     if (!bdrv_is_inserted(bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
         return;
@@ -1354,9 +1359,6 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
                          name, device);
         return;
     }
-
-    /* 4. succeed, mark a snapshot is created */
-    state->bs = bs;
 }
 
 static void internal_snapshot_abort(BlkTransactionState *common)
@@ -1387,6 +1389,10 @@ static void internal_snapshot_clean(BlkTransactionState *common)
     InternalSnapshotState *state = DO_UPCAST(InternalSnapshotState,
                                              common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 09/13] blockdev: Block device IO during external snapshot transaction
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (7 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7f763d9..923fc90 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1404,6 +1404,7 @@ typedef struct ExternalSnapshotState {
     BlockDriverState *old_bs;
     BlockDriverState *new_bs;
     AioContext *aio_context;
+    Error *blocker;
 } ExternalSnapshotState;
 
 static void external_snapshot_prepare(BlkTransactionState *common,
@@ -1525,6 +1526,9 @@ static void external_snapshot_prepare(BlkTransactionState *common,
     if (ret != 0) {
         error_propagate(errp, local_err);
     }
+
+    error_setg(&state->blocker, "snapshot in progress");
+    bdrv_op_block(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
 }
 
 static void external_snapshot_commit(BlkTransactionState *common)
@@ -1541,8 +1545,6 @@ static void external_snapshot_commit(BlkTransactionState *common)
      * don't want to abort all of them if one of them fails the reopen */
     bdrv_reopen(state->new_bs, state->new_bs->open_flags & ~BDRV_O_RDWR,
                 NULL);
-
-    aio_context_release(state->aio_context);
 }
 
 static void external_snapshot_abort(BlkTransactionState *common)
@@ -1552,6 +1554,17 @@ static void external_snapshot_abort(BlkTransactionState *common)
     if (state->new_bs) {
         bdrv_unref(state->new_bs);
     }
+}
+
+static void external_snapshot_clean(BlkTransactionState *common)
+{
+    ExternalSnapshotState *state =
+                             DO_UPCAST(ExternalSnapshotState, common, common);
+
+    if (state->old_bs) {
+        bdrv_op_unblock(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
@@ -1716,6 +1729,7 @@ static const BdrvActionOps actions[] = {
         .prepare  = external_snapshot_prepare,
         .commit   = external_snapshot_commit,
         .abort = external_snapshot_abort,
+        .clean = external_snapshot_clean,
     },
     [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
         .instance_size = sizeof(DriveBackupState),
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 10/13] blockdev: Block device IO during drive-backup transaction
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (8 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 09/13] blockdev: Block device IO during external " Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 923fc90..ae52d27 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1575,6 +1575,7 @@ typedef struct DriveBackupState {
     BlockDriverState *bs;
     AioContext *aio_context;
     BlockJob *job;
+    Error *blocker;
 } DriveBackupState;
 
 static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1599,6 +1600,9 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
     state->aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "drive-backup in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
     qmp_drive_backup(backup->device, backup->target,
                      backup->has_format, backup->format,
                      backup->sync,
@@ -1613,7 +1617,6 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
         return;
     }
 
-    state->bs = bs;
     state->job = state->bs->job;
 }
 
@@ -1632,6 +1635,10 @@ static void drive_backup_clean(BlkTransactionState *common)
 {
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 11/13] blockdev: Block device IO during blockdev-backup transaction
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (9 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index ae52d27..bd28183 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1649,6 +1649,7 @@ typedef struct BlockdevBackupState {
     BlockDriverState *bs;
     BlockJob *job;
     AioContext *aio_context;
+    Error *blocker;
 } BlockdevBackupState;
 
 static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1685,6 +1686,10 @@ static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
     }
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "blockdev-backup in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+
     qmp_blockdev_backup(backup->device, backup->target,
                         backup->sync,
                         backup->has_speed, backup->speed,
@@ -1696,7 +1701,6 @@ static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
         return;
     }
 
-    state->bs = bs;
     state->job = state->bs->job;
 }
 
@@ -1715,6 +1719,10 @@ static void blockdev_backup_clean(BlkTransactionState *common)
 {
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (10 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
  12 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

We don't want new requests from guest, so block the operation around the
nested poll.

It also avoids looping forever when iothread is submitting a lot of requests.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1ce62c4..b23a83f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -286,12 +286,21 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
+ *
+ * Devices are paused to avoid looping forever because otherwise they could
+ * keep submitting more requests.
  */
 void bdrv_drain(BlockDriverState *bs)
 {
+    Error *blocker = NULL;
+
+    error_setg(&blocker, "bdrv_drain in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
     while (bdrv_drain_one(bs)) {
         /* Keep iterating */
     }
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
+    error_free(blocker);
 }
 
 /*
@@ -303,14 +312,20 @@ void bdrv_drain(BlockDriverState *bs)
  * Note that completion of an asynchronous I/O operation can trigger any
  * number of other I/O operations on other devices---for example a coroutine
  * can be arbitrarily complex and a constant flow of I/O can come until the
- * coroutine is complete.  Because of this, it is not possible to have a
- * function to drain a single device's I/O queue.
+ * coroutine is complete.  Because of this, we must call bdrv_drain_one in a
+ * loop.
+ *
+ * We explicitly pause block jobs and devices to prevent them from submitting
+ * more requests.
  */
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
     bool busy = true;
     BlockDriverState *bs = NULL;
+    Error *blocker = NULL;
+
+    error_setg(&blocker, "bdrv_drain_all in progress");
 
     while ((bs = bdrv_next(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -319,6 +334,7 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_pause(bs->job);
         }
+        bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
         aio_context_release(aio_context);
     }
 
@@ -343,8 +359,10 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_resume(bs->job);
         }
+        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
         aio_context_release(aio_context);
     }
+    error_free(blocker);
 }
 
 /**
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (11 preceding siblings ...)
  2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
@ 2015-05-19 11:49 ` Fam Zheng
  2015-05-19  8:04   ` Paolo Bonzini
  12 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Juan Quintela, Jeff Cody,
	Michael S. Tsirkin, mreitz, Stefan Hajnoczi, Amit Shah,
	Paolo Bonzini

When block job mirror is finished, the source and target is synced. But
we call bdrv_swap() later in main loop bh. If the guest write before
that, target will not get the new data.

Reported-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 58f391a..d96403b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -320,6 +320,8 @@ static void mirror_drain(MirrorBlockJob *s)
 
 typedef struct {
     int ret;
+    /* Use to pause device IO during mirror exit */
+    Error *blocker;
 } MirrorExitData;
 
 static void mirror_exit(BlockJob *job, void *opaque)
@@ -328,6 +330,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
     MirrorExitData *data = opaque;
     AioContext *replace_aio_context = NULL;
 
+    bdrv_op_unblock(s->common.bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
+    error_free(data->blocker);
     if (s->to_replace) {
         replace_aio_context = bdrv_get_aio_context(s->to_replace);
         aio_context_acquire(replace_aio_context);
@@ -563,8 +567,10 @@ immediate_exit:
     bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
     bdrv_iostatus_disable(s->target);
 
-    data = g_malloc(sizeof(*data));
+    data = g_malloc0(sizeof(*data));
     data->ret = ret;
+    error_setg(&data->blocker, "mirror job exiting");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
     block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }
 
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19  8:04   ` Paolo Bonzini
@ 2015-05-19 16:48     ` Fam Zheng
  2015-05-19  8:49       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 16:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-block, Michael S. Tsirkin, Juan Quintela, qemu-devel,
	mreitz, Stefan Hajnoczi, Amit Shah

On Tue, 05/19 10:04, Paolo Bonzini wrote:
> On 19/05/2015 13:49, Fam Zheng wrote:
> > When block job mirror is finished, the source and target is synced. But
> > we call bdrv_swap() later in main loop bh. If the guest write before
> > that, target will not get the new data.
> 
> This is too late.  As a rule, the blocker must be established before
> calling bdrv_drain, and removed on the next yield (in this case, before
> the assignment to last_pause_ns).

I don't understand. If the blocker is removed before mirror_run returns,
wouldn't more device IO already hit source image by the time mirror_exit runs?

Fam

> 
> Paolo
> 
> > Reported-by: Wen Congyang <wency@cn.fujitsu.com>
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block/mirror.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 58f391a..d96403b 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -320,6 +320,8 @@ static void mirror_drain(MirrorBlockJob *s)
> >  
> >  typedef struct {
> >      int ret;
> > +    /* Use to pause device IO during mirror exit */
> > +    Error *blocker;
> >  } MirrorExitData;
> >  
> >  static void mirror_exit(BlockJob *job, void *opaque)
> > @@ -328,6 +330,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
> >      MirrorExitData *data = opaque;
> >      AioContext *replace_aio_context = NULL;
> >  
> > +    bdrv_op_unblock(s->common.bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
> > +    error_free(data->blocker);
> >      if (s->to_replace) {
> >          replace_aio_context = bdrv_get_aio_context(s->to_replace);
> >          aio_context_acquire(replace_aio_context);
> > @@ -563,8 +567,10 @@ immediate_exit:
> >      bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
> >      bdrv_iostatus_disable(s->target);
> >  
> > -    data = g_malloc(sizeof(*data));
> > +    data = g_malloc0(sizeof(*data));
> >      data->ret = ret;
> > +    error_setg(&data->blocker, "mirror job exiting");
> > +    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
> >      block_job_defer_to_main_loop(&s->common, mirror_exit, data);
> >  }
> >  
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19  8:49       ` Paolo Bonzini
@ 2015-05-19 18:37         ` Fam Zheng
  2015-05-19 10:57           ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-05-19 18:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-block, Juan Quintela, Michael S. Tsirkin, qemu-devel,
	mreitz, Stefan Hajnoczi, Amit Shah

On Tue, 05/19 10:49, Paolo Bonzini wrote:
> 
> 
> On 19/05/2015 18:48, Fam Zheng wrote:
> >> > 
> >> > This is too late.  As a rule, the blocker must be established before
> >> > calling bdrv_drain, and removed on the next yield (in this case, before
> >> > the assignment to last_pause_ns).
> > I don't understand. If the blocker is removed before mirror_run returns,
> > wouldn't more device IO already hit source image by the time mirror_exit runs?
> 
> If you go to mirror_exit, you won't reach the assignment (so you have to
> remove the blocker in mirror_exit too).
> 
> But if you don't go to mirror_exit because cnt != 0, you must remove the
> blocker before the next I/O.
> 

OK, but I'm still not clear how is it too late in this patch? Although the
blocker is set after bdrv_drain, we know there is no dirty data because cnt is
0, and we'll be holding a blocker when releasing the AioContext, no new IO is
allowed.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-19 10:57           ` Paolo Bonzini
@ 2015-05-20  2:23             ` Fam Zheng
  0 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-05-20  2:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-block, Juan Quintela, Michael S. Tsirkin, qemu-devel,
	mreitz, Stefan Hajnoczi, Amit Shah

On Tue, 05/19 12:57, Paolo Bonzini wrote:
> 
> 
> On 19/05/2015 20:37, Fam Zheng wrote:
> > On Tue, 05/19 10:49, Paolo Bonzini wrote:
> >>
> >>
> >> On 19/05/2015 18:48, Fam Zheng wrote:
> >>>>>
> >>>>> This is too late.  As a rule, the blocker must be established before
> >>>>> calling bdrv_drain, and removed on the next yield (in this case, before
> >>>>> the assignment to last_pause_ns).
> >>> I don't understand. If the blocker is removed before mirror_run returns,
> >>> wouldn't more device IO already hit source image by the time mirror_exit runs?
> >>
> >> If you go to mirror_exit, you won't reach the assignment (so you have to
> >> remove the blocker in mirror_exit too).
> >>
> >> But if you don't go to mirror_exit because cnt != 0, you must remove the
> >> blocker before the next I/O.
> >>
> > 
> > OK, but I'm still not clear how is it too late in this patch? Although the
> > blocker is set after bdrv_drain, we know there is no dirty data because cnt is
> > 0, and we'll be holding a blocker when releasing the AioContext, no new IO is
> > allowed.
> 
> So you rely on the caller of mirror_run not calling aio_context_release
> between bdrv_drain and block_job_defer_to_main_loop?  That indeed should
> work, but why not stick to a common pattern of blocking I/O before
> bdrv_drain?  That's how bdrv_drain is almost always used in the code, so
> it's a safe thing to do and the preemption points are then documented
> more clearly.
> 
> I think it would be nice to have all bdrv_drain calls:
> 
> - either preceded by a device I/O blocker
> 
> - or preceded by a comment explaining why the call is there and why it
> doesn't need the blocker
> 
> This is not a NACK, but I would like to understand the disadvantages of
> what I am suggesting here.

That makes sense now, I was just trying to get your idea. Of course the patten
as you suggested is more advanced!

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-05-20  2:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-19 11:49 [Qemu-devel] [PATCH v4 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 01/13] block: Add op blocker type "device IO" Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 02/13] block: Add op blocker notifier list Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 09/13] blockdev: Block device IO during external " Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
2015-05-19 11:49 ` [Qemu-devel] [PATCH v4 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
2015-05-19  8:04   ` Paolo Bonzini
2015-05-19 16:48     ` Fam Zheng
2015-05-19  8:49       ` Paolo Bonzini
2015-05-19 18:37         ` Fam Zheng
2015-05-19 10:57           ` Paolo Bonzini
2015-05-20  2:23             ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.