All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export
@ 2015-05-21  6:42 Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
                   ` (12 more replies)
  0 siblings, 13 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

v6: Patch 13: unset block after bdrv_swap().

Reported by Paolo.

Unlike the iohandler in main loop, iothreads currently process the event
notifier used by virtio-blk ioeventfd in nested aio_poll. This is dangerous
without proper protection, because guest requests could sneak to block layer
where they mustn't.

For example, a QMP transaction may involve multiple bdrv_drain_all() in
handling the list of AioContext it works on. If an aio_poll in one of the
bdrv_drain_all() happens to process a guest VQ kick, and dispatches the
ioeventfd event to virtio-blk, a new guest write is then submitted, and voila,
the transaction semantics is violated.

This series avoids this problem by disabling virtio-blk handlers during
bdrv_drain_all() and transactions.

- Patches 1~3 add the block layer op blocker change notifier code.
- Patches 4,5 secure virtio-blk dataplane.
- Patch 6 protects virtio-scsi dataplane.
- Patch 7 secures nbd export.
- Patch 8~11 protect each transaction type from being voilated by new IO
  generated in nested aio_poll.
- Patch 12 protects bdrv_drain and bdrv_drain_all.
- Patch 13 protects mirror complete.


Fam Zheng (13):
  block: Add op blocker type "device IO"
  block: Add op blocker notifier list
  block-backend: Add blk_op_blocker_add_notifier
  virtio-blk: Move complete_request to 'ops' structure
  virtio-blk: Don't handle output when there is "device IO" op blocker
  virtio-scsi-dataplane: Add "device IO" op blocker listener
  nbd-server: Clear "can_read" when "device io" blocker is set
  blockdev: Block device IO during internal snapshot transaction
  blockdev: Block device IO during external snapshot transaction
  blockdev: Block device IO during drive-backup transaction
  blockdev: Block device IO during blockdev-backup transaction
  block: Block "device IO" during bdrv_drain and bdrv_drain_all
  block/mirror: Block "device IO" during mirror exit

 block.c                         | 35 ++++++++++++++++++
 block/block-backend.c           |  6 +++
 block/io.c                      | 22 ++++++++++-
 block/mirror.c                  |  9 ++++-
 blockdev.c                      | 49 ++++++++++++++++++++----
 blockjob.c                      |  1 +
 hw/block/dataplane/virtio-blk.c | 37 ++++++++++++++++---
 hw/block/virtio-blk.c           | 65 ++++++++++++++++++++++++++++++--
 hw/scsi/virtio-scsi-dataplane.c | 82 +++++++++++++++++++++++++++++++----------
 hw/scsi/virtio-scsi.c           |  4 ++
 include/block/block.h           |  9 +++++
 include/block/block_int.h       |  3 ++
 include/hw/virtio/virtio-blk.h  | 17 +++++++--
 include/hw/virtio/virtio-scsi.h |  3 ++
 include/sysemu/block-backend.h  |  2 +
 migration/block.c               |  1 +
 nbd.c                           | 24 ++++++++++++
 17 files changed, 328 insertions(+), 41 deletions(-)

-- 
2.4.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-21  7:06   ` Wen Congyang
                     ` (2 more replies)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 02/13] block: Add op blocker notifier list Fam Zheng
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

It blocks device IO.

All bdrv_op_block_all/blk_op_block_all callers are taken care of:

- virtio_blk_data_plane_create
- virtio_scsi_hotplug

  Device creation, unblock it.

- bdrv_set_backing_hd

  Backing hd is not used by device, so blocking is OK.

- backup_start

  Blocking target when backup is running, unblock it.

- mirror_complete

  Blocking s->to_replace until mirror_exit, OK.

- block_job_complete

  The block job may be long running. Unblock it.

- init_blk_migration

  The block migration may be long running, Unblock it.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockjob.c                      | 1 +
 hw/block/dataplane/virtio-blk.c | 1 +
 hw/scsi/virtio-scsi.c           | 1 +
 include/block/block.h           | 1 +
 migration/block.c               | 1 +
 5 files changed, 5 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 2755465..e39bdde 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -51,6 +51,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
                BlockJobType_lookup[driver->job_type]);
     bdrv_op_block_all(bs, job->blocker);
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, job->blocker);
 
     job->driver        = driver;
     job->bs            = bs;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3db139b..3ecc8bd 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -209,6 +209,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_MIRROR, s->blocker);
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_STREAM, s->blocker);
     blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_REPLACE, s->blocker);
+    blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
 
     *dataplane = s;
 }
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index e242fef..5e15fa6 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -772,6 +772,7 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
             return;
         }
         blk_op_block_all(sd->conf.blk, s->blocker);
+        blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
         aio_context_acquire(s->ctx);
         blk_set_aio_context(sd->conf.blk, s->ctx);
         aio_context_release(s->ctx);
diff --git a/include/block/block.h b/include/block/block.h
index 7d1a717..906fb31 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -159,6 +159,7 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_RESIZE,
     BLOCK_OP_TYPE_STREAM,
     BLOCK_OP_TYPE_REPLACE,
+    BLOCK_OP_TYPE_DEVICE_IO,
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
diff --git a/migration/block.c b/migration/block.c
index ddb59cc..b833bac 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -379,6 +379,7 @@ static void init_blk_migration(QEMUFile *f)
         alloc_aio_bitmap(bmds);
         error_setg(&bmds->blocker, "block device is in use by migration");
         bdrv_op_block_all(bs, bmds->blocker);
+        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, bmds->blocker);
         bdrv_ref(bs);
 
         block_mig_state.total_sector_sum += sectors;
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 02/13] block: Add op blocker notifier list
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

BDS users can register a notifier and get notified about op blocker
changes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 35 +++++++++++++++++++++++++++++++++++
 include/block/block.h     |  8 ++++++++
 include/block/block_int.h |  3 +++
 3 files changed, 46 insertions(+)

diff --git a/block.c b/block.c
index 7904098..9714603 100644
--- a/block.c
+++ b/block.c
@@ -3375,6 +3375,19 @@ struct BdrvOpBlocker {
     QLIST_ENTRY(BdrvOpBlocker) list;
 };
 
+/* Add a notifier which will be notified when the first blocker of some type is
+ * added to bs, or when the last blocker is removed. The notifier handler
+ * should receive a BlockOpEvent data.
+ *
+ * Caller must hold bs->aio_context; the notifier handler is also called with
+ * it held.
+ */
+void bdrv_op_blocker_add_notifier(BlockDriverState *bs,
+                                  Notifier *notifier)
+{
+    notifier_list_add(&bs->op_blocker_notifiers, notifier);
+}
+
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
 {
     BdrvOpBlocker *blocker;
@@ -3391,26 +3404,48 @@ bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
     return false;
 }
 
+static void bdrv_op_blocker_notify(BlockDriverState *bs, BlockOpType op,
+                                   Error *reason, bool blocking)
+{
+    BlockOpEvent event = (BlockOpEvent) {
+        op = op,
+        reason = reason,
+        blocking = blocking,
+    };
+
+    notifier_list_notify(&bs->op_blocker_notifiers, &event);
+}
+
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason)
 {
+    bool blocked;
     BdrvOpBlocker *blocker;
     assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
 
     blocker = g_new0(BdrvOpBlocker, 1);
     blocker->reason = reason;
+    blocked = !QLIST_EMPTY(&bs->op_blockers[op]);
     QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list);
+    if (!blocked) {
+        bdrv_op_blocker_notify(bs, op, reason, true);
+    }
 }
 
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason)
 {
+    bool blocked;
     BdrvOpBlocker *blocker, *next;
     assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
+    blocked = !QLIST_EMPTY(&bs->op_blockers[op]);
     QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
         if (blocker->reason == reason) {
             QLIST_REMOVE(blocker, list);
             g_free(blocker);
         }
     }
+    if (blocked && QLIST_EMPTY(&bs->op_blockers[op])) {
+        bdrv_op_blocker_notify(bs, op, reason, false);
+    }
 }
 
 void bdrv_op_block_all(BlockDriverState *bs, Error *reason)
diff --git a/include/block/block.h b/include/block/block.h
index 906fb31..3420b2c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -163,6 +163,12 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
+typedef struct {
+    BlockOpType type;
+    Error *reason;
+    bool blocking;
+} BlockOpEvent;
+
 void bdrv_iostatus_enable(BlockDriverState *bs);
 void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
@@ -491,6 +497,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_ref(BlockDriverState *bs);
 void bdrv_unref(BlockDriverState *bs);
 
+void bdrv_op_blocker_add_notifier(BlockDriverState *bs,
+                                  Notifier *notifier);
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index db29b74..29d1c84 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -418,6 +418,9 @@ struct BlockDriverState {
     /* operation blockers */
     QLIST_HEAD(, BdrvOpBlocker) op_blockers[BLOCK_OP_TYPE_MAX];
 
+    /* Callback when one list in op_blockers has "empty" status change. */
+    NotifierList op_blocker_notifiers;
+
     /* long-running background operation */
     BlockJob *job;
 
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 03/13] block-backend: Add blk_op_blocker_add_notifier
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 02/13] block: Add op blocker notifier list Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Forward the call to bdrv_op_blocker_add_notifier.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/block-backend.c          | 6 ++++++
 include/sysemu/block-backend.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 93e46f3..16efe60 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -802,6 +802,12 @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason)
     bdrv_op_unblock_all(blk->bs, reason);
 }
 
+void blk_op_blocker_add_notifier(BlockBackend *blk,
+                                 Notifier *notifier)
+{
+    bdrv_op_blocker_add_notifier(blk->bs, notifier);
+}
+
 AioContext *blk_get_aio_context(BlockBackend *blk)
 {
     return bdrv_get_aio_context(blk->bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index b4a4d5e..cde9651 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -136,6 +136,8 @@ int blk_get_flags(BlockBackend *blk);
 int blk_get_max_transfer_length(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
+void blk_op_blocker_add_notifier(BlockBackend *blk,
+                                 Notifier *notifier);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
 void blk_op_unblock(BlockBackend *blk, BlockOpType op, Error *reason);
 void blk_op_block_all(BlockBackend *blk, Error *reason);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 04/13] virtio-blk: Move complete_request to 'ops' structure
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (2 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Should more ops be added to differentiate code between dataplane and
non-dataplane, the new saved_ops approach will be cleaner than messing
with N pointers.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 13 ++++++++-----
 hw/block/virtio-blk.c           |  8 ++++++--
 include/hw/virtio/virtio-blk.h  |  9 +++++++--
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3ecc8bd..e287e09 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -51,8 +51,7 @@ struct VirtIOBlockDataPlane {
 
     /* Operation blocker on BDS */
     Error *blocker;
-    void (*saved_complete_request)(struct VirtIOBlockReq *req,
-                                   unsigned char status);
+    const VirtIOBlockOps *saved_ops;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -88,6 +87,10 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
     qemu_bh_schedule(s->bh);
 }
 
+static const VirtIOBlockOps virtio_blk_data_plane_ops = {
+    .complete_request = complete_request_vring,
+};
+
 static void handle_notify(EventNotifier *e)
 {
     VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
@@ -270,8 +273,8 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     }
     s->host_notifier = *virtio_queue_get_host_notifier(vq);
 
-    s->saved_complete_request = vblk->complete_request;
-    vblk->complete_request = complete_request_vring;
+    s->saved_ops = vblk->ops;
+    vblk->ops = &virtio_blk_data_plane_ops;
 
     s->starting = false;
     s->started = true;
@@ -314,7 +317,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
         return;
     }
     s->stopping = true;
-    vblk->complete_request = s->saved_complete_request;
+    vblk->ops = s->saved_ops;
     trace_virtio_blk_data_plane_stop(s);
 
     aio_context_acquire(s->ctx);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e6afe97..f4a9d19 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -59,9 +59,13 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req,
     virtio_notify(vdev, s->vq);
 }
 
+static const VirtIOBlockOps virtio_blk_ops = (VirtIOBlockOps) {
+    .complete_request = virtio_blk_complete_request,
+};
+
 static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
-    req->dev->complete_request(req, status);
+    req->dev->ops->complete_request(req, status);
 }
 
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
@@ -905,7 +909,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
     s->vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
-    s->complete_request = virtio_blk_complete_request;
+    s->ops = &virtio_blk_ops;
     virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
     if (err != NULL) {
         error_propagate(errp, err);
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 6bf5905..28b3436 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -44,6 +44,12 @@ struct VirtIOBlkConf
 struct VirtIOBlockDataPlane;
 
 struct VirtIOBlockReq;
+
+typedef struct {
+    /* Function to push to vq and notify guest */
+    void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+} VirtIOBlockOps;
+
 typedef struct VirtIOBlock {
     VirtIODevice parent_obj;
     BlockBackend *blk;
@@ -54,8 +60,7 @@ typedef struct VirtIOBlock {
     unsigned short sector_mask;
     bool original_wce;
     VMChangeStateEntry *change;
-    /* Function to push to vq and notify guest */
-    void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+    const VirtIOBlockOps *ops;
     Notifier migration_state_notifier;
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (3 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-23 16:53   ` Max Reitz
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

virtio-blk now listens to op blocker change of the associated block
backend.

Up on setting op blocker on BLOCK_OP_TYPE_DEVICE_IO:

  non-dataplane:
   1) Set VirtIOBlock.paused
   2) In virtio_blk_handle_output, do nothing if VirtIOBlock.paused

  dataplane:
   1) Clear the host event notifier
   2) In handle_notify, do nothing if VirtIOBlock.paused

Up on removing the op blocker:

  non-dataplane:
   1) Clear VirtIOBlock.paused
   2) Schedule a BH on the AioContext of the backend, which calls
   virtio_blk_handle_output, so that the previous unhandled kicks can
   make progress

  dataplane:
   1) Set the host event notifier
   2) Notify the host event notifier so that unhandled events are
   processed

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 25 ++++++++++++++++-
 hw/block/virtio-blk.c           | 59 +++++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-blk.h  |  8 +++++-
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e287e09..a5e8e35 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -87,8 +87,28 @@ static void complete_request_vring(VirtIOBlockReq *req, unsigned char status)
     qemu_bh_schedule(s->bh);
 }
 
+static void virtio_blk_data_plane_pause(VirtIOBlock *vblk)
+{
+    VirtIOBlockDataPlane *s = vblk->dataplane;
+
+    event_notifier_test_and_clear(&s->host_notifier);
+    aio_set_event_notifier(s->ctx, &s->host_notifier, NULL);
+}
+
+static void handle_notify(EventNotifier *e);
+static void virtio_blk_data_plane_resume(VirtIOBlock *vblk)
+{
+    VirtIOBlockDataPlane *s = vblk->dataplane;
+
+    aio_set_event_notifier(s->ctx, &s->host_notifier, handle_notify);
+
+    event_notifier_set(&s->host_notifier);
+}
+
 static const VirtIOBlockOps virtio_blk_data_plane_ops = {
-    .complete_request = complete_request_vring,
+    .complete_request           = complete_request_vring,
+    .pause                      = virtio_blk_data_plane_pause,
+    .resume                     = virtio_blk_data_plane_resume,
 };
 
 static void handle_notify(EventNotifier *e)
@@ -98,6 +118,9 @@ static void handle_notify(EventNotifier *e)
     VirtIOBlock *vblk = VIRTIO_BLK(s->vdev);
 
     event_notifier_test_and_clear(&s->host_notifier);
+    if (vblk->paused) {
+        return;
+    }
     blk_io_plug(s->conf->conf.blk);
     for (;;) {
         MultiReqBuffer mrb = {};
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f4a9d19..b4b19b5 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -59,8 +59,38 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req,
     virtio_notify(vdev, s->vq);
 }
 
+typedef struct {
+    QEMUBH *bh;
+    VirtIOBlock *s;
+} VirtIOBlockResumeData;
+
+static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq);
+static void virtio_blk_resume_bh_cb(void *opaque)
+{
+    VirtIOBlockResumeData *data = opaque;
+    qemu_bh_delete(data->bh);
+    virtio_blk_handle_output(VIRTIO_DEVICE(data->s), data->s->vq);
+}
+
+static void virtio_blk_pause(VirtIOBlock *vblk)
+{
+    /* TODO: stop ioeventfd */
+}
+
+static void virtio_blk_resume(VirtIOBlock *vblk)
+{
+    VirtIOBlockResumeData *data = g_new(VirtIOBlockResumeData, 1);
+    data->bh = aio_bh_new(blk_get_aio_context(vblk->blk),
+            virtio_blk_resume_bh_cb, data);
+    data->s = vblk;
+    data->s->paused = false;
+    qemu_bh_schedule(data->bh);
+}
+
 static const VirtIOBlockOps virtio_blk_ops = (VirtIOBlockOps) {
-    .complete_request = virtio_blk_complete_request,
+    .complete_request           = virtio_blk_complete_request,
+    .pause                      = virtio_blk_pause,
+    .resume                     = virtio_blk_resume,
 };
 
 static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
@@ -597,6 +627,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     VirtIOBlockReq *req;
     MultiReqBuffer mrb = {};
 
+    if (s->paused) {
+        return;
+    }
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * dataplane here instead of waiting for .set_status().
      */
@@ -787,7 +820,7 @@ static void virtio_blk_save(QEMUFile *f, void *opaque)
 
     virtio_save(vdev, f);
 }
-    
+
 static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
@@ -875,6 +908,25 @@ static void virtio_blk_migration_state_changed(Notifier *notifier, void *data)
     }
 }
 
+static void virtio_blk_op_blocker_changed(Notifier *notifier, void *opaque)
+{
+    BlockOpEvent *event = opaque;
+    VirtIOBlock *s = container_of(notifier, VirtIOBlock,
+                                  op_blocker_notifier);
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    if (event->blocking == s->paused) {
+        return;
+    }
+    if (event->blocking) {
+        s->ops->pause(s);
+    } else {
+        s->ops->resume(s);
+    }
+}
+
 static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -926,6 +978,9 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
     blk_iostatus_enable(s->blk);
+
+    s->op_blocker_notifier.notify = virtio_blk_op_blocker_changed;
+    blk_op_blocker_add_notifier(s->blk, &s->op_blocker_notifier);
 }
 
 static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 28b3436..aa15fea 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -42,12 +42,16 @@ struct VirtIOBlkConf
 };
 
 struct VirtIOBlockDataPlane;
-
+struct VirtIOBlock;
 struct VirtIOBlockReq;
 
 typedef struct {
     /* Function to push to vq and notify guest */
     void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+
+    /* Functions to pause/resume request handling */
+    void (*pause)(struct VirtIOBlock *vblk);
+    void (*resume)(struct VirtIOBlock *vblk);
 } VirtIOBlockOps;
 
 typedef struct VirtIOBlock {
@@ -62,6 +66,8 @@ typedef struct VirtIOBlock {
     VMChangeStateEntry *change;
     const VirtIOBlockOps *ops;
     Notifier migration_state_notifier;
+    Notifier op_blocker_notifier;
+    bool paused;
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (4 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-23 16:53   ` Max Reitz
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

When a disk is attached to scsi-bus, virtio_scsi_hotplug will take care
of protecting the block device with op blockers. Currently we haven't
enabled block jobs (like what's done in virtio_blk_data_plane_create),
but it is necessary to honor "device IO" op blocker first before we do.
This is useful to make sure that guest IO requests are paused during qmp
transactions (such as multi-disk snapshot or backup).

A counter is added to the virtio-scsi device, which keeps track of
currently blocked disks. If it goes from 0 to 1, the ioeventfds are
disabled; when it goes back to 0, they are re-enabled.

Also in device initialization, push the enabling of ioeventfds to before
return, so the virtio_scsi_clear_aio is not needed there. Rename it,
pair with an enabling variant, fix one coding style issue, then use it
in the device pause points.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/scsi/virtio-scsi-dataplane.c | 82 +++++++++++++++++++++++++++++++----------
 hw/scsi/virtio-scsi.c           |  3 ++
 include/hw/virtio/virtio-scsi.h |  3 ++
 3 files changed, 68 insertions(+), 20 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 5575648..e220c12 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -40,7 +40,6 @@ void virtio_scsi_set_iothread(VirtIOSCSI *s, IOThread *iothread)
 
 static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
                                                VirtQueue *vq,
-                                               EventNotifierHandler *handler,
                                                int n)
 {
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
@@ -60,7 +59,6 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
     r = g_slice_new(VirtIOSCSIVring);
     r->host_notifier = *virtio_queue_get_host_notifier(vq);
     r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
-    aio_set_event_notifier(s->ctx, &r->host_notifier, handler);
 
     r->parent = s;
 
@@ -71,7 +69,6 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
     return r;
 
 fail_vring:
-    aio_set_event_notifier(s->ctx, &r->host_notifier, NULL);
     k->set_host_notifier(qbus->parent, n, false);
     g_slice_free(VirtIOSCSIVring, r);
     return NULL;
@@ -104,6 +101,9 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
     }
 }
 
+static void virtio_scsi_start_ioeventfd(VirtIOSCSI *s);
+static void virtio_scsi_stop_ioeventfd(VirtIOSCSI *s);
+
 static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
 {
     VirtIOSCSIVring *vring = container_of(notifier,
@@ -111,6 +111,7 @@ static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
     VirtIOSCSI *s = VIRTIO_SCSI(vring->parent);
     VirtIOSCSIReq *req;
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
     while ((req = virtio_scsi_pop_req_vring(s, vring))) {
         virtio_scsi_handle_ctrl_req(s, req);
@@ -124,6 +125,7 @@ static void virtio_scsi_iothread_handle_event(EventNotifier *notifier)
     VirtIOSCSI *s = vring->parent;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
 
     if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -143,6 +145,7 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier *notifier)
     VirtIOSCSIReq *req, *next;
     QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
+    assert(!s->pause_counter);
     event_notifier_test_and_clear(notifier);
     while ((req = virtio_scsi_pop_req_vring(s, vring))) {
         if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
@@ -155,8 +158,56 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier *notifier)
     }
 }
 
+void virtio_scsi_dataplane_blocker_notify(Notifier *notifier,
+                                          void *data)
+{
+    VirtIOSCSI *s = container_of(notifier, VirtIOSCSI, blocker_notifier);
+    BlockOpEvent *event = data;
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    if (event->blocking) {
+        s->pause_counter++;
+        if (s->pause_counter == 1) {
+            virtio_scsi_stop_ioeventfd(s);
+        }
+    } else {
+        s->pause_counter--;
+        if (s->pause_counter == 0) {
+            virtio_scsi_start_ioeventfd(s);
+        }
+    }
+    assert(s->pause_counter >= 0);
+}
+
 /* assumes s->ctx held */
-static void virtio_scsi_clear_aio(VirtIOSCSI *s)
+static void virtio_scsi_start_ioeventfd(VirtIOSCSI *s)
+{
+    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+    int i;
+
+    if (!s->dataplane_started || s->dataplane_stopping) {
+        return;
+    }
+    if (s->ctrl_vring) {
+        aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier,
+                               virtio_scsi_iothread_handle_ctrl);
+    }
+    if (s->event_vring) {
+        aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier,
+                               virtio_scsi_iothread_handle_event);
+    }
+    if (s->cmd_vrings) {
+        for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
+            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
+                                   virtio_scsi_iothread_handle_cmd);
+        }
+    }
+}
+
+/* assumes s->ctx held */
+static void virtio_scsi_stop_ioeventfd(VirtIOSCSI *s)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
     int i;
@@ -169,7 +220,8 @@ static void virtio_scsi_clear_aio(VirtIOSCSI *s)
     }
     if (s->cmd_vrings) {
         for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
-            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier, NULL);
+            aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier,
+                                   NULL);
         }
     }
 }
@@ -229,24 +281,18 @@ void virtio_scsi_dataplane_start(VirtIOSCSI *s)
     }
 
     aio_context_acquire(s->ctx);
-    s->ctrl_vring = virtio_scsi_vring_init(s, vs->ctrl_vq,
-                                           virtio_scsi_iothread_handle_ctrl,
-                                           0);
+    s->ctrl_vring = virtio_scsi_vring_init(s, vs->ctrl_vq, 0);
     if (!s->ctrl_vring) {
         goto fail_vrings;
     }
-    s->event_vring = virtio_scsi_vring_init(s, vs->event_vq,
-                                            virtio_scsi_iothread_handle_event,
-                                            1);
+    s->event_vring = virtio_scsi_vring_init(s, vs->event_vq, 1);
     if (!s->event_vring) {
         goto fail_vrings;
     }
     s->cmd_vrings = g_new(VirtIOSCSIVring *, vs->conf.num_queues);
     for (i = 0; i < vs->conf.num_queues; i++) {
         s->cmd_vrings[i] =
-            virtio_scsi_vring_init(s, vs->cmd_vqs[i],
-                                   virtio_scsi_iothread_handle_cmd,
-                                   i + 2);
+            virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2);
         if (!s->cmd_vrings[i]) {
             goto fail_vrings;
         }
@@ -254,11 +300,11 @@ void virtio_scsi_dataplane_start(VirtIOSCSI *s)
 
     s->dataplane_starting = false;
     s->dataplane_started = true;
+    virtio_scsi_start_ioeventfd(s);
     aio_context_release(s->ctx);
     return;
 
 fail_vrings:
-    virtio_scsi_clear_aio(s);
     aio_context_release(s->ctx);
     virtio_scsi_vring_teardown(s);
     for (i = 0; i < vs->conf.num_queues + 2; i++) {
@@ -290,11 +336,7 @@ void virtio_scsi_dataplane_stop(VirtIOSCSI *s)
 
     aio_context_acquire(s->ctx);
 
-    aio_set_event_notifier(s->ctx, &s->ctrl_vring->host_notifier, NULL);
-    aio_set_event_notifier(s->ctx, &s->event_vring->host_notifier, NULL);
-    for (i = 0; i < vs->conf.num_queues; i++) {
-        aio_set_event_notifier(s->ctx, &s->cmd_vrings[i]->host_notifier, NULL);
-    }
+    virtio_scsi_stop_ioeventfd(s);
 
     blk_drain_all(); /* ensure there are no in-flight requests */
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 5e15fa6..16c253f 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -775,6 +775,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
         aio_context_acquire(s->ctx);
         blk_set_aio_context(sd->conf.blk, s->ctx);
+        s->blocker_notifier.notify = virtio_scsi_dataplane_blocker_notify;
+        blk_op_blocker_add_notifier(sd->conf.blk, &s->blocker_notifier);
         aio_context_release(s->ctx);
     }
 
@@ -799,6 +801,7 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     }
 
     if (s->ctx) {
+        notifier_remove(&s->blocker_notifier);
         blk_op_unblock_all(sd->conf.blk, s->blocker);
     }
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index b42e7f1..f8591fd 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -97,6 +97,8 @@ typedef struct VirtIOSCSI {
     bool dataplane_disabled;
     bool dataplane_fenced;
     Error *blocker;
+    Notifier blocker_notifier;
+    int pause_counter;
     Notifier migration_state_notifier;
     uint32_t host_features;
 } VirtIOSCSI;
@@ -170,6 +172,7 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
                             uint32_t event, uint32_t reason);
 
 void virtio_scsi_set_iothread(VirtIOSCSI *s, IOThread *iothread);
+void virtio_scsi_dataplane_blocker_notify(Notifier *notifier, void *data);
 void virtio_scsi_dataplane_start(VirtIOSCSI *s);
 void virtio_scsi_dataplane_stop(VirtIOSCSI *s);
 void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (5 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-23 16:54   ` Max Reitz
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

So that NBD export cannot submit IO during bdrv_drain_all().

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 nbd.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/nbd.c b/nbd.c
index 06b501b..b5026af 100644
--- a/nbd.c
+++ b/nbd.c
@@ -160,6 +160,8 @@ struct NBDExport {
     uint32_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;
+    Notifier blocker_notifier;
+    bool io_blocked;
 
     AioContext *ctx;
 };
@@ -1053,6 +1055,22 @@ static void blk_aio_detach(void *opaque)
     exp->ctx = NULL;
 }
 
+static void nbd_op_blocker_changed(Notifier *notifier, void *data)
+{
+    BlockOpEvent *event = data;
+    NBDClient *client;
+    NBDExport *exp = container_of(notifier, NBDExport, blocker_notifier);
+
+    if (event->type != BLOCK_OP_TYPE_DEVICE_IO) {
+        return;
+    }
+    exp->io_blocked = event->blocking;
+
+    QTAILQ_FOREACH(client, &exp->clients, next) {
+        nbd_update_can_read(client);
+    }
+}
+
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
                           uint32_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
@@ -1081,6 +1099,8 @@ NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
      * access since the export could be available before migration handover.
      */
     blk_invalidate_cache(blk, NULL);
+    exp->blocker_notifier.notify = nbd_op_blocker_changed;
+    blk_op_blocker_add_notifier(blk, &exp->blocker_notifier);
     return exp;
 
 fail:
@@ -1132,6 +1152,7 @@ void nbd_export_close(NBDExport *exp)
     nbd_export_set_name(exp, NULL);
     nbd_export_put(exp);
     if (exp->blk) {
+        notifier_remove(&exp->blocker_notifier);
         blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                         blk_aio_detach, exp);
         blk_unref(exp->blk);
@@ -1455,6 +1476,9 @@ static void nbd_update_can_read(NBDClient *client)
     bool can_read = client->recv_coroutine ||
                     client->nb_requests < MAX_NBD_REQUESTS;
 
+    if (client->exp && client->exp->io_blocked) {
+        can_read = false;
+    }
     if (can_read != client->can_read) {
         client->can_read = can_read;
         nbd_set_handlers(client);
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (6 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-23 16:56   ` Max Reitz
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external " Fam Zheng
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..7f763d9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1262,6 +1262,7 @@ typedef struct InternalSnapshotState {
     BlockDriverState *bs;
     AioContext *aio_context;
     QEMUSnapshotInfo sn;
+    Error *blocker;
 } InternalSnapshotState;
 
 static void internal_snapshot_prepare(BlkTransactionState *common,
@@ -1300,6 +1301,10 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
     state->aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "internal snapshot in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+
     if (!bdrv_is_inserted(bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
         return;
@@ -1354,9 +1359,6 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
                          name, device);
         return;
     }
-
-    /* 4. succeed, mark a snapshot is created */
-    state->bs = bs;
 }
 
 static void internal_snapshot_abort(BlkTransactionState *common)
@@ -1387,6 +1389,10 @@ static void internal_snapshot_clean(BlkTransactionState *common)
     InternalSnapshotState *state = DO_UPCAST(InternalSnapshotState,
                                              common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external snapshot transaction
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (7 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
@ 2015-05-21  6:42 ` Fam Zheng
  2015-05-23 16:58   ` Max Reitz
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7f763d9..923fc90 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1404,6 +1404,7 @@ typedef struct ExternalSnapshotState {
     BlockDriverState *old_bs;
     BlockDriverState *new_bs;
     AioContext *aio_context;
+    Error *blocker;
 } ExternalSnapshotState;
 
 static void external_snapshot_prepare(BlkTransactionState *common,
@@ -1525,6 +1526,9 @@ static void external_snapshot_prepare(BlkTransactionState *common,
     if (ret != 0) {
         error_propagate(errp, local_err);
     }
+
+    error_setg(&state->blocker, "snapshot in progress");
+    bdrv_op_block(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
 }
 
 static void external_snapshot_commit(BlkTransactionState *common)
@@ -1541,8 +1545,6 @@ static void external_snapshot_commit(BlkTransactionState *common)
      * don't want to abort all of them if one of them fails the reopen */
     bdrv_reopen(state->new_bs, state->new_bs->open_flags & ~BDRV_O_RDWR,
                 NULL);
-
-    aio_context_release(state->aio_context);
 }
 
 static void external_snapshot_abort(BlkTransactionState *common)
@@ -1552,6 +1554,17 @@ static void external_snapshot_abort(BlkTransactionState *common)
     if (state->new_bs) {
         bdrv_unref(state->new_bs);
     }
+}
+
+static void external_snapshot_clean(BlkTransactionState *common)
+{
+    ExternalSnapshotState *state =
+                             DO_UPCAST(ExternalSnapshotState, common, common);
+
+    if (state->old_bs) {
+        bdrv_op_unblock(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
@@ -1716,6 +1729,7 @@ static const BdrvActionOps actions[] = {
         .prepare  = external_snapshot_prepare,
         .commit   = external_snapshot_commit,
         .abort = external_snapshot_abort,
+        .clean = external_snapshot_clean,
     },
     [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
         .instance_size = sizeof(DriveBackupState),
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (8 preceding siblings ...)
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external " Fam Zheng
@ 2015-05-21  6:43 ` Fam Zheng
  2015-05-23 16:59   ` Max Reitz
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 923fc90..ae52d27 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1575,6 +1575,7 @@ typedef struct DriveBackupState {
     BlockDriverState *bs;
     AioContext *aio_context;
     BlockJob *job;
+    Error *blocker;
 } DriveBackupState;
 
 static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1599,6 +1600,9 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
     state->aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "drive-backup in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
     qmp_drive_backup(backup->device, backup->target,
                      backup->has_format, backup->format,
                      backup->sync,
@@ -1613,7 +1617,6 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
         return;
     }
 
-    state->bs = bs;
     state->job = state->bs->job;
 }
 
@@ -1632,6 +1635,10 @@ static void drive_backup_clean(BlkTransactionState *common)
 {
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (9 preceding siblings ...)
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
@ 2015-05-21  6:43 ` Fam Zheng
  2015-05-23 17:05   ` Max Reitz
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 blockdev.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index ae52d27..bd28183 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1649,6 +1649,7 @@ typedef struct BlockdevBackupState {
     BlockDriverState *bs;
     BlockJob *job;
     AioContext *aio_context;
+    Error *blocker;
 } BlockdevBackupState;
 
 static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1685,6 +1686,10 @@ static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
     }
     aio_context_acquire(state->aio_context);
 
+    state->bs = bs;
+    error_setg(&state->blocker, "blockdev-backup in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+
     qmp_blockdev_backup(backup->device, backup->target,
                         backup->sync,
                         backup->has_speed, backup->speed,
@@ -1696,7 +1701,6 @@ static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
         return;
     }
 
-    state->bs = bs;
     state->job = state->bs->job;
 }
 
@@ -1715,6 +1719,10 @@ static void blockdev_backup_clean(BlkTransactionState *common)
 {
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
 
+    if (state->bs) {
+        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
+        error_free(state->blocker);
+    }
     if (state->aio_context) {
         aio_context_release(state->aio_context);
     }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (10 preceding siblings ...)
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
@ 2015-05-21  6:43 ` Fam Zheng
  2015-05-23 17:11   ` Max Reitz
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

We don't want new requests from guest, so block the operation around the
nested poll.

It also avoids looping forever when iothread is submitting a lot of requests.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1ce62c4..b23a83f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -286,12 +286,21 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
+ *
+ * Devices are paused to avoid looping forever because otherwise they could
+ * keep submitting more requests.
  */
 void bdrv_drain(BlockDriverState *bs)
 {
+    Error *blocker = NULL;
+
+    error_setg(&blocker, "bdrv_drain in progress");
+    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
     while (bdrv_drain_one(bs)) {
         /* Keep iterating */
     }
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
+    error_free(blocker);
 }
 
 /*
@@ -303,14 +312,20 @@ void bdrv_drain(BlockDriverState *bs)
  * Note that completion of an asynchronous I/O operation can trigger any
  * number of other I/O operations on other devices---for example a coroutine
  * can be arbitrarily complex and a constant flow of I/O can come until the
- * coroutine is complete.  Because of this, it is not possible to have a
- * function to drain a single device's I/O queue.
+ * coroutine is complete.  Because of this, we must call bdrv_drain_one in a
+ * loop.
+ *
+ * We explicitly pause block jobs and devices to prevent them from submitting
+ * more requests.
  */
 void bdrv_drain_all(void)
 {
     /* Always run first iteration so any pending completion BHs run */
     bool busy = true;
     BlockDriverState *bs = NULL;
+    Error *blocker = NULL;
+
+    error_setg(&blocker, "bdrv_drain_all in progress");
 
     while ((bs = bdrv_next(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -319,6 +334,7 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_pause(bs->job);
         }
+        bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
         aio_context_release(aio_context);
     }
 
@@ -343,8 +359,10 @@ void bdrv_drain_all(void)
         if (bs->job) {
             block_job_resume(bs->job);
         }
+        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
         aio_context_release(aio_context);
     }
+    error_free(blocker);
 }
 
 /**
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
                   ` (11 preceding siblings ...)
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
@ 2015-05-21  6:43 ` Fam Zheng
  2015-05-23 17:21   ` Max Reitz
  12 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  6:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

When mirror should complete, the source and target are in sync.  But we
call bdrv_swap() only a while later in the main loop bh. If the guest
writes something before that, target will not get the new data.

Block "device IO" before bdrv_drain and unblock it after bdrw_swap().

Reported-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/mirror.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 58f391a..ed770c2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -320,6 +320,8 @@ static void mirror_drain(MirrorBlockJob *s)
 
 typedef struct {
     int ret;
+    /* Use to pause device IO during mirror exit */
+    Error *blocker;
 } MirrorExitData;
 
 static void mirror_exit(BlockJob *job, void *opaque)
@@ -358,6 +360,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
     if (replace_aio_context) {
         aio_context_release(replace_aio_context);
     }
+    bdrv_op_unblock(s->common.bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
+    error_free(data->blocker);
     g_free(s->replaces);
     bdrv_unref(s->target);
     block_job_completed(&s->common, data->ret);
@@ -376,6 +380,8 @@ static void coroutine_fn mirror_run(void *opaque)
                                  checking for a NULL string */
     int ret = 0;
     int n;
+    data = g_malloc0(sizeof(*data));
+    error_setg(&data->blocker, "mirror job exiting");
 
     if (block_job_is_cancelled(&s->common)) {
         goto immediate_exit;
@@ -521,6 +527,7 @@ static void coroutine_fn mirror_run(void *opaque)
              * mirror_populate runs.
              */
             trace_mirror_before_drain(s, cnt);
+            bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
             bdrv_drain(bs);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         }
@@ -543,6 +550,7 @@ static void coroutine_fn mirror_run(void *opaque)
             s->common.cancelled = false;
             break;
         }
+        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, data->blocker);
         last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }
 
@@ -563,7 +571,6 @@ immediate_exit:
     bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
     bdrv_iostatus_disable(s->target);
 
-    data = g_malloc(sizeof(*data));
     data->ret = ret;
     block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }
-- 
2.4.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
@ 2015-05-21  7:06   ` Wen Congyang
  2015-05-21  7:32     ` Fam Zheng
  2015-05-21  8:00   ` Wen Congyang
  2015-05-26 14:22   ` Kevin Wolf
  2 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2015-05-21  7:06 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

On 05/21/2015 02:42 PM, Fam Zheng wrote:
> It blocks device IO.
> 
> All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> 
> - virtio_blk_data_plane_create
> - virtio_scsi_hotplug
> 
>   Device creation, unblock it.
> 
> - bdrv_set_backing_hd
> 
>   Backing hd is not used by device, so blocking is OK.
> 
> - backup_start
> 
>   Blocking target when backup is running, unblock it.

Do you forget it?

> 
> - mirror_complete
> 
>   Blocking s->to_replace until mirror_exit, OK.
> 
> - block_job_complete
> 
>   The block job may be long running. Unblock it.

It should be block_job_create()

Thanks
Wen Congyang

> 
> - init_blk_migration
> 
>   The block migration may be long running, Unblock it.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  blockjob.c                      | 1 +
>  hw/block/dataplane/virtio-blk.c | 1 +
>  hw/scsi/virtio-scsi.c           | 1 +
>  include/block/block.h           | 1 +
>  migration/block.c               | 1 +
>  5 files changed, 5 insertions(+)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 2755465..e39bdde 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -51,6 +51,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
>                 BlockJobType_lookup[driver->job_type]);
>      bdrv_op_block_all(bs, job->blocker);
>      bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
> +    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, job->blocker);
>  
>      job->driver        = driver;
>      job->bs            = bs;
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 3db139b..3ecc8bd 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -209,6 +209,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_MIRROR, s->blocker);
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_STREAM, s->blocker);
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_REPLACE, s->blocker);
> +    blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
>  
>      *dataplane = s;
>  }
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index e242fef..5e15fa6 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -772,6 +772,7 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>              return;
>          }
>          blk_op_block_all(sd->conf.blk, s->blocker);
> +        blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
>          aio_context_acquire(s->ctx);
>          blk_set_aio_context(sd->conf.blk, s->ctx);
>          aio_context_release(s->ctx);
> diff --git a/include/block/block.h b/include/block/block.h
> index 7d1a717..906fb31 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -159,6 +159,7 @@ typedef enum BlockOpType {
>      BLOCK_OP_TYPE_RESIZE,
>      BLOCK_OP_TYPE_STREAM,
>      BLOCK_OP_TYPE_REPLACE,
> +    BLOCK_OP_TYPE_DEVICE_IO,
>      BLOCK_OP_TYPE_MAX,
>  } BlockOpType;
>  
> diff --git a/migration/block.c b/migration/block.c
> index ddb59cc..b833bac 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -379,6 +379,7 @@ static void init_blk_migration(QEMUFile *f)
>          alloc_aio_bitmap(bmds);
>          error_setg(&bmds->blocker, "block device is in use by migration");
>          bdrv_op_block_all(bs, bmds->blocker);
> +        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, bmds->blocker);
>          bdrv_ref(bs);
>  
>          block_mig_state.total_sector_sum += sectors;
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  7:06   ` Wen Congyang
@ 2015-05-21  7:32     ` Fam Zheng
  2015-05-22  4:54       ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21  7:32 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, mreitz,
	Stefan Hajnoczi, amit.shah, pbonzini

On Thu, 05/21 15:06, Wen Congyang wrote:
> On 05/21/2015 02:42 PM, Fam Zheng wrote:
> > It blocks device IO.
> > 
> > All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> > 
> > - virtio_blk_data_plane_create
> > - virtio_scsi_hotplug
> > 
> >   Device creation, unblock it.
> > 
> > - bdrv_set_backing_hd
> > 
> >   Backing hd is not used by device, so blocking is OK.
> > 
> > - backup_start
> > 
> >   Blocking target when backup is running, unblock it.
> 
> Do you forget it?

Oh I think the commit log is wrong: the target image is only written to by
block job, there cannot be a device on it, so it it's similar to
bdrv_set_backing_hd.

> 
> > 
> > - mirror_complete
> > 
> >   Blocking s->to_replace until mirror_exit, OK.
> > 
> > - block_job_complete
> > 
> >   The block job may be long running. Unblock it.
> 
> It should be block_job_create()

Yes.

Fam

> 
> Thanks
> Wen Congyang
> 
> > 
> > - init_blk_migration
> > 
> >   The block migration may be long running, Unblock it.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  blockjob.c                      | 1 +
> >  hw/block/dataplane/virtio-blk.c | 1 +
> >  hw/scsi/virtio-scsi.c           | 1 +
> >  include/block/block.h           | 1 +
> >  migration/block.c               | 1 +
> >  5 files changed, 5 insertions(+)
> > 
> > diff --git a/blockjob.c b/blockjob.c
> > index 2755465..e39bdde 100644
> > --- a/blockjob.c
> > +++ b/blockjob.c
> > @@ -51,6 +51,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
> >                 BlockJobType_lookup[driver->job_type]);
> >      bdrv_op_block_all(bs, job->blocker);
> >      bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
> > +    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, job->blocker);
> >  
> >      job->driver        = driver;
> >      job->bs            = bs;
> > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> > index 3db139b..3ecc8bd 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -209,6 +209,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
> >      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_MIRROR, s->blocker);
> >      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_STREAM, s->blocker);
> >      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_REPLACE, s->blocker);
> > +    blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
> >  
> >      *dataplane = s;
> >  }
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index e242fef..5e15fa6 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -772,6 +772,7 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >              return;
> >          }
> >          blk_op_block_all(sd->conf.blk, s->blocker);
> > +        blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
> >          aio_context_acquire(s->ctx);
> >          blk_set_aio_context(sd->conf.blk, s->ctx);
> >          aio_context_release(s->ctx);
> > diff --git a/include/block/block.h b/include/block/block.h
> > index 7d1a717..906fb31 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -159,6 +159,7 @@ typedef enum BlockOpType {
> >      BLOCK_OP_TYPE_RESIZE,
> >      BLOCK_OP_TYPE_STREAM,
> >      BLOCK_OP_TYPE_REPLACE,
> > +    BLOCK_OP_TYPE_DEVICE_IO,
> >      BLOCK_OP_TYPE_MAX,
> >  } BlockOpType;
> >  
> > diff --git a/migration/block.c b/migration/block.c
> > index ddb59cc..b833bac 100644
> > --- a/migration/block.c
> > +++ b/migration/block.c
> > @@ -379,6 +379,7 @@ static void init_blk_migration(QEMUFile *f)
> >          alloc_aio_bitmap(bmds);
> >          error_setg(&bmds->blocker, "block device is in use by migration");
> >          bdrv_op_block_all(bs, bmds->blocker);
> > +        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, bmds->blocker);
> >          bdrv_ref(bs);
> >  
> >          block_mig_state.total_sector_sum += sectors;
> > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
  2015-05-21  7:06   ` Wen Congyang
@ 2015-05-21  8:00   ` Wen Congyang
  2015-05-21 12:44     ` Fam Zheng
  2015-05-26 14:22   ` Kevin Wolf
  2 siblings, 1 reply; 56+ messages in thread
From: Wen Congyang @ 2015-05-21  8:00 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

On 05/21/2015 02:42 PM, Fam Zheng wrote:
> It blocks device IO.

Does tt only block virtio-blk/scsi? Not all block types?

Thanks
Wen Congyang

> 
> All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> 
> - virtio_blk_data_plane_create
> - virtio_scsi_hotplug
> 
>   Device creation, unblock it.
> 
> - bdrv_set_backing_hd
> 
>   Backing hd is not used by device, so blocking is OK.
> 
> - backup_start
> 
>   Blocking target when backup is running, unblock it.
> 
> - mirror_complete
> 
>   Blocking s->to_replace until mirror_exit, OK.
> 
> - block_job_complete
> 
>   The block job may be long running. Unblock it.
> 
> - init_blk_migration
> 
>   The block migration may be long running, Unblock it.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  blockjob.c                      | 1 +
>  hw/block/dataplane/virtio-blk.c | 1 +
>  hw/scsi/virtio-scsi.c           | 1 +
>  include/block/block.h           | 1 +
>  migration/block.c               | 1 +
>  5 files changed, 5 insertions(+)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 2755465..e39bdde 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -51,6 +51,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
>                 BlockJobType_lookup[driver->job_type]);
>      bdrv_op_block_all(bs, job->blocker);
>      bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
> +    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, job->blocker);
>  
>      job->driver        = driver;
>      job->bs            = bs;
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 3db139b..3ecc8bd 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -209,6 +209,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_MIRROR, s->blocker);
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_STREAM, s->blocker);
>      blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_REPLACE, s->blocker);
> +    blk_op_unblock(conf->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
>  
>      *dataplane = s;
>  }
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index e242fef..5e15fa6 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -772,6 +772,7 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>              return;
>          }
>          blk_op_block_all(sd->conf.blk, s->blocker);
> +        blk_op_unblock(sd->conf.blk, BLOCK_OP_TYPE_DEVICE_IO, s->blocker);
>          aio_context_acquire(s->ctx);
>          blk_set_aio_context(sd->conf.blk, s->ctx);
>          aio_context_release(s->ctx);
> diff --git a/include/block/block.h b/include/block/block.h
> index 7d1a717..906fb31 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -159,6 +159,7 @@ typedef enum BlockOpType {
>      BLOCK_OP_TYPE_RESIZE,
>      BLOCK_OP_TYPE_STREAM,
>      BLOCK_OP_TYPE_REPLACE,
> +    BLOCK_OP_TYPE_DEVICE_IO,
>      BLOCK_OP_TYPE_MAX,
>  } BlockOpType;
>  
> diff --git a/migration/block.c b/migration/block.c
> index ddb59cc..b833bac 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -379,6 +379,7 @@ static void init_blk_migration(QEMUFile *f)
>          alloc_aio_bitmap(bmds);
>          error_setg(&bmds->blocker, "block device is in use by migration");
>          bdrv_op_block_all(bs, bmds->blocker);
> +        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, bmds->blocker);
>          bdrv_ref(bs);
>  
>          block_mig_state.total_sector_sum += sectors;
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  8:00   ` Wen Congyang
@ 2015-05-21 12:44     ` Fam Zheng
  2015-05-22  6:18       ` Wen Congyang
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-21 12:44 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, mreitz,
	Stefan Hajnoczi, amit.shah, pbonzini

On Thu, 05/21 16:00, Wen Congyang wrote:
> On 05/21/2015 02:42 PM, Fam Zheng wrote:
> > It blocks device IO.
> 
> Does tt only block virtio-blk/scsi? Not all block types?

It's only necessary for dataplane enabled devices, which are virtio-blk and
virtio-scsi at the moment. Other types are OK.


Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  7:32     ` Fam Zheng
@ 2015-05-22  4:54       ` Fam Zheng
  2015-05-23 16:51         ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-22  4:54 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, qemu-block, qemu-devel, jcody, armbru, mreitz,
	Stefan Hajnoczi, amit.shah, pbonzini

On Thu, 05/21 15:32, Fam Zheng wrote:
> On Thu, 05/21 15:06, Wen Congyang wrote:
> > On 05/21/2015 02:42 PM, Fam Zheng wrote:
> > > It blocks device IO.
> > > 
> > > All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> > > 
> > > - virtio_blk_data_plane_create
> > > - virtio_scsi_hotplug
> > > 
> > >   Device creation, unblock it.
> > > 
> > > - bdrv_set_backing_hd
> > > 
> > >   Backing hd is not used by device, so blocking is OK.
> > > 
> > > - backup_start
> > > 
> > >   Blocking target when backup is running, unblock it.
> > 
> > Do you forget it?
> 
> Oh I think the commit log is wrong: the target image is only written to by
> block job, there cannot be a device on it, so it it's similar to
> bdrv_set_backing_hd.

Correction: if it's blockdev-backup, the target could have a device, in that
sense it should be unblocked like block_job_create(). I'll fix it.

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21 12:44     ` Fam Zheng
@ 2015-05-22  6:18       ` Wen Congyang
  0 siblings, 0 replies; 56+ messages in thread
From: Wen Congyang @ 2015-05-22  6:18 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, mreitz,
	Stefan Hajnoczi, amit.shah, pbonzini

On 05/21/2015 08:44 PM, Fam Zheng wrote:
> On Thu, 05/21 16:00, Wen Congyang wrote:
>> On 05/21/2015 02:42 PM, Fam Zheng wrote:
>>> It blocks device IO.
>>
>> Does tt only block virtio-blk/scsi? Not all block types?
> 
> It's only necessary for dataplane enabled devices, which are virtio-blk and
> virtio-scsi at the moment. Other types are OK.

Hmm, it is OK for qmp transaction. But I think it is not OK for mirror job.

Thanks
Wen Congyang

> 
> 
> Fam
> .
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-22  4:54       ` Fam Zheng
@ 2015-05-23 16:51         ` Max Reitz
  2015-05-25  2:15           ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:51 UTC (permalink / raw)
  To: Fam Zheng, Wen Congyang
  Cc: Kevin Wolf, qemu-block, jcody, armbru, qemu-devel,
	Stefan Hajnoczi, amit.shah, pbonzini

On 22.05.2015 06:54, Fam Zheng wrote:
> On Thu, 05/21 15:32, Fam Zheng wrote:
>> On Thu, 05/21 15:06, Wen Congyang wrote:
>>> On 05/21/2015 02:42 PM, Fam Zheng wrote:
>>>> It blocks device IO.
>>>>
>>>> All bdrv_op_block_all/blk_op_block_all callers are taken care of:
>>>>
>>>> - virtio_blk_data_plane_create
>>>> - virtio_scsi_hotplug
>>>>
>>>>    Device creation, unblock it.
>>>>
>>>> - bdrv_set_backing_hd
>>>>
>>>>    Backing hd is not used by device, so blocking is OK.
>>>>
>>>> - backup_start
>>>>
>>>>    Blocking target when backup is running, unblock it.
>>> Do you forget it?
>> Oh I think the commit log is wrong: the target image is only written to by
>> block job, there cannot be a device on it, so it it's similar to
>> bdrv_set_backing_hd.
> Correction: if it's blockdev-backup, the target could have a device, in that
> sense it should be unblocked like block_job_create(). I'll fix it.

Really? I think it makes sense not to allow I/O on a backup target. At 
least I can't imagine a use case where you'd want to do that... But that 
doesn't necessarily mean anything, of course.

Max

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
@ 2015-05-23 16:53   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:53 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:42, Fam Zheng wrote:
> virtio-blk now listens to op blocker change of the associated block
> backend.
>
> Up on setting op blocker on BLOCK_OP_TYPE_DEVICE_IO:
>
>    non-dataplane:
>     1) Set VirtIOBlock.paused
>     2) In virtio_blk_handle_output, do nothing if VirtIOBlock.paused
>
>    dataplane:
>     1) Clear the host event notifier
>     2) In handle_notify, do nothing if VirtIOBlock.paused
>
> Up on removing the op blocker:
>
>    non-dataplane:
>     1) Clear VirtIOBlock.paused
>     2) Schedule a BH on the AioContext of the backend, which calls
>     virtio_blk_handle_output, so that the previous unhandled kicks can
>     make progress
>
>    dataplane:
>     1) Set the host event notifier
>     2) Notify the host event notifier so that unhandled events are
>     processed
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   hw/block/dataplane/virtio-blk.c | 25 ++++++++++++++++-
>   hw/block/virtio-blk.c           | 59 +++++++++++++++++++++++++++++++++++++++--
>   include/hw/virtio/virtio-blk.h  |  8 +++++-
>   3 files changed, 88 insertions(+), 4 deletions(-)

(this is what happens if you sort your inbox from oldest to newest and 
go from top to bottom)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
@ 2015-05-23 16:53   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:53 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:42, Fam Zheng wrote:
> When a disk is attached to scsi-bus, virtio_scsi_hotplug will take care
> of protecting the block device with op blockers. Currently we haven't
> enabled block jobs (like what's done in virtio_blk_data_plane_create),
> but it is necessary to honor "device IO" op blocker first before we do.
> This is useful to make sure that guest IO requests are paused during qmp
> transactions (such as multi-disk snapshot or backup).
>
> A counter is added to the virtio-scsi device, which keeps track of
> currently blocked disks. If it goes from 0 to 1, the ioeventfds are
> disabled; when it goes back to 0, they are re-enabled.
>
> Also in device initialization, push the enabling of ioeventfds to before
> return, so the virtio_scsi_clear_aio is not needed there. Rename it,
> pair with an enabling variant, fix one coding style issue, then use it
> in the device pause points.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   hw/scsi/virtio-scsi-dataplane.c | 82 +++++++++++++++++++++++++++++++----------
>   hw/scsi/virtio-scsi.c           |  3 ++
>   include/hw/virtio/virtio-scsi.h |  3 ++
>   3 files changed, 68 insertions(+), 20 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
@ 2015-05-23 16:54   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:54 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:42, Fam Zheng wrote:
> So that NBD export cannot submit IO during bdrv_drain_all().
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   nbd.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
@ 2015-05-23 16:56   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:56 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:42, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   blockdev.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 5eaf77e..7f763d9 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1262,6 +1262,7 @@ typedef struct InternalSnapshotState {
>       BlockDriverState *bs;
>       AioContext *aio_context;
>       QEMUSnapshotInfo sn;
> +    Error *blocker;
>   } InternalSnapshotState;
>   
>   static void internal_snapshot_prepare(BlkTransactionState *common,
> @@ -1300,6 +1301,10 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
>       state->aio_context = bdrv_get_aio_context(bs);
>       aio_context_acquire(state->aio_context);
>   
> +    state->bs = bs;
> +    error_setg(&state->blocker, "internal snapshot in progress");
> +    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
> +
>       if (!bdrv_is_inserted(bs)) {
>           error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
>           return;
> @@ -1354,9 +1359,6 @@ static void internal_snapshot_prepare(BlkTransactionState *common,
>                            name, device);
>           return;
>       }
> -
> -    /* 4. succeed, mark a snapshot is created */
> -    state->bs = bs;
>   }

Unchanged from v5, so the same issue: Pulling this up means that 
internal_snapshot_abort() may attempt to delete the snapshot even if it 
hasn't been (successfully) created. This will always fail because 
sn->id_str is empty, though.

Max

>   static void internal_snapshot_abort(BlkTransactionState *common)
> @@ -1387,6 +1389,10 @@ static void internal_snapshot_clean(BlkTransactionState *common)
>       InternalSnapshotState *state = DO_UPCAST(InternalSnapshotState,
>                                                common, common);
>   
> +    if (state->bs) {
> +        bdrv_op_unblock(state->bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
> +        error_free(state->blocker);
> +    }
>       if (state->aio_context) {
>           aio_context_release(state->aio_context);
>       }

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external snapshot transaction
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external " Fam Zheng
@ 2015-05-23 16:58   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:58 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:42, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   blockdev.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)

Unchanged from v5, so the same question: Would it suffice to block I/O 
only inside external_snapshot_commit()?

> diff --git a/blockdev.c b/blockdev.c
> index 7f763d9..923fc90 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1404,6 +1404,7 @@ typedef struct ExternalSnapshotState {
>       BlockDriverState *old_bs;
>       BlockDriverState *new_bs;
>       AioContext *aio_context;
> +    Error *blocker;
>   } ExternalSnapshotState;
>   
>   static void external_snapshot_prepare(BlkTransactionState *common,
> @@ -1525,6 +1526,9 @@ static void external_snapshot_prepare(BlkTransactionState *common,
>       if (ret != 0) {
>           error_propagate(errp, local_err);
>       }
> +
> +    error_setg(&state->blocker, "snapshot in progress");
> +    bdrv_op_block(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
>   }
>   
>   static void external_snapshot_commit(BlkTransactionState *common)
> @@ -1541,8 +1545,6 @@ static void external_snapshot_commit(BlkTransactionState *common)
>        * don't want to abort all of them if one of them fails the reopen */
>       bdrv_reopen(state->new_bs, state->new_bs->open_flags & ~BDRV_O_RDWR,
>                   NULL);
> -
> -    aio_context_release(state->aio_context);
>   }
>   
>   static void external_snapshot_abort(BlkTransactionState *common)
> @@ -1552,6 +1554,17 @@ static void external_snapshot_abort(BlkTransactionState *common)
>       if (state->new_bs) {
>           bdrv_unref(state->new_bs);
>       }
> +}
> +
> +static void external_snapshot_clean(BlkTransactionState *common)
> +{
> +    ExternalSnapshotState *state =
> +                             DO_UPCAST(ExternalSnapshotState, common, common);
> +
> +    if (state->old_bs) {

I'd prefer state->blocker, because this block only makes sense if it's 
non-NULL, and if it is non-NULL, state->old_bs will always be non-NULL, too.

Max

> +        bdrv_op_unblock(state->old_bs, BLOCK_OP_TYPE_DEVICE_IO, state->blocker);
> +        error_free(state->blocker);
> +    }
>       if (state->aio_context) {
>           aio_context_release(state->aio_context);
>       }
> @@ -1716,6 +1729,7 @@ static const BdrvActionOps actions[] = {
>           .prepare  = external_snapshot_prepare,
>           .commit   = external_snapshot_commit,
>           .abort = external_snapshot_abort,
> +        .clean = external_snapshot_clean,
>       },
>       [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
>           .instance_size = sizeof(DriveBackupState),

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
@ 2015-05-23 16:59   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 16:59 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:43, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   blockdev.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

(for a richer review-reading experience, see reply to v5)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
@ 2015-05-23 17:05   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 17:05 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:43, Fam Zheng wrote:
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   blockdev.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
@ 2015-05-23 17:11   ` Max Reitz
  2015-05-25  2:48     ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2015-05-23 17:11 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:43, Fam Zheng wrote:
> We don't want new requests from guest, so block the operation around the
> nested poll.
>
> It also avoids looping forever when iothread is submitting a lot of requests.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   block/io.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)

Hm, I don't know about this. When I see someone calling 
bdrv_drain()/bdrv_drain_all(), I'm expecting that every request has been 
drained afterwards. This patch implies that this is not necessarily the 
case, because apparently in some configurations the guest can still 
submit I/O even while bdrv_drain() is running, but this means that even 
after this patch, the same can happen if I/O is submitted after 
bdrv_op_unblock() and before anything the caller of bdrv_drain() wants 
to do while the BDS is still drained. So this looks to me more like the 
caller must ensure that the BDS won't receive new requests, and do so 
before bdrv_drain() is called.

Maybe it works anyway because I'm once again just confused by qemu's 
threading model, and the problem here is only that bdrv_drain_one() may 
yield which may result in new I/O requests being submitted. If apart 
from yielding no new I/O can be submitted, I guess this patch is good.

Max

> diff --git a/block/io.c b/block/io.c
> index 1ce62c4..b23a83f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -286,12 +286,21 @@ static bool bdrv_drain_one(BlockDriverState *bs)
>    *
>    * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
>    * AioContext.
> + *
> + * Devices are paused to avoid looping forever because otherwise they could
> + * keep submitting more requests.
>    */
>   void bdrv_drain(BlockDriverState *bs)
>   {
> +    Error *blocker = NULL;
> +
> +    error_setg(&blocker, "bdrv_drain in progress");
> +    bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
>       while (bdrv_drain_one(bs)) {
>           /* Keep iterating */
>       }
> +    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
> +    error_free(blocker);
>   }
>   
>   /*
> @@ -303,14 +312,20 @@ void bdrv_drain(BlockDriverState *bs)
>    * Note that completion of an asynchronous I/O operation can trigger any
>    * number of other I/O operations on other devices---for example a coroutine
>    * can be arbitrarily complex and a constant flow of I/O can come until the
> - * coroutine is complete.  Because of this, it is not possible to have a
> - * function to drain a single device's I/O queue.
> + * coroutine is complete.  Because of this, we must call bdrv_drain_one in a
> + * loop.
> + *
> + * We explicitly pause block jobs and devices to prevent them from submitting
> + * more requests.
>    */
>   void bdrv_drain_all(void)
>   {
>       /* Always run first iteration so any pending completion BHs run */
>       bool busy = true;
>       BlockDriverState *bs = NULL;
> +    Error *blocker = NULL;
> +
> +    error_setg(&blocker, "bdrv_drain_all in progress");
>   
>       while ((bs = bdrv_next(bs))) {
>           AioContext *aio_context = bdrv_get_aio_context(bs);
> @@ -319,6 +334,7 @@ void bdrv_drain_all(void)
>           if (bs->job) {
>               block_job_pause(bs->job);
>           }
> +        bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
>           aio_context_release(aio_context);
>       }
>   
> @@ -343,8 +359,10 @@ void bdrv_drain_all(void)
>           if (bs->job) {
>               block_job_resume(bs->job);
>           }
> +        bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
>           aio_context_release(aio_context);
>       }
> +    error_free(blocker);
>   }
>   
>   /**

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit
  2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
@ 2015-05-23 17:21   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-23 17:21 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel
  Cc: Kevin Wolf, qemu-block, jcody, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 21.05.2015 08:43, Fam Zheng wrote:
> When mirror should complete, the source and target are in sync.  But we
> call bdrv_swap() only a while later in the main loop bh. If the guest
> writes something before that, target will not get the new data.
>
> Block "device IO" before bdrv_drain and unblock it after bdrw_swap().
>
> Reported-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   block/mirror.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-23 16:51         ` Max Reitz
@ 2015-05-25  2:15           ` Fam Zheng
  0 siblings, 0 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-25  2:15 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, qemu-block, jcody, qemu-devel, armbru,
	Stefan Hajnoczi, amit.shah, pbonzini

On Sat, 05/23 18:51, Max Reitz wrote:
> On 22.05.2015 06:54, Fam Zheng wrote:
> >On Thu, 05/21 15:32, Fam Zheng wrote:
> >>On Thu, 05/21 15:06, Wen Congyang wrote:
> >>>On 05/21/2015 02:42 PM, Fam Zheng wrote:
> >>>>It blocks device IO.
> >>>>
> >>>>All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> >>>>
> >>>>- virtio_blk_data_plane_create
> >>>>- virtio_scsi_hotplug
> >>>>
> >>>>   Device creation, unblock it.
> >>>>
> >>>>- bdrv_set_backing_hd
> >>>>
> >>>>   Backing hd is not used by device, so blocking is OK.
> >>>>
> >>>>- backup_start
> >>>>
> >>>>   Blocking target when backup is running, unblock it.
> >>>Do you forget it?
> >>Oh I think the commit log is wrong: the target image is only written to by
> >>block job, there cannot be a device on it, so it it's similar to
> >>bdrv_set_backing_hd.
> >Correction: if it's blockdev-backup, the target could have a device, in that
> >sense it should be unblocked like block_job_create(). I'll fix it.
> 
> Really? I think it makes sense not to allow I/O on a backup target. At least
> I can't imagine a use case where you'd want to do that... But that doesn't
> necessarily mean anything, of course.

Sure that nobody other than backup job itself should write to backup target,
but it's valid to read it - image fleecing aims to export it through NBD
server. If you attach it back to guest, it is as valid a scenario, isn't it?

So at least for image fleecing, we need to either 1) split device IO blocker to
read and write and only block write on target. 2) don't add device IO blocker
at all, expect the disk's end user to take the responsibility. And I'm afraid
1) will be too complicated.

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all
  2015-05-23 17:11   ` Max Reitz
@ 2015-05-25  2:48     ` Fam Zheng
  2015-05-26 14:21       ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-25  2:48 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, qemu-block, jcody, qemu-devel, armbru,
	Stefan Hajnoczi, amit.shah, pbonzini

On Sat, 05/23 19:11, Max Reitz wrote:
> On 21.05.2015 08:43, Fam Zheng wrote:
> >We don't want new requests from guest, so block the operation around the
> >nested poll.
> >
> >It also avoids looping forever when iothread is submitting a lot of requests.
> >
> >Signed-off-by: Fam Zheng <famz@redhat.com>
> >---
> >  block/io.c | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> Hm, I don't know about this. When I see someone calling
> bdrv_drain()/bdrv_drain_all(), I'm expecting that every request has been
> drained afterwards. This patch implies that this is not necessarily the
> case, because apparently in some configurations the guest can still submit
> I/O even while bdrv_drain() is running,

In dataplane, aio_poll in bdrv_drain_all will poll the ioeventfd, which could
call the handlers of virtio queues. That's how guest I/O sneaks in.


> but this means that even after this
> patch, the same can happen if I/O is submitted after bdrv_op_unblock() and
> before anything the caller of bdrv_drain() wants to do while the BDS is
> still drained. So this looks to me more like the caller must ensure that the
> BDS won't receive new requests, and do so before bdrv_drain() is called.

Yes, callers of bdrv_drain*() should use a blocker like you reasoned. Other
patches in this series looked at qmp_transaction, but there are more, which may
still be wrong until they're fixed.

This patch, however, fixes one of the potential issues of those callers:

> >It also avoids looping forever when iothread is submitting a lot of
> >requests.

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all
  2015-05-25  2:48     ` Fam Zheng
@ 2015-05-26 14:21       ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2015-05-26 14:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, jcody, qemu-devel, armbru,
	Stefan Hajnoczi, amit.shah, pbonzini

On 25.05.2015 04:48, Fam Zheng wrote:
> On Sat, 05/23 19:11, Max Reitz wrote:
>> On 21.05.2015 08:43, Fam Zheng wrote:
>>> We don't want new requests from guest, so block the operation around the
>>> nested poll.
>>>
>>> It also avoids looping forever when iothread is submitting a lot of requests.
>>>
>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>> ---
>>>   block/io.c | 22 ++++++++++++++++++++--
>>>   1 file changed, 20 insertions(+), 2 deletions(-)
>> Hm, I don't know about this. When I see someone calling
>> bdrv_drain()/bdrv_drain_all(), I'm expecting that every request has been
>> drained afterwards. This patch implies that this is not necessarily the
>> case, because apparently in some configurations the guest can still submit
>> I/O even while bdrv_drain() is running,
> In dataplane, aio_poll in bdrv_drain_all will poll the ioeventfd, which could
> call the handlers of virtio queues. That's how guest I/O sneaks in.
>
>
>> but this means that even after this
>> patch, the same can happen if I/O is submitted after bdrv_op_unblock() and
>> before anything the caller of bdrv_drain() wants to do while the BDS is
>> still drained. So this looks to me more like the caller must ensure that the
>> BDS won't receive new requests, and do so before bdrv_drain() is called.
> Yes, callers of bdrv_drain*() should use a blocker like you reasoned. Other
> patches in this series looked at qmp_transaction, but there are more, which may
> still be wrong until they're fixed.

Well, I can't say that I like fixes which are known (or at least 
suspected) to not be complete very much, but well... It's always better 
to fix something than nothing at all. The issue always is not forgetting 
about the potential problem.

Reviewed-by: Max Reitz <mreitz@redhat.com>

> This patch, however, fixes one of the potential issues of those callers:
>
>>> It also avoids looping forever when iothread is submitting a lot of
>>> requests.
> Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
  2015-05-21  7:06   ` Wen Congyang
  2015-05-21  8:00   ` Wen Congyang
@ 2015-05-26 14:22   ` Kevin Wolf
  2015-05-26 14:24     ` Max Reitz
  2 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2015-05-26 14:22 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, armbru, jcody, qemu-devel, mreitz, Stefan Hajnoczi,
	amit.shah, pbonzini

Am 21.05.2015 um 08:42 hat Fam Zheng geschrieben:
> It blocks device IO.

What I'm missing here is a description of the case that it protects
against. Blocking device I/O doesn't sound like a valid thing to want
per se, but it might be true that we need it as long as we don't have
the "real" op blockers that Jeff is working on. However, as long as you
don't say why you want that, I can't say whether it's true or not.

And of course, it doesn't block anything yet after this patch, it's just
set or cleared without any effect. In fact, it seems that even at the
end of the series, there is no bdrv_op_is_blocked() call that checks for
BLOCK_OP_TYPE_DEVICE_IO. Is this intentional?
> 
> All bdrv_op_block_all/blk_op_block_all callers are taken care of:
> 
> - virtio_blk_data_plane_create
> - virtio_scsi_hotplug
> 
>   Device creation, unblock it.
> 
> - bdrv_set_backing_hd
> 
>   Backing hd is not used by device, so blocking is OK.

If the backing file becomes the active layer after committing,
bdrv_swap() will keep the blockers of the overlay, so that the image
doesn't have the device I/O blocker any more. Correct?

> - backup_start
> 
>   Blocking target when backup is running, unblock it.
> 
> - mirror_complete
> 
>   Blocking s->to_replace until mirror_exit, OK.
> 
> - block_job_complete
> 
>   The block job may be long running. Unblock it.
> 
> - init_blk_migration
> 
>   The block migration may be long running, Unblock it.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-26 14:22   ` Kevin Wolf
@ 2015-05-26 14:24     ` Max Reitz
  2015-05-27  9:07       ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2015-05-26 14:24 UTC (permalink / raw)
  To: Kevin Wolf, Fam Zheng
  Cc: qemu-block, jcody, qemu-devel, armbru, Stefan Hajnoczi,
	amit.shah, pbonzini

On 26.05.2015 16:22, Kevin Wolf wrote:
> Am 21.05.2015 um 08:42 hat Fam Zheng geschrieben:
>> It blocks device IO.
> What I'm missing here is a description of the case that it protects
> against. Blocking device I/O doesn't sound like a valid thing to want
> per se, but it might be true that we need it as long as we don't have
> the "real" op blockers that Jeff is working on. However, as long as you
> don't say why you want that, I can't say whether it's true or not.
>
> And of course, it doesn't block anything yet after this patch, it's just
> set or cleared without any effect. In fact, it seems that even at the
> end of the series, there is no bdrv_op_is_blocked() call that checks for
> BLOCK_OP_TYPE_DEVICE_IO. Is this intentional?

Probably yes, because patches 2 and 3 introduce a notifier for when op 
blockers are set up and removed. This is used by virtio-blk and 
virtio-scsi to pause and unpause device I/O, respectively.

Max

>> All bdrv_op_block_all/blk_op_block_all callers are taken care of:
>>
>> - virtio_blk_data_plane_create
>> - virtio_scsi_hotplug
>>
>>    Device creation, unblock it.
>>
>> - bdrv_set_backing_hd
>>
>>    Backing hd is not used by device, so blocking is OK.
> If the backing file becomes the active layer after committing,
> bdrv_swap() will keep the blockers of the overlay, so that the image
> doesn't have the device I/O blocker any more. Correct?
>
>> - backup_start
>>
>>    Blocking target when backup is running, unblock it.
>>
>> - mirror_complete
>>
>>    Blocking s->to_replace until mirror_exit, OK.
>>
>> - block_job_complete
>>
>>    The block job may be long running. Unblock it.
>>
>> - init_blk_migration
>>
>>    The block migration may be long running, Unblock it.
>>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
> Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-26 14:24     ` Max Reitz
@ 2015-05-27  9:07       ` Kevin Wolf
  2015-05-27  9:50         ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2015-05-27  9:07 UTC (permalink / raw)
  To: Max Reitz
  Cc: Fam Zheng, qemu-block, jcody, qemu-devel, armbru,
	Stefan Hajnoczi, amit.shah, pbonzini

Am 26.05.2015 um 16:24 hat Max Reitz geschrieben:
> On 26.05.2015 16:22, Kevin Wolf wrote:
> >Am 21.05.2015 um 08:42 hat Fam Zheng geschrieben:
> >>It blocks device IO.
> >What I'm missing here is a description of the case that it protects
> >against. Blocking device I/O doesn't sound like a valid thing to want
> >per se, but it might be true that we need it as long as we don't have
> >the "real" op blockers that Jeff is working on. However, as long as you
> >don't say why you want that, I can't say whether it's true or not.
> >
> >And of course, it doesn't block anything yet after this patch, it's just
> >set or cleared without any effect. In fact, it seems that even at the
> >end of the series, there is no bdrv_op_is_blocked() call that checks for
> >BLOCK_OP_TYPE_DEVICE_IO. Is this intentional?
> 
> Probably yes, because patches 2 and 3 introduce a notifier for when
> op blockers are set up and removed. This is used by virtio-blk and
> virtio-scsi to pause and unpause device I/O, respectively.

Indeed, I missed that.

Having thought a bit more about this, I think we need to carefully check
whether op blockers are really the right tool here; and if they are, we
need to document the reason in the commit message at least.

The op blocker model as used until now has a few characteristics:

* Old model: You have a period of time between blocking and unblocking
  during which starting the corresponding operation is prohibited. The
  operation checks whether the blocker is set when it starts, and it
  fails if the operation is blocked.

  New model that Jeff is working on: You still have block/unblock (for a
  category rather than a specific operation, though), but you also have
  a start/stop pair for using the functionality. If you start an
  operation that is blocked, it still errors out. However, blocking an
  operation that is already started fails as well now.

  This series: An already running operation is interrupted when the
  blocker is set, and it continues when it is removed. You had to
  introduce new infrastructure (notifiers) because this doesn't match
  the op blockers model.

* Op blockers used to be set to protect high-level, long-running
  background operations, that are usually initiated by the user.
  bdrv_drain() is low-level and a blocking foreground operation.

* Op blockers came into effect only after monitor interactions and used
  to be on a slow path. We're now in the I/O path. It's not a problem
  with the current code per se (as you introduced notifiers, so we don't
  have to iterate the list all the time), but it broadens the scope of
  op blockers significantly and we shouldn't be doing that carelessly.

* Op blockers have an error message. It's unused for the new case.

This is the first part of what's troubling me with this series, as it
makes me doubtful if op blockers are the right tool to implement what
the commit message says (block device I/O). This is "are we doing the
thing right?"

The second part should actually come first, though: "Are we doing the
right thing?" I'm also doubtful whether blocking device I/O is even what
we should do.

Why is device I/O special compared to block job I/O or NBD server I/O?
If the answer is "because block jobs are already paused while draining"
(and probably nobody thought much about the NBD server), then chances
are that it's not the right thing.  In fact, using two different
mechanisms for pausing block jobs and pausing device I/O seems
inconsistent and wrong.

I suspect that the real solution needs to be in the block layer, though
I'm not quite sure yet what it would be like. Perhaps a function pair
like blk_stop/resume_io() that is used by bdrv_drain() callers and works
on the BlockBackend level. (The BDS level is problematic because we
don't want to stop all I/O; many callers just want to drain I/O of
_other_ callers so they can do their own I/O without having to care
about concurrency).

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-27  9:07       ` Kevin Wolf
@ 2015-05-27  9:50         ` Paolo Bonzini
  2015-05-27 10:10           ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-27  9:50 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz
  Cc: Fam Zheng, qemu-block, jcody, qemu-devel, armbru,
	Stefan Hajnoczi, amit.shah



On 27/05/2015 11:07, Kevin Wolf wrote:
> This is the first part of what's troubling me with this series, as it
> makes me doubtful if op blockers are the right tool to implement what
> the commit message says (block device I/O). This is "are we doing the
> thing right?"
> 
> The second part should actually come first, though: "Are we doing the
> right thing?" I'm also doubtful whether blocking device I/O is even what
> we should do.
> 
> Why is device I/O special compared to block job I/O or NBD server I/O?

Because block job I/O doesn't modify the source disk.  For the target
disk, block jobs should definitely treat themselves as device I/O and
register notifiers that pause themselves on bdrv_drain.

> If the answer is "because block jobs are already paused while draining"
> (and probably nobody thought much about the NBD server), then chances
> are that it's not the right thing.  In fact, using two different
> mechanisms for pausing block jobs and pausing device I/O seems
> inconsistent and wrong.
> 
> I suspect that the real solution needs to be in the block layer, though
> I'm not quite sure yet what it would be like. Perhaps a function pair
> like blk_stop/resume_io() that is used by bdrv_drain() callers and works
> on the BlockBackend level.

This is suspiciously similar to the first idea that I and Stefan had,
which was a blk_pause/blk_resume API, implemented through a notifier list.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-27  9:50         ` Paolo Bonzini
@ 2015-05-27 10:10           ` Kevin Wolf
  2015-05-27 10:43             ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2015-05-27 10:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah

Am 27.05.2015 um 11:50 hat Paolo Bonzini geschrieben:
> 
> 
> On 27/05/2015 11:07, Kevin Wolf wrote:
> > This is the first part of what's troubling me with this series, as it
> > makes me doubtful if op blockers are the right tool to implement what
> > the commit message says (block device I/O). This is "are we doing the
> > thing right?"
> > 
> > The second part should actually come first, though: "Are we doing the
> > right thing?" I'm also doubtful whether blocking device I/O is even what
> > we should do.
> > 
> > Why is device I/O special compared to block job I/O or NBD server I/O?
> 
> Because block job I/O doesn't modify the source disk.  For the target
> disk, block jobs should definitely treat themselves as device I/O and
> register notifiers that pause themselves on bdrv_drain.

Block jobs don't generally have a source and a target; in fact, the
design didn't even consider that such jobs exist (otherwise, we wouldn't
have bs->job).

There are some jobs that use a BDS read-only (source for backup, mirror,
commit) just like there are guest devices that use the BDS read-only
(CD-ROMs). And others write to it (streaming, hard disks).

This doesn't seem to be the crucial difference between both.

> > If the answer is "because block jobs are already paused while draining"
> > (and probably nobody thought much about the NBD server), then chances
> > are that it's not the right thing.  In fact, using two different
> > mechanisms for pausing block jobs and pausing device I/O seems
> > inconsistent and wrong.
> > 
> > I suspect that the real solution needs to be in the block layer, though
> > I'm not quite sure yet what it would be like. Perhaps a function pair
> > like blk_stop/resume_io() that is used by bdrv_drain() callers and works
> > on the BlockBackend level.
> 
> This is suspiciously similar to the first idea that I and Stefan had,
> which was a blk_pause/blk_resume API, implemented through a notifier list.

Any problems with it because of which Fam decided against it?

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-27 10:10           ` Kevin Wolf
@ 2015-05-27 10:43             ` Paolo Bonzini
  2015-05-28  2:49               ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-27 10:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 27/05/2015 12:10, Kevin Wolf wrote:
> Am 27.05.2015 um 11:50 hat Paolo Bonzini geschrieben:
>>
>>
>> On 27/05/2015 11:07, Kevin Wolf wrote:
>>> This is the first part of what's troubling me with this series, as it
>>> makes me doubtful if op blockers are the right tool to implement what
>>> the commit message says (block device I/O). This is "are we doing the
>>> thing right?"
>>>
>>> The second part should actually come first, though: "Are we doing the
>>> right thing?" I'm also doubtful whether blocking device I/O is even what
>>> we should do.
>>>
>>> Why is device I/O special compared to block job I/O or NBD server I/O?
>>
>> Because block job I/O doesn't modify the source disk.  For the target
>> disk, block jobs should definitely treat themselves as device I/O and
>> register notifiers that pause themselves on bdrv_drain.
> 
> Block jobs don't generally have a source and a target; in fact, the
> design didn't even consider that such jobs exist (otherwise, we wouldn't
> have bs->job).

Mirror and backup have targets.

> There are some jobs that use a BDS read-only (source for backup, mirror,
> commit) just like there are guest devices that use the BDS read-only
> (CD-ROMs). And others write to it (streaming, hard disks).

Streaming doesn't write to the BDS.  It reads it while copy-on-read is
active.  It's subtle, but it means that streaming can proceed without
changing the contents of the disk.  As far as safety of atomic
transactions is concerned, streaming need not be paused.

However, you're right that there is no particular reason why a more
generic pause/resume wouldn't pause jobs as well.  It wouldn't be a
problem either way as far as the job is concerned, and a better-defined
API is a better API.

>> This is suspiciously similar to the first idea that I and Stefan had,
>> which was a blk_pause/blk_resume API, implemented through a notifier list.
> 
> Any problems with it because of which Fam decided against it?

I don't know, but I don't think so.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-27 10:43             ` Paolo Bonzini
@ 2015-05-28  2:49               ` Fam Zheng
  2015-05-28  8:23                 ` Paolo Bonzini
  2015-05-28  9:40                 ` Kevin Wolf
  0 siblings, 2 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-28  2:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah

On Wed, 05/27 12:43, Paolo Bonzini wrote:
> 
> 
> On 27/05/2015 12:10, Kevin Wolf wrote:
> > Am 27.05.2015 um 11:50 hat Paolo Bonzini geschrieben:
> >>
> >>
> >> On 27/05/2015 11:07, Kevin Wolf wrote:
> >>> This is the first part of what's troubling me with this series, as it
> >>> makes me doubtful if op blockers are the right tool to implement what
> >>> the commit message says (block device I/O). This is "are we doing the
> >>> thing right?"
> >>>
> >>> The second part should actually come first, though: "Are we doing the
> >>> right thing?" I'm also doubtful whether blocking device I/O is even what
> >>> we should do.
> >>>
> >>> Why is device I/O special compared to block job I/O or NBD server I/O?
> >>
> >> Because block job I/O doesn't modify the source disk.  For the target
> >> disk, block jobs should definitely treat themselves as device I/O and
> >> register notifiers that pause themselves on bdrv_drain.
> > 
> > Block jobs don't generally have a source and a target; in fact, the
> > design didn't even consider that such jobs exist (otherwise, we wouldn't
> > have bs->job).
> 
> Mirror and backup have targets.
> 
> > There are some jobs that use a BDS read-only (source for backup, mirror,
> > commit) just like there are guest devices that use the BDS read-only
> > (CD-ROMs). And others write to it (streaming, hard disks).
> 
> Streaming doesn't write to the BDS.  It reads it while copy-on-read is
> active.  It's subtle, but it means that streaming can proceed without
> changing the contents of the disk.  As far as safety of atomic
> transactions is concerned, streaming need not be paused.
> 
> However, you're right that there is no particular reason why a more
> generic pause/resume wouldn't pause jobs as well.  It wouldn't be a
> problem either way as far as the job is concerned, and a better-defined
> API is a better API.
> 
> >> This is suspiciously similar to the first idea that I and Stefan had,
> >> which was a blk_pause/blk_resume API, implemented through a notifier list.
> > 
> > Any problems with it because of which Fam decided against it?

I am not against it.

Initially I started with writing blk_pause/resume, and soon I though it would
be useful if blk_aio_read and friends can check the pause state before issuing
IO:

https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02608.html

So I reused op blockers. But the idea was later abandoned, so it's now very
similar to the blk_pause/resume idea.

BTW, I remember we once proposed the new op blocker model should have a
category for I/O, no?

On a second thought, although in this series, we only need to modify
virtio-{scsi,blk}, Wen pointed out that mirror job also wants this during
completion (because bdrv_swap is in a BH after the job coroutine returns).  So,
in order for the solution to be general, and complete, this nofitier list
approach relies on the listening devices to do the right thing, which requires
modifying all of them, and is harder to maintain.

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28  2:49               ` Fam Zheng
@ 2015-05-28  8:23                 ` Paolo Bonzini
  2015-05-28 10:46                   ` Fam Zheng
  2015-05-28  9:40                 ` Kevin Wolf
  1 sibling, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28  8:23 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 04:49, Fam Zheng wrote:
> 
> On a second thought, although in this series, we only need to modify
> virtio-{scsi,blk}, Wen pointed out that mirror job also wants this during
> completion (because bdrv_swap is in a BH after the job coroutine returns).

Mirror needs to pause/resume the source.  It doesn't need to handle
pause/resume of the target, does it?

> So, in order for the solution to be general, and complete, this nofitier list
> approach relies on the listening devices to do the right thing, which requires
> modifying all of them, and is harder to maintain.

Isn't it only devices that use aio_set_event_notifier for their ioeventfd?

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28  2:49               ` Fam Zheng
  2015-05-28  8:23                 ` Paolo Bonzini
@ 2015-05-28  9:40                 ` Kevin Wolf
  2015-05-28 10:55                   ` Fam Zheng
  1 sibling, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2015-05-28  9:40 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah, Paolo Bonzini

Am 28.05.2015 um 04:49 hat Fam Zheng geschrieben:
> On Wed, 05/27 12:43, Paolo Bonzini wrote:
> > 
> > 
> > On 27/05/2015 12:10, Kevin Wolf wrote:
> > > Am 27.05.2015 um 11:50 hat Paolo Bonzini geschrieben:
> > >>
> > >>
> > >> On 27/05/2015 11:07, Kevin Wolf wrote:
> > >>> This is the first part of what's troubling me with this series, as it
> > >>> makes me doubtful if op blockers are the right tool to implement what
> > >>> the commit message says (block device I/O). This is "are we doing the
> > >>> thing right?"
> > >>>
> > >>> The second part should actually come first, though: "Are we doing the
> > >>> right thing?" I'm also doubtful whether blocking device I/O is even what
> > >>> we should do.
> > >>>
> > >>> Why is device I/O special compared to block job I/O or NBD server I/O?
> > >>
> > >> Because block job I/O doesn't modify the source disk.  For the target
> > >> disk, block jobs should definitely treat themselves as device I/O and
> > >> register notifiers that pause themselves on bdrv_drain.
> > > 
> > > Block jobs don't generally have a source and a target; in fact, the
> > > design didn't even consider that such jobs exist (otherwise, we wouldn't
> > > have bs->job).
> > 
> > Mirror and backup have targets.
> > 
> > > There are some jobs that use a BDS read-only (source for backup, mirror,
> > > commit) just like there are guest devices that use the BDS read-only
> > > (CD-ROMs). And others write to it (streaming, hard disks).
> > 
> > Streaming doesn't write to the BDS.  It reads it while copy-on-read is
> > active.  It's subtle, but it means that streaming can proceed without
> > changing the contents of the disk.  As far as safety of atomic
> > transactions is concerned, streaming need not be paused.
> > 
> > However, you're right that there is no particular reason why a more
> > generic pause/resume wouldn't pause jobs as well.  It wouldn't be a
> > problem either way as far as the job is concerned, and a better-defined
> > API is a better API.
> > 
> > >> This is suspiciously similar to the first idea that I and Stefan had,
> > >> which was a blk_pause/blk_resume API, implemented through a notifier list.
> > > 
> > > Any problems with it because of which Fam decided against it?
> 
> I am not against it.
> 
> Initially I started with writing blk_pause/resume, and soon I though it would
> be useful if blk_aio_read and friends can check the pause state before issuing
> IO:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02608.html
> 
> So I reused op blockers. But the idea was later abandoned, so it's now very
> similar to the blk_pause/resume idea.
> 
> BTW, I remember we once proposed the new op blocker model should have a
> category for I/O, no?

Yes, but I was always thinking of this on a much higher level: As soon
as a device is attached to a backend, it announces that it uses the
category, and it only drops it again when it's detached.

The use case for this is starting background jobs that don't work e.g.
while the image is written to. In such cases you want to block if the
attached device _can_ submit write request, not just if a write request
is already in flight.

Mutual exclusion on a per-request basis is something different.

> On a second thought, although in this series, we only need to modify
> virtio-{scsi,blk}, Wen pointed out that mirror job also wants this during
> completion (because bdrv_swap is in a BH after the job coroutine returns).  So,
> in order for the solution to be general, and complete, this nofitier list
> approach relies on the listening devices to do the right thing, which requires
> modifying all of them, and is harder to maintain.

Indeed. blk_pause/resume would handle everything in one central place
in the block layer instead of spreading the logic across all the block
layer users.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28  8:23                 ` Paolo Bonzini
@ 2015-05-28 10:46                   ` Fam Zheng
  2015-05-28 10:52                     ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 10:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah

On Thu, 05/28 10:23, Paolo Bonzini wrote:
> 
> 
> On 28/05/2015 04:49, Fam Zheng wrote:
> > 
> > On a second thought, although in this series, we only need to modify
> > virtio-{scsi,blk}, Wen pointed out that mirror job also wants this during
> > completion (because bdrv_swap is in a BH after the job coroutine returns).
> 
> Mirror needs to pause/resume the source.  It doesn't need to handle
> pause/resume of the target, does it?
> 
> > So, in order for the solution to be general, and complete, this nofitier list
> > approach relies on the listening devices to do the right thing, which requires
> > modifying all of them, and is harder to maintain.
> 
> Isn't it only devices that use aio_set_event_notifier for their ioeventfd?

I think it's only the case for qmp_transaction, mirror job needs all block
devices to implement pause: mirror_run guarantees source and target in sync
when it returns; but bdrv_swap is deferred to a main loop bottom half -  what
if there is a guest write to source in between?

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 10:46                   ` Fam Zheng
@ 2015-05-28 10:52                     ` Paolo Bonzini
  2015-05-28 11:11                       ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28 10:52 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 12:46, Fam Zheng wrote:
>> > 
>> > Mirror needs to pause/resume the source.  It doesn't need to handle
>> > pause/resume of the target, does it?
>> > 
>>> > > So, in order for the solution to be general, and complete, this nofitier list
>>> > > approach relies on the listening devices to do the right thing, which requires
>>> > > modifying all of them, and is harder to maintain.
>> > 
>> > Isn't it only devices that use aio_set_event_notifier for their ioeventfd?
> I think it's only the case for qmp_transaction, mirror job needs all block
> devices to implement pause: mirror_run guarantees source and target in sync
> when it returns; but bdrv_swap is deferred to a main loop bottom half -  what
> if there is a guest write to source in between?

Whoever uses ioeventfd needs to implement pause/resume, yes---not just
dataplane, also "regular" virtio-blk/virtio-scsi.

However, everyone else should be okay, because the bottom half runs
immediately and the big QEMU lock is not released in the meanwhile.  So
the CPUs have no occasion to run.  This needs a comment!

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28  9:40                 ` Kevin Wolf
@ 2015-05-28 10:55                   ` Fam Zheng
  2015-05-28 11:00                     ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 10:55 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah, Paolo Bonzini

On Thu, 05/28 11:40, Kevin Wolf wrote:
> Indeed. blk_pause/resume would handle everything in one central place
> in the block layer instead of spreading the logic across all the block
> layer users.

Sorry, I'm confused. Do you mean there is a way to implement blk_pause
completely in block layer, without the necessity of various notifier handlers
in device models?

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 10:55                   ` Fam Zheng
@ 2015-05-28 11:00                     ` Paolo Bonzini
  2015-05-28 11:24                       ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28 11:00 UTC (permalink / raw)
  To: Fam Zheng, Kevin Wolf
  Cc: qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 12:55, Fam Zheng wrote:
> > Indeed. blk_pause/resume would handle everything in one central place
> > in the block layer instead of spreading the logic across all the block
> > layer users.
>
> Sorry, I'm confused. Do you mean there is a way to implement blk_pause
> completely in block layer, without the necessity of various notifier handlers
> in device models?

How would you do that?  Do you have to keep a queue of pending requests
in the BlockBackend?  Since bdrv_drain_all may never return (e.g. stuck
NFS connection with nfs=hard), the guest can force unbounded allocation
in the host, which is bad.

In addition, the BDS doesn't have a list of BlockBackends attached to
it.  So you need the BlockBackends to register themselves for
pause/resume in some way---for example with a notifier list.

Then it's irrelevant whether it's the device model or the BB that
attaches itself to the notifier list.  You can start with doing it in
the device models (those that use ioeventfd), and later it can be moved
to the BB.  The low-level implementation remains the same.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 10:52                     ` Paolo Bonzini
@ 2015-05-28 11:11                       ` Fam Zheng
  2015-05-28 11:19                         ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 11:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah

On Thu, 05/28 12:52, Paolo Bonzini wrote:
> 
> 
> On 28/05/2015 12:46, Fam Zheng wrote:
> >> > 
> >> > Mirror needs to pause/resume the source.  It doesn't need to handle
> >> > pause/resume of the target, does it?
> >> > 
> >>> > > So, in order for the solution to be general, and complete, this nofitier list
> >>> > > approach relies on the listening devices to do the right thing, which requires
> >>> > > modifying all of them, and is harder to maintain.
> >> > 
> >> > Isn't it only devices that use aio_set_event_notifier for their ioeventfd?
> > I think it's only the case for qmp_transaction, mirror job needs all block
> > devices to implement pause: mirror_run guarantees source and target in sync
> > when it returns; but bdrv_swap is deferred to a main loop bottom half -  what
> > if there is a guest write to source in between?
> 
> Whoever uses ioeventfd needs to implement pause/resume, yes---not just
> dataplane, also "regular" virtio-blk/virtio-scsi.
> 
> However, everyone else should be okay, because the bottom half runs
> immediately and the big QEMU lock is not released in the meanwhile.  So
> the CPUs have no occasion to run.  This needs a comment!
> 

I'm not sure. It seems timer callbacks also do I/O, for example
nvme_process_sq().

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:11                       ` Fam Zheng
@ 2015-05-28 11:19                         ` Paolo Bonzini
  2015-05-28 12:05                           ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28 11:19 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 13:11, Fam Zheng wrote:
> > Whoever uses ioeventfd needs to implement pause/resume, yes---not just
> > dataplane, also "regular" virtio-blk/virtio-scsi.
> > 
> > However, everyone else should be okay, because the bottom half runs
> > immediately and the big QEMU lock is not released in the meanwhile.  So
> > the CPUs have no occasion to run.  This needs a comment!
> 
> I'm not sure. It seems timer callbacks also do I/O, for example
> nvme_process_sq().

Right, that's also true for USB devices. :(

Perhaps we can skip block_job_defer_to_main_loop if not necessary
(bs->aio_context == qemu_get_aio_context()).

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:00                     ` Paolo Bonzini
@ 2015-05-28 11:24                       ` Kevin Wolf
  2015-05-28 11:41                         ` Paolo Bonzini
  2015-05-28 11:44                         ` Fam Zheng
  0 siblings, 2 replies; 56+ messages in thread
From: Kevin Wolf @ 2015-05-28 11:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah

Am 28.05.2015 um 13:00 hat Paolo Bonzini geschrieben:
> On 28/05/2015 12:55, Fam Zheng wrote:
> > > Indeed. blk_pause/resume would handle everything in one central place
> > > in the block layer instead of spreading the logic across all the block
> > > layer users.
> >
> > Sorry, I'm confused. Do you mean there is a way to implement blk_pause
> > completely in block layer, without the necessity of various notifier handlers
> > in device models?
> 
> How would you do that?  Do you have to keep a queue of pending requests
> in the BlockBackend?  Since bdrv_drain_all may never return (e.g. stuck
> NFS connection with nfs=hard), the guest can force unbounded allocation
> in the host, which is bad.

We already queue requests for things like I/O throttling or
serialisation. Why would this be any worse?

> In addition, the BDS doesn't have a list of BlockBackends attached to
> it.  So you need the BlockBackends to register themselves for
> pause/resume in some way---for example with a notifier list.
> 
> Then it's irrelevant whether it's the device model or the BB that
> attaches itself to the notifier list.  You can start with doing it in
> the device models (those that use ioeventfd), and later it can be moved
> to the BB.  The low-level implementation remains the same.

The reason for doing it in the block layer is that it's in one place and
we can be sure that it's applied. We can still in addition modify
specific users to avoid even trying to send requests, but I think it's
good to have the single place that always ensures correct functionality
of the drain instead of making it dependent on the user.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:24                       ` Kevin Wolf
@ 2015-05-28 11:41                         ` Paolo Bonzini
  2015-05-28 11:44                         ` Fam Zheng
  1 sibling, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28 11:41 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 13:24, Kevin Wolf wrote:
> Am 28.05.2015 um 13:00 hat Paolo Bonzini geschrieben:
>> On 28/05/2015 12:55, Fam Zheng wrote:
>>>> Indeed. blk_pause/resume would handle everything in one central place
>>>> in the block layer instead of spreading the logic across all the block
>>>> layer users.
>>>
>>> Sorry, I'm confused. Do you mean there is a way to implement blk_pause
>>> completely in block layer, without the necessity of various notifier handlers
>>> in device models?
>>
>> How would you do that?  Do you have to keep a queue of pending requests
>> in the BlockBackend?  Since bdrv_drain_all may never return (e.g. stuck
>> NFS connection with nfs=hard), the guest can force unbounded allocation
>> in the host, which is bad.
> 
> We already queue requests for things like I/O throttling or
> serialisation. Why would this be any worse?

The fact that it's potentially unbounded makes me nervous.  But you're
right that we sort of expect the guest to not have too high a queue
depth.  And serialization is also potentially unbounded.

So it may indeed be the best way.  Just to double check, is it correct
that the API would still be BDS-based, i.e.
bdrv_pause_backends/bdrv_resume_backends, and the BlockBackends would
attach themselves to pause/resume notifier lists?

Paolo

>> In addition, the BDS doesn't have a list of BlockBackends attached to
>> it.  So you need the BlockBackends to register themselves for
>> pause/resume in some way---for example with a notifier list.
>>
>> Then it's irrelevant whether it's the device model or the BB that
>> attaches itself to the notifier list.  You can start with doing it in
>> the device models (those that use ioeventfd), and later it can be moved
>> to the BB.  The low-level implementation remains the same.
> 
> The reason for doing it in the block layer is that it's in one place and
> we can be sure that it's applied. We can still in addition modify
> specific users to avoid even trying to send requests, but I think it's
> good to have the single place that always ensures correct functionality
> of the drain instead of making it dependent on the user.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:24                       ` Kevin Wolf
  2015-05-28 11:41                         ` Paolo Bonzini
@ 2015-05-28 11:44                         ` Fam Zheng
  2015-05-28 11:47                           ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 11:44 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah, Paolo Bonzini

On Thu, 05/28 13:24, Kevin Wolf wrote:
> Am 28.05.2015 um 13:00 hat Paolo Bonzini geschrieben:
> > On 28/05/2015 12:55, Fam Zheng wrote:
> > > > Indeed. blk_pause/resume would handle everything in one central place
> > > > in the block layer instead of spreading the logic across all the block
> > > > layer users.
> > >
> > > Sorry, I'm confused. Do you mean there is a way to implement blk_pause
> > > completely in block layer, without the necessity of various notifier handlers
> > > in device models?
> > 
> > How would you do that?  Do you have to keep a queue of pending requests
> > in the BlockBackend?  Since bdrv_drain_all may never return (e.g. stuck
> > NFS connection with nfs=hard), the guest can force unbounded allocation
> > in the host, which is bad.
> 
> We already queue requests for things like I/O throttling or
> serialisation. Why would this be any worse?
> 
> > In addition, the BDS doesn't have a list of BlockBackends attached to
> > it.  So you need the BlockBackends to register themselves for
> > pause/resume in some way---for example with a notifier list.
> > 
> > Then it's irrelevant whether it's the device model or the BB that
> > attaches itself to the notifier list.  You can start with doing it in
> > the device models (those that use ioeventfd), and later it can be moved
> > to the BB.  The low-level implementation remains the same.
> 
> The reason for doing it in the block layer is that it's in one place and
> we can be sure that it's applied. We can still in addition modify
> specific users to avoid even trying to send requests, but I think it's
> good to have the single place that always ensures correct functionality
> of the drain instead of making it dependent on the user.
> 

How to do that for the synchronous blk_write callers that don't run in
a coroutine?

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:44                         ` Fam Zheng
@ 2015-05-28 11:47                           ` Paolo Bonzini
  2015-05-28 12:04                             ` Fam Zheng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-28 11:47 UTC (permalink / raw)
  To: Fam Zheng, Kevin Wolf
  Cc: qemu-block, qemu-devel, jcody, armbru, Max Reitz,
	Stefan Hajnoczi, amit.shah



On 28/05/2015 13:44, Fam Zheng wrote:
> > The reason for doing it in the block layer is that it's in one place and
> > we can be sure that it's applied. We can still in addition modify
> > specific users to avoid even trying to send requests, but I think it's
> > good to have the single place that always ensures correct functionality
> > of the drain instead of making it dependent on the user.
> 
> How to do that for the synchronous blk_write callers that don't run in
> a coroutine?

They would be completely oblivious to it.

Their call to blk_co_write would queue the request.  Then blk_write
calls aio_poll, which ultimately would result in blk_resume and unqueue
the request.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:47                           ` Paolo Bonzini
@ 2015-05-28 12:04                             ` Fam Zheng
  0 siblings, 0 replies; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 12:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah

On Thu, 05/28 13:47, Paolo Bonzini wrote:
> 
> 
> On 28/05/2015 13:44, Fam Zheng wrote:
> > > The reason for doing it in the block layer is that it's in one place and
> > > we can be sure that it's applied. We can still in addition modify
> > > specific users to avoid even trying to send requests, but I think it's
> > > good to have the single place that always ensures correct functionality
> > > of the drain instead of making it dependent on the user.
> > 
> > How to do that for the synchronous blk_write callers that don't run in
> > a coroutine?
> 
> They would be completely oblivious to it.
> 
> Their call to blk_co_write would queue the request.  Then blk_write
> calls aio_poll, which ultimately would result in blk_resume and unqueue
> the request.

OK.. I thought that, then we have to make a rule that a blk_pause must be
resolved by an aio_poll() loop.  This is somehow tricky, for example I assume
blk_write() would only call aio_poll() of its own AioContext? But the pairing
blk_resume() will be in the main context.  To make it worse, what if there is
no BDS in the main context, should it be special cased in the block layer?

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 11:19                         ` Paolo Bonzini
@ 2015-05-28 12:05                           ` Fam Zheng
  2015-05-29 11:11                             ` Andrey Korolyov
  0 siblings, 1 reply; 56+ messages in thread
From: Fam Zheng @ 2015-05-28 12:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, qemu-block, armbru, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi, amit.shah

On Thu, 05/28 13:19, Paolo Bonzini wrote:
> 
> 
> On 28/05/2015 13:11, Fam Zheng wrote:
> > > Whoever uses ioeventfd needs to implement pause/resume, yes---not just
> > > dataplane, also "regular" virtio-blk/virtio-scsi.
> > > 
> > > However, everyone else should be okay, because the bottom half runs
> > > immediately and the big QEMU lock is not released in the meanwhile.  So
> > > the CPUs have no occasion to run.  This needs a comment!
> > 
> > I'm not sure. It seems timer callbacks also do I/O, for example
> > nvme_process_sq().
> 
> Right, that's also true for USB devices. :(
> 
> Perhaps we can skip block_job_defer_to_main_loop if not necessary
> (bs->aio_context == qemu_get_aio_context()).

I think so. It will make dataplane even more specialized but that seems the
only way to fix the problem at the moment.

Fam

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-28 12:05                           ` Fam Zheng
@ 2015-05-29 11:11                             ` Andrey Korolyov
  2015-05-30 13:21                               ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Korolyov @ 2015-05-29 11:11 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu block, jcody, Markus Armbruster, qemu-devel,
	Stefan Hajnoczi, Amit Shah, Paolo Bonzini, Max Reitz

On Thu, May 28, 2015 at 3:05 PM, Fam Zheng <famz@redhat.com> wrote:
> On Thu, 05/28 13:19, Paolo Bonzini wrote:
>>
>>
>> On 28/05/2015 13:11, Fam Zheng wrote:
>> > > Whoever uses ioeventfd needs to implement pause/resume, yes---not just
>> > > dataplane, also "regular" virtio-blk/virtio-scsi.
>> > >
>> > > However, everyone else should be okay, because the bottom half runs
>> > > immediately and the big QEMU lock is not released in the meanwhile.  So
>> > > the CPUs have no occasion to run.  This needs a comment!
>> >
>> > I'm not sure. It seems timer callbacks also do I/O, for example
>> > nvme_process_sq().
>>
>> Right, that's also true for USB devices. :(
>>
>> Perhaps we can skip block_job_defer_to_main_loop if not necessary
>> (bs->aio_context == qemu_get_aio_context()).
>
> I think so. It will make dataplane even more specialized but that seems the
> only way to fix the problem at the moment.
>
> Fam
>

Sorry for a potential thread hijack, but I`m curious about the reasons
to not making advertised queue depth for non-passthrough backends an
independent tunable, is there any concerns behind that?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
  2015-05-29 11:11                             ` Andrey Korolyov
@ 2015-05-30 13:21                               ` Paolo Bonzini
  0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2015-05-30 13:21 UTC (permalink / raw)
  To: Andrey Korolyov, Fam Zheng
  Cc: Kevin Wolf, qemu block, qemu-devel, jcody, Markus Armbruster,
	Max Reitz, Stefan Hajnoczi, Amit Shah



On 29/05/2015 13:11, Andrey Korolyov wrote:
> Sorry for a potential thread hijack, but I`m curious about the reasons
> to not making advertised queue depth for non-passthrough backends an
> independent tunable, is there any concerns behind that?

It certainly can be made tunable.  Usually it is bounded actually, for
example PATA has a queue depth of 1 (!) while AHCI has 32.  For virtio
(blk and scsi) it actually is already somewhat tunable, since the queue
depth is limited by the size of the virtqueue.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2015-05-30 13:22 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-21  6:42 [Qemu-devel] [PATCH v6 00/13] Fix transactional snapshot with dataplane and NBD export Fam Zheng
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO" Fam Zheng
2015-05-21  7:06   ` Wen Congyang
2015-05-21  7:32     ` Fam Zheng
2015-05-22  4:54       ` Fam Zheng
2015-05-23 16:51         ` Max Reitz
2015-05-25  2:15           ` Fam Zheng
2015-05-21  8:00   ` Wen Congyang
2015-05-21 12:44     ` Fam Zheng
2015-05-22  6:18       ` Wen Congyang
2015-05-26 14:22   ` Kevin Wolf
2015-05-26 14:24     ` Max Reitz
2015-05-27  9:07       ` Kevin Wolf
2015-05-27  9:50         ` Paolo Bonzini
2015-05-27 10:10           ` Kevin Wolf
2015-05-27 10:43             ` Paolo Bonzini
2015-05-28  2:49               ` Fam Zheng
2015-05-28  8:23                 ` Paolo Bonzini
2015-05-28 10:46                   ` Fam Zheng
2015-05-28 10:52                     ` Paolo Bonzini
2015-05-28 11:11                       ` Fam Zheng
2015-05-28 11:19                         ` Paolo Bonzini
2015-05-28 12:05                           ` Fam Zheng
2015-05-29 11:11                             ` Andrey Korolyov
2015-05-30 13:21                               ` Paolo Bonzini
2015-05-28  9:40                 ` Kevin Wolf
2015-05-28 10:55                   ` Fam Zheng
2015-05-28 11:00                     ` Paolo Bonzini
2015-05-28 11:24                       ` Kevin Wolf
2015-05-28 11:41                         ` Paolo Bonzini
2015-05-28 11:44                         ` Fam Zheng
2015-05-28 11:47                           ` Paolo Bonzini
2015-05-28 12:04                             ` Fam Zheng
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 02/13] block: Add op blocker notifier list Fam Zheng
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 03/13] block-backend: Add blk_op_blocker_add_notifier Fam Zheng
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 04/13] virtio-blk: Move complete_request to 'ops' structure Fam Zheng
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 05/13] virtio-blk: Don't handle output when there is "device IO" op blocker Fam Zheng
2015-05-23 16:53   ` Max Reitz
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 06/13] virtio-scsi-dataplane: Add "device IO" op blocker listener Fam Zheng
2015-05-23 16:53   ` Max Reitz
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 07/13] nbd-server: Clear "can_read" when "device io" blocker is set Fam Zheng
2015-05-23 16:54   ` Max Reitz
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 08/13] blockdev: Block device IO during internal snapshot transaction Fam Zheng
2015-05-23 16:56   ` Max Reitz
2015-05-21  6:42 ` [Qemu-devel] [PATCH v6 09/13] blockdev: Block device IO during external " Fam Zheng
2015-05-23 16:58   ` Max Reitz
2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 10/13] blockdev: Block device IO during drive-backup transaction Fam Zheng
2015-05-23 16:59   ` Max Reitz
2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 11/13] blockdev: Block device IO during blockdev-backup transaction Fam Zheng
2015-05-23 17:05   ` Max Reitz
2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 12/13] block: Block "device IO" during bdrv_drain and bdrv_drain_all Fam Zheng
2015-05-23 17:11   ` Max Reitz
2015-05-25  2:48     ` Fam Zheng
2015-05-26 14:21       ` Max Reitz
2015-05-21  6:43 ` [Qemu-devel] [PATCH v6 13/13] block/mirror: Block "device IO" during mirror exit Fam Zheng
2015-05-23 17:21   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.