All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer
@ 2016-04-04 13:43 Alberto Garcia
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
                   ` (11 more replies)
  0 siblings, 12 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Hi!

Half a year later :) here's the new version of the intermediate block
streaming patches.

Quick summary: the feature was already working since June but there
were two problems that made it impossible to merge the patches. Here's
the full description:

   https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html

And here's the TL;DR version:

   1) Opening a BDS in read-write mode with bdrv_reopen() has the side
      effect of reopening its backing chain in read-only mode.

   2) bdrv_reopen() can trigger the completion of an existing block
      job. If that job modifies the backing chain in which that reopen
      call is operating, it can crash QEMU.

Now, these problems are only relevant if there are two block jobs
running at the same time in the same backing chain. This is a nice
feature, but it's not essential for the intermediate block streaming
operation. Therefore I decided to disable it in this version of the
series.

The way it works is that if you run block-stream on an intermediate
node, it will block the active layer as well, preventing other block
jobs in the same chain.

Other than that there are no significant changes compared to v8, but a
couple of things are different:

 - The code has been rebased
 - The patch that introduced block_job_next() has been split into
   several: one that adds the function and the rest that use it in
   different parts of the QEMU code.
 - Some patches have already been merged into QEMU so they have been
   removed from the series.
 - The test that ran several block-stream operations in parallel now
   verifies that they are forbidden.

I think that's all. Comments and questions are welcome!

Berto

v9:
- Rebase the code
- Block the active layer in order to forbid other block jobs in the
  same chain.
- Split the patch that adds block_job_next() into 4 (new patches 1-4).
- Replace the test that performs several block-stream operations in
  parallel with one that check that they're forbidden.
- Remove patches that have already been merged.

v8: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05754.html
- Rebased on top of Stefan's block branch (0a35bce416)
- The loop that pauses the block jobs in bdrv_drain_all() is now split
  in two: one that iterates the list of block jobs to stop them, and
  one that iterates the root bds in order to get the aio contexts.

v7: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02580.html
- Rebased against the current master
- Updated bdrv_drain_all() to use the new block_job_next() API.

v6: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03046.html
- fix the no-op test following Max's suggestions

v5: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03006.html
- Fix a few typos
- Minor documentation updates
- Update test_stream_partial() to test no-ops
- New test case: test_stream_parallel()
- New test case: test_stream_overlapping()

v4: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01878.html
- Refactor find_block_job to use the error from bdrv_lookup_bs()
- Don't use QERR_DEVICE_IN_USE in block_job_create() since we can be
  dealing with nodes now.
- Fix @device comment in the BlockJobInfo documentation
- stream_start(): simplify the bdrv_reopen() call and use
  bdrv_get_device_or_node_name() for error messages.
- Use a different variable name for BlockDriverState *i
- Documentation fixes in docs/live-block-ops.txt
- Update iotest 30 since now test_device_not_found() returns
  GenericError
- Fix test case test_stream_partial()
- Add new test case test_stream_intermediate()
- Fix typos

v3: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg00806.html
- Keep a list of block jobs and make qmp_query_block_jobs() iterate
  over it.

v2: https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg04798.html
- The 'block-stream' command does not have a 'node-name' parameter
  anymore and reuses 'device' for that purpose.
- Block jobs can now be owned by any intermediate node, and not just
  by the ones at the root. query-block-jobs is updated to reflect that
  change.
- The 'device' parameter of all 'block-job-*' commands can now take a
  node name.
- The BlockJobInfo type and all BLOCK_JOB_* events report the node
  name in the 'device' field if the node does not have a device name.
- All intermediate nodes are blocked (and checked for blockers) during
  the streaming operation.

v1: https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04116.html

Alberto Garcia (11):
  block: keep a list of block jobs
  block: use the block job list in bdrv_drain_all()
  block: use the block job list in qmp_query_block_jobs()
  block: use the block job list in bdrv_close()
  block: allow block jobs in any arbitrary node
  block: Support streaming to an intermediate layer
  block: Add QMP support for streaming to an intermediate layer
  docs: Document how to stream to an intermediate layer
  qemu-iotests: test streaming to an intermediate layer
  qemu-iotests: test overlapping block-stream operations
  qemu-iotests: test non-overlapping block-stream operations

 block.c                    |  29 ++++---------
 block/io.c                 |  21 ++++++----
 block/stream.c             |  39 ++++++++++++++++-
 blockdev.c                 |  68 ++++++++++++++++++------------
 blockjob.c                 |  18 +++++++-
 docs/live-block-ops.txt    |  31 +++++++++-----
 docs/qmp-events.txt        |   8 ++--
 include/block/block_int.h  |   5 ++-
 include/block/blockjob.h   |  14 +++++++
 qapi/block-core.json       |  30 ++++++++------
 tests/qemu-iotests/030     | 101 ++++++++++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/030.out |   4 +-
 12 files changed, 276 insertions(+), 92 deletions(-)

-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 11:59   ` Max Reitz
  2016-04-29 14:22   ` Kevin Wolf
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all() Alberto Garcia
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

The current way to obtain the list of existing block jobs is to
iterate over all root nodes and check which ones own a job.

Since we want to be able to support block jobs in other nodes as well,
this patch keeps a list of jobs that is updated every time one is
created or destroyed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 blockjob.c               | 13 +++++++++++++
 include/block/blockjob.h | 14 ++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 9fc37ca..3557048 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -50,6 +50,16 @@ struct BlockJobTxn {
     int refcnt;
 };
 
+static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
+
+BlockJob *block_job_next(BlockJob *job)
+{
+    if (!job) {
+        return QLIST_FIRST(&block_jobs);
+    }
+    return QLIST_NEXT(job, job_list);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
                        int64_t speed, BlockCompletionFunc *cb,
                        void *opaque, Error **errp)
@@ -76,6 +86,8 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
     job->refcnt        = 1;
     bs->job = job;
 
+    QLIST_INSERT_HEAD(&block_jobs, job, job_list);
+
     /* Only set speed when necessary to avoid NotSupported error */
     if (speed != 0) {
         Error *local_err = NULL;
@@ -103,6 +115,7 @@ void block_job_unref(BlockJob *job)
         bdrv_unref(job->bs);
         error_free(job->blocker);
         g_free(job->id);
+        QLIST_REMOVE(job, job_list);
         g_free(job);
     }
 }
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 8bedc49..5d665c1 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -135,6 +135,9 @@ struct BlockJob {
      */
     bool deferred_to_main_loop;
 
+    /** Element of the list of block jobs */
+    QLIST_ENTRY(BlockJob) job_list;
+
     /** Status that is published by the query-block-jobs QMP API */
     BlockDeviceIoStatus iostatus;
 
@@ -173,6 +176,17 @@ struct BlockJob {
 };
 
 /**
+ * block_job_next:
+ * @job: A block job, or %NULL.
+ *
+ * Get the next element from the list of block jobs after @job, or the
+ * first one if @job is %NULL.
+ *
+ * Returns the requested job, or %NULL if there are no more jobs left.
+ */
+BlockJob *block_job_next(BlockJob *job);
+
+/**
  * block_job_create:
  * @job_type: The class object for the newly-created job.
  * @bs: The block
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all()
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 12:04   ` Max Reitz
  2016-04-29 14:25   ` Kevin Wolf
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs() Alberto Garcia
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.

This patch uses block_job_next() to iterate over all block jobs.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/io.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index c4869b9..884ed1e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -288,15 +288,21 @@ void bdrv_drain_all(void)
     /* Always run first iteration so any pending completion BHs run */
     bool busy = true;
     BlockDriverState *bs = NULL;
+    BlockJob *job = NULL;
     GSList *aio_ctxs = NULL, *ctx;
 
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = bdrv_get_aio_context(job->bs);
+
+        aio_context_acquire(aio_context);
+        block_job_pause(job);
+        aio_context_release(aio_context);
+    }
+
     while ((bs = bdrv_next(bs))) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        if (bs->job) {
-            block_job_pause(bs->job);
-        }
         bdrv_drain_recurse(bs);
         aio_context_release(aio_context);
 
@@ -333,14 +339,11 @@ void bdrv_drain_all(void)
         }
     }
 
-    bs = NULL;
-    while ((bs = bdrv_next(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = bdrv_get_aio_context(job->bs);
 
         aio_context_acquire(aio_context);
-        if (bs->job) {
-            block_job_resume(bs->job);
-        }
+        block_job_resume(job);
         aio_context_release(aio_context);
     }
     g_slist_free(aio_ctxs);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs()
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all() Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 12:09   ` Max Reitz
  2016-04-29 14:32   ` Kevin Wolf
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close() Alberto Garcia
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
this function can only find those in top-level BlockDriverStates.

This patch uses block_job_next() instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 blockdev.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index e50e8ea..edbcc19 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4083,21 +4083,18 @@ out:
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
     BlockJobInfoList *head = NULL, **p_next = &head;
-    BlockDriverState *bs;
+    BlockJob *job;
 
-    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
+    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+        BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+        AioContext *aio_context = bdrv_get_aio_context(job->bs);
 
         aio_context_acquire(aio_context);
-
-        if (bs->job) {
-            BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
-            elem->value = block_job_query(bs->job);
-            *p_next = elem;
-            p_next = &elem->next;
-        }
-
+        elem->value = block_job_query(job);
         aio_context_release(aio_context);
+
+        *p_next = elem;
+        p_next = &elem->next;
     }
 
     return head;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close()
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (2 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs() Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 12:14   ` Max Reitz
  2016-04-29 14:38   ` Kevin Wolf
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

bdrv_close_all() cancels all block jobs by iterating over all
BlockDriverStates. This patch simplifies the code by iterating
directly over the block jobs using block_job_next().

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block.c | 25 ++++++-------------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/block.c b/block.c
index d36eb75..48638c9 100644
--- a/block.c
+++ b/block.c
@@ -2182,8 +2182,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
-    BlockDriverState *bs;
-    AioContext *aio_context;
+    BlockJob *job;
 
     /* Drop references from requests still in flight, such as canceled block
      * jobs whose AIO context has not been polled yet */
@@ -2193,23 +2192,11 @@ void bdrv_close_all(void)
     blockdev_close_all_bdrv_states();
 
     /* Cancel all block jobs */
-    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
-        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
-            aio_context = bdrv_get_aio_context(bs);
-
-            aio_context_acquire(aio_context);
-            if (bs->job) {
-                block_job_cancel_sync(bs->job);
-                aio_context_release(aio_context);
-                break;
-            }
-            aio_context_release(aio_context);
-        }
-
-        /* All the remaining BlockDriverStates are referenced directly or
-         * indirectly from block jobs, so there needs to be at least one BDS
-         * directly used by a block job */
-        assert(bs);
+    while ((job = block_job_next(NULL))) {
+        AioContext *aio_context = bdrv_get_aio_context(job->bs);
+        aio_context_acquire(aio_context);
+        block_job_cancel_sync(job);
+        aio_context_release(aio_context);
     }
 }
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (3 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close() Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 12:30   ` Max Reitz
                     ` (2 more replies)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer Alberto Garcia
                   ` (6 subsequent siblings)
  11 siblings, 3 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Currently, block jobs can only be owned by root nodes. This patch
allows block jobs to be in any arbitrary node, by making the following
changes:

- Block jobs can now be identified by the node name of their
  BlockDriverState in addition to the device name. Since both device
  and node names live in the same namespace there's no ambiguity.

- The "device" parameter used by all commands that operate on block
  jobs can also be a node name now.

- The node name is used as a fallback to fill in the BlockJobInfo
  structure and all BLOCK_JOB_* events if there is no device name for
  that job.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 blockdev.c           | 18 ++++++++++--------
 blockjob.c           |  5 +++--
 docs/qmp-events.txt  |  8 ++++----
 qapi/block-core.json | 20 ++++++++++----------
 4 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index edbcc19..d1f5dfb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3685,8 +3685,10 @@ void qmp_blockdev_mirror(const char *device, const char *target,
     aio_context_release(aio_context);
 }
 
-/* Get the block job for a given device name and acquire its AioContext */
-static BlockJob *find_block_job(const char *device, AioContext **aio_context,
+/* Get the block job for a given device or node name
+ * and acquire its AioContext */
+static BlockJob *find_block_job(const char *device_or_node,
+                                AioContext **aio_context,
                                 Error **errp)
 {
     BlockBackend *blk;
@@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
 
     *aio_context = NULL;
 
-    blk = blk_by_name(device);
-    if (!blk) {
+    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
+    if (!bs) {
         goto notfound;
     }
 
-    *aio_context = blk_get_aio_context(blk);
+    *aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(*aio_context);
 
-    if (!blk_is_available(blk)) {
+    blk = blk_by_name(device_or_node);
+    if (blk && !blk_is_available(blk)) {
         goto notfound;
     }
-    bs = blk_bs(blk);
 
     if (!bs->job) {
         goto notfound;
@@ -3715,7 +3717,7 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
 
 notfound:
     error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
-              "No active block job on device '%s'", device);
+              "No active block job on node '%s'", device_or_node);
     if (*aio_context) {
         aio_context_release(*aio_context);
         *aio_context = NULL;
diff --git a/blockjob.c b/blockjob.c
index 3557048..2ab4794 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -67,7 +67,8 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
     BlockJob *job;
 
     if (bs->job) {
-        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
+        error_setg(errp, "Node '%s' is in use",
+                   bdrv_get_device_or_node_name(bs));
         return NULL;
     }
     bdrv_ref(bs);
@@ -78,7 +79,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
     bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
     job->driver        = driver;
-    job->id            = g_strdup(bdrv_get_device_name(bs));
+    job->id            = g_strdup(bdrv_get_device_or_node_name(bs));
     job->bs            = bs;
     job->cb            = cb;
     job->opaque        = opaque;
diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index fa7574d..2d238c5 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -92,7 +92,7 @@ Data:
 
 - "type":     Job type (json-string; "stream" for image streaming
                                      "commit" for block commit)
-- "device":   Device name (json-string)
+- "device":   Device name, or node name if not present (json-string)
 - "len":      Maximum progress value (json-int)
 - "offset":   Current progress value (json-int)
               On success this is equal to len.
@@ -116,7 +116,7 @@ Data:
 
 - "type":     Job type (json-string; "stream" for image streaming
                                      "commit" for block commit)
-- "device":   Device name (json-string)
+- "device":   Device name, or node name if not present (json-string)
 - "len":      Maximum progress value (json-int)
 - "offset":   Current progress value (json-int)
               On success this is equal to len.
@@ -143,7 +143,7 @@ Emitted when a block job encounters an error.
 
 Data:
 
-- "device": device name (json-string)
+- "device": device name, or node name if not present (json-string)
 - "operation": I/O operation (json-string, "read" or "write")
 - "action": action that has been taken, it's one of the following (json-string):
     "ignore": error has been ignored, the job may fail later
@@ -167,7 +167,7 @@ Data:
 
 - "type":     Job type (json-string; "stream" for image streaming
                                      "commit" for block commit)
-- "device":   Device name (json-string)
+- "device":   Device name, or node name if not present (json-string)
 - "len":      Maximum progress value (json-int)
 - "offset":   Current progress value (json-int)
               On success this is equal to len.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d09079..59107ed 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -713,7 +713,7 @@
 #
 # @type: the job type ('stream' for image streaming)
 #
-# @device: the block device name
+# @device: the block device name, or node name if not present
 #
 # @len: the maximum progress value
 #
@@ -1456,7 +1456,7 @@
 #
 # Throttling can be disabled by setting the speed to 0.
 #
-# @device: the device name
+# @device: the device or node name of the owner of the block job.
 #
 # @speed:  the maximum speed, in bytes per second, or 0 for unlimited.
 #          Defaults to 0.
@@ -1487,7 +1487,7 @@
 # operation can be started at a later time to finish copying all data from the
 # backing file.
 #
-# @device: the device name
+# @device: the device or node name of the owner of the block job.
 #
 # @force: #optional whether to allow cancellation of a paused job (default
 #         false).  Since 1.3.
@@ -1513,7 +1513,7 @@
 # the operation is actually paused.  Cancelling a paused job automatically
 # resumes it.
 #
-# @device: the device name
+# @device: the device or node name of the owner of the block job.
 #
 # Returns: Nothing on success
 #          If no background operation is active on this device, DeviceNotActive
@@ -1533,7 +1533,7 @@
 #
 # This command also clears the error status of the job.
 #
-# @device: the device name
+# @device: the device or node name of the owner of the block job.
 #
 # Returns: Nothing on success
 #          If no background operation is active on this device, DeviceNotActive
@@ -1559,7 +1559,7 @@
 #
 # A cancelled or paused job cannot be completed.
 #
-# @device: the device name
+# @device: the device or node name of the owner of the block job.
 #
 # Returns: Nothing on success
 #          If no background operation is active on this device, DeviceNotActive
@@ -2404,7 +2404,7 @@
 #
 # @type: job type
 #
-# @device: device name
+# @device: device name, or node name if not present
 #
 # @len: maximum progress value
 #
@@ -2435,7 +2435,7 @@
 #
 # @type: job type
 #
-# @device: device name
+# @device: device name, or node name if not present
 #
 # @len: maximum progress value
 #
@@ -2458,7 +2458,7 @@
 #
 # Emitted when a block job encounters an error
 #
-# @device: device name
+# @device: device name, or node name if not present
 #
 # @operation: I/O operation
 #
@@ -2478,7 +2478,7 @@
 #
 # @type: job type
 #
-# @device: device name
+# @device: device name, or node name if not present
 #
 # @len: maximum progress value
 #
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (4 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 13:04   ` Max Reitz
  2016-04-29 15:07   ` Kevin Wolf
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This makes sure that the image we are steaming into is open in
read-write mode during the operation.

The block job is created on the destination image, but operation
blockers are also set on the active layer. We do this in order to
prevent other block jobs from running in parallel in the same chain.
See here for details on why that is currently not supported:

[Qemu-block] RFC: Status of the intermediate block streaming work
https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html

Finally, this also unblocks the stream operation in backing files.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block.c                   |  4 +++-
 block/stream.c            | 39 ++++++++++++++++++++++++++++++++++++++-
 blockdev.c                |  2 +-
 include/block/block_int.h |  5 ++++-
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 48638c9..f7698a1 100644
--- a/block.c
+++ b/block.c
@@ -1252,9 +1252,11 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
             backing_hd->drv ? backing_hd->drv->format_name : "");
 
     bdrv_op_block_all(backing_hd, bs->backing_blocker);
-    /* Otherwise we won't be able to commit due to check in bdrv_commit */
+    /* Otherwise we won't be able to commit or stream */
     bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                     bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_STREAM,
+                    bs->backing_blocker);
 out:
     bdrv_refresh_limits(bs, NULL);
 }
diff --git a/block/stream.c b/block/stream.c
index 332b9a1..ac02ddf 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -35,8 +35,10 @@ typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    BlockDriverState *active;
     BlockdevOnError on_error;
     char *backing_file_str;
+    int bs_flags;
 } StreamBlockJob;
 
 static int coroutine_fn stream_populate(BlockDriverState *bs,
@@ -66,6 +68,11 @@ static void stream_complete(BlockJob *job, void *opaque)
     StreamCompleteData *data = opaque;
     BlockDriverState *base = s->base;
 
+    if (s->active) {
+        bdrv_op_unblock_all(s->active, s->common.blocker);
+        bdrv_unref(s->active);
+    }
+
     if (!block_job_is_cancelled(&s->common) && data->reached_end &&
         data->ret == 0) {
         const char *base_id = NULL, *base_fmt = NULL;
@@ -79,6 +86,11 @@ static void stream_complete(BlockJob *job, void *opaque)
         bdrv_set_backing_hd(job->bs, base);
     }
 
+    /* Reopen the image back in read-only mode if necessary */
+    if (s->bs_flags != bdrv_get_flags(job->bs)) {
+        bdrv_reopen(job->bs, s->bs_flags, NULL);
+    }
+
     g_free(s->backing_file_str);
     block_job_completed(&s->common, data->ret);
     g_free(data);
@@ -216,13 +228,15 @@ static const BlockJobDriver stream_job_driver = {
     .set_speed     = stream_set_speed,
 };
 
-void stream_start(BlockDriverState *bs, BlockDriverState *base,
+void stream_start(BlockDriverState *bs, BlockDriverState *active,
+                  BlockDriverState *base,
                   const char *backing_file_str, int64_t speed,
                   BlockdevOnError on_error,
                   BlockCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
     StreamBlockJob *s;
+    int orig_bs_flags;
 
     if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
          on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
@@ -231,13 +245,36 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
         return;
     }
 
+    /* Make sure that the image is opened in read-write mode */
+    orig_bs_flags = bdrv_get_flags(bs);
+    if (!(orig_bs_flags & BDRV_O_RDWR)) {
+        if (bdrv_reopen(bs, orig_bs_flags | BDRV_O_RDWR, errp) != 0) {
+            return;
+        }
+    }
+
     s = block_job_create(&stream_job_driver, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
     }
 
+    /* If we are streaming to an intermediate image, we need to block
+     * the active layer. Due to a race condition, having several block
+     * jobs running in the same chain is broken and we currently don't
+     * support it. See here for details:
+     * https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
+     */
+    if (active) {
+        bdrv_op_block_all(active, s->common.blocker);
+        /* Hold a reference to the active layer, otherwise
+         * bdrv_close_all() will destroy it if we shut down the VM */
+        bdrv_ref(active);
+    }
+
     s->base = base;
+    s->active = active;
     s->backing_file_str = g_strdup(backing_file_str);
+    s->bs_flags = orig_bs_flags;
 
     s->on_error = on_error;
     s->common.co = qemu_coroutine_create(stream_run);
diff --git a/blockdev.c b/blockdev.c
index d1f5dfb..2e7712e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3038,7 +3038,7 @@ void qmp_block_stream(const char *device,
     /* backing_file string overrides base bs filename */
     base_name = has_backing_file ? backing_file : base_name;
 
-    stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
+    stream_start(bs, NULL, base_bs, base_name, has_speed ? speed : 0,
                  on_error, block_job_cb, bs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 10d8759..0a53d5f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -597,6 +597,8 @@ int is_windows_drive(const char *filename);
 /**
  * stream_start:
  * @bs: Block device to operate on.
+ * @active: The active layer of the chain where @bs belongs, or %NULL
+ *          if @bs is already the topmost device.
  * @base: Block device that will become the new base, or %NULL to
  * flatten the whole backing file chain onto @bs.
  * @base_id: The file name that will be written to @bs as the new
@@ -613,7 +615,8 @@ int is_windows_drive(const char *filename);
  * streaming job, the backing file of @bs will be changed to
  * @base_id in the written image and to @base in the live BlockDriverState.
  */
-void stream_start(BlockDriverState *bs, BlockDriverState *base,
+void stream_start(BlockDriverState *bs, BlockDriverState *active,
+                  BlockDriverState *base,
                   const char *base_id, int64_t speed, BlockdevOnError on_error,
                   BlockCompletionFunc *cb,
                   void *opaque, Error **errp);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (5 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-27 13:34   ` Max Reitz
                     ` (2 more replies)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 08/11] docs: Document how to stream " Alberto Garcia
                   ` (4 subsequent siblings)
  11 siblings, 3 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This patch makes the 'device' parameter of the 'block-stream' command
accept a node name as well as a device name.

In addition to that, operation blockers will be checked in all
intermediate nodes between the top and the base node.

Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
no longer returns DeviceNotFound, iotest 030 is updated to expect
GenericError instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 blockdev.c             | 31 +++++++++++++++++++++++--------
 qapi/block-core.json   | 10 +++++++---
 tests/qemu-iotests/030 |  2 +-
 3 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 2e7712e..bfdc0e3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2989,6 +2989,7 @@ void qmp_block_stream(const char *device,
     BlockBackend *blk;
     BlockDriverState *bs;
     BlockDriverState *base_bs = NULL;
+    BlockDriverState *active;
     AioContext *aio_context;
     Error *local_err = NULL;
     const char *base_name = NULL;
@@ -2997,21 +2998,19 @@ void qmp_block_stream(const char *device,
         on_error = BLOCKDEV_ON_ERROR_REPORT;
     }
 
-    blk = blk_by_name(device);
-    if (!blk) {
-        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-                  "Device '%s' not found", device);
+    bs = bdrv_lookup_bs(device, device, errp);
+    if (!bs) {
         return;
     }
 
-    aio_context = blk_get_aio_context(blk);
+    aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
 
-    if (!blk_is_available(blk)) {
+    blk = blk_by_name(device);
+    if (blk && !blk_is_available(blk)) {
         error_setg(errp, "Device '%s' has no medium", device);
         goto out;
     }
-    bs = blk_bs(blk);
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
         goto out;
@@ -3027,6 +3026,22 @@ void qmp_block_stream(const char *device,
         base_name = base;
     }
 
+    /* Look for the top-level node that contains 'bs' in its chain */
+    active = NULL;
+    do {
+        active = bdrv_next(active);
+    } while (active && !bdrv_chain_contains(active, bs));
+
+    if (active == NULL) {
+        error_setg(errp, "Cannot find top level node for '%s'", device);
+        goto out;
+    }
+
+    /* Check for op blockers in the top-level node too */
+    if (bdrv_op_is_blocked(active, BLOCK_OP_TYPE_STREAM, errp)) {
+        goto out;
+    }
+
     /* if we are streaming the entire chain, the result will have no backing
      * file, and specifying one is therefore an error */
     if (base_bs == NULL && has_backing_file) {
@@ -3038,7 +3053,7 @@ void qmp_block_stream(const char *device,
     /* backing_file string overrides base bs filename */
     base_name = has_backing_file ? backing_file : base_name;
 
-    stream_start(bs, NULL, base_bs, base_name, has_speed ? speed : 0,
+    stream_start(bs, active, base_bs, base_name, has_speed ? speed : 0,
                  on_error, block_job_cb, bs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 59107ed..e20193a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1405,6 +1405,10 @@
 # with query-block-jobs.  The operation can be stopped before it has completed
 # using the block-job-cancel command.
 #
+# The node that receives the data is called the top image, can be located
+# in any part of the whole chain and can be specified using its device
+# or node name.
+#
 # If a base file is specified then sectors are not copied from that base file and
 # its backing chain.  When streaming completes the image file will have the base
 # file as its backing file.  This can be used to stream a subset of the backing
@@ -1413,12 +1417,12 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
-# @device: the device name
+# @device: the device or node name of the top image
 #
 # @base:   #optional the common backing file name
 #
-# @backing-file: #optional The backing file string to write into the active
-#                          layer. This filename is not validated.
+# @backing-file: #optional The backing file string to write into the top
+#                          image. This filename is not validated.
 #
 #                          If a pathname string is such that it cannot be
 #                          resolved by QEMU, that means that subsequent QMP or
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 3ac2443..107049b 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -126,7 +126,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 
     def test_device_not_found(self):
         result = self.vm.qmp('block-stream', device='nonexistent')
-        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+        self.assert_qmp(result, 'error/class', 'GenericError')
 
 
 class TestSmallerBackingFile(iotests.QMPTestCase):
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 08/11] docs: Document how to stream to an intermediate layer
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (6 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 09/11] qemu-iotests: test streaming " Alberto Garcia
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 docs/live-block-ops.txt | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
index a257087..a05d869 100644
--- a/docs/live-block-ops.txt
+++ b/docs/live-block-ops.txt
@@ -10,9 +10,9 @@ Snapshot live merge
 Given a snapshot chain, described in this document in the following
 format:
 
-[A] -> [B] -> [C] -> [D]
+[A] <- [B] <- [C] <- [D] <- [E]
 
-Where the rightmost object ([D] in the example) described is the current
+Where the rightmost object ([E] in the example) described is the current
 image which the guest OS has write access to. To the left of it is its base
 image, and so on accordingly until the leftmost image, which has no
 base.
@@ -21,11 +21,14 @@ The snapshot live merge operation transforms such a chain into a
 smaller one with fewer elements, such as this transformation relative
 to the first example:
 
-[A] -> [D]
+[A] <- [E]
 
-Currently only forward merge with target being the active image is
-supported, that is, data copy is performed in the right direction with
-destination being the rightmost image.
+Data is copied in the right direction with destination being the
+rightmost image, but any other intermediate image can be specified
+instead. In this example data is copied from [C] into [D], so [D] can
+be backed by [B]:
+
+[A] <- [B] <- [D] <- [E]
 
 The operation is implemented in QEMU through image streaming facilities.
 
@@ -35,14 +38,20 @@ streaming operation completes it raises a QMP event. 'block_stream'
 copies data from the backing file(s) into the active image. When finished,
 it adjusts the backing file pointer.
 
-The 'base' parameter specifies an image which data need not be streamed from.
-This image will be used as the backing file for the active image when the
-operation is finished.
+The 'base' parameter specifies an image which data need not be
+streamed from. This image will be used as the backing file for the
+destination image when the operation is finished.
 
-In the example above, the command would be:
+In the first example above, the command would be:
 
-(qemu) block_stream virtio0 A
+(qemu) block_stream virtio0 file-A.img
 
+In order to specify a destination image different from the active
+(rightmost) one we can use its (previously set) node name instead.
+
+In the second example above, the command would be:
+
+(qemu) block_stream node-D file-B.img
 
 Live block copy
 ===============
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 09/11] qemu-iotests: test streaming to an intermediate layer
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (7 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 08/11] docs: Document how to stream " Alberto Garcia
@ 2016-04-04 13:43 ` Alberto Garcia
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations Alberto Garcia
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This adds test_stream_intermediate(), similar to test_stream() but
streams to the intermediate image instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/030     | 18 +++++++++++++++++-
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 107049b..1c53f80 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -36,7 +36,8 @@ class TestSingleDrive(iotests.QMPTestCase):
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % mid_img, test_img)
         qemu_io('-f', 'raw', '-c', 'write -P 0x1 0 512', backing_img)
         qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 524288 512', mid_img)
-        self.vm = iotests.VM().add_drive("blkdebug::" + test_img)
+        opts = "backing.node-name=mid"
+        self.vm = iotests.VM().add_drive("blkdebug::" + test_img, opts)
         self.vm.launch()
 
     def tearDown(self):
@@ -60,6 +61,21 @@ class TestSingleDrive(iotests.QMPTestCase):
                          qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
                          'image file map does not match backing file after streaming')
 
+    def test_stream_intermediate(self):
+        self.assert_no_active_block_jobs()
+
+        result = self.vm.qmp('block-stream', device='mid')
+        self.assert_qmp(result, 'return', {})
+
+        self.wait_until_completed(drive='mid')
+
+        self.assert_no_active_block_jobs()
+        self.vm.shutdown()
+
+        self.assertEqual(qemu_io('-f', 'raw', '-c', 'map', backing_img),
+                         qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img),
+                         'image file map does not match backing file after streaming')
+
     def test_stream_pause(self):
         self.assert_no_active_block_jobs()
 
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6323079..96961ed 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-..............
+...............
 ----------------------------------------------------------------------
-Ran 14 tests
+Ran 15 tests
 
 OK
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (8 preceding siblings ...)
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 09/11] qemu-iotests: test streaming " Alberto Garcia
@ 2016-04-04 13:44 ` Alberto Garcia
  2016-04-27 13:48   ` Max Reitz
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping " Alberto Garcia
  2016-04-08 10:00 ` [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Stefan Hajnoczi
  11 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

This test case checks that it's not possible to perform two
block-stream operations if there are nodes involved in both.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/030     | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 1c53f80..6348872 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -145,6 +145,66 @@ class TestSingleDrive(iotests.QMPTestCase):
         self.assert_qmp(result, 'error/class', 'GenericError')
 
 
+class TestMultipleOps(iotests.QMPTestCase):
+    num_imgs = 9 # It must be > 5 or some tests will break
+    image_len = num_imgs * 1024 * 1024
+    imgs = []
+
+    def setUp(self):
+        opts = []
+        self.imgs = []
+
+        # Initialize file names and command-line options
+        for i in range(self.num_imgs):
+            img_depth = self.num_imgs - i - 1
+            opts.append("backing." * img_depth + "node-name=node%d" % i)
+            self.imgs.append(os.path.join(iotests.test_dir, 'img-%d.img' % i))
+
+        # Create all images
+        iotests.create_image(self.imgs[0], self.image_len)
+        for i in range(1, self.num_imgs):
+            qemu_img('create', '-f', iotests.imgfmt,
+                     '-o', 'backing_file=%s' % self.imgs[i-1], self.imgs[i])
+
+        # Put data into some images so there's something to copy
+        for i in range(1, self.num_imgs, 2):
+            qemu_io('-f', iotests.imgfmt,
+                    '-c', 'write -P %d %d %d' % (i, i*1024*1024, 1024*1024),
+                    self.imgs[i])
+
+        # Attach the drive to the VM
+        self.vm = iotests.VM()
+        self.vm.add_drive("blkdebug::" + self.imgs[-1], ','.join(opts))
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        for img in self.imgs:
+            os.remove(img)
+
+    # Test that it's not possible to perform two block-stream
+    # operations if there are nodes involved in both.
+    def test_overlapping(self):
+        self.assert_no_active_block_jobs()
+
+        # Set a speed limit to make sure that this job blocks the rest
+        result = self.vm.qmp('block-stream', device='node4', base=self.imgs[0], speed=32768)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-stream', device='node5', base=self.imgs[1])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        result = self.vm.qmp('block-stream', device='node3', base=self.imgs[2])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        result = self.vm.qmp('block-stream', device='node4')
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.wait_until_completed(drive='node4')
+        self.assert_no_active_block_jobs()
+
+        self.vm.shutdown()
+
 class TestSmallerBackingFile(iotests.QMPTestCase):
     backing_len = 1 * 1024 * 1024 # MB
     image_len = 2 * backing_len
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 96961ed..b6f2576 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...............
+................
 ----------------------------------------------------------------------
-Ran 15 tests
+Ran 16 tests
 
 OK
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping block-stream operations
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (9 preceding siblings ...)
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations Alberto Garcia
@ 2016-04-04 13:44 ` Alberto Garcia
  2016-04-27 13:50   ` Max Reitz
  2016-04-08 10:00 ` [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Stefan Hajnoczi
  11 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-04-04 13:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Max Reitz, Stefan Hajnoczi

Even if there are no common nodes involved, we currently don't support
several operations at the same time in the same backing chain.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/030     | 21 +++++++++++++++++++++
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 6348872..144f174 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -205,6 +205,27 @@ class TestMultipleOps(iotests.QMPTestCase):
 
         self.vm.shutdown()
 
+    # We currently don't support non-overlapping jobs if they are in
+    # the same chain. If we do it in the future this test will need to
+    # be updated.
+    def test_non_overlapping(self):
+        self.assert_no_active_block_jobs()
+
+        # Set a speed limit to make sure that this job blocks the rest
+        result = self.vm.qmp('block-stream', device='node4', base=self.imgs[0], speed=32768)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-stream', device='drive0', top=self.imgs[5])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[5])
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.wait_until_completed(drive='node4')
+        self.assert_no_active_block_jobs()
+
+        self.vm.shutdown()
+
 class TestSmallerBackingFile(iotests.QMPTestCase):
     backing_len = 1 * 1024 * 1024 # MB
     image_len = 2 * backing_len
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index b6f2576..52d796e 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-................
+.................
 ----------------------------------------------------------------------
-Ran 16 tests
+Ran 17 tests
 
 OK
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer
  2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
                   ` (10 preceding siblings ...)
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping " Alberto Garcia
@ 2016-04-08 10:00 ` Stefan Hajnoczi
  11 siblings, 0 replies; 56+ messages in thread
From: Stefan Hajnoczi @ 2016-04-08 10:00 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 6271 bytes --]

On Mon, Apr 04, 2016 at 04:43:50PM +0300, Alberto Garcia wrote:
> Hi!
> 
> Half a year later :) here's the new version of the intermediate block
> streaming patches.

CCing Jeff Cody, block jobs maintainer.

> Quick summary: the feature was already working since June but there
> were two problems that made it impossible to merge the patches. Here's
> the full description:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
> And here's the TL;DR version:
> 
>    1) Opening a BDS in read-write mode with bdrv_reopen() has the side
>       effect of reopening its backing chain in read-only mode.
> 
>    2) bdrv_reopen() can trigger the completion of an existing block
>       job. If that job modifies the backing chain in which that reopen
>       call is operating, it can crash QEMU.
> 
> Now, these problems are only relevant if there are two block jobs
> running at the same time in the same backing chain. This is a nice
> feature, but it's not essential for the intermediate block streaming
> operation. Therefore I decided to disable it in this version of the
> series.
> 
> The way it works is that if you run block-stream on an intermediate
> node, it will block the active layer as well, preventing other block
> jobs in the same chain.
> 
> Other than that there are no significant changes compared to v8, but a
> couple of things are different:
> 
>  - The code has been rebased
>  - The patch that introduced block_job_next() has been split into
>    several: one that adds the function and the rest that use it in
>    different parts of the QEMU code.
>  - Some patches have already been merged into QEMU so they have been
>    removed from the series.
>  - The test that ran several block-stream operations in parallel now
>    verifies that they are forbidden.
> 
> I think that's all. Comments and questions are welcome!
> 
> Berto
> 
> v9:
> - Rebase the code
> - Block the active layer in order to forbid other block jobs in the
>   same chain.
> - Split the patch that adds block_job_next() into 4 (new patches 1-4).
> - Replace the test that performs several block-stream operations in
>   parallel with one that check that they're forbidden.
> - Remove patches that have already been merged.
> 
> v8: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05754.html
> - Rebased on top of Stefan's block branch (0a35bce416)
> - The loop that pauses the block jobs in bdrv_drain_all() is now split
>   in two: one that iterates the list of block jobs to stop them, and
>   one that iterates the root bds in order to get the aio contexts.
> 
> v7: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02580.html
> - Rebased against the current master
> - Updated bdrv_drain_all() to use the new block_job_next() API.
> 
> v6: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03046.html
> - fix the no-op test following Max's suggestions
> 
> v5: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03006.html
> - Fix a few typos
> - Minor documentation updates
> - Update test_stream_partial() to test no-ops
> - New test case: test_stream_parallel()
> - New test case: test_stream_overlapping()
> 
> v4: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01878.html
> - Refactor find_block_job to use the error from bdrv_lookup_bs()
> - Don't use QERR_DEVICE_IN_USE in block_job_create() since we can be
>   dealing with nodes now.
> - Fix @device comment in the BlockJobInfo documentation
> - stream_start(): simplify the bdrv_reopen() call and use
>   bdrv_get_device_or_node_name() for error messages.
> - Use a different variable name for BlockDriverState *i
> - Documentation fixes in docs/live-block-ops.txt
> - Update iotest 30 since now test_device_not_found() returns
>   GenericError
> - Fix test case test_stream_partial()
> - Add new test case test_stream_intermediate()
> - Fix typos
> 
> v3: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg00806.html
> - Keep a list of block jobs and make qmp_query_block_jobs() iterate
>   over it.
> 
> v2: https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg04798.html
> - The 'block-stream' command does not have a 'node-name' parameter
>   anymore and reuses 'device' for that purpose.
> - Block jobs can now be owned by any intermediate node, and not just
>   by the ones at the root. query-block-jobs is updated to reflect that
>   change.
> - The 'device' parameter of all 'block-job-*' commands can now take a
>   node name.
> - The BlockJobInfo type and all BLOCK_JOB_* events report the node
>   name in the 'device' field if the node does not have a device name.
> - All intermediate nodes are blocked (and checked for blockers) during
>   the streaming operation.
> 
> v1: https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04116.html
> 
> Alberto Garcia (11):
>   block: keep a list of block jobs
>   block: use the block job list in bdrv_drain_all()
>   block: use the block job list in qmp_query_block_jobs()
>   block: use the block job list in bdrv_close()
>   block: allow block jobs in any arbitrary node
>   block: Support streaming to an intermediate layer
>   block: Add QMP support for streaming to an intermediate layer
>   docs: Document how to stream to an intermediate layer
>   qemu-iotests: test streaming to an intermediate layer
>   qemu-iotests: test overlapping block-stream operations
>   qemu-iotests: test non-overlapping block-stream operations
> 
>  block.c                    |  29 ++++---------
>  block/io.c                 |  21 ++++++----
>  block/stream.c             |  39 ++++++++++++++++-
>  blockdev.c                 |  68 ++++++++++++++++++------------
>  blockjob.c                 |  18 +++++++-
>  docs/live-block-ops.txt    |  31 +++++++++-----
>  docs/qmp-events.txt        |   8 ++--
>  include/block/block_int.h  |   5 ++-
>  include/block/blockjob.h   |  14 +++++++
>  qapi/block-core.json       |  30 ++++++++------
>  tests/qemu-iotests/030     | 101 ++++++++++++++++++++++++++++++++++++++++++++-
>  tests/qemu-iotests/030.out |   4 +-
>  12 files changed, 276 insertions(+), 92 deletions(-)
> 
> -- 
> 2.8.0.rc3
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
@ 2016-04-27 11:59   ` Max Reitz
  2016-04-29 14:22   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Max Reitz @ 2016-04-27 11:59 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 592 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> The current way to obtain the list of existing block jobs is to
> iterate over all root nodes and check which ones own a job.
> 
> Since we want to be able to support block jobs in other nodes as well,
> this patch keeps a list of jobs that is updated every time one is
> created or destroyed.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockjob.c               | 13 +++++++++++++
>  include/block/blockjob.h | 14 ++++++++++++++
>  2 files changed, 27 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all() Alberto Garcia
@ 2016-04-27 12:04   ` Max Reitz
  2016-04-27 12:08     ` Alberto Garcia
  2016-04-29 14:25   ` Kevin Wolf
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2016-04-27 12:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 2271 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
> over all top-level BlockDriverStates. Therefore the code is unable to
> find block jobs in other nodes.
> 
> This patch uses block_job_next() to iterate over all block jobs.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/io.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c4869b9..884ed1e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -288,15 +288,21 @@ void bdrv_drain_all(void)
>      /* Always run first iteration so any pending completion BHs run */
>      bool busy = true;
>      BlockDriverState *bs = NULL;
> +    BlockJob *job = NULL;
>      GSList *aio_ctxs = NULL, *ctx;
>  
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = bdrv_get_aio_context(job->bs);
> +
> +        aio_context_acquire(aio_context);
> +        block_job_pause(job);
> +        aio_context_release(aio_context);
> +    }
> +
>      while ((bs = bdrv_next(bs))) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
>  
>          aio_context_acquire(aio_context);
> -        if (bs->job) {
> -            block_job_pause(bs->job);
> -        }
>          bdrv_drain_recurse(bs);
>          aio_context_release(aio_context);
>  
> @@ -333,14 +339,11 @@ void bdrv_drain_all(void)
>          }
>      }
>  
> -    bs = NULL;
> -    while ((bs = bdrv_next(bs))) {
> -        AioContext *aio_context = bdrv_get_aio_context(bs);
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = bdrv_get_aio_context(job->bs);

Technically, the "bs = NULL;" before didn't do anything either. But in
my opinion, it made the code more readable, therefore I'd really like a
"job = NULL;" before this loop, too.

But it's correct as it is, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

>  
>          aio_context_acquire(aio_context);
> -        if (bs->job) {
> -            block_job_resume(bs->job);
> -        }
> +        block_job_resume(job);
>          aio_context_release(aio_context);
>      }
>      g_slist_free(aio_ctxs);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all()
  2016-04-27 12:04   ` Max Reitz
@ 2016-04-27 12:08     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-27 12:08 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 27 Apr 2016 02:04:33 PM CEST, Max Reitz wrote:
>> -    bs = NULL;
>> -    while ((bs = bdrv_next(bs))) {
>> -        AioContext *aio_context = bdrv_get_aio_context(bs);
>> +    while ((job = block_job_next(job))) {
>> +        AioContext *aio_context = bdrv_get_aio_context(job->bs);
>
> Technically, the "bs = NULL;" before didn't do anything either. But in
> my opinion, it made the code more readable, therefore I'd really like
> a "job = NULL;" before this loop, too.

Looks reasonable, I'll update it in the next revision.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs() Alberto Garcia
@ 2016-04-27 12:09   ` Max Reitz
  2016-04-29 14:32   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Max Reitz @ 2016-04-27 12:09 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 437 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
> this function can only find those in top-level BlockDriverStates.
> 
> This patch uses block_job_next() instead.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockdev.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close() Alberto Garcia
@ 2016-04-27 12:14   ` Max Reitz
  2016-04-29 14:38   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Max Reitz @ 2016-04-27 12:14 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 2136 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> bdrv_close_all() cancels all block jobs by iterating over all
> BlockDriverStates. This patch simplifies the code by iterating
> directly over the block jobs using block_job_next().
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d36eb75..48638c9 100644
> --- a/block.c
> +++ b/block.c
> @@ -2182,8 +2182,7 @@ static void bdrv_close(BlockDriverState *bs)
>  
>  void bdrv_close_all(void)
>  {
> -    BlockDriverState *bs;
> -    AioContext *aio_context;
> +    BlockJob *job;
>  
>      /* Drop references from requests still in flight, such as canceled block
>       * jobs whose AIO context has not been polled yet */
> @@ -2193,23 +2192,11 @@ void bdrv_close_all(void)
>      blockdev_close_all_bdrv_states();
>  
>      /* Cancel all block jobs */
> -    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
> -        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
> -            aio_context = bdrv_get_aio_context(bs);
> -
> -            aio_context_acquire(aio_context);
> -            if (bs->job) {
> -                block_job_cancel_sync(bs->job);
> -                aio_context_release(aio_context);
> -                break;
> -            }
> -            aio_context_release(aio_context);
> -        }
> -
> -        /* All the remaining BlockDriverStates are referenced directly or
> -         * indirectly from block jobs, so there needs to be at least one BDS
> -         * directly used by a block job */
> -        assert(bs);
> +    while ((job = block_job_next(NULL))) {
> +        AioContext *aio_context = bdrv_get_aio_context(job->bs);
> +        aio_context_acquire(aio_context);
> +        block_job_cancel_sync(job);
> +        aio_context_release(aio_context);
>      }

I'd like an assert(QTAILQ_EMPTY(&all_bdrv_states)) after this loop. This
is basically what the "assert(bs);" was for in the old version.

Apart from that, good patch.

Max

>  }
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
@ 2016-04-27 12:30   ` Max Reitz
  2016-04-27 14:59     ` Alberto Garcia
  2016-04-29 15:00   ` Kevin Wolf
  2016-04-29 15:25   ` Eric Blake
  2 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2016-04-27 12:30 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 4359 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> Currently, block jobs can only be owned by root nodes. This patch
> allows block jobs to be in any arbitrary node, by making the following
> changes:
> 
> - Block jobs can now be identified by the node name of their
>   BlockDriverState in addition to the device name. Since both device
>   and node names live in the same namespace there's no ambiguity.
> 
> - The "device" parameter used by all commands that operate on block
>   jobs can also be a node name now.
> 
> - The node name is used as a fallback to fill in the BlockJobInfo
>   structure and all BLOCK_JOB_* events if there is no device name for
>   that job.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockdev.c           | 18 ++++++++++--------
>  blockjob.c           |  5 +++--
>  docs/qmp-events.txt  |  8 ++++----
>  qapi/block-core.json | 20 ++++++++++----------
>  4 files changed, 27 insertions(+), 24 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index edbcc19..d1f5dfb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3685,8 +3685,10 @@ void qmp_blockdev_mirror(const char *device, const char *target,
>      aio_context_release(aio_context);
>  }
>  
> -/* Get the block job for a given device name and acquire its AioContext */
> -static BlockJob *find_block_job(const char *device, AioContext **aio_context,
> +/* Get the block job for a given device or node name
> + * and acquire its AioContext */
> +static BlockJob *find_block_job(const char *device_or_node,
> +                                AioContext **aio_context,
>                                  Error **errp)
>  {
>      BlockBackend *blk;
> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>  
>      *aio_context = NULL;
>  
> -    blk = blk_by_name(device);
> -    if (!blk) {
> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
> +    if (!bs) {

If we get here, *errp is already set... [1]

>          goto notfound;
>      }
>  
> -    *aio_context = blk_get_aio_context(blk);
> +    *aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(*aio_context);
>  
> -    if (!blk_is_available(blk)) {
> +    blk = blk_by_name(device_or_node);
> +    if (blk && !blk_is_available(blk)) {

I'd just drop this. The reason blk_is_available() was added here because
it became possible for BBs not to have a BDS.

Now that you get the BDS directly through bdrv_lookup_bs(), it's no
longer necessary.

>          goto notfound;
>      }
> -    bs = blk_bs(blk);
>  
>      if (!bs->job) {
>          goto notfound;
> @@ -3715,7 +3717,7 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>  
>  notfound:
>      error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
> -              "No active block job on device '%s'", device);
> +              "No active block job on node '%s'", device_or_node);

[1] ...and then we'll get a failed assertion here (through error_setv()).

>      if (*aio_context) {
>          aio_context_release(*aio_context);
>          *aio_context = NULL;
> diff --git a/blockjob.c b/blockjob.c
> index 3557048..2ab4794 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -67,7 +67,8 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
>      BlockJob *job;
>  
>      if (bs->job) {
> -        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
> +        error_setg(errp, "Node '%s' is in use",
> +                   bdrv_get_device_or_node_name(bs));
>          return NULL;
>      }
>      bdrv_ref(bs);
> @@ -78,7 +79,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
>      bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
>  
>      job->driver        = driver;
> -    job->id            = g_strdup(bdrv_get_device_name(bs));
> +    job->id            = g_strdup(bdrv_get_device_or_node_name(bs));

Hm, since this is only used for events, it's not too bad that a block
job will have a different name if it was created on a root BDS by
specifying its node name. But it still is kind of strange.

But as I said, it should be fine as long as one can still use the node
name for controlling it (which is the case).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer Alberto Garcia
@ 2016-04-27 13:04   ` Max Reitz
  2016-04-28  9:23     ` Alberto Garcia
  2016-04-29 15:07   ` Kevin Wolf
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2016-04-27 13:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 7140 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> This makes sure that the image we are steaming into is open in
> read-write mode during the operation.
> 
> The block job is created on the destination image, but operation
> blockers are also set on the active layer. We do this in order to
> prevent other block jobs from running in parallel in the same chain.
> See here for details on why that is currently not supported:
> 
> [Qemu-block] RFC: Status of the intermediate block streaming work
> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
> Finally, this also unblocks the stream operation in backing files.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block.c                   |  4 +++-
>  block/stream.c            | 39 ++++++++++++++++++++++++++++++++++++++-
>  blockdev.c                |  2 +-
>  include/block/block_int.h |  5 ++++-
>  4 files changed, 46 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 48638c9..f7698a1 100644
> --- a/block.c
> +++ b/block.c
> @@ -1252,9 +1252,11 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
>              backing_hd->drv ? backing_hd->drv->format_name : "");
>  
>      bdrv_op_block_all(backing_hd, bs->backing_blocker);
> -    /* Otherwise we won't be able to commit due to check in bdrv_commit */
> +    /* Otherwise we won't be able to commit or stream */
>      bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
>                      bs->backing_blocker);
> +    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_STREAM,
> +                    bs->backing_blocker);
>  out:
>      bdrv_refresh_limits(bs, NULL);
>  }
> diff --git a/block/stream.c b/block/stream.c
> index 332b9a1..ac02ddf 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -35,8 +35,10 @@ typedef struct StreamBlockJob {
>      BlockJob common;
>      RateLimit limit;
>      BlockDriverState *base;
> +    BlockDriverState *active;
>      BlockdevOnError on_error;
>      char *backing_file_str;
> +    int bs_flags;
>  } StreamBlockJob;
>  
>  static int coroutine_fn stream_populate(BlockDriverState *bs,
> @@ -66,6 +68,11 @@ static void stream_complete(BlockJob *job, void *opaque)
>      StreamCompleteData *data = opaque;
>      BlockDriverState *base = s->base;
>  
> +    if (s->active) {
> +        bdrv_op_unblock_all(s->active, s->common.blocker);
> +        bdrv_unref(s->active);
> +    }
> +
>      if (!block_job_is_cancelled(&s->common) && data->reached_end &&
>          data->ret == 0) {
>          const char *base_id = NULL, *base_fmt = NULL;
> @@ -79,6 +86,11 @@ static void stream_complete(BlockJob *job, void *opaque)
>          bdrv_set_backing_hd(job->bs, base);
>      }
>  
> +    /* Reopen the image back in read-only mode if necessary */
> +    if (s->bs_flags != bdrv_get_flags(job->bs)) {
> +        bdrv_reopen(job->bs, s->bs_flags, NULL);
> +    }
> +
>      g_free(s->backing_file_str);
>      block_job_completed(&s->common, data->ret);
>      g_free(data);
> @@ -216,13 +228,15 @@ static const BlockJobDriver stream_job_driver = {
>      .set_speed     = stream_set_speed,
>  };
>  
> -void stream_start(BlockDriverState *bs, BlockDriverState *base,
> +void stream_start(BlockDriverState *bs, BlockDriverState *active,
> +                  BlockDriverState *base,
>                    const char *backing_file_str, int64_t speed,
>                    BlockdevOnError on_error,
>                    BlockCompletionFunc *cb,
>                    void *opaque, Error **errp)
>  {
>      StreamBlockJob *s;
> +    int orig_bs_flags;
>  
>      if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
>           on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
> @@ -231,13 +245,36 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
>          return;
>      }
>  
> +    /* Make sure that the image is opened in read-write mode */
> +    orig_bs_flags = bdrv_get_flags(bs);
> +    if (!(orig_bs_flags & BDRV_O_RDWR)) {
> +        if (bdrv_reopen(bs, orig_bs_flags | BDRV_O_RDWR, errp) != 0) {
> +            return;
> +        }
> +    }
> +
>      s = block_job_create(&stream_job_driver, bs, speed, cb, opaque, errp);
>      if (!s) {
>          return;
>      }
>  
> +    /* If we are streaming to an intermediate image, we need to block
> +     * the active layer. Due to a race condition, having several block
> +     * jobs running in the same chain is broken and we currently don't
> +     * support it. See here for details:
> +     * https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> +     */
> +    if (active) {
> +        bdrv_op_block_all(active, s->common.blocker);

block_job_create() unblocks BLOCK_OP_TYPE_DATAPLANE. Maybe this should
do the same?

No other objections from me.

Max

> +        /* Hold a reference to the active layer, otherwise
> +         * bdrv_close_all() will destroy it if we shut down the VM */
> +        bdrv_ref(active);
> +    }
> +
>      s->base = base;
> +    s->active = active;
>      s->backing_file_str = g_strdup(backing_file_str);
> +    s->bs_flags = orig_bs_flags;
>  
>      s->on_error = on_error;
>      s->common.co = qemu_coroutine_create(stream_run);
> diff --git a/blockdev.c b/blockdev.c
> index d1f5dfb..2e7712e 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3038,7 +3038,7 @@ void qmp_block_stream(const char *device,
>      /* backing_file string overrides base bs filename */
>      base_name = has_backing_file ? backing_file : base_name;
>  
> -    stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
> +    stream_start(bs, NULL, base_bs, base_name, has_speed ? speed : 0,
>                   on_error, block_job_cb, bs, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 10d8759..0a53d5f 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -597,6 +597,8 @@ int is_windows_drive(const char *filename);
>  /**
>   * stream_start:
>   * @bs: Block device to operate on.
> + * @active: The active layer of the chain where @bs belongs, or %NULL
> + *          if @bs is already the topmost device.
>   * @base: Block device that will become the new base, or %NULL to
>   * flatten the whole backing file chain onto @bs.
>   * @base_id: The file name that will be written to @bs as the new
> @@ -613,7 +615,8 @@ int is_windows_drive(const char *filename);
>   * streaming job, the backing file of @bs will be changed to
>   * @base_id in the written image and to @base in the live BlockDriverState.
>   */
> -void stream_start(BlockDriverState *bs, BlockDriverState *base,
> +void stream_start(BlockDriverState *bs, BlockDriverState *active,
> +                  BlockDriverState *base,
>                    const char *base_id, int64_t speed, BlockdevOnError on_error,
>                    BlockCompletionFunc *cb,
>                    void *opaque, Error **errp);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
@ 2016-04-27 13:34   ` Max Reitz
  2016-04-28 12:20     ` Alberto Garcia
  2016-04-29 15:11   ` Kevin Wolf
  2016-04-29 15:29   ` Eric Blake
  2 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2016-04-27 13:34 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 6071 bytes --]

On 04.04.2016 15:43, Alberto Garcia wrote:
> This patch makes the 'device' parameter of the 'block-stream' command
> accept a node name as well as a device name.
> 
> In addition to that, operation blockers will be checked in all
> intermediate nodes between the top and the base node.
> 
> Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
> no longer returns DeviceNotFound, iotest 030 is updated to expect
> GenericError instead.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockdev.c             | 31 +++++++++++++++++++++++--------
>  qapi/block-core.json   | 10 +++++++---
>  tests/qemu-iotests/030 |  2 +-
>  3 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 2e7712e..bfdc0e3 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2989,6 +2989,7 @@ void qmp_block_stream(const char *device,
>      BlockBackend *blk;
>      BlockDriverState *bs;
>      BlockDriverState *base_bs = NULL;
> +    BlockDriverState *active;
>      AioContext *aio_context;
>      Error *local_err = NULL;
>      const char *base_name = NULL;
> @@ -2997,21 +2998,19 @@ void qmp_block_stream(const char *device,
>          on_error = BLOCKDEV_ON_ERROR_REPORT;
>      }
>  
> -    blk = blk_by_name(device);
> -    if (!blk) {
> -        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> -                  "Device '%s' not found", device);
> +    bs = bdrv_lookup_bs(device, device, errp);
> +    if (!bs) {
>          return;
>      }
>  
> -    aio_context = blk_get_aio_context(blk);
> +    aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
>  
> -    if (!blk_is_available(blk)) {
> +    blk = blk_by_name(device);
> +    if (blk && !blk_is_available(blk)) {
>          error_setg(errp, "Device '%s' has no medium", device);
>          goto out;
>      }
> -    bs = blk_bs(blk);
>  
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
>          goto out;
> @@ -3027,6 +3026,22 @@ void qmp_block_stream(const char *device,
>          base_name = base;
>      }
>  
> +    /* Look for the top-level node that contains 'bs' in its chain */
> +    active = NULL;
> +    do {
> +        active = bdrv_next(active);
> +    } while (active && !bdrv_chain_contains(active, bs));

Alternatively, you could iterate up directly from @bs. Just look for the
BdrvChild in bs->parents with .role == &child_backing.

More interesting question: What happens if you have e.g. a qcow2 file as
a quorum child, and then want to stream inside of the qcow2 backing chain?

So maybe you should first walk up along &child_backing and then along
&child_file/&child_format. I think a generic bdrv_get_root_bs() wouldn't
hurt.

> +
> +    if (active == NULL) {
> +        error_setg(errp, "Cannot find top level node for '%s'", device);
> +        goto out;
> +    }
> +
> +    /* Check for op blockers in the top-level node too */
> +    if (bdrv_op_is_blocked(active, BLOCK_OP_TYPE_STREAM, errp)) {
> +        goto out;
> +    }
> +
>      /* if we are streaming the entire chain, the result will have no backing
>       * file, and specifying one is therefore an error */
>      if (base_bs == NULL && has_backing_file) {
> @@ -3038,7 +3053,7 @@ void qmp_block_stream(const char *device,
>      /* backing_file string overrides base bs filename */
>      base_name = has_backing_file ? backing_file : base_name;
>  
> -    stream_start(bs, NULL, base_bs, base_name, has_speed ? speed : 0,
> +    stream_start(bs, active, base_bs, base_name, has_speed ? speed : 0,

stream_start() behaves differently for active == NULL and active == bs.
I think both kinds of behavior work, but it looks weird for active ==
bs. Should we pass NULL in that case instead?

Max

>                   on_error, block_job_cb, bs, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 59107ed..e20193a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1405,6 +1405,10 @@
>  # with query-block-jobs.  The operation can be stopped before it has completed
>  # using the block-job-cancel command.
>  #
> +# The node that receives the data is called the top image, can be located
> +# in any part of the whole chain and can be specified using its device
> +# or node name.
> +#
>  # If a base file is specified then sectors are not copied from that base file and
>  # its backing chain.  When streaming completes the image file will have the base
>  # file as its backing file.  This can be used to stream a subset of the backing
> @@ -1413,12 +1417,12 @@
>  # On successful completion the image file is updated to drop the backing file
>  # and the BLOCK_JOB_COMPLETED event is emitted.
>  #
> -# @device: the device name
> +# @device: the device or node name of the top image
>  #
>  # @base:   #optional the common backing file name
>  #
> -# @backing-file: #optional The backing file string to write into the active
> -#                          layer. This filename is not validated.
> +# @backing-file: #optional The backing file string to write into the top
> +#                          image. This filename is not validated.
>  #
>  #                          If a pathname string is such that it cannot be
>  #                          resolved by QEMU, that means that subsequent QMP or
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 3ac2443..107049b 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -126,7 +126,7 @@ class TestSingleDrive(iotests.QMPTestCase):
>  
>      def test_device_not_found(self):
>          result = self.vm.qmp('block-stream', device='nonexistent')
> -        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
> +        self.assert_qmp(result, 'error/class', 'GenericError')
>  
>  
>  class TestSmallerBackingFile(iotests.QMPTestCase):
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations Alberto Garcia
@ 2016-04-27 13:48   ` Max Reitz
  2016-04-27 15:02     ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2016-04-27 13:48 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 3852 bytes --]

On 04.04.2016 15:44, Alberto Garcia wrote:
> This test case checks that it's not possible to perform two
> block-stream operations if there are nodes involved in both.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  tests/qemu-iotests/030     | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/030.out |  4 ++--
>  2 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 1c53f80..6348872 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -145,6 +145,66 @@ class TestSingleDrive(iotests.QMPTestCase):
>          self.assert_qmp(result, 'error/class', 'GenericError')
>  
>  
> +class TestMultipleOps(iotests.QMPTestCase):
> +    num_imgs = 9 # It must be > 5 or some tests will break
> +    image_len = num_imgs * 1024 * 1024
> +    imgs = []
> +
> +    def setUp(self):
> +        opts = []
> +        self.imgs = []
> +
> +        # Initialize file names and command-line options
> +        for i in range(self.num_imgs):
> +            img_depth = self.num_imgs - i - 1
> +            opts.append("backing." * img_depth + "node-name=node%d" % i)
> +            self.imgs.append(os.path.join(iotests.test_dir, 'img-%d.img' % i))
> +
> +        # Create all images
> +        iotests.create_image(self.imgs[0], self.image_len)
> +        for i in range(1, self.num_imgs):
> +            qemu_img('create', '-f', iotests.imgfmt,
> +                     '-o', 'backing_file=%s' % self.imgs[i-1], self.imgs[i])
> +
> +        # Put data into some images so there's something to copy
> +        for i in range(1, self.num_imgs, 2):
> +            qemu_io('-f', iotests.imgfmt,
> +                    '-c', 'write -P %d %d %d' % (i, i*1024*1024, 1024*1024),
> +                    self.imgs[i])
> +
> +        # Attach the drive to the VM
> +        self.vm = iotests.VM()
> +        self.vm.add_drive("blkdebug::" + self.imgs[-1], ','.join(opts))

Any special reason for blkdebug? For me it works just fine without.

With or without:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +        self.vm.launch()
> +
> +    def tearDown(self):
> +        self.vm.shutdown()
> +        for img in self.imgs:
> +            os.remove(img)
> +
> +    # Test that it's not possible to perform two block-stream
> +    # operations if there are nodes involved in both.
> +    def test_overlapping(self):
> +        self.assert_no_active_block_jobs()
> +
> +        # Set a speed limit to make sure that this job blocks the rest
> +        result = self.vm.qmp('block-stream', device='node4', base=self.imgs[0], speed=32768)
> +        self.assert_qmp(result, 'return', {})
> +
> +        result = self.vm.qmp('block-stream', device='node5', base=self.imgs[1])
> +        self.assert_qmp(result, 'error/class', 'GenericError')
> +
> +        result = self.vm.qmp('block-stream', device='node3', base=self.imgs[2])
> +        self.assert_qmp(result, 'error/class', 'GenericError')
> +
> +        result = self.vm.qmp('block-stream', device='node4')
> +        self.assert_qmp(result, 'error/class', 'GenericError')
> +
> +        self.wait_until_completed(drive='node4')
> +        self.assert_no_active_block_jobs()
> +
> +        self.vm.shutdown()
> +
>  class TestSmallerBackingFile(iotests.QMPTestCase):
>      backing_len = 1 * 1024 * 1024 # MB
>      image_len = 2 * backing_len
> diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
> index 96961ed..b6f2576 100644
> --- a/tests/qemu-iotests/030.out
> +++ b/tests/qemu-iotests/030.out
> @@ -1,5 +1,5 @@
> -...............
> +................
>  ----------------------------------------------------------------------
> -Ran 15 tests
> +Ran 16 tests
>  
>  OK
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping block-stream operations
  2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping " Alberto Garcia
@ 2016-04-27 13:50   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2016-04-27 13:50 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 448 bytes --]

On 04.04.2016 15:44, Alberto Garcia wrote:
> Even if there are no common nodes involved, we currently don't support
> several operations at the same time in the same backing chain.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  tests/qemu-iotests/030     | 21 +++++++++++++++++++++
>  tests/qemu-iotests/030.out |  4 ++--
>  2 files changed, 23 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-27 12:30   ` Max Reitz
@ 2016-04-27 14:59     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-27 14:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 27 Apr 2016 02:30:55 PM CEST, Max Reitz wrote:
>> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>>  
>>      *aio_context = NULL;
>>  
>> -    blk = blk_by_name(device);
>> -    if (!blk) {
>> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
>> +    if (!bs) {
>
> If we get here, *errp is already set... [1]

Good catch, thanks.

>> -    if (!blk_is_available(blk)) {
>> +    blk = blk_by_name(device_or_node);
>> +    if (blk && !blk_is_available(blk)) {
>
> I'd just drop this. The reason blk_is_available() was added here
> because it became possible for BBs not to have a BDS.
>
> Now that you get the BDS directly through bdrv_lookup_bs(), it's no
> longer necessary.

Ok.

>> @@ -78,7 +79,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
>>      bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
>>  
>>      job->driver        = driver;
>> -    job->id            = g_strdup(bdrv_get_device_name(bs));
>> +    job->id            = g_strdup(bdrv_get_device_or_node_name(bs));
>
> Hm, since this is only used for events, it's not too bad that a block
> job will have a different name if it was created on a root BDS by
> specifying its node name. But it still is kind of strange.

The idea is that if you create a block job on a root BDS (which is still
the most common case) you get the same id that you used to get before
this series.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations
  2016-04-27 13:48   ` Max Reitz
@ 2016-04-27 15:02     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-27 15:02 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 27 Apr 2016 03:48:26 PM CEST, Max Reitz wrote:
>> +        # Attach the drive to the VM
>> +        self.vm = iotests.VM()
>> +        self.vm.add_drive("blkdebug::" + self.imgs[-1], ','.join(opts))
>
> Any special reason for blkdebug? For me it works just fine without.

Oh, I think it's just a remainder from previous tests that did use
blkdebug. I'll remove it.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer
  2016-04-27 13:04   ` Max Reitz
@ 2016-04-28  9:23     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-04-28  9:23 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 27 Apr 2016 03:04:57 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>> +    /* If we are streaming to an intermediate image, we need to block
>> +     * the active layer. Due to a race condition, having several block
>> +     * jobs running in the same chain is broken and we currently don't
>> +     * support it. See here for details:
>> +     * https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
>> +     */
>> +    if (active) {
>> +        bdrv_op_block_all(active, s->common.blocker);
>
> block_job_create() unblocks BLOCK_OP_TYPE_DATAPLANE. Maybe this should
> do the same?

I don't know the internals of how dataplane works, but I don't see that
it tries to call bdrv_reopen() so it should be safe. I'll unblock it.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-27 13:34   ` Max Reitz
@ 2016-04-28 12:20     ` Alberto Garcia
  2016-04-29 15:18       ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-04-28 12:20 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 27 Apr 2016 03:34:19 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
>> +    /* Look for the top-level node that contains 'bs' in its chain */
>> +    active = NULL;
>> +    do {
>> +        active = bdrv_next(active);
>> +    } while (active && !bdrv_chain_contains(active, bs));
>
> Alternatively, you could iterate up directly from @bs. Just look for
> the BdrvChild in bs->parents with .role == &child_backing.

Yes, but the BdrvChild in bs->parents does not contain any pointer to
the actual parent node, so unless we want to change that structure I
wouldn't go for this solution.

> More interesting question: What happens if you have e.g. a qcow2 file
> as a quorum child, and then want to stream inside of the qcow2 backing
> chain?
>
> So maybe you should first walk up along &child_backing and then along
> &child_file/&child_format. I think a generic bdrv_get_root_bs()
> wouldn't hurt.

You're right. I'm not sure if that would have other consequences that we
should consider, so maybe we can add that later, with its own set of
tests.

Also, this is all necessary because of the problem with bdrv_reopen().
If that is ever fixed there's no need to block the active layer at all.

>> -    stream_start(bs, NULL, base_bs, base_name, has_speed ? speed : 0,
>> +    stream_start(bs, active, base_bs, base_name, has_speed ? speed : 0,
>
> stream_start() behaves differently for active == NULL and active == bs.
> I think both kinds of behavior work, but it looks weird for active ==
> bs. Should we pass NULL in that case instead?

You're right, I'll change it to pass NULL and assert(active != bs).

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
  2016-04-27 11:59   ` Max Reitz
@ 2016-04-29 14:22   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 14:22 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> The current way to obtain the list of existing block jobs is to
> iterate over all root nodes and check which ones own a job.
> 
> Since we want to be able to support block jobs in other nodes as well,
> this patch keeps a list of jobs that is updated every time one is
> created or destroyed.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Actually, I have almost literally the same change (except that I didn't
need a block_job_next()) in my development branch. I guess I should
rebase on top of this one. :-)

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all() Alberto Garcia
  2016-04-27 12:04   ` Max Reitz
@ 2016-04-29 14:25   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 14:25 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
> over all top-level BlockDriverStates. Therefore the code is unable to
> find block jobs in other nodes.
> 
> This patch uses block_job_next() to iterate over all block jobs.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

This conflicts with Paolo's patches in block-next. Please rebase. (Apart
from that, the change looks sane.)

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs() Alberto Garcia
  2016-04-27 12:09   ` Max Reitz
@ 2016-04-29 14:32   ` Kevin Wolf
  2016-05-02 13:06     ` Alberto Garcia
  1 sibling, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 14:32 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
> this function can only find those in top-level BlockDriverStates.
> 
> This patch uses block_job_next() instead.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

However, I'd like to give you a heads-up that this will technically
conflict with my series that removes BlockDriverState.blk because that
changes the bdrv_next() signature.

Nothing dramatic, but I guess it would make sense to decide where in the
queue of patches this series should go. My suggestion would be on top of
"blockdev: (Nearly) free clean-up work".

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close()
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close() Alberto Garcia
  2016-04-27 12:14   ` Max Reitz
@ 2016-04-29 14:38   ` Kevin Wolf
  2016-05-02 13:42     ` Alberto Garcia
  1 sibling, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 14:38 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> bdrv_close_all() cancels all block jobs by iterating over all
> BlockDriverStates. This patch simplifies the code by iterating
> directly over the block jobs using block_job_next().
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

This is essentially the same as I'm doing here:
http://repo.or.cz/qemu/kevin.git/commitdiff/6b545b21e3dfe2e3927cfb6bbdcc1b233c67630c

I think I like having a separate block_job_cancel_sync_all() function
like I did instead of inlining it in bdrv_close_all(), though that's a
matter of taste.

My patch also moves the call, so having to move only a single line in my
patch (assuming it gets rebased on top of this one) seems nicer.

Kevin

>  block.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d36eb75..48638c9 100644
> --- a/block.c
> +++ b/block.c
> @@ -2182,8 +2182,7 @@ static void bdrv_close(BlockDriverState *bs)
>  
>  void bdrv_close_all(void)
>  {
> -    BlockDriverState *bs;
> -    AioContext *aio_context;
> +    BlockJob *job;
>  
>      /* Drop references from requests still in flight, such as canceled block
>       * jobs whose AIO context has not been polled yet */
> @@ -2193,23 +2192,11 @@ void bdrv_close_all(void)
>      blockdev_close_all_bdrv_states();
>  
>      /* Cancel all block jobs */
> -    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
> -        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
> -            aio_context = bdrv_get_aio_context(bs);
> -
> -            aio_context_acquire(aio_context);
> -            if (bs->job) {
> -                block_job_cancel_sync(bs->job);
> -                aio_context_release(aio_context);
> -                break;
> -            }
> -            aio_context_release(aio_context);
> -        }
> -
> -        /* All the remaining BlockDriverStates are referenced directly or
> -         * indirectly from block jobs, so there needs to be at least one BDS
> -         * directly used by a block job */
> -        assert(bs);
> +    while ((job = block_job_next(NULL))) {
> +        AioContext *aio_context = bdrv_get_aio_context(job->bs);
> +        aio_context_acquire(aio_context);
> +        block_job_cancel_sync(job);
> +        aio_context_release(aio_context);
>      }
>  }
>  
> -- 
> 2.8.0.rc3
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
  2016-04-27 12:30   ` Max Reitz
@ 2016-04-29 15:00   ` Kevin Wolf
  2016-05-06 10:00     ` Alberto Garcia
  2016-04-29 15:25   ` Eric Blake
  2 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 15:00 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	jcody, jsnow

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> Currently, block jobs can only be owned by root nodes. This patch
> allows block jobs to be in any arbitrary node, by making the following
> changes:
> 
> - Block jobs can now be identified by the node name of their
>   BlockDriverState in addition to the device name. Since both device
>   and node names live in the same namespace there's no ambiguity.

Careful, we know this is a part of our API that we have already messed
up and we don't want to make things worse while adding new things before
we've cleaned it up.

Let's keep in mind where we are planning to go with block jobs: They
should become background jobs, not tied to any block device. The close
connection to a single BDS already doesn't make a lot of sense today
because most block jobs involve at least two BDSes.

In the final state, we will have a job ID that uniquely identifies the
job, and each command that starts a job will take an ID from the client.
For compatibility, we'll use the block device name as the job ID when
using old commands that don't get an explicit ID yet.

In the existing qemu version, you can't start two block jobs on the same
device, and in future versions, you're supposed to specify an ID each
time. This is why the default can always be supposed to work without
conflicts. If in newer versions, the client mixes both ways (which it
shouldn't do), starting a new block job may fail because the device name
is already in use as an ID for another job.

Now we can probably make the same argument for node names, so we can
extend the interface and still keep it compatible.

Where we need to be careful is that with device names and node names, we
have potentially two names describing the same BDS and therefore the
same job. This must not happen, because we won't be able to represent
that in the generic background job API. Any job has just a single ID
there.

> - The "device" parameter used by all commands that operate on block
>   jobs can also be a node name now.

So I think the least we need to do is change this one to resolve the
block job by its ID (job->id) rather than device or node names.

> - The node name is used as a fallback to fill in the BlockJobInfo
>   structure and all BLOCK_JOB_* events if there is no device name for
>   that job.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockdev.c           | 18 ++++++++++--------
>  blockjob.c           |  5 +++--
>  docs/qmp-events.txt  |  8 ++++----
>  qapi/block-core.json | 20 ++++++++++----------
>  4 files changed, 27 insertions(+), 24 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index edbcc19..d1f5dfb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3685,8 +3685,10 @@ void qmp_blockdev_mirror(const char *device, const char *target,
>      aio_context_release(aio_context);
>  }
>  
> -/* Get the block job for a given device name and acquire its AioContext */
> -static BlockJob *find_block_job(const char *device, AioContext **aio_context,
> +/* Get the block job for a given device or node name
> + * and acquire its AioContext */
> +static BlockJob *find_block_job(const char *device_or_node,
> +                                AioContext **aio_context,
>                                  Error **errp)
>  {
>      BlockBackend *blk;
> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>  
>      *aio_context = NULL;
>  
> -    blk = blk_by_name(device);
> -    if (!blk) {
> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);

Specifically, this one is bad. It allows two different ways to specify
the same job.

The other thing about this patch is that I'm not sure how badly it
conflicts with my series to convert block jobs to BlockBackend. It seems
that you switch everything from blk to bs here, and I'll have to switch
back to blk (once job->blk exists).

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer Alberto Garcia
  2016-04-27 13:04   ` Max Reitz
@ 2016-04-29 15:07   ` Kevin Wolf
  1 sibling, 0 replies; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 15:07 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi, jcody

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> This makes sure that the image we are steaming into is open in

s/steaming/streaming/

> read-write mode during the operation.
> 
> The block job is created on the destination image, but operation
> blockers are also set on the active layer. We do this in order to
> prevent other block jobs from running in parallel in the same chain.
> See here for details on why that is currently not supported:
> 
> [Qemu-block] RFC: Status of the intermediate block streaming work
> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
> Finally, this also unblocks the stream operation in backing files.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Would be great to have the new op blockers already, because having to
know the active layer before streaming can be started is certainly not
nice.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
  2016-04-27 13:34   ` Max Reitz
@ 2016-04-29 15:11   ` Kevin Wolf
  2016-05-03 12:53     ` Alberto Garcia
  2016-04-29 15:29   ` Eric Blake
  2 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 15:11 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> This patch makes the 'device' parameter of the 'block-stream' command
> accept a node name as well as a device name.
> 
> In addition to that, operation blockers will be checked in all
> intermediate nodes between the top and the base node.
> 
> Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
> no longer returns DeviceNotFound, iotest 030 is updated to expect
> GenericError instead.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  blockdev.c             | 31 +++++++++++++++++++++++--------
>  qapi/block-core.json   | 10 +++++++---
>  tests/qemu-iotests/030 |  2 +-
>  3 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 2e7712e..bfdc0e3 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2989,6 +2989,7 @@ void qmp_block_stream(const char *device,
>      BlockBackend *blk;
>      BlockDriverState *bs;
>      BlockDriverState *base_bs = NULL;
> +    BlockDriverState *active;
>      AioContext *aio_context;
>      Error *local_err = NULL;
>      const char *base_name = NULL;
> @@ -2997,21 +2998,19 @@ void qmp_block_stream(const char *device,
>          on_error = BLOCKDEV_ON_ERROR_REPORT;
>      }
>  
> -    blk = blk_by_name(device);
> -    if (!blk) {
> -        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> -                  "Device '%s' not found", device);
> +    bs = bdrv_lookup_bs(device, device, errp);
> +    if (!bs) {
>          return;
>      }
>  
> -    aio_context = blk_get_aio_context(blk);
> +    aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
>  
> -    if (!blk_is_available(blk)) {
> +    blk = blk_by_name(device);
> +    if (blk && !blk_is_available(blk)) {
>          error_setg(errp, "Device '%s' has no medium", device);
>          goto out;
>      }
> -    bs = blk_bs(blk);
>  
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
>          goto out;
> @@ -3027,6 +3026,22 @@ void qmp_block_stream(const char *device,
>          base_name = base;
>      }
>  
> +    /* Look for the top-level node that contains 'bs' in its chain */
> +    active = NULL;
> +    do {
> +        active = bdrv_next(active);
> +    } while (active && !bdrv_chain_contains(active, bs));
> +
> +    if (active == NULL) {
> +        error_setg(errp, "Cannot find top level node for '%s'", device);
> +        goto out;
> +    }

Hm... On the one hand, I really like that you don't expect the user to
provide the active layer in QMP. This allows us to remove this wart once
we have the new op blockers.

On the other hand, this code assumes that there is only a single
top-level node. This isn't necessarily true any more these days. Do we
need to set blockers on _all_ root nodes that have the node in their
backing chain?

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-28 12:20     ` Alberto Garcia
@ 2016-04-29 15:18       ` Kevin Wolf
  2016-05-03 12:50         ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-04-29 15:18 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 28.04.2016 um 14:20 hat Alberto Garcia geschrieben:
> On Wed 27 Apr 2016 03:34:19 PM CEST, Max Reitz <mreitz@redhat.com> wrote:
> >> +    /* Look for the top-level node that contains 'bs' in its chain */
> >> +    active = NULL;
> >> +    do {
> >> +        active = bdrv_next(active);
> >> +    } while (active && !bdrv_chain_contains(active, bs));
> >
> > Alternatively, you could iterate up directly from @bs. Just look for
> > the BdrvChild in bs->parents with .role == &child_backing.
> 
> Yes, but the BdrvChild in bs->parents does not contain any pointer to
> the actual parent node, so unless we want to change that structure I
> wouldn't go for this solution.

Yes, and that's intentional. If we do things right, there shouldn't be
any need to go upwards in the tree. Our current op blockers are not done
right as they are only effective when set on the top layer. Some
temporary ugliness to deal with this is okay; breaking the assumption
that children don't know their parents would require much better
justification.

> > More interesting question: What happens if you have e.g. a qcow2 file
> > as a quorum child, and then want to stream inside of the qcow2 backing
> > chain?
> >
> > So maybe you should first walk up along &child_backing and then along
> > &child_file/&child_format. I think a generic bdrv_get_root_bs()
> > wouldn't hurt.
> 
> You're right. I'm not sure if that would have other consequences that we
> should consider, so maybe we can add that later, with its own set of
> tests.
> 
> Also, this is all necessary because of the problem with bdrv_reopen().
> If that is ever fixed there's no need to block the active layer at all.

This patch errors out if we can't find the active layer. Sounds safe and
appropriate for an initial version. The real solution isn't to improve
the magic to find the root node, but to remove the need to find it (by
getting the new op blockers).

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
  2016-04-27 12:30   ` Max Reitz
  2016-04-29 15:00   ` Kevin Wolf
@ 2016-04-29 15:25   ` Eric Blake
  2 siblings, 0 replies; 56+ messages in thread
From: Eric Blake @ 2016-04-29 15:25 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Max Reitz, Kevin Wolf, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2721 bytes --]

On 04/04/2016 07:43 AM, Alberto Garcia wrote:
> Currently, block jobs can only be owned by root nodes. This patch
> allows block jobs to be in any arbitrary node, by making the following
> changes:
> 
> - Block jobs can now be identified by the node name of their
>   BlockDriverState in addition to the device name. Since both device
>   and node names live in the same namespace there's no ambiguity.
> 
> - The "device" parameter used by all commands that operate on block
>   jobs can also be a node name now.
> 
> - The node name is used as a fallback to fill in the BlockJobInfo
>   structure and all BLOCK_JOB_* events if there is no device name for
>   that job.

Having more than one way to name a job might be okay for convenience,
but only if the canonical name is unambiguous.  I agree with Kevin's
concern that using the device and/or node name as the canonical name of
the job is worrisome, because it locks us into having only one job with
that name at a time.


> +++ b/docs/qmp-events.txt
> @@ -92,7 +92,7 @@ Data:
>  
>  - "type":     Job type (json-string; "stream" for image streaming
>                                       "commit" for block commit)
> -- "device":   Device name (json-string)
> +- "device":   Device name, or node name if not present (json-string)

On the surface this sounds okay (you are promising to return a canonical
name, insofar as job canonical names are currently the node/device name
until the time that we allow job ids with multiple jobs per device) -
even if it means that you return a device name when the user started a
job based on a node name.

But I'm also worried about jobs where the node name tied to a device
changes over time (creating a snapshot, pivoting to a mirror, doing an
active commit with a pivot to the backing file - all of these are cases
where the device name stays the same, but the top node name associated
with the device differs over time).  If the device name is the canonical
one, then a job started on node "A" but reported as device "D", needs to
STILL be known as job "D" even if by the end of the job node "A" is no
longer associated with device "D" (because "D" is now serving the
top-level node "B").


> @@ -1513,7 +1513,7 @@
>  # the operation is actually paused.  Cancelling a paused job automatically
>  # resumes it.
>  #
> -# @device: the device name
> +# @device: the device or node name of the owner of the block job.

We aren't consistent on whether to use a trailing '.'.  I don't care
enough to standardize on a style, so no need to respin for just that.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
  2016-04-27 13:34   ` Max Reitz
  2016-04-29 15:11   ` Kevin Wolf
@ 2016-04-29 15:29   ` Eric Blake
  2 siblings, 0 replies; 56+ messages in thread
From: Eric Blake @ 2016-04-29 15:29 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Max Reitz, Kevin Wolf, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On 04/04/2016 07:43 AM, Alberto Garcia wrote:
> This patch makes the 'device' parameter of the 'block-stream' command
> accept a node name as well as a device name.
> 
> In addition to that, operation blockers will be checked in all
> intermediate nodes between the top and the base node.
> 
> Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
> no longer returns DeviceNotFound, iotest 030 is updated to expect
> GenericError instead.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
> +++ b/qapi/block-core.json
> @@ -1405,6 +1405,10 @@

Context: block-stream

>  # with query-block-jobs.  The operation can be stopped before it has completed
>  # using the block-job-cancel command.
>  #
> +# The node that receives the data is called the top image, can be located
> +# in any part of the whole chain and can be specified using its device
> +# or node name.
> +#

"any part of the whole chain" feels fishy - it has to be ABOVE the base
file.  That is, in a chain A <- B <- C <- D <- E, if I want to set B as
the base, then the top has to be C, D, or E (but not A).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs()
  2016-04-29 14:32   ` Kevin Wolf
@ 2016-05-02 13:06     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-02 13:06 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

On Fri 29 Apr 2016 04:32:41 PM CEST, Kevin Wolf wrote:

> However, I'd like to give you a heads-up that this will technically
> conflict with my series that removes BlockDriverState.blk because that
> changes the bdrv_next() signature.
>
> Nothing dramatic, but I guess it would make sense to decide where in
> the queue of patches this series should go. My suggestion would be on
> top of "blockdev: (Nearly) free clean-up work".

Okay, thanks.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close()
  2016-04-29 14:38   ` Kevin Wolf
@ 2016-05-02 13:42     ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-02 13:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

On Fri 29 Apr 2016 04:38:58 PM CEST, Kevin Wolf wrote:
> This is essentially the same as I'm doing here:
> http://repo.or.cz/qemu/kevin.git/commitdiff/6b545b21e3dfe2e3927cfb6bbdcc1b233c67630c

Oh, I see.

> I think I like having a separate block_job_cancel_sync_all() function
> like I did instead of inlining it in bdrv_close_all(), though that's a
> matter of taste.

Having a separate function looks good, but I don't really have a strong
opinion on this.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-29 15:18       ` Kevin Wolf
@ 2016-05-03 12:50         ` Alberto Garcia
  2016-05-03 13:23           ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-05-03 12:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Fri 29 Apr 2016 05:18:26 PM CEST, Kevin Wolf wrote:
> This patch errors out if we can't find the active layer. Sounds safe
> and appropriate for an initial version. The real solution isn't to
> improve the magic to find the root node, but to remove the need to
> find it (by getting the new op blockers).

Well, I agree with the fact that what we really want is not to block the
active layer at all, but I don't see how any new op blockers are going
to solve that.

The situation is that we can't allow two block jobs in the same chain at
the moment, and I only see three solutions:

a) each job blocks the whole chain (what this series does).

b) each job checks that no other job is running on the same chain.
   Maybe cleaner? But it would require modifying all other jobs.

c) we fix bdrv_reopen() so we can actually run both jobs at the same
   time. I'm wondering if pausing all block jobs between
   bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
   trick. Opinions?

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-04-29 15:11   ` Kevin Wolf
@ 2016-05-03 12:53     ` Alberto Garcia
  2016-05-03 13:18       ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-05-03 12:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

On Fri 29 Apr 2016 05:11:07 PM CEST, Kevin Wolf wrote:
>> +    if (active == NULL) {
>> +        error_setg(errp, "Cannot find top level node for '%s'", device);
>> +        goto out;
>> +    }
>
> Hm... On the one hand, I really like that you don't expect the user to
> provide the active layer in QMP. This allows us to remove this wart
> once we have the new op blockers.

Exactly, I still plan to stick to the API we discussed last year.

> On the other hand, this code assumes that there is only a single
> top-level node. This isn't necessarily true any more these days.

Hmm... if you give me an example I can test that scenario.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 12:53     ` Alberto Garcia
@ 2016-05-03 13:18       ` Kevin Wolf
  2016-05-03 13:29         ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-05-03 13:18 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

Am 03.05.2016 um 14:53 hat Alberto Garcia geschrieben:
> On Fri 29 Apr 2016 05:11:07 PM CEST, Kevin Wolf wrote:
> >> +    if (active == NULL) {
> >> +        error_setg(errp, "Cannot find top level node for '%s'", device);
> >> +        goto out;
> >> +    }
> >
> > Hm... On the one hand, I really like that you don't expect the user to
> > provide the active layer in QMP. This allows us to remove this wart
> > once we have the new op blockers.
> 
> Exactly, I still plan to stick to the API we discussed last year.
> 
> > On the other hand, this code assumes that there is only a single
> > top-level node. This isn't necessarily true any more these days.
> 
> Hmm... if you give me an example I can test that scenario.

Simply reference the same node twice:

$ x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,file=/tmp/backing.qcow2,id=backing \
-drive file=/tmp/test.qcow2,backing=backing,id=hda \
-drive file=/tmp/test2.qcow2,backing=backing,id=hdb

If backing.qcow2 has another backing file, you can do the intermediate
streaming to it and both hda and hdb are active layers on top of it.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 12:50         ` Alberto Garcia
@ 2016-05-03 13:23           ` Kevin Wolf
  2016-05-03 13:33             ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-05-03 13:23 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 03.05.2016 um 14:50 hat Alberto Garcia geschrieben:
> On Fri 29 Apr 2016 05:18:26 PM CEST, Kevin Wolf wrote:
> > This patch errors out if we can't find the active layer. Sounds safe
> > and appropriate for an initial version. The real solution isn't to
> > improve the magic to find the root node, but to remove the need to
> > find it (by getting the new op blockers).
> 
> Well, I agree with the fact that what we really want is not to block the
> active layer at all, but I don't see how any new op blockers are going
> to solve that.
> 
> The situation is that we can't allow two block jobs in the same chain at
> the moment, and I only see three solutions:
> 
> a) each job blocks the whole chain (what this series does).
> 
> b) each job checks that no other job is running on the same chain.
>    Maybe cleaner? But it would require modifying all other jobs.
> 
> c) we fix bdrv_reopen() so we can actually run both jobs at the same
>    time. I'm wondering if pausing all block jobs between
>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
>    trick. Opinions?

I would have to read up the details of the problem again, but I think
with bdrv_drained_begin/end() we actually have the right tool now to fix
it properly. We may need to pull up the drain (bdrv_drain_all() today)
from bdrv_reopen_multiple() to its caller and just assert it in the
function itself, but there shouldn't be much more to it than that.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 13:18       ` Kevin Wolf
@ 2016-05-03 13:29         ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-03 13:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi

On Tue 03 May 2016 03:18:39 PM CEST, Kevin Wolf wrote:
>> > On the other hand, this code assumes that there is only a single
>> > top-level node. This isn't necessarily true any more these days.
>> 
>> Hmm... if you give me an example I can test that scenario.
>
> Simply reference the same node twice:
>
> $ x86_64-softmmu/qemu-system-x86_64 \
> -drive if=none,file=/tmp/backing.qcow2,id=backing \
> -drive file=/tmp/test.qcow2,backing=backing,id=hda \
> -drive file=/tmp/test2.qcow2,backing=backing,id=hdb

You're right, but maybe we need to add additional checks to some of the
existing commands then, because if you run e.g block-commit on hda
you're gonna corrupt hdb.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 13:23           ` Kevin Wolf
@ 2016-05-03 13:33             ` Alberto Garcia
  2016-05-03 13:48               ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-05-03 13:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
>> c) we fix bdrv_reopen() so we can actually run both jobs at the same
>>    time. I'm wondering if pausing all block jobs between
>>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
>>    trick. Opinions?
>
> I would have to read up the details of the problem again, but I think
> with bdrv_drained_begin/end() we actually have the right tool now to fix
> it properly. We may need to pull up the drain (bdrv_drain_all() today)
> from bdrv_reopen_multiple() to its caller and just assert it in the
> function itself, but there shouldn't be much more to it than that.

I think that's not enough, see point 2) here:

https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html

  "I've been taking a look at the bdrv_drained_begin/end() API, but as I
   understand it it prevents requests from a different AioContext.
   Since all BDS in the same chain share the same context it does not
   really help here."

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 13:33             ` Alberto Garcia
@ 2016-05-03 13:48               ` Kevin Wolf
  2016-05-03 15:09                 ` Alberto Garcia
       [not found]                 ` <w517fezo0al.fsf@maestria.local.igalia.com>
  0 siblings, 2 replies; 56+ messages in thread
From: Kevin Wolf @ 2016-05-03 13:48 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >>    time. I'm wondering if pausing all block jobs between
> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >>    trick. Opinions?
> >
> > I would have to read up the details of the problem again, but I think
> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> > from bdrv_reopen_multiple() to its caller and just assert it in the
> > function itself, but there shouldn't be much more to it than that.
> 
> I think that's not enough, see point 2) here:
> 
> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
>    understand it it prevents requests from a different AioContext.
>    Since all BDS in the same chain share the same context it does not
>    really help here."

Yes, that's the part I meant with pulling up the calls.

If I understand correctly, the problem is that first bdrv_reopen_queue()
queues a few BDSes for reopen, then bdrv_drain_all() completes all
running requests and can indirectly trigger a graph modification, and
then bdrv_reopen_multiple() uses the queue which doesn't match reality
any more.

The solution to that should be simply changing the order of things:

1. bdrv_drained_begin()
2. bdrv_reopen_queue()
3. bdrv_reopen_multiple()
    * Instead of bdrv_drain_all(), assert that no requests are pending
    * We don't run requests, so we can't complete a block job and
      manipulate the graph any more
4. then bdrv_drained_end()

But you're right that this changed order is really the key and not
bdrv_drained_begin/end(). The latter are just cleaner than the existing
bdrv_drain_all(), but don't actually contribute much to the fix without
dataplane. I must have misremembered its importance.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-03 13:48               ` Kevin Wolf
@ 2016-05-03 15:09                 ` Alberto Garcia
       [not found]                 ` <w517fezo0al.fsf@maestria.local.igalia.com>
  1 sibling, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-03 15:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> The solution to that should be simply changing the order of things:
>
> 1. bdrv_drained_begin()
> 2. bdrv_reopen_queue()
> 3. bdrv_reopen_multiple()
>     * Instead of bdrv_drain_all(), assert that no requests are pending

No requests are pending in any member of the bs_queue? I don't have the
time to debug it today, but this part doesn't seem to be working.

>     * We don't run requests, so we can't complete a block job and
>       manipulate the graph any more
> 4. then bdrv_drained_end()

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-04-29 15:00   ` Kevin Wolf
@ 2016-05-06 10:00     ` Alberto Garcia
  2016-05-06 17:54       ` John Snow
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-05-06 10:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, Stefan Hajnoczi,
	jcody, jsnow

On Fri 29 Apr 2016 05:00:57 PM CEST, Kevin Wolf wrote:

>> - Block jobs can now be identified by the node name of their
>> BlockDriverState in addition to the device name. Since both device
>> and node names live in the same namespace there's no ambiguity.
>
> Careful, we know this is a part of our API that we have already messed
> up and we don't want to make things worse while adding new things
> before we've cleaned it up.
>
> Let's keep in mind where we are planning to go with block jobs: They
> should become background jobs, not tied to any block device. The close
> connection to a single BDS already doesn't make a lot of sense today
> because most block jobs involve at least two BDSes.
>
> In the final state, we will have a job ID that uniquely identifies the
> job, and each command that starts a job will take an ID from the
> client.  For compatibility, we'll use the block device name as the job
> ID when using old commands that don't get an explicit ID yet.
>
> In the existing qemu version, you can't start two block jobs on the
> same device, and in future versions, you're supposed to specify an ID
> each time. This is why the default can always be supposed to work
> without conflicts. If in newer versions, the client mixes both ways
> (which it shouldn't do), starting a new block job may fail because the
> device name is already in use as an ID for another job.
>
> Now we can probably make the same argument for node names, so we can
> extend the interface and still keep it compatible.
>
> Where we need to be careful is that with device names and node names,
> we have potentially two names describing the same BDS and therefore
> the same job. This must not happen, because we won't be able to
> represent that in the generic background job API. Any job has just a
> single ID there.

Ok, what can be done in this case is to keep the name that the client
used when the job was created.

block-stream virti0   <-- job id is 'virtio0'
block-stream node5    <-- job id is 'node5'

In this case it wouldn't matter if 'node5' is the one attached to
'virtio0'.

>> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>>  
>>      *aio_context = NULL;
>>  
>> -    blk = blk_by_name(device);
>> -    if (!blk) {
>> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
>
> Specifically, this one is bad. It allows two different ways to specify
> the same job.

I think we can simply iterate over all block jobs (now that we have a
function to do that) and return the one with the matching ID.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-05-06 10:00     ` Alberto Garcia
@ 2016-05-06 17:54       ` John Snow
  2016-05-09  7:06         ` Kevin Wolf
  2016-05-09 11:59         ` Alberto Garcia
  0 siblings, 2 replies; 56+ messages in thread
From: John Snow @ 2016-05-06 17:54 UTC (permalink / raw)
  To: Alberto Garcia, Kevin Wolf
  Cc: qemu-block, jcody, qemu-devel, Max Reitz, Stefan Hajnoczi



On 05/06/2016 06:00 AM, Alberto Garcia wrote:
> On Fri 29 Apr 2016 05:00:57 PM CEST, Kevin Wolf wrote:
> 
>>> - Block jobs can now be identified by the node name of their
>>> BlockDriverState in addition to the device name. Since both device
>>> and node names live in the same namespace there's no ambiguity.
>>
>> Careful, we know this is a part of our API that we have already messed
>> up and we don't want to make things worse while adding new things
>> before we've cleaned it up.
>>
>> Let's keep in mind where we are planning to go with block jobs: They
>> should become background jobs, not tied to any block device. The close
>> connection to a single BDS already doesn't make a lot of sense today
>> because most block jobs involve at least two BDSes.
>>
>> In the final state, we will have a job ID that uniquely identifies the
>> job, and each command that starts a job will take an ID from the
>> client.  For compatibility, we'll use the block device name as the job
>> ID when using old commands that don't get an explicit ID yet.
>>
>> In the existing qemu version, you can't start two block jobs on the
>> same device, and in future versions, you're supposed to specify an ID
>> each time. This is why the default can always be supposed to work
>> without conflicts. If in newer versions, the client mixes both ways
>> (which it shouldn't do), starting a new block job may fail because the
>> device name is already in use as an ID for another job.
>>
>> Now we can probably make the same argument for node names, so we can
>> extend the interface and still keep it compatible.
>>
>> Where we need to be careful is that with device names and node names,
>> we have potentially two names describing the same BDS and therefore
>> the same job. This must not happen, because we won't be able to
>> represent that in the generic background job API. Any job has just a
>> single ID there.
> 
> Ok, what can be done in this case is to keep the name that the client
> used when the job was created.
> 
> block-stream virti0   <-- job id is 'virtio0'
> block-stream node5    <-- job id is 'node5'
> 
> In this case it wouldn't matter if 'node5' is the one attached to
> 'virtio0'.
> 
>>> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>>>  
>>>      *aio_context = NULL;
>>>  
>>> -    blk = blk_by_name(device);
>>> -    if (!blk) {
>>> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
>>
>> Specifically, this one is bad. It allows two different ways to specify
>> the same job.
> 
> I think we can simply iterate over all block jobs (now that we have a
> function to do that) and return the one with the matching ID.
> 
> Berto
> 

I think it's time to add a proper ID field to Block Jobs. Currently, we
just use the node-name as the ID, but as you are aware this is a poor
mechanism for fetching the job.

I'd really like to avoid continue using any kind of node-name or
block/device-name for job IDs, and instead start using either a
user-provided value or a QEMU auto-generated one.

Then, for legacy support, we'd have an "id" field (that's the new proper
globally unique ID) and an old legacy "node" field or similar.

Then, for events/etc we can report two things back:

(1) Legacy: the name of the node we were created under. This is like it
works currently, and it should keep libvirt happy.
(2) New: the proper, real ID that all management utilities should begin
using in the future.

I've got some patches that work towards this angle, but they're
currently intermingled with a bunch of experimental job re-factoring
stuff. If you give me a few days I can submit a proposal series to re-do
the job ID situation.

--js

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-05-06 17:54       ` John Snow
@ 2016-05-09  7:06         ` Kevin Wolf
  2016-05-09 11:59         ` Alberto Garcia
  1 sibling, 0 replies; 56+ messages in thread
From: Kevin Wolf @ 2016-05-09  7:06 UTC (permalink / raw)
  To: John Snow
  Cc: Alberto Garcia, qemu-block, jcody, qemu-devel, Max Reitz,
	Stefan Hajnoczi

Am 06.05.2016 um 19:54 hat John Snow geschrieben:
> 
> 
> On 05/06/2016 06:00 AM, Alberto Garcia wrote:
> > On Fri 29 Apr 2016 05:00:57 PM CEST, Kevin Wolf wrote:
> > 
> >>> - Block jobs can now be identified by the node name of their
> >>> BlockDriverState in addition to the device name. Since both device
> >>> and node names live in the same namespace there's no ambiguity.
> >>
> >> Careful, we know this is a part of our API that we have already messed
> >> up and we don't want to make things worse while adding new things
> >> before we've cleaned it up.
> >>
> >> Let's keep in mind where we are planning to go with block jobs: They
> >> should become background jobs, not tied to any block device. The close
> >> connection to a single BDS already doesn't make a lot of sense today
> >> because most block jobs involve at least two BDSes.
> >>
> >> In the final state, we will have a job ID that uniquely identifies the
> >> job, and each command that starts a job will take an ID from the
> >> client.  For compatibility, we'll use the block device name as the job
> >> ID when using old commands that don't get an explicit ID yet.
> >>
> >> In the existing qemu version, you can't start two block jobs on the
> >> same device, and in future versions, you're supposed to specify an ID
> >> each time. This is why the default can always be supposed to work
> >> without conflicts. If in newer versions, the client mixes both ways
> >> (which it shouldn't do), starting a new block job may fail because the
> >> device name is already in use as an ID for another job.
> >>
> >> Now we can probably make the same argument for node names, so we can
> >> extend the interface and still keep it compatible.
> >>
> >> Where we need to be careful is that with device names and node names,
> >> we have potentially two names describing the same BDS and therefore
> >> the same job. This must not happen, because we won't be able to
> >> represent that in the generic background job API. Any job has just a
> >> single ID there.
> > 
> > Ok, what can be done in this case is to keep the name that the client
> > used when the job was created.
> > 
> > block-stream virti0   <-- job id is 'virtio0'
> > block-stream node5    <-- job id is 'node5'
> > 
> > In this case it wouldn't matter if 'node5' is the one attached to
> > 'virtio0'.
> > 
> >>> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
> >>>  
> >>>      *aio_context = NULL;
> >>>  
> >>> -    blk = blk_by_name(device);
> >>> -    if (!blk) {
> >>> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
> >>
> >> Specifically, this one is bad. It allows two different ways to specify
> >> the same job.
> > 
> > I think we can simply iterate over all block jobs (now that we have a
> > function to do that) and return the one with the matching ID.
> > 
> > Berto
> > 
> 
> I think it's time to add a proper ID field to Block Jobs. Currently, we
> just use the node-name as the ID, but as you are aware this is a poor
> mechanism for fetching the job.
> 
> I'd really like to avoid continue using any kind of node-name or
> block/device-name for job IDs, and instead start using either a
> user-provided value or a QEMU auto-generated one.

We already have a job->id, we just need to start using it not only for
events, but also for finding block jobs. The existing auto-generation
mechanism is a very simple one (just copy the device name), but it's
effective and produces unique IDs in our current environment where only
a single block job per device is allowed.

> Then, for legacy support, we'd have an "id" field (that's the new proper
> globally unique ID) and an old legacy "node" field or similar.
> 
> Then, for events/etc we can report two things back:
> 
> (1) Legacy: the name of the node we were created under. This is like it
> works currently, and it should keep libvirt happy.
> (2) New: the proper, real ID that all management utilities should begin
> using in the future.

I don't think that's actually necessary, legacy and new can be the same
thing. What would remain from the legacy model is the field name
"device" both in commands and returned objects, which would then be a
misnomer. But I'm not sure if a bad name is a good reason for
duplicating a field.

> I've got some patches that work towards this angle, but they're
> currently intermingled with a bunch of experimental job re-factoring
> stuff. If you give me a few days I can submit a proposal series to re-do
> the job ID situation.

Sounds good.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node
  2016-05-06 17:54       ` John Snow
  2016-05-09  7:06         ` Kevin Wolf
@ 2016-05-09 11:59         ` Alberto Garcia
  1 sibling, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-09 11:59 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: qemu-block, jcody, qemu-devel, Max Reitz, Stefan Hajnoczi

On Fri 06 May 2016 07:54:36 PM CEST, John Snow wrote:
>>>> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, AioContext **aio_context,
>>>>  
>>>>      *aio_context = NULL;
>>>>  
>>>> -    blk = blk_by_name(device);
>>>> -    if (!blk) {
>>>> +    bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);
>>>
>>> Specifically, this one is bad. It allows two different ways to specify
>>> the same job.
>> 
>> I think we can simply iterate over all block jobs (now that we have a
>> function to do that) and return the one with the matching ID.
>
> I think it's time to add a proper ID field to Block Jobs. Currently,
> we just use the node-name as the ID, but as you are aware this is a
> poor mechanism for fetching the job.
>
> I'd really like to avoid continue using any kind of node-name or
> block/device-name for job IDs, and instead start using either a
> user-provided value or a QEMU auto-generated one.
>
> Then, for legacy support, we'd have an "id" field (that's the new
> proper globally unique ID) and an old legacy "node" field or similar.
>
> Then, for events/etc we can report two things back:
>
> (1) Legacy: the name of the node we were created under. This is like
> it works currently, and it should keep libvirt happy.
> (2) New: the proper, real ID that all management utilities should
> begin using in the future.

Do we really need to have those two? Isn't it possible to do it with a
single ID?:

- New commands that create block jobs have a parameter to specify the
  job ID (if we don't choose to generate them automatically).

- Existing commands (block-stream, block-commit, etc) can also have a
  new (optional) parameter to set the job ID. If omitted, they use the
  device name (exactly as they do now).

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
       [not found]                 ` <w517fezo0al.fsf@maestria.local.igalia.com>
@ 2016-05-12 15:04                   ` Kevin Wolf
       [not found]                     ` <w514ma3nwbl.fsf@maestria.local.igalia.com>
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-05-12 15:04 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >> >>    time. I'm wondering if pausing all block jobs between
> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >>    trick. Opinions?
> >> >
> >> > I would have to read up the details of the problem again, but I think
> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> > function itself, but there shouldn't be much more to it than that.
> >> 
> >> I think that's not enough, see point 2) here:
> >> 
> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> 
> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
> >>    understand it it prevents requests from a different AioContext.
> >>    Since all BDS in the same chain share the same context it does not
> >>    really help here."
> >
> > Yes, that's the part I meant with pulling up the calls.
> >
> > If I understand correctly, the problem is that first bdrv_reopen_queue()
> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> > running requests and can indirectly trigger a graph modification, and
> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> > any more.
> >
> > The solution to that should be simply changing the order of things:
> >
> > 1. bdrv_drained_begin()
> > 2. bdrv_reopen_queue()
> > 3. bdrv_reopen_multiple()
> >     * Instead of bdrv_drain_all(), assert that no requests are pending
> >     * We don't run requests, so we can't complete a block job and
> >       manipulate the graph any more
> > 4. then bdrv_drained_end()
> 
> This doesn't work. Here's what happens:
> 
> 1) Block job (a) starts (block-stream).
> 
> 2) Block job (b) starts (block-stream, or block-commit).
> 
> 3) job (b) calls bdrv_reopen() and does the drain call.
> 
> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
>    There are no pending requests at this point, but job (a) is sleeping.
> 
> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
>    qemu_coroutine_yield().

I think between here and the next step is what I don't understand.

bdrv_reopen_multiple() is not called in coroutine context, right? All
block jobs use block_job_defer_to_main_loop() before they call
bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
shortcut, but use a nested event loop.

What is it that calls into job (a) from that event loop? It can't be a
request completion because we already drained all requests. Is it a
timer?

> 6) job (a) resumes, finishes the job and removes nodes from the graph.
> 
> 7) job (b) continues with bdrv_reopen_multiple() but now reopen_queue
>    contains invalid pointers.

I don't fully understand the problem yet, but as a shot in the dark,
would pausing block jobs in bdrv_drained_begin() help?

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
       [not found]                     ` <w514ma3nwbl.fsf@maestria.local.igalia.com>
@ 2016-05-12 15:28                       ` Kevin Wolf
  2016-05-17 14:26                         ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-05-12 15:28 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >> >> >>    time. I'm wondering if pausing all block jobs between
> >> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >> >>    trick. Opinions?
> >> >> >
> >> >> > I would have to read up the details of the problem again, but I think
> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> >> > function itself, but there shouldn't be much more to it than that.
> >> >> 
> >> >> I think that's not enough, see point 2) here:
> >> >> 
> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> >> 
> >> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
> >> >>    understand it it prevents requests from a different AioContext.
> >> >>    Since all BDS in the same chain share the same context it does not
> >> >>    really help here."
> >> >
> >> > Yes, that's the part I meant with pulling up the calls.
> >> >
> >> > If I understand correctly, the problem is that first bdrv_reopen_queue()
> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> >> > running requests and can indirectly trigger a graph modification, and
> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> >> > any more.
> >> >
> >> > The solution to that should be simply changing the order of things:
> >> >
> >> > 1. bdrv_drained_begin()
> >> > 2. bdrv_reopen_queue()
> >> > 3. bdrv_reopen_multiple()
> >> >     * Instead of bdrv_drain_all(), assert that no requests are pending
> >> >     * We don't run requests, so we can't complete a block job and
> >> >       manipulate the graph any more
> >> > 4. then bdrv_drained_end()
> >> 
> >> This doesn't work. Here's what happens:
> >> 
> >> 1) Block job (a) starts (block-stream).
> >> 
> >> 2) Block job (b) starts (block-stream, or block-commit).
> >> 
> >> 3) job (b) calls bdrv_reopen() and does the drain call.
> >> 
> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
> >>    There are no pending requests at this point, but job (a) is sleeping.
> >> 
> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
> >>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
> >>    qemu_coroutine_yield().
> >
> > I think between here and the next step is what I don't understand.
> >
> > bdrv_reopen_multiple() is not called in coroutine context, right? All
> > block jobs use block_job_defer_to_main_loop() before they call
> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
> > shortcut, but use a nested event loop.
> 
> When bdrv_flush() is not called in coroutine context it does
> qemu_coroutine_create() + qemu_coroutine_enter().

Right, but if the coroutine yields, we jump back to the caller, which
looks like this:

    co = qemu_coroutine_create(bdrv_flush_co_entry);
    qemu_coroutine_enter(co, &rwco);
    while (rwco.ret == NOT_DONE) {
        aio_poll(aio_context, true);
    }

So this loops until the flush has completed. The only way I can see how
something else (job (a)) can run is if aio_poll() calls it.

> > What is it that calls into job (a) from that event loop? It can't be a
> > request completion because we already drained all requests. Is it a
> > timer?
> 
> If I didn't get it wrong it's this bit in bdrv_co_flush()
> [...]

That's the place that yields from (b), but it's not the place that calls
into (a).

> >> 6) job (a) resumes, finishes the job and removes nodes from the graph.
> >> 
> >> 7) job (b) continues with bdrv_reopen_multiple() but now reopen_queue
> >>    contains invalid pointers.
> >
> > I don't fully understand the problem yet, but as a shot in the dark,
> > would pausing block jobs in bdrv_drained_begin() help?
> 
> Yeah, my impression is that pausing all jobs during bdrv_reopen() should
> be enough.

If you base your patches on top of my queue (which I think you already
do for the greatest part), the nicest way to implement this would
probably be that BlockBackends give their users a callback for
.drained_begin/end and the jobs implement that as pausing themselves.

We could, of course, directly pause block jobs in bdrv_drained_begin(),
but that would feel a bit hackish. So maybe do that for a quick attempt
whether it helps, and if it does, we can write the real thing then.

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-12 15:28                       ` Kevin Wolf
@ 2016-05-17 14:26                         ` Alberto Garcia
  2016-05-17 14:47                           ` Kevin Wolf
  0 siblings, 1 reply; 56+ messages in thread
From: Alberto Garcia @ 2016-05-17 14:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote:
> Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
>> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
>> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
>> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
>> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
>> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
>> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
>> >> >> >>    time. I'm wondering if pausing all block jobs between
>> >> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
>> >> >> >>    trick. Opinions?
>> >> >> >
>> >> >> > I would have to read up the details of the problem again, but I think
>> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
>> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
>> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
>> >> >> > function itself, but there shouldn't be much more to it than that.
>> >> >> 
>> >> >> I think that's not enough, see point 2) here:
>> >> >> 
>> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
>> >> >> 
>> >> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
>> >> >>    understand it it prevents requests from a different AioContext.
>> >> >>    Since all BDS in the same chain share the same context it does not
>> >> >>    really help here."
>> >> >
>> >> > Yes, that's the part I meant with pulling up the calls.
>> >> >
>> >> > If I understand correctly, the problem is that first bdrv_reopen_queue()
>> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
>> >> > running requests and can indirectly trigger a graph modification, and
>> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
>> >> > any more.
>> >> >
>> >> > The solution to that should be simply changing the order of things:
>> >> >
>> >> > 1. bdrv_drained_begin()
>> >> > 2. bdrv_reopen_queue()
>> >> > 3. bdrv_reopen_multiple()
>> >> >     * Instead of bdrv_drain_all(), assert that no requests are pending
>> >> >     * We don't run requests, so we can't complete a block job and
>> >> >       manipulate the graph any more
>> >> > 4. then bdrv_drained_end()
>> >> 
>> >> This doesn't work. Here's what happens:
>> >> 
>> >> 1) Block job (a) starts (block-stream).
>> >> 
>> >> 2) Block job (b) starts (block-stream, or block-commit).
>> >> 
>> >> 3) job (b) calls bdrv_reopen() and does the drain call.
>> >> 
>> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
>> >>    There are no pending requests at this point, but job (a) is sleeping.
>> >> 
>> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
>> >>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
>> >>    qemu_coroutine_yield().
>> >
>> > I think between here and the next step is what I don't understand.
>> >
>> > bdrv_reopen_multiple() is not called in coroutine context, right? All
>> > block jobs use block_job_defer_to_main_loop() before they call
>> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
>> > shortcut, but use a nested event loop.
>> 
>> When bdrv_flush() is not called in coroutine context it does
>> qemu_coroutine_create() + qemu_coroutine_enter().
>
> Right, but if the coroutine yields, we jump back to the caller, which
> looks like this:
>
>     co = qemu_coroutine_create(bdrv_flush_co_entry);
>     qemu_coroutine_enter(co, &rwco);
>     while (rwco.ret == NOT_DONE) {
>         aio_poll(aio_context, true);
>     }
>
> So this loops until the flush has completed. The only way I can see how
> something else (job (a)) can run is if aio_poll() calls it.

If I'm not wrong that can happen if job (a) is sleeping.

If job (a) is launched with a rate limit, then the bdrv_drain() call
will not drain it completely but return when the block job hits the I/O
limit -> block_job_sleep_ns() -> co_aio_sleep_ns().

>> > I don't fully understand the problem yet, but as a shot in the
>> > dark, would pausing block jobs in bdrv_drained_begin() help?
>> 
>> Yeah, my impression is that pausing all jobs during bdrv_reopen()
>> should be enough.
>
> If you base your patches on top of my queue (which I think you already
> do for the greatest part), the nicest way to implement this would
> probably be that BlockBackends give their users a callback for
> .drained_begin/end and the jobs implement that as pausing themselves.
>
> We could, of course, directly pause block jobs in
> bdrv_drained_begin(), but that would feel a bit hackish. So maybe do
> that for a quick attempt whether it helps, and if it does, we can
> write the real thing then.

I'll try that.

On a related note, bdrv_drain_all() already pauses all block jobs. The
problem is that when it ends it resumes them, so it can happen that
there's again pending I/O requests before bdrv_drain_all() returns.

That sounds like a problem to me, or am I overlooking anything?

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-17 14:26                         ` Alberto Garcia
@ 2016-05-17 14:47                           ` Kevin Wolf
  2016-05-17 14:54                             ` Alberto Garcia
  0 siblings, 1 reply; 56+ messages in thread
From: Kevin Wolf @ 2016-05-17 14:47 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 17.05.2016 um 16:26 hat Alberto Garcia geschrieben:
> On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote:
> > Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
> >> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
> >> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> >> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> >> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >> >> >> >>    time. I'm wondering if pausing all block jobs between
> >> >> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >> >> >>    trick. Opinions?
> >> >> >> >
> >> >> >> > I would have to read up the details of the problem again, but I think
> >> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> >> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> >> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> >> >> > function itself, but there shouldn't be much more to it than that.
> >> >> >> 
> >> >> >> I think that's not enough, see point 2) here:
> >> >> >> 
> >> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> >> >> 
> >> >> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
> >> >> >>    understand it it prevents requests from a different AioContext.
> >> >> >>    Since all BDS in the same chain share the same context it does not
> >> >> >>    really help here."
> >> >> >
> >> >> > Yes, that's the part I meant with pulling up the calls.
> >> >> >
> >> >> > If I understand correctly, the problem is that first bdrv_reopen_queue()
> >> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> >> >> > running requests and can indirectly trigger a graph modification, and
> >> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> >> >> > any more.
> >> >> >
> >> >> > The solution to that should be simply changing the order of things:
> >> >> >
> >> >> > 1. bdrv_drained_begin()
> >> >> > 2. bdrv_reopen_queue()
> >> >> > 3. bdrv_reopen_multiple()
> >> >> >     * Instead of bdrv_drain_all(), assert that no requests are pending
> >> >> >     * We don't run requests, so we can't complete a block job and
> >> >> >       manipulate the graph any more
> >> >> > 4. then bdrv_drained_end()
> >> >> 
> >> >> This doesn't work. Here's what happens:
> >> >> 
> >> >> 1) Block job (a) starts (block-stream).
> >> >> 
> >> >> 2) Block job (b) starts (block-stream, or block-commit).
> >> >> 
> >> >> 3) job (b) calls bdrv_reopen() and does the drain call.
> >> >> 
> >> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
> >> >>    There are no pending requests at this point, but job (a) is sleeping.
> >> >> 
> >> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
> >> >>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
> >> >>    qemu_coroutine_yield().
> >> >
> >> > I think between here and the next step is what I don't understand.
> >> >
> >> > bdrv_reopen_multiple() is not called in coroutine context, right? All
> >> > block jobs use block_job_defer_to_main_loop() before they call
> >> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
> >> > shortcut, but use a nested event loop.
> >> 
> >> When bdrv_flush() is not called in coroutine context it does
> >> qemu_coroutine_create() + qemu_coroutine_enter().
> >
> > Right, but if the coroutine yields, we jump back to the caller, which
> > looks like this:
> >
> >     co = qemu_coroutine_create(bdrv_flush_co_entry);
> >     qemu_coroutine_enter(co, &rwco);
> >     while (rwco.ret == NOT_DONE) {
> >         aio_poll(aio_context, true);
> >     }
> >
> > So this loops until the flush has completed. The only way I can see how
> > something else (job (a)) can run is if aio_poll() calls it.
> 
> If I'm not wrong that can happen if job (a) is sleeping.
> 
> If job (a) is launched with a rate limit, then the bdrv_drain() call
> will not drain it completely but return when the block job hits the I/O
> limit -> block_job_sleep_ns() -> co_aio_sleep_ns().

So timers. Currently, bdrv_drained_begin/end disables the FD handlers,
but not timers. Not sure if that's easily fixable in a generic way,
though, so maybe we need to disable timers one by one.

Pausing a block job does not actually cause the timer to be cancelled.
Maybe we need to change block_job_sleep_ns() so that the function
returns only when the job isn't paused any more. Maybe something like:

    job->busy = false;
    if (!block_job_is_paused(job)) {
        co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns);
        /* The job could be paused now */
    }
    if (block_job_is_paused(job)) {
        qemu_coroutine_yield();
    }
    job->busy = true;

Then it should be safe to assume that if block jobs are paused (even if
they were sleeping), they won't issue new requests any more until they
are resumed.

> >> > I don't fully understand the problem yet, but as a shot in the
> >> > dark, would pausing block jobs in bdrv_drained_begin() help?
> >> 
> >> Yeah, my impression is that pausing all jobs during bdrv_reopen()
> >> should be enough.
> >
> > If you base your patches on top of my queue (which I think you already
> > do for the greatest part), the nicest way to implement this would
> > probably be that BlockBackends give their users a callback for
> > .drained_begin/end and the jobs implement that as pausing themselves.
> >
> > We could, of course, directly pause block jobs in
> > bdrv_drained_begin(), but that would feel a bit hackish. So maybe do
> > that for a quick attempt whether it helps, and if it does, we can
> > write the real thing then.
> 
> I'll try that.
> 
> On a related note, bdrv_drain_all() already pauses all block jobs. The
> problem is that when it ends it resumes them, so it can happen that
> there's again pending I/O requests before bdrv_drain_all() returns.
> 
> That sounds like a problem to me, or am I overlooking anything?

Essentially every bdrv_drain(_all) call is a problem, it should always
be a bdrv_drained_begin/end section. The only exception is where you
want to drain only requests from a single source that you know doesn't
send new requests (e.g. draining guest requests in vm_stop).

Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer
  2016-05-17 14:47                           ` Kevin Wolf
@ 2016-05-17 14:54                             ` Alberto Garcia
  0 siblings, 0 replies; 56+ messages in thread
From: Alberto Garcia @ 2016-05-17 14:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Tue 17 May 2016 04:47:57 PM CEST, Kevin Wolf wrote:
>> > Right, but if the coroutine yields, we jump back to the caller, which
>> > looks like this:
>> >
>> >     co = qemu_coroutine_create(bdrv_flush_co_entry);
>> >     qemu_coroutine_enter(co, &rwco);
>> >     while (rwco.ret == NOT_DONE) {
>> >         aio_poll(aio_context, true);
>> >     }
>> >
>> > So this loops until the flush has completed. The only way I can see
>> > how something else (job (a)) can run is if aio_poll() calls it.
>> 
>> If I'm not wrong that can happen if job (a) is sleeping.
>> 
>> If job (a) is launched with a rate limit, then the bdrv_drain() call
>> will not drain it completely but return when the block job hits the
>> I/O limit -> block_job_sleep_ns() -> co_aio_sleep_ns().
>
> So timers. Currently, bdrv_drained_begin/end disables the FD handlers,
> but not timers. Not sure if that's easily fixable in a generic way,
> though, so maybe we need to disable timers one by one.
>
> Pausing a block job does not actually cause the timer to be cancelled.
> Maybe we need to change block_job_sleep_ns() so that the function
> returns only when the job isn't paused any more. Maybe something like:
>
>     job->busy = false;
>     if (!block_job_is_paused(job)) {
>         co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns);
>         /* The job could be paused now */
>     }
>     if (block_job_is_paused(job)) {
>         qemu_coroutine_yield();
>     }
>     job->busy = true;
>
> Then it should be safe to assume that if block jobs are paused (even
> if they were sleeping), they won't issue new requests any more until
> they are resumed.

That looks good, I'll give it a try.

Berto

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-05-17 14:55 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-04 13:43 [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Alberto Garcia
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs Alberto Garcia
2016-04-27 11:59   ` Max Reitz
2016-04-29 14:22   ` Kevin Wolf
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all() Alberto Garcia
2016-04-27 12:04   ` Max Reitz
2016-04-27 12:08     ` Alberto Garcia
2016-04-29 14:25   ` Kevin Wolf
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs() Alberto Garcia
2016-04-27 12:09   ` Max Reitz
2016-04-29 14:32   ` Kevin Wolf
2016-05-02 13:06     ` Alberto Garcia
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close() Alberto Garcia
2016-04-27 12:14   ` Max Reitz
2016-04-29 14:38   ` Kevin Wolf
2016-05-02 13:42     ` Alberto Garcia
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node Alberto Garcia
2016-04-27 12:30   ` Max Reitz
2016-04-27 14:59     ` Alberto Garcia
2016-04-29 15:00   ` Kevin Wolf
2016-05-06 10:00     ` Alberto Garcia
2016-05-06 17:54       ` John Snow
2016-05-09  7:06         ` Kevin Wolf
2016-05-09 11:59         ` Alberto Garcia
2016-04-29 15:25   ` Eric Blake
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer Alberto Garcia
2016-04-27 13:04   ` Max Reitz
2016-04-28  9:23     ` Alberto Garcia
2016-04-29 15:07   ` Kevin Wolf
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for " Alberto Garcia
2016-04-27 13:34   ` Max Reitz
2016-04-28 12:20     ` Alberto Garcia
2016-04-29 15:18       ` Kevin Wolf
2016-05-03 12:50         ` Alberto Garcia
2016-05-03 13:23           ` Kevin Wolf
2016-05-03 13:33             ` Alberto Garcia
2016-05-03 13:48               ` Kevin Wolf
2016-05-03 15:09                 ` Alberto Garcia
     [not found]                 ` <w517fezo0al.fsf@maestria.local.igalia.com>
2016-05-12 15:04                   ` Kevin Wolf
     [not found]                     ` <w514ma3nwbl.fsf@maestria.local.igalia.com>
2016-05-12 15:28                       ` Kevin Wolf
2016-05-17 14:26                         ` Alberto Garcia
2016-05-17 14:47                           ` Kevin Wolf
2016-05-17 14:54                             ` Alberto Garcia
2016-04-29 15:11   ` Kevin Wolf
2016-05-03 12:53     ` Alberto Garcia
2016-05-03 13:18       ` Kevin Wolf
2016-05-03 13:29         ` Alberto Garcia
2016-04-29 15:29   ` Eric Blake
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 08/11] docs: Document how to stream " Alberto Garcia
2016-04-04 13:43 ` [Qemu-devel] [PATCH v9 09/11] qemu-iotests: test streaming " Alberto Garcia
2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 10/11] qemu-iotests: test overlapping block-stream operations Alberto Garcia
2016-04-27 13:48   ` Max Reitz
2016-04-27 15:02     ` Alberto Garcia
2016-04-04 13:44 ` [Qemu-devel] [PATCH v9 11/11] qemu-iotests: test non-overlapping " Alberto Garcia
2016-04-27 13:50   ` Max Reitz
2016-04-08 10:00 ` [Qemu-devel] [PATCH for-2.7 v9 00/11] Support streaming to an intermediate layer Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.