* [PATCH 1/4] mirror: Move open_backing_file to exit_common
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
@ 2021-04-09 12:04 ` Max Reitz
2021-04-09 12:04 ` [PATCH 2/4] mirror: Do not enter a paused job on completion Max Reitz
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2021-04-09 12:04 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
Max Reitz
This is a graph change and therefore should be done in job-finalize
(which is what invokes mirror_exit_common()).
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/mirror.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index d7e54c0ff7..f1f936bf1a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -689,6 +689,14 @@ static int mirror_exit_common(Job *job)
ret = -EPERM;
}
}
+ } else if (!abort && s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
+ assert(!bdrv_backing_chain_next(target_bs));
+ ret = bdrv_open_backing_file(bdrv_skip_filters(target_bs), NULL,
+ "backing", &local_err);
+ if (ret < 0) {
+ error_report_err(local_err);
+ local_err = NULL;
+ }
}
if (s->to_replace) {
@@ -1107,9 +1115,6 @@ immediate_exit:
static void mirror_complete(Job *job, Error **errp)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
- BlockDriverState *target;
-
- target = blk_bs(s->target);
if (!s->synced) {
error_setg(errp, "The active block job '%s' cannot be completed",
@@ -1117,17 +1122,6 @@ static void mirror_complete(Job *job, Error **errp)
return;
}
- if (s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
- int ret;
-
- assert(!bdrv_backing_chain_next(target));
- ret = bdrv_open_backing_file(bdrv_skip_filters(target), NULL,
- "backing", errp);
- if (ret < 0) {
- return;
- }
- }
-
/* block all operations on to_replace bs */
if (s->replaces) {
AioContext *replace_aio_context;
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] mirror: Do not enter a paused job on completion
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
2021-04-09 12:04 ` [PATCH 1/4] mirror: Move open_backing_file to exit_common Max Reitz
@ 2021-04-09 12:04 ` Max Reitz
2021-04-09 12:04 ` [PATCH 3/4] job: Allow complete for jobs on standby Max Reitz
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2021-04-09 12:04 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
Max Reitz
Currently, it is impossible to complete jobs on standby (i.e. paused
ready jobs), but actually the only thing in mirror_complete() that does
not work quite well with a paused job is the job_enter() at the end.
If we make it conditional, this function works just fine even if the
mirror job is paused.
So technically this is a no-op, but obviously the intention is to accept
block-job-complete even for jobs on standby, which we need this patch
for first.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/mirror.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/mirror.c b/block/mirror.c
index f1f936bf1a..5a71bd8bbc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1148,7 +1148,11 @@ static void mirror_complete(Job *job, Error **errp)
}
s->should_complete = true;
- job_enter(job);
+
+ /* If the job is paused, it will be re-entered when it is resumed */
+ if (!job->paused) {
+ job_enter(job);
+ }
}
static void coroutine_fn mirror_pause(Job *job)
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] job: Allow complete for jobs on standby
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
2021-04-09 12:04 ` [PATCH 1/4] mirror: Move open_backing_file to exit_common Max Reitz
2021-04-09 12:04 ` [PATCH 2/4] mirror: Do not enter a paused job on completion Max Reitz
@ 2021-04-09 12:04 ` Max Reitz
2021-04-09 12:04 ` [PATCH 4/4] test-blockjob: Test job_wait_unpaused() Max Reitz
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2021-04-09 12:04 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
Max Reitz
The only job that implements .complete is the mirror job, and it can
handle completion requests just fine while the job is paused.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
job.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/job.c b/job.c
index 289edee143..4aff13d95a 100644
--- a/job.c
+++ b/job.c
@@ -56,7 +56,7 @@ bool JobVerbTable[JOB_VERB__MAX][JOB_STATUS__MAX] = {
[JOB_VERB_PAUSE] = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
[JOB_VERB_RESUME] = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
[JOB_VERB_SET_SPEED] = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
- [JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
+ [JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0},
[JOB_VERB_FINALIZE] = {0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0},
[JOB_VERB_DISMISS] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
};
@@ -991,7 +991,7 @@ void job_complete(Job *job, Error **errp)
if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
return;
}
- if (job->pause_count || job_is_cancelled(job) || !job->driver->complete) {
+ if (job_is_cancelled(job) || !job->driver->complete) {
error_setg(errp, "The active block job '%s' cannot be completed",
job->id);
return;
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] test-blockjob: Test job_wait_unpaused()
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
` (2 preceding siblings ...)
2021-04-09 12:04 ` [PATCH 3/4] job: Allow complete for jobs on standby Max Reitz
@ 2021-04-09 12:04 ` Max Reitz
2021-04-09 12:04 ` [PATCH 5/4] iotests: Test completion immediately after drain Max Reitz
2021-04-09 16:15 ` [PATCH 0/4] job: Allow complete for jobs on standby Kevin Wolf
5 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2021-04-09 12:04 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
Max Reitz
Create a job that remains on STANDBY after a drained section, and see
that invoking job_wait_unpaused() will get it unstuck.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
tests/unit/test-blockjob.c | 121 +++++++++++++++++++++++++++++++++++++
1 file changed, 121 insertions(+)
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 7519847912..dcacfa6c7c 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -16,6 +16,7 @@
#include "block/blockjob_int.h"
#include "sysemu/block-backend.h"
#include "qapi/qmp/qdict.h"
+#include "iothread.h"
static const BlockJobDriver test_block_job_driver = {
.job_driver = {
@@ -375,6 +376,125 @@ static void test_cancel_concluded(void)
cancel_common(s);
}
+/* (See test_yielding_driver for the job description) */
+typedef struct YieldingJob {
+ BlockJob common;
+ bool should_complete;
+} YieldingJob;
+
+static void yielding_job_complete(Job *job, Error **errp)
+{
+ YieldingJob *s = container_of(job, YieldingJob, common.job);
+ s->should_complete = true;
+ job_enter(job);
+}
+
+static int coroutine_fn yielding_job_run(Job *job, Error **errp)
+{
+ YieldingJob *s = container_of(job, YieldingJob, common.job);
+
+ job_transition_to_ready(job);
+
+ while (!s->should_complete) {
+ job_yield(job);
+ }
+
+ return 0;
+}
+
+/*
+ * This job transitions immediately to the READY state, and then
+ * yields until it is to complete.
+ */
+static const BlockJobDriver test_yielding_driver = {
+ .job_driver = {
+ .instance_size = sizeof(YieldingJob),
+ .free = block_job_free,
+ .user_resume = block_job_user_resume,
+ .run = yielding_job_run,
+ .complete = yielding_job_complete,
+ },
+};
+
+/*
+ * Test that job_complete() works even on jobs that are in a paused
+ * state (i.e., STANDBY).
+ *
+ * To do this, run YieldingJob in an IO thread, get it into the READY
+ * state, then have a drained section. Before ending the section,
+ * acquire the context so the job will not be entered and will thus
+ * remain on STANDBY.
+ *
+ * job_complete() should still work without error.
+ *
+ * Note that on the QMP interface, it is impossible to lock an IO
+ * thread before a drained section ends. In practice, the
+ * bdrv_drain_all_end() and the aio_context_acquire() will be
+ * reversed. However, that makes for worse reproducibility here:
+ * Sometimes, the job would no longer be in STANDBY then but already
+ * be started. We cannot prevent that, because the IO thread runs
+ * concurrently. We can only prevent it by taking the lock before
+ * ending the drained section, so we do that.
+ *
+ * (You can reverse the order of operations and most of the time the
+ * test will pass, but sometimes the assert(status == STANDBY) will
+ * fail.)
+ */
+static void test_complete_in_standby(void)
+{
+ BlockBackend *blk;
+ IOThread *iothread;
+ AioContext *ctx;
+ Job *job;
+ BlockJob *bjob;
+
+ /* Create a test drive, move it to an IO thread */
+ blk = create_blk(NULL);
+ iothread = iothread_new();
+
+ ctx = iothread_get_aio_context(iothread);
+ blk_set_aio_context(blk, ctx, &error_abort);
+
+ /* Create our test job */
+ bjob = mk_job(blk, "job", &test_yielding_driver, true,
+ JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
+ job = &bjob->job;
+ assert(job->status == JOB_STATUS_CREATED);
+
+ /* Wait for the job to become READY */
+ job_start(job);
+ aio_context_acquire(ctx);
+ AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
+ aio_context_release(ctx);
+
+ /* Begin the drained section, pausing the job */
+ bdrv_drain_all_begin();
+ assert(job->status == JOB_STATUS_STANDBY);
+ /* Lock the IO thread to prevent the job from being run */
+ aio_context_acquire(ctx);
+ /* This will schedule the job to resume it */
+ bdrv_drain_all_end();
+
+ /* But the job cannot run, so it will remain on standby */
+ assert(job->status == JOB_STATUS_STANDBY);
+
+ /* Even though the job is on standby, this should work */
+ job_complete(job, &error_abort);
+
+ /* The test is done now, clean up. */
+ job_finish_sync(job, NULL, &error_abort);
+ assert(job->status == JOB_STATUS_PENDING);
+
+ job_finalize(job, &error_abort);
+ assert(job->status == JOB_STATUS_CONCLUDED);
+
+ job_dismiss(&job, &error_abort);
+
+ destroy_blk(blk);
+ aio_context_release(ctx);
+ iothread_join(iothread);
+}
+
int main(int argc, char **argv)
{
qemu_init_main_loop(&error_abort);
@@ -389,5 +509,6 @@ int main(int argc, char **argv)
g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
+ g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby);
return g_test_run();
}
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/4] iotests: Test completion immediately after drain
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
` (3 preceding siblings ...)
2021-04-09 12:04 ` [PATCH 4/4] test-blockjob: Test job_wait_unpaused() Max Reitz
@ 2021-04-09 12:04 ` Max Reitz
2021-04-09 16:15 ` [PATCH 0/4] job: Allow complete for jobs on standby Kevin Wolf
5 siblings, 0 replies; 7+ messages in thread
From: Max Reitz @ 2021-04-09 12:04 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
Max Reitz
Test what happens when you have multiple busy block jobs, drain all (via
an empty transaction), and immediately issue a block-job-complete on one
of the jobs.
Sometimes it will still be in STANDBY, in which case block-job-complete
used to fail. It should not.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
.../tests/mirror-complete-after-drain | 89 +++++++++++++++++++
.../tests/mirror-complete-after-drain.out | 14 +++
2 files changed, 103 insertions(+)
create mode 100755 tests/qemu-iotests/tests/mirror-complete-after-drain
create mode 100644 tests/qemu-iotests/tests/mirror-complete-after-drain.out
diff --git a/tests/qemu-iotests/tests/mirror-complete-after-drain b/tests/qemu-iotests/tests/mirror-complete-after-drain
new file mode 100755
index 0000000000..b096ffbcb4
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-complete-after-drain
@@ -0,0 +1,89 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Tests for block-job-complete immediately after a drain
+#
+# Copyright (c) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+import iotests
+
+iotests.script_initialize(supported_fmts=['raw'])
+
+DISK_JOBS = 4
+NULL_JOBS = 1
+
+
+# We cannot auto-generate these in a loop because the files are
+# deleted when their scope ends
+src_imgs = iotests.file_path('src0', 'src1', 'src2', 'src3')
+dst_imgs = iotests.file_path('dst0', 'dst1', 'dst2', 'dst3')
+
+assert len(src_imgs) == DISK_JOBS
+assert len(dst_imgs) == DISK_JOBS
+
+
+for i in range(DISK_JOBS):
+ ret = iotests.qemu_img('create', '-f', iotests.imgfmt, src_imgs[i], '128M')
+ assert ret == 0
+
+ ret = iotests.qemu_img('create', '-f', iotests.imgfmt, dst_imgs[i], '128M')
+ assert ret == 0
+
+with iotests.VM() as vm:
+ vm.add_object('iothread,id=iothr0')
+ vm.add_device('virtio-scsi,iothread=iothr0')
+
+ for i in range(DISK_JOBS):
+ vm.add_blockdev(f'file,node-name=source-disk-{i},'
+ f'filename={src_imgs[i]}')
+
+ vm.add_blockdev(f'file,node-name=target-disk-{i},'
+ f'filename={dst_imgs[i]}')
+
+ vm.add_device(f'scsi-hd,id=device-disk-{i},drive=source-disk-{i}')
+
+ for i in range(NULL_JOBS):
+ vm.add_blockdev(f'null-co,node-name=source-null-{i},read-zeroes=on')
+ vm.add_blockdev(f'null-co,node-name=target-null-{i},read-zeroes=on')
+ vm.add_device(f'scsi-hd,id=device-null-{i},drive=source-null-{i}')
+
+ vm.launch()
+
+ for i in range(DISK_JOBS):
+ vm.qmp_log('blockdev-mirror',
+ job_id=f'mirror-disk-{i}',
+ device=f'source-disk-{i}',
+ target=f'target-disk-{i}',
+ sync='full',
+ granularity=1048576,
+ buf_size=(16 * 1048576))
+
+ for i in range(NULL_JOBS):
+ vm.qmp_log('blockdev-mirror',
+ job_id=f'mirror-null-{i}',
+ device=f'source-null-{i}',
+ target=f'target-null-{i}',
+ sync='full')
+
+ for i in range(DISK_JOBS + NULL_JOBS):
+ vm.event_wait('BLOCK_JOB_READY')
+
+ for i in range(DISK_JOBS):
+ vm.hmp(f'qemu-io -d device-disk-{i} "write 0 128M"')
+
+ vm.qmp_log('transaction', actions=[])
+ vm.qmp_log('block-job-complete', device='mirror-null-0')
diff --git a/tests/qemu-iotests/tests/mirror-complete-after-drain.out b/tests/qemu-iotests/tests/mirror-complete-after-drain.out
new file mode 100644
index 0000000000..4d9d0529fe
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-complete-after-drain.out
@@ -0,0 +1,14 @@
+{"execute": "blockdev-mirror", "arguments": {"buf-size": 16777216, "device": "source-disk-0", "granularity": 1048576, "job-id": "mirror-disk-0", "sync": "full", "target": "target-disk-0"}}
+{"return": {}}
+{"execute": "blockdev-mirror", "arguments": {"buf-size": 16777216, "device": "source-disk-1", "granularity": 1048576, "job-id": "mirror-disk-1", "sync": "full", "target": "target-disk-1"}}
+{"return": {}}
+{"execute": "blockdev-mirror", "arguments": {"buf-size": 16777216, "device": "source-disk-2", "granularity": 1048576, "job-id": "mirror-disk-2", "sync": "full", "target": "target-disk-2"}}
+{"return": {}}
+{"execute": "blockdev-mirror", "arguments": {"buf-size": 16777216, "device": "source-disk-3", "granularity": 1048576, "job-id": "mirror-disk-3", "sync": "full", "target": "target-disk-3"}}
+{"return": {}}
+{"execute": "blockdev-mirror", "arguments": {"device": "source-null-0", "job-id": "mirror-null-0", "sync": "full", "target": "target-null-0"}}
+{"return": {}}
+{"execute": "transaction", "arguments": {"actions": []}}
+{"return": {}}
+{"execute": "block-job-complete", "arguments": {"device": "mirror-null-0"}}
+{"return": {}}
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] job: Allow complete for jobs on standby
2021-04-09 12:04 [PATCH 0/4] job: Allow complete for jobs on standby Max Reitz
` (4 preceding siblings ...)
2021-04-09 12:04 ` [PATCH 5/4] iotests: Test completion immediately after drain Max Reitz
@ 2021-04-09 16:15 ` Kevin Wolf
5 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2021-04-09 16:15 UTC (permalink / raw)
To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block
Am 09.04.2021 um 14:04 hat Max Reitz geschrieben:
> Hi,
>
> We sometimes have a problem with jobs remaining on STANDBY after a drain
> (for a short duration), so if the user immediately issues a
> block-job-complete, that will fail.
>
> (See also
> https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00215.html,
> which this series is an alternative for.)
>
> Looking at the only implementation of .complete(), which is
> mirror_complete(), it looks like there is basically nothing that would
> prevent it from being run while mirror is paused. Really only the
> job_enter() at the end, which we should not and need not do when the job
> is paused.
>
> So make that conditional (patch 2), clean up the function on the way
> (patch 1, which moves one of its blocks to mirror_exit_common()), and
> then we can allow job_complete() on jobs that are on standby (patch 3).
>
> Patch 4 is basically the same test as in
> https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00214.html,
> except some comments are different and, well, job_complete() just works
> on STANDBY jobs.
>
> Patch 5 is an iotest that may or may not show the problem for you. I’ve
> tuned the numbers so that on my machine, it fails about 50/50 without
> this series (i.e., the job is still on STANDBY and job_complete()
> refuses to do anything).
>
> I’m not sure we want that iotest, because it does quite a bit of I/O and
> it’s unreliable, and I don’t think there’s anything I can do to make it
> reliable.
Thanks, applied patches 1-4 to the block branch.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread