qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete
@ 2021-04-08 16:20 Max Reitz
  2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-08 16:20 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Hi,

See patch 1 for a detailed explanation of the problem.

The gist is: Draining a READY job makes it transition to STANDBY, and
jobs on STANDBY cannot be completed.  Ending the drained section will
schedule the job (so it is then resumed), but not wait until it is
actually running again.

Therefore, it can happen that issuing block-job-complete fails when you
issue it right after some draining operation.

I tried to come up with an iotest reproducer, but in the end I only got
something that reproduced the issue like 2/10 times, and it required
heavy I/O, so it is nothing I would like to have as part of the iotests.
Instead, I opted for a unit test, which allows me to cheat a bit
(specifically, locking the job IO thread before ending the drained
section).


Max Reitz (3):
  job: Add job_wait_unpaused() for block-job-complete
  test-blockjob: Test job_wait_unpaused()
  iotests/041: block-job-complete on user-paused job

 include/qemu/job.h         |  15 ++++
 blockdev.c                 |   3 +
 job.c                      |  42 +++++++++++
 tests/unit/test-blockjob.c | 140 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/041     |  13 +++-
 5 files changed, 212 insertions(+), 1 deletion(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
@ 2021-04-08 16:20 ` Max Reitz
  2021-04-08 16:55   ` John Snow
  2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
  2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-08 16:20 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

block-job-complete can only be applied when the job is READY, not when
it is on STANDBY (ready, but paused).  Draining a job technically pauses
it (which makes a READY job enter STANDBY), and ending the drained
section does not synchronously resume it, but only schedules the job,
which will then be resumed.  So attempting to complete a job immediately
after a drained section may sometimes fail.

That is bad at least because users cannot really work nicely around
this: A job may be paused and resumed at any time, so waiting for the
job to be in the READY state and then issuing a block-job-complete poses
a TOCTTOU problem.  The only way around it would be to issue
block-job-complete until it no longer fails due to the job being in the
STANDBY state, but that would not be nice.

We can solve the problem by allowing block-job-complete to be invoked on
jobs that are on STANDBY, if that status is the result of a drained
section (not because the user has paused the job), and that section has
ended.  That is, if the job is on STANDBY, but scheduled to be resumed.

Perhaps we could actually just directly allow this, seeing that mirror
is the only user of ready/complete, and that mirror_complete() could
probably work under the given circumstances, but there may be many side
effects to consider.

It is simpler to add a function job_wait_unpaused() that waits for the
job to be resumed (under said circumstances), and to make
qmp_block_job_complete() use it to delay job_complete() until then.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h | 15 +++++++++++++++
 blockdev.c         |  3 +++
 job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index efc6fa7544..cf3082b6d7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
 
+/**
+ * If the job has been paused because of a drained section, and that
+ * section has ended, wait until the job is resumed.
+ *
+ * Return 0 if the job is not paused, or if it has been successfully
+ * resumed.
+ * Return an error if the job has been paused in such a way that
+ * waiting will not resume it, i.e. if it has been paused by the user,
+ * or if it is still drained.
+ *
+ * Callers must be in the home AioContext and hold the AioContext lock
+ * of job->aio_context.
+ */
+int job_wait_unpaused(Job *job, Error **errp);
+
 #endif
diff --git a/blockdev.c b/blockdev.c
index a57590aae4..c0cc2fa364 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
         return;
     }
 
+    if (job_wait_unpaused(&job->job, errp) < 0) {
+        return;
+    }
     trace_qmp_block_job_complete(job);
     job_complete(&job->job, errp);
     aio_context_release(aio_context);
diff --git a/job.c b/job.c
index 289edee143..1ea30fd294 100644
--- a/job.c
+++ b/job.c
@@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
     job_unref(job);
     return ret;
 }
+
+int job_wait_unpaused(Job *job, Error **errp)
+{
+    /*
+     * Only run this function from the main context, because this is
+     * what we need, and this way we do not have to think about what
+     * happens if the user concurrently pauses the job from the main
+     * monitor.
+     */
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+    /*
+     * Quick path (e.g. so we do not get an error if pause_count > 0
+     * but the job is not even paused)
+     */
+    if (!job->paused) {
+        return 0;
+    }
+
+    /* If the user has paused the job, waiting will not help */
+    if (job->user_paused) {
+        error_setg(errp, "Job '%s' has been paused by the user", job->id);
+        return -EBUSY;
+    }
+
+    /* Similarly, if the job is still drained, waiting will not help either */
+    if (job->pause_count > 0) {
+        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
+        return -EBUSY;
+    }
+
+    /*
+     * This function is specifically for waiting for a job to be
+     * resumed after a drained section.  Ending the drained section
+     * includes a job_enter(), which schedules the job loop to be run,
+     * and once it does, job->paused will be cleared.  Therefore, we
+     * do not need to invoke job_enter() here.
+     */
+    AIO_WAIT_WHILE(job->aio_context, job->paused);
+
+    return 0;
+}
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused()
  2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
  2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
@ 2021-04-08 16:20 ` Max Reitz
  2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
  2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow
  3 siblings, 0 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-08 16:20 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Create a job that remains on STANDBY after a drained section, and see
that invoking job_wait_unpaused() will get it unstuck.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/unit/test-blockjob.c | 140 +++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)

diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 7519847912..b7736e298d 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -16,6 +16,7 @@
 #include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/qmp/qdict.h"
+#include "iothread.h"
 
 static const BlockJobDriver test_block_job_driver = {
     .job_driver = {
@@ -375,6 +376,144 @@ static void test_cancel_concluded(void)
     cancel_common(s);
 }
 
+/* (See test_yielding_driver for the job description) */
+typedef struct YieldingJob {
+    BlockJob common;
+    bool should_complete;
+} YieldingJob;
+
+static void yielding_job_complete(Job *job, Error **errp)
+{
+    YieldingJob *s = container_of(job, YieldingJob, common.job);
+    s->should_complete = true;
+    job_enter(job);
+}
+
+static int coroutine_fn yielding_job_run(Job *job, Error **errp)
+{
+    YieldingJob *s = container_of(job, YieldingJob, common.job);
+
+    job_transition_to_ready(job);
+
+    while (!s->should_complete) {
+        job_yield(job);
+    }
+
+    return 0;
+}
+
+/*
+ * This job transitions immediately to the READY state, and then
+ * yields until it is to complete.
+ */
+static const BlockJobDriver test_yielding_driver = {
+    .job_driver = {
+        .instance_size  = sizeof(YieldingJob),
+        .free           = block_job_free,
+        .user_resume    = block_job_user_resume,
+        .run            = yielding_job_run,
+        .complete       = yielding_job_complete,
+    },
+};
+
+/*
+ * Test that job_wait_unpaused() can get jobs from a paused state to
+ * a running state so that job_complete() can be applied (assuming the
+ * pause occurred due to a drain that has already been lifted).
+ * (This is what QMP's block-job-complete does so it can be executed
+ * even immediately after some other operation instated and lifted a
+ * drain.)
+ *
+ * To do this, run YieldingJob in an IO thread, get it into the READY
+ * state, then have a drained section.  Before ending the section,
+ * acquire the context so the job will not be entered and will thus
+ * remain on STANDBY.
+ *
+ * Invoking job_complete() then will fail.
+ *
+ * However, job_wait_unpaused() should see the job is to be resumed,
+ * wait for it to be resumed, and then we can invoke job_complete()
+ * without error.
+ *
+ * Note that on the QMP interface, it is impossible to lock an IO
+ * thread before a drained section ends.  In practice, the
+ * bdrv_drain_all_end() and the aio_context_acquire() will be
+ * reversed.  However, that makes for worse reproducibility here:
+ * Sometimes, the job would no longer be in STANDBY then but already
+ * be started.  We cannot prevent that, because the IO thread runs
+ * concurrently.  We can only prevent it by taking the lock before
+ * ending the drained section, so we do that.
+ *
+ * (You can reverse the order of operations and most of the time the
+ * test will pass, but sometimes the assert(status == STANDBY) will
+ * fail.)
+ */
+static void test_complete_in_standby(void)
+{
+    BlockBackend *blk;
+    IOThread *iothread;
+    AioContext *ctx;
+    Job *job;
+    BlockJob *bjob;
+    Error *local_err = NULL;
+
+    /* Create a test drive, move it to an IO thread */
+    blk = create_blk(NULL);
+    iothread = iothread_new();
+
+    ctx = iothread_get_aio_context(iothread);
+    blk_set_aio_context(blk, ctx, &error_abort);
+
+    /* Create our test job */
+    bjob = mk_job(blk, "job", &test_yielding_driver, true,
+                  JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
+    job = &bjob->job;
+    assert(job->status == JOB_STATUS_CREATED);
+
+    /* Wait for the job to become READY */
+    job_start(job);
+    aio_context_acquire(ctx);
+    AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
+    aio_context_release(ctx);
+
+    /* Begin the drained section, pausing the job */
+    bdrv_drain_all_begin();
+    assert(job->status == JOB_STATUS_STANDBY);
+    /* Lock the IO thread to prevent the job from being run */
+    aio_context_acquire(ctx);
+    /* This will schedule the job to resume it */
+    bdrv_drain_all_end();
+
+    /* But the job cannot run, so it will remain on standby */
+    assert(job->status == JOB_STATUS_STANDBY);
+
+    /* A job on standby cannot be completed */
+    job_complete(job, &local_err);
+    assert(local_err != NULL);
+    error_free(local_err);
+    local_err = NULL;
+
+    /*
+     * But waiting for it and then completing it should work.
+     * (This is what qmp_block_job_complete() does.)
+     */
+    job_wait_unpaused(job, &error_abort);
+    job_complete(job, &error_abort);
+
+    /* The test is done now, clean up. */
+    job_finish_sync(job, NULL, &error_abort);
+    assert(job->status == JOB_STATUS_PENDING);
+
+    job_finalize(job, &error_abort);
+    assert(job->status == JOB_STATUS_CONCLUDED);
+
+    job_dismiss(&job, &error_abort);
+
+    destroy_blk(blk);
+    aio_context_release(ctx);
+    iothread_join(iothread);
+}
+
 int main(int argc, char **argv)
 {
     qemu_init_main_loop(&error_abort);
@@ -389,5 +528,6 @@ int main(int argc, char **argv)
     g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
     g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
     g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
+    g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby);
     return g_test_run();
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job
  2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
  2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
  2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
@ 2021-04-08 16:20 ` Max Reitz
  2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow
  3 siblings, 0 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-08 16:20 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Expand test_pause() to check what happens when issuing
block-job-complete on a job that is on STANDBY because it has been
paused by the user.  (This should be an error, and in particular not
hang job_wait_unpaused().)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/041 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 5cc02b24fc..d2c9669741 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -120,7 +120,18 @@ class TestSingleDrive(iotests.QMPTestCase):
         result = self.vm.qmp('block-job-resume', device='drive0')
         self.assert_qmp(result, 'return', {})
 
-        self.complete_and_wait()
+        self.wait_ready()
+
+        # Check that a job on STANDBY cannot be completed
+        self.pause_job('drive0')
+        result = self.vm.qmp('block-job-complete', device='drive0')
+        self.assert_qmp(result, 'error/desc',
+                        "Job 'drive0' has been paused by the user")
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait(wait_ready=False)
         self.vm.shutdown()
         self.assertTrue(iotests.compare_images(test_img, target_img),
                         'target image does not match source after mirroring')
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
@ 2021-04-08 16:55   ` John Snow
  2021-04-09  9:31     ` Max Reitz
  2021-04-09  9:44     ` Kevin Wolf
  2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 18+ messages in thread
From: John Snow @ 2021-04-08 16:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 4/8/21 12:20 PM, Max Reitz wrote:
> block-job-complete can only be applied when the job is READY, not when
> it is on STANDBY (ready, but paused).  Draining a job technically pauses
> it (which makes a READY job enter STANDBY), and ending the drained
> section does not synchronously resume it, but only schedules the job,
> which will then be resumed.  So attempting to complete a job immediately
> after a drained section may sometimes fail.
> 
> That is bad at least because users cannot really work nicely around
> this: A job may be paused and resumed at any time, so waiting for the
> job to be in the READY state and then issuing a block-job-complete poses
> a TOCTTOU problem.  The only way around it would be to issue
> block-job-complete until it no longer fails due to the job being in the
> STANDBY state, but that would not be nice.
> 
> We can solve the problem by allowing block-job-complete to be invoked on
> jobs that are on STANDBY, if that status is the result of a drained
> section (not because the user has paused the job), and that section has
> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
> 
> Perhaps we could actually just directly allow this, seeing that mirror
> is the only user of ready/complete, and that mirror_complete() could
> probably work under the given circumstances, but there may be many side
> effects to consider.
> 
> It is simpler to add a function job_wait_unpaused() that waits for the
> job to be resumed (under said circumstances), and to make
> qmp_block_job_complete() use it to delay job_complete() until then.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/qemu/job.h | 15 +++++++++++++++
>   blockdev.c         |  3 +++
>   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 60 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index efc6fa7544..cf3082b6d7 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>    */
>   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>   
> +/**
> + * If the job has been paused because of a drained section, and that
> + * section has ended, wait until the job is resumed.
> + *
> + * Return 0 if the job is not paused, or if it has been successfully
> + * resumed.
> + * Return an error if the job has been paused in such a way that
> + * waiting will not resume it, i.e. if it has been paused by the user,
> + * or if it is still drained.
> + *
> + * Callers must be in the home AioContext and hold the AioContext lock
> + * of job->aio_context.
> + */
> +int job_wait_unpaused(Job *job, Error **errp);
> +
>   #endif
> diff --git a/blockdev.c b/blockdev.c
> index a57590aae4..c0cc2fa364 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
>           return;
>       }
>   
> +    if (job_wait_unpaused(&job->job, errp) < 0) {
> +        return;
> +    }

After which point, we assume we've transitioned back to either RUNNING 
or READY, and

>       trace_qmp_block_job_complete(job);
>       job_complete(&job->job, errp);

This function checks the usual state table for permission to 
deliver/perform the verb.

>       aio_context_release(aio_context);
> diff --git a/job.c b/job.c
> index 289edee143..1ea30fd294 100644
> --- a/job.c
> +++ b/job.c
> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>       job_unref(job);
>       return ret;
>   }
> +
> +int job_wait_unpaused(Job *job, Error **errp)
> +{
> +    /*
> +     * Only run this function from the main context, because this is
> +     * what we need, and this way we do not have to think about what
> +     * happens if the user concurrently pauses the job from the main
> +     * monitor.
> +     */
> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> +
> +    /*
> +     * Quick path (e.g. so we do not get an error if pause_count > 0
> +     * but the job is not even paused)
> +     */
> +    if (!job->paused) {
> +        return 0;
> +    }
> +
> +    /* If the user has paused the job, waiting will not help */
> +    if (job->user_paused) {
> +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
> +        return -EBUSY;
> +    }
> +

Or the job has encountered an error if that error policy is set. It is 
maybe more accurate to say that the job is currently paused/halted (for 
some reason) and is awaiting the explicit unpause instruction.

"Job '%s' has been paused and needs to be explicitly resumed with 
job-resume", maybe?

Job '%s' has been paused and needs to be [explicitly] resumed
[by the user] [with job-resume]

Some combo of those runes.

> +    /* Similarly, if the job is still drained, waiting will not help either */
> +    if (job->pause_count > 0) {
> +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
> +        return -EBUSY;
> +    }
> +

This leaks an internal state detail out to the caller. In which 
circumstances does this happen? Do we expect it to?

As the user: Why is it blocked? Can I unblock it? Do I wait?

> +    /*
> +     * This function is specifically for waiting for a job to be
> +     * resumed after a drained section.  Ending the drained section
> +     * includes a job_enter(), which schedules the job loop to be run,
> +     * and once it does, job->paused will be cleared.  Therefore, we
> +     * do not need to invoke job_enter() here.
> +     */
> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
> +
> +    return 0;
> +}
> 

Looks about right to me, but you'll want Kevin's look-see for the finer 
details, of course.

My concern is that this adds a wait of an indefinite period to the 
job_complete command. We mitigate this by checking for some other 
internal state criteria first, and then by process of elimination deduce 
that it's safe to wait, as it will (likely) be very quick.

Do we open the door for ourselves to get into trouble here, either by a 
state we are forgetting to rule out (You'd have added it if you know the 
answer to this) or a hypothetical future change where we forget to 
update this function?

Not necessarily a blocker, I think, and this does solve a real problem 
fairly inexpensively.


On good faith that you understand the synchronicity issues here better 
than I do:

Reviewed-by: John Snow <jsnow@redhat.com>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
  2021-04-08 16:55   ` John Snow
@ 2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
  2021-04-08 17:04     ` John Snow
  2021-04-09  9:38     ` Max Reitz
  1 sibling, 2 replies; 18+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-08 16:58 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

08.04.2021 19:20, Max Reitz wrote:
> block-job-complete can only be applied when the job is READY, not when
> it is on STANDBY (ready, but paused).  Draining a job technically pauses
> it (which makes a READY job enter STANDBY), and ending the drained
> section does not synchronously resume it, but only schedules the job,
> which will then be resumed.  So attempting to complete a job immediately
> after a drained section may sometimes fail.
> 
> That is bad at least because users cannot really work nicely around
> this: A job may be paused and resumed at any time, so waiting for the
> job to be in the READY state and then issuing a block-job-complete poses
> a TOCTTOU problem.  The only way around it would be to issue
> block-job-complete until it no longer fails due to the job being in the
> STANDBY state, but that would not be nice.
> 
> We can solve the problem by allowing block-job-complete to be invoked on
> jobs that are on STANDBY, if that status is the result of a drained
> section (not because the user has paused the job), and that section has
> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
> 
> Perhaps we could actually just directly allow this, seeing that mirror
> is the only user of ready/complete, and that mirror_complete() could
> probably work under the given circumstances, but there may be many side
> effects to consider.
> 
> It is simpler to add a function job_wait_unpaused() that waits for the
> job to be resumed (under said circumstances), and to make
> qmp_block_job_complete() use it to delay job_complete() until then.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/qemu/job.h | 15 +++++++++++++++
>   blockdev.c         |  3 +++
>   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 60 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index efc6fa7544..cf3082b6d7 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>    */
>   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>   
> +/**
> + * If the job has been paused because of a drained section, and that
> + * section has ended, wait until the job is resumed.
> + *
> + * Return 0 if the job is not paused, or if it has been successfully
> + * resumed.
> + * Return an error if the job has been paused in such a way that
> + * waiting will not resume it, i.e. if it has been paused by the user,
> + * or if it is still drained.
> + *
> + * Callers must be in the home AioContext and hold the AioContext lock
> + * of job->aio_context.
> + */
> +int job_wait_unpaused(Job *job, Error **errp);
> +
>   #endif
> diff --git a/blockdev.c b/blockdev.c
> index a57590aae4..c0cc2fa364 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
>           return;
>       }
>   
> +    if (job_wait_unpaused(&job->job, errp) < 0) {
> +        return;
> +    }
>       trace_qmp_block_job_complete(job);
>       job_complete(&job->job, errp);
>       aio_context_release(aio_context);
> diff --git a/job.c b/job.c
> index 289edee143..1ea30fd294 100644
> --- a/job.c
> +++ b/job.c
> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>       job_unref(job);
>       return ret;
>   }
> +
> +int job_wait_unpaused(Job *job, Error **errp)
> +{
> +    /*
> +     * Only run this function from the main context, because this is
> +     * what we need, and this way we do not have to think about what
> +     * happens if the user concurrently pauses the job from the main
> +     * monitor.
> +     */
> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> +
> +    /*
> +     * Quick path (e.g. so we do not get an error if pause_count > 0
> +     * but the job is not even paused)
> +     */
> +    if (!job->paused) {
> +        return 0;
> +    }
> +
> +    /* If the user has paused the job, waiting will not help */
> +    if (job->user_paused) {
> +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
> +        return -EBUSY;
> +    }
> +
> +    /* Similarly, if the job is still drained, waiting will not help either */
> +    if (job->pause_count > 0) {
> +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
> +        return -EBUSY;
> +    }
> +
> +    /*
> +     * This function is specifically for waiting for a job to be
> +     * resumed after a drained section.  Ending the drained section
> +     * includes a job_enter(), which schedules the job loop to be run,
> +     * and once it does, job->paused will be cleared.  Therefore, we
> +     * do not need to invoke job_enter() here.
> +     */
> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
> +
> +    return 0;
> +}
> 

Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and the period when pause_count is 0 but paused is still true should be relatively shot. And patch doesn't help if user call job-complete during drained section. So it looks like the patch will help relatively seldom.. Or I'm missing something?

job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
@ 2021-04-08 17:04     ` John Snow
  2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
  2021-04-09  9:38     ` Max Reitz
  1 sibling, 1 reply; 18+ messages in thread
From: John Snow @ 2021-04-08 17:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: Kevin Wolf, qemu-devel

On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
> job-complete command is async. Can we instead just add a boolean like 
> job->completion_requested, and set it if job-complete called in STANDBY 
> state, and on job_resume job_complete will be called automatically if 
> this boolean is true?

job_complete has a synchronous setup, though -- we lose out on a lot of 
synchronous error checking in that circumstance.

I was not able to audit it to determine that it'd be safe to attempt 
that setup during a drained section -- I imagine it won't work and will 
fail, though.

So I thought we'd have to signal completion and run the setup *later*, 
but what do we do if we get an error then? Does the entire job fail? Do 
we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?

So on and so forth. Seems like a lot of things to consider, unless I am 
making a giant fuss about nothing again, not like that's ever happened. O:-)

--js



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
                   ` (2 preceding siblings ...)
  2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
@ 2021-04-08 17:09 ` John Snow
  3 siblings, 0 replies; 18+ messages in thread
From: John Snow @ 2021-04-08 17:09 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 4/8/21 12:20 PM, Max Reitz wrote:
> Hi,
> 
> See patch 1 for a detailed explanation of the problem.
> 
> The gist is: Draining a READY job makes it transition to STANDBY, and
> jobs on STANDBY cannot be completed.  Ending the drained section will
> schedule the job (so it is then resumed), but not wait until it is
> actually running again.
> 
> Therefore, it can happen that issuing block-job-complete fails when you
> issue it right after some draining operation.
> 
> I tried to come up with an iotest reproducer, but in the end I only got
> something that reproduced the issue like 2/10 times, and it required
> heavy I/O, so it is nothing I would like to have as part of the iotests.
> Instead, I opted for a unit test, which allows me to cheat a bit
> (specifically, locking the job IO thread before ending the drained
> section).
> 
> 
> Max Reitz (3):
>    job: Add job_wait_unpaused() for block-job-complete
>    test-blockjob: Test job_wait_unpaused()
>    iotests/041: block-job-complete on user-paused job
> 
>   include/qemu/job.h         |  15 ++++
>   blockdev.c                 |   3 +
>   job.c                      |  42 +++++++++++
>   tests/unit/test-blockjob.c | 140 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/041     |  13 +++-
>   5 files changed, 212 insertions(+), 1 deletion(-)
> 

Left comments and review on #1, skimmed 2/3. Not sure if it's 
appropriate for 6.0 yet, that might depend on the responses to my 
comments and other reviewers and so on.

Acked-by: John Snow <jsnow@redhat.com>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 17:04     ` John Snow
@ 2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
  2021-04-09  9:51         ` Max Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-08 17:26 UTC (permalink / raw)
  To: John Snow, Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf

08.04.2021 20:04, John Snow wrote:
> On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
>> job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true?
> 
> job_complete has a synchronous setup, though -- we lose out on a lot of synchronous error checking in that circumstance.

yes, that's a problem..

> 
> I was not able to audit it to determine that it'd be safe to attempt that setup during a drained section -- I imagine it won't work and will fail, though.
> 
> So I thought we'd have to signal completion and run the setup *later*, but what do we do if we get an error then? Does the entire job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?
> 

Isn't it possible even now, that after successful job-complete job still fails and we report BLOCK_JOB_COMPLETED with error?

And actually, how much benefit user get from the fact that job-complete may fail?

We can make job-complete a simple always-success boolean flag setter like job-pause.

And actual completion will be done in background, when possible. And if it fail, job just fails, like it does for any background io error. And user have to check error/success status of final BLOCK_JOB_COMPLETED anyway.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:55   ` John Snow
@ 2021-04-09  9:31     ` Max Reitz
  2021-04-09 10:17       ` Kevin Wolf
  2021-04-09  9:44     ` Kevin Wolf
  1 sibling, 1 reply; 18+ messages in thread
From: Max Reitz @ 2021-04-09  9:31 UTC (permalink / raw)
  To: John Snow, qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel

On 08.04.21 18:55, John Snow wrote:
> On 4/8/21 12:20 PM, Max Reitz wrote:
>> block-job-complete can only be applied when the job is READY, not when
>> it is on STANDBY (ready, but paused).  Draining a job technically pauses
>> it (which makes a READY job enter STANDBY), and ending the drained
>> section does not synchronously resume it, but only schedules the job,
>> which will then be resumed.  So attempting to complete a job immediately
>> after a drained section may sometimes fail.
>>
>> That is bad at least because users cannot really work nicely around
>> this: A job may be paused and resumed at any time, so waiting for the
>> job to be in the READY state and then issuing a block-job-complete poses
>> a TOCTTOU problem.  The only way around it would be to issue
>> block-job-complete until it no longer fails due to the job being in the
>> STANDBY state, but that would not be nice.
>>
>> We can solve the problem by allowing block-job-complete to be invoked on
>> jobs that are on STANDBY, if that status is the result of a drained
>> section (not because the user has paused the job), and that section has
>> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
>>
>> Perhaps we could actually just directly allow this, seeing that mirror
>> is the only user of ready/complete, and that mirror_complete() could
>> probably work under the given circumstances, but there may be many side
>> effects to consider.
>>
>> It is simpler to add a function job_wait_unpaused() that waits for the
>> job to be resumed (under said circumstances), and to make
>> qmp_block_job_complete() use it to delay job_complete() until then.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/qemu/job.h | 15 +++++++++++++++
>>   blockdev.c         |  3 +++
>>   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 60 insertions(+)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index efc6fa7544..cf3082b6d7 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>>    */
>>   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), 
>> Error **errp);
>> +/**
>> + * If the job has been paused because of a drained section, and that
>> + * section has ended, wait until the job is resumed.
>> + *
>> + * Return 0 if the job is not paused, or if it has been successfully
>> + * resumed.
>> + * Return an error if the job has been paused in such a way that
>> + * waiting will not resume it, i.e. if it has been paused by the user,
>> + * or if it is still drained.
>> + *
>> + * Callers must be in the home AioContext and hold the AioContext lock
>> + * of job->aio_context.
>> + */
>> +int job_wait_unpaused(Job *job, Error **errp);
>> +
>>   #endif
>> diff --git a/blockdev.c b/blockdev.c
>> index a57590aae4..c0cc2fa364 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, 
>> Error **errp)
>>           return;
>>       }
>> +    if (job_wait_unpaused(&job->job, errp) < 0) {
>> +        return;
>> +    }
> 
> After which point, we assume we've transitioned back to either RUNNING 
> or READY, and
> 
>>       trace_qmp_block_job_complete(job);
>>       job_complete(&job->job, errp);
> 
> This function checks the usual state table for permission to 
> deliver/perform the verb.
> 
>>       aio_context_release(aio_context);
>> diff --git a/job.c b/job.c
>> index 289edee143..1ea30fd294 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void 
>> (*finish)(Job *, Error **errp), Error **errp)
>>       job_unref(job);
>>       return ret;
>>   }
>> +
>> +int job_wait_unpaused(Job *job, Error **errp)
>> +{
>> +    /*
>> +     * Only run this function from the main context, because this is
>> +     * what we need, and this way we do not have to think about what
>> +     * happens if the user concurrently pauses the job from the main
>> +     * monitor.
>> +     */
>> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>> +
>> +    /*
>> +     * Quick path (e.g. so we do not get an error if pause_count > 0
>> +     * but the job is not even paused)
>> +     */
>> +    if (!job->paused) {
>> +        return 0;
>> +    }
>> +
>> +    /* If the user has paused the job, waiting will not help */
>> +    if (job->user_paused) {
>> +        error_setg(errp, "Job '%s' has been paused by the user", 
>> job->id);
>> +        return -EBUSY;
>> +    }
>> +
> 
> Or the job has encountered an error if that error policy is set. It is 
> maybe more accurate to say that the job is currently paused/halted (for 
> some reason) and is awaiting the explicit unpause instruction.
> 
> "Job '%s' has been paused and needs to be explicitly resumed with 
> job-resume", maybe?
> 
> Job '%s' has been paused and needs to be [explicitly] resumed
> [by the user] [with job-resume]
> 
> Some combo of those runes.

Sounds good.  I think I’ll go for “Job '%s' has been paused and needs to 
be explicitly resumed”.

>> +    /* Similarly, if the job is still drained, waiting will not help 
>> either */
>> +    if (job->pause_count > 0) {
>> +        error_setg(errp, "Job '%s' is blocked and cannot be 
>> unpaused", job->id);
>> +        return -EBUSY;
>> +    }
>> +
> 
> This leaks an internal state detail out to the caller. In which 
> circumstances does this happen?

Hm.  Now that you ask it.

The circumstance would be a concurrent drain in some other IO thread. 
Probably the IO thread the job runs in?  I don’t know any other thread 
that could concurrently drain, because this function runs in the main 
thread, and there shouldn’t be any drain in the background.

If it is another IO thread, waiting would indeed help, so there would 
not be a need to error out.

Perhaps it’s possible to have a background drain in the main thread?  I 
don’t think so, though...

> Do we expect it to?

I can’t say I do.

> As the user: Why is it blocked? Can I unblock it? Do I wait?

Waiting would be the strategy.

Perhaps we should bite the bullet, drop the condition and indeed just 
wait regardless of pause_count.

>> +    /*
>> +     * This function is specifically for waiting for a job to be
>> +     * resumed after a drained section.  Ending the drained section
>> +     * includes a job_enter(), which schedules the job loop to be run,
>> +     * and once it does, job->paused will be cleared.  Therefore, we
>> +     * do not need to invoke job_enter() here.
>> +     */
>> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
>> +
>> +    return 0;
>> +}
>>
> 
> Looks about right to me, but you'll want Kevin's look-see for the finer 
> details, of course.
> 
> My concern is that this adds a wait of an indefinite period to the 
> job_complete command. We mitigate this by checking for some other 
> internal state criteria first, and then by process of elimination deduce 
> that it's safe to wait, as it will (likely) be very quick.
> 
> Do we open the door for ourselves to get into trouble here, either by a 
> state we are forgetting to rule out (You'd have added it if you know the 
> answer to this) or a hypothetical future change where we forget to 
> update this function?

Well.  Who knows.

The alternatives I see are:

(A) Let drained_end wait for the block jobs to be resumed.  There are 
some details to consider there, I had some discussion around this with 
Kevin on Tuesday.  For example, should every drained_end wait for all 
jobs involved to be resumed?  (That would mean waiting for concurrent 
drained_ends, too.)  Would the drained_end_counter be the right tool for 
the job?  (None of this is unsolvable, I guess, but it would mean having 
another discussion.)
It would also mean that you basically just move the wait to wherever the 
drained_end occurs, for example to qmp_transaction().  Now, every 
drained_end is preceded by a drained_begin that always has to wait, so 
it probably isn’t bad.  OTOH, if qmp_transaction() would be allowed to 
wait for a job to be resumed, I think we can allow the same for 
qmp_block_job_complete().
(And there’s the fact that most of the time not having the block job 
running after drained_end poses no problem.  This is the first time I’m 
aware of a problem, so I think it would be preferable to wait only on 
the rare occasion where we have to.)

(B) Have block-job-complete be kind of asynchronous.  We talked about 
that on IRC yesterday, and the main problem seems to be that we don’t 
know what we’d do with errors.  We could only emit them via an event, or 
let the whole job fail, both of which seem like bad solutions.

(C) Perhaps mirror’s completion function works just fine when the job is 
paused (we just would have to skip the job_enter()).  I don’t know. 
Someone™ would need to find out.


So I can’t see (B) working, (A) is a bit of an ant’s nest that I don’t 
want to poke too much (and for the problem at hand, I don’t think it 
would be a better solution, because it would make us wait on every 
drained_end instead of only when there is a real conflict).  (C) might 
be nice, but it has the potential of giving headaches.


I can’t see future changes posing a problem, but that’s kind of the 
problem with future changes.

> Not necessarily a blocker, I think, and this does solve a real problem 
> fairly inexpensively.
> 
> 
> On good faith that you understand the synchronicity issues here better 
> than I do:

I should let you know that faith is probably misplaced.

> Reviewed-by: John Snow <jsnow@redhat.com>

Well, er, thanks?  I don’t know if I can take this now. O:)

Max



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
  2021-04-08 17:04     ` John Snow
@ 2021-04-09  9:38     ` Max Reitz
  1 sibling, 0 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-09  9:38 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, John Snow, qemu-devel

On 08.04.21 18:58, Vladimir Sementsov-Ogievskiy wrote:
> 08.04.2021 19:20, Max Reitz wrote:
>> block-job-complete can only be applied when the job is READY, not when
>> it is on STANDBY (ready, but paused).  Draining a job technically pauses
>> it (which makes a READY job enter STANDBY), and ending the drained
>> section does not synchronously resume it, but only schedules the job,
>> which will then be resumed.  So attempting to complete a job immediately
>> after a drained section may sometimes fail.
>>
>> That is bad at least because users cannot really work nicely around
>> this: A job may be paused and resumed at any time, so waiting for the
>> job to be in the READY state and then issuing a block-job-complete poses
>> a TOCTTOU problem.  The only way around it would be to issue
>> block-job-complete until it no longer fails due to the job being in the
>> STANDBY state, but that would not be nice.
>>
>> We can solve the problem by allowing block-job-complete to be invoked on
>> jobs that are on STANDBY, if that status is the result of a drained
>> section (not because the user has paused the job), and that section has
>> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
>>
>> Perhaps we could actually just directly allow this, seeing that mirror
>> is the only user of ready/complete, and that mirror_complete() could
>> probably work under the given circumstances, but there may be many side
>> effects to consider.
>>
>> It is simpler to add a function job_wait_unpaused() that waits for the
>> job to be resumed (under said circumstances), and to make
>> qmp_block_job_complete() use it to delay job_complete() until then.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/qemu/job.h | 15 +++++++++++++++
>>   blockdev.c         |  3 +++
>>   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 60 insertions(+)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index efc6fa7544..cf3082b6d7 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>>    */
>>   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), 
>> Error **errp);
>> +/**
>> + * If the job has been paused because of a drained section, and that
>> + * section has ended, wait until the job is resumed.
>> + *
>> + * Return 0 if the job is not paused, or if it has been successfully
>> + * resumed.
>> + * Return an error if the job has been paused in such a way that
>> + * waiting will not resume it, i.e. if it has been paused by the user,
>> + * or if it is still drained.
>> + *
>> + * Callers must be in the home AioContext and hold the AioContext lock
>> + * of job->aio_context.
>> + */
>> +int job_wait_unpaused(Job *job, Error **errp);
>> +
>>   #endif
>> diff --git a/blockdev.c b/blockdev.c
>> index a57590aae4..c0cc2fa364 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, 
>> Error **errp)
>>           return;
>>       }
>> +    if (job_wait_unpaused(&job->job, errp) < 0) {
>> +        return;
>> +    }
>>       trace_qmp_block_job_complete(job);
>>       job_complete(&job->job, errp);
>>       aio_context_release(aio_context);
>> diff --git a/job.c b/job.c
>> index 289edee143..1ea30fd294 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void 
>> (*finish)(Job *, Error **errp), Error **errp)
>>       job_unref(job);
>>       return ret;
>>   }
>> +
>> +int job_wait_unpaused(Job *job, Error **errp)
>> +{
>> +    /*
>> +     * Only run this function from the main context, because this is
>> +     * what we need, and this way we do not have to think about what
>> +     * happens if the user concurrently pauses the job from the main
>> +     * monitor.
>> +     */
>> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>> +
>> +    /*
>> +     * Quick path (e.g. so we do not get an error if pause_count > 0
>> +     * but the job is not even paused)
>> +     */
>> +    if (!job->paused) {
>> +        return 0;
>> +    }
>> +
>> +    /* If the user has paused the job, waiting will not help */
>> +    if (job->user_paused) {
>> +        error_setg(errp, "Job '%s' has been paused by the user", 
>> job->id);
>> +        return -EBUSY;
>> +    }
>> +
>> +    /* Similarly, if the job is still drained, waiting will not help 
>> either */
>> +    if (job->pause_count > 0) {
>> +        error_setg(errp, "Job '%s' is blocked and cannot be 
>> unpaused", job->id);
>> +        return -EBUSY;
>> +    }
>> +
>> +    /*
>> +     * This function is specifically for waiting for a job to be
>> +     * resumed after a drained section.  Ending the drained section
>> +     * includes a job_enter(), which schedules the job loop to be run,
>> +     * and once it does, job->paused will be cleared.  Therefore, we
>> +     * do not need to invoke job_enter() here.
>> +     */
>> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
>> +
>> +    return 0;
>> +}
>>
> 
> Hmm.. It seems that when job->pause_count becomes 0, job_enter is 
> called, and the period when pause_count is 0 but paused is still true 
> should be relatively shot.

Yes.

> And patch doesn't help if user call 
> job-complete during drained section.

Is that possible?

(If the drained section is in some IO thread, I suppose we could drop 
the pause_count condition as I wrote in my reply to John.  Because if 
some IO thread has drained the job, waiting will help.)

> So it looks like the patch will 
> help relatively seldom.. Or I'm missing something?

No, you’re right.  The condition is rather rare.  Which I think is good 
because it means that block-job-complete having to wait for a job will 
be a rare occurrence.

I have a Ruby test script that can reproduce it sometimes, but I’m not 
sure that it will reproduce the problem on others’ machines at all. 
I’ll try to rewrite it as a Python iotest so you can try for yourself.

> job-complete command is async. Can we instead just add a boolean like 
> job->completion_requested, and set it if job-complete called in STANDBY 
> state, and on job_resume job_complete will be called automatically if 
> this boolean is true?

job-complete is not async, it can return errors, and mirror_complete() 
has several paths that do so.  What would you do with those errors?  I 
can only imagine sending an event (which would be ignored by existing 
management software) or failing the block job (which seems wrong, but 
perhaps would be the right thing to do).

The problem is that when block-job-complete returns success, a 
BLOCK_JOB_COMPLETE is bound to follow at some point (unless the job runs 
into an error, or it cannot converge).  This contract would be broken 
(unless we let mirror_complete() fail the whole block job).

Max



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 16:55   ` John Snow
  2021-04-09  9:31     ` Max Reitz
@ 2021-04-09  9:44     ` Kevin Wolf
  2021-04-09  9:57       ` Max Reitz
  1 sibling, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2021-04-09  9:44 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Max Reitz

Am 08.04.2021 um 18:55 hat John Snow geschrieben:
> On 4/8/21 12:20 PM, Max Reitz wrote:
> > block-job-complete can only be applied when the job is READY, not when
> > it is on STANDBY (ready, but paused).  Draining a job technically pauses
> > it (which makes a READY job enter STANDBY), and ending the drained
> > section does not synchronously resume it, but only schedules the job,
> > which will then be resumed.  So attempting to complete a job immediately
> > after a drained section may sometimes fail.
> > 
> > That is bad at least because users cannot really work nicely around
> > this: A job may be paused and resumed at any time, so waiting for the
> > job to be in the READY state and then issuing a block-job-complete poses
> > a TOCTTOU problem.  The only way around it would be to issue
> > block-job-complete until it no longer fails due to the job being in the
> > STANDBY state, but that would not be nice.
> > 
> > We can solve the problem by allowing block-job-complete to be invoked on
> > jobs that are on STANDBY, if that status is the result of a drained
> > section (not because the user has paused the job), and that section has
> > ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
> > 
> > Perhaps we could actually just directly allow this, seeing that mirror
> > is the only user of ready/complete, and that mirror_complete() could
> > probably work under the given circumstances, but there may be many side
> > effects to consider.
> > 
> > It is simpler to add a function job_wait_unpaused() that waits for the
> > job to be resumed (under said circumstances), and to make
> > qmp_block_job_complete() use it to delay job_complete() until then.
> > 
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >   include/qemu/job.h | 15 +++++++++++++++
> >   blockdev.c         |  3 +++
> >   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 60 insertions(+)
> > 
> > diff --git a/include/qemu/job.h b/include/qemu/job.h
> > index efc6fa7544..cf3082b6d7 100644
> > --- a/include/qemu/job.h
> > +++ b/include/qemu/job.h
> > @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
> >    */
> >   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
> > +/**
> > + * If the job has been paused because of a drained section, and that
> > + * section has ended, wait until the job is resumed.
> > + *
> > + * Return 0 if the job is not paused, or if it has been successfully
> > + * resumed.
> > + * Return an error if the job has been paused in such a way that
> > + * waiting will not resume it, i.e. if it has been paused by the user,
> > + * or if it is still drained.
> > + *
> > + * Callers must be in the home AioContext and hold the AioContext lock
> > + * of job->aio_context.
> > + */
> > +int job_wait_unpaused(Job *job, Error **errp);
> > +
> >   #endif
> > diff --git a/blockdev.c b/blockdev.c
> > index a57590aae4..c0cc2fa364 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
> >           return;
> >       }
> > +    if (job_wait_unpaused(&job->job, errp) < 0) {
> > +        return;
> > +    }
> 
> After which point, we assume we've transitioned back to either RUNNING or
> READY, and
> 
> >       trace_qmp_block_job_complete(job);
> >       job_complete(&job->job, errp);
> 
> This function checks the usual state table for permission to deliver/perform
> the verb.
> 
> >       aio_context_release(aio_context);
> > diff --git a/job.c b/job.c
> > index 289edee143..1ea30fd294 100644
> > --- a/job.c
> > +++ b/job.c
> > @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
> >       job_unref(job);
> >       return ret;
> >   }
> > +
> > +int job_wait_unpaused(Job *job, Error **errp)
> > +{
> > +    /*
> > +     * Only run this function from the main context, because this is
> > +     * what we need, and this way we do not have to think about what
> > +     * happens if the user concurrently pauses the job from the main
> > +     * monitor.
> > +     */
> > +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > +
> > +    /*
> > +     * Quick path (e.g. so we do not get an error if pause_count > 0
> > +     * but the job is not even paused)
> > +     */
> > +    if (!job->paused) {
> > +        return 0;
> > +    }
> > +
> > +    /* If the user has paused the job, waiting will not help */
> > +    if (job->user_paused) {
> > +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
> > +        return -EBUSY;
> > +    }
> > +
> 
> Or the job has encountered an error if that error policy is set. It is maybe
> more accurate to say that the job is currently paused/halted (for some
> reason) and is awaiting the explicit unpause instruction.
> 
> "Job '%s' has been paused and needs to be explicitly resumed with
> job-resume", maybe?

Sounds good to me.

> Job '%s' has been paused and needs to be [explicitly] resumed
> [by the user] [with job-resume]
> 
> Some combo of those runes.
> 
> > +    /* Similarly, if the job is still drained, waiting will not help either */
> > +    if (job->pause_count > 0) {
> > +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
> > +        return -EBUSY;
> > +    }
> > +
> 
> This leaks an internal state detail out to the caller. In which
> circumstances does this happen? Do we expect it to?
> 
> As the user: Why is it blocked? Can I unblock it? Do I wait?

I don't even understand why we need to fail here (which is still an
unexpected case for management tools, so we would not fix the bug fully,
but just restrict it to a more unlikely condition).

We know that we're running in a QMP command handler here, so it's not
the caller that drained the job, but some iothread. The iothread can
still make progress, so why doesn't waiting help?

> > +    /*
> > +     * This function is specifically for waiting for a job to be
> > +     * resumed after a drained section.  Ending the drained section
> > +     * includes a job_enter(), which schedules the job loop to be run,
> > +     * and once it does, job->paused will be cleared.  Therefore, we
> > +     * do not need to invoke job_enter() here.
> > +     */
> > +    AIO_WAIT_WHILE(job->aio_context, job->paused);

Do we need a aio_wait_kick() in job_pause_point()?

> > +    return 0;
> > +}
> > 
> 
> Looks about right to me, but you'll want Kevin's look-see for the finer
> details, of course.
> 
> My concern is that this adds a wait of an indefinite period to the
> job_complete command. We mitigate this by checking for some other internal
> state criteria first, and then by process of elimination deduce that it's
> safe to wait, as it will (likely) be very quick.
> 
> Do we open the door for ourselves to get into trouble here, either by a
> state we are forgetting to rule out (You'd have added it if you know the
> answer to this) or a hypothetical future change where we forget to update
> this function?
> 
> Not necessarily a blocker, I think, and this does solve a real problem
> fairly inexpensively.

I think the only assumption this function really makes (in particular
after dropping the job->pause_count check) is that an automatically
paused job (not counting I/O errors which are treated as user paused)
will be resumed automatically eventually. Which is an entirely
reasonable assumption because otherwise the job would be stuck with no
way to resume it.

The part how long it will block is less clear, but if it isn't very
short, wouldn't that be a sign that a backend is hanging and we're in
trouble anyway?

Kevin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
@ 2021-04-09  9:51         ` Max Reitz
  2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 18+ messages in thread
From: Max Reitz @ 2021-04-09  9:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow, qemu-block
  Cc: Kevin Wolf, qemu-devel

On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
> 08.04.2021 20:04, John Snow wrote:
>> On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> job-complete command is async. Can we instead just add a boolean like 
>>> job->completion_requested, and set it if job-complete called in 
>>> STANDBY state, and on job_resume job_complete will be called 
>>> automatically if this boolean is true?
>>
>> job_complete has a synchronous setup, though -- we lose out on a lot 
>> of synchronous error checking in that circumstance.
> 
> yes, that's a problem..
> 
>>
>> I was not able to audit it to determine that it'd be safe to attempt 
>> that setup during a drained section -- I imagine it won't work and 
>> will fail, though.
>>
>> So I thought we'd have to signal completion and run the setup *later*, 
>> but what do we do if we get an error then? Does the entire job fail? 
>> Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it 
>> recoverable?
>>
> 
> Isn't it possible even now, that after successful job-complete job still 
> fails and we report BLOCK_JOB_COMPLETED with error?
> 
> And actually, how much benefit user get from the fact that job-complete 
> may fail?
> 
> We can make job-complete a simple always-success boolean flag setter 
> like job-pause.

I wanted to say the following:

   But job-pause does always succeed, in contrast to block-job-complete.

   block-job-complete is more akin to job-finalize, which too is a
   synchronous operation.

But when I wrote that last sentence, I asked myself whether what 
mirror_complete() does isn’t actually a remnant of what we had to do 
when we didn’t have job-finalize yet.  Shouldn’t that all be in 
mirror_exit_common()?  What’s the advantage of opening the backing chain 
or putting blockers on the to-replace node in block-job-complete? 
Aren’t that all graph-changing operation, basically, i.e. stuff that 
should be done in job-finalize?

If we move everything to mirror_exit_common(), all that remains to do is 
basically set some should_complete flag (could even be part of the Job 
struct), and then the whole problem disappears.

Thoughts?

Max

> And actual completion will be done in background, when possible. And if 
> it fail, job just fails, like it does for any background io error. And 
> user have to check error/success status of final BLOCK_JOB_COMPLETED 
> anyway.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-09  9:44     ` Kevin Wolf
@ 2021-04-09  9:57       ` Max Reitz
  2021-04-09 16:54         ` John Snow
  0 siblings, 1 reply; 18+ messages in thread
From: Max Reitz @ 2021-04-09  9:57 UTC (permalink / raw)
  To: Kevin Wolf, John Snow
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On 09.04.21 11:44, Kevin Wolf wrote:
> Am 08.04.2021 um 18:55 hat John Snow geschrieben:
>> On 4/8/21 12:20 PM, Max Reitz wrote:
>>> block-job-complete can only be applied when the job is READY, not when
>>> it is on STANDBY (ready, but paused).  Draining a job technically pauses
>>> it (which makes a READY job enter STANDBY), and ending the drained
>>> section does not synchronously resume it, but only schedules the job,
>>> which will then be resumed.  So attempting to complete a job immediately
>>> after a drained section may sometimes fail.
>>>
>>> That is bad at least because users cannot really work nicely around
>>> this: A job may be paused and resumed at any time, so waiting for the
>>> job to be in the READY state and then issuing a block-job-complete poses
>>> a TOCTTOU problem.  The only way around it would be to issue
>>> block-job-complete until it no longer fails due to the job being in the
>>> STANDBY state, but that would not be nice.
>>>
>>> We can solve the problem by allowing block-job-complete to be invoked on
>>> jobs that are on STANDBY, if that status is the result of a drained
>>> section (not because the user has paused the job), and that section has
>>> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
>>>
>>> Perhaps we could actually just directly allow this, seeing that mirror
>>> is the only user of ready/complete, and that mirror_complete() could
>>> probably work under the given circumstances, but there may be many side
>>> effects to consider.
>>>
>>> It is simpler to add a function job_wait_unpaused() that waits for the
>>> job to be resumed (under said circumstances), and to make
>>> qmp_block_job_complete() use it to delay job_complete() until then.
>>>
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    include/qemu/job.h | 15 +++++++++++++++
>>>    blockdev.c         |  3 +++
>>>    job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>    3 files changed, 60 insertions(+)
>>>
>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>> index efc6fa7544..cf3082b6d7 100644
>>> --- a/include/qemu/job.h
>>> +++ b/include/qemu/job.h
>>> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>>>     */
>>>    int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>>> +/**
>>> + * If the job has been paused because of a drained section, and that
>>> + * section has ended, wait until the job is resumed.
>>> + *
>>> + * Return 0 if the job is not paused, or if it has been successfully
>>> + * resumed.
>>> + * Return an error if the job has been paused in such a way that
>>> + * waiting will not resume it, i.e. if it has been paused by the user,
>>> + * or if it is still drained.
>>> + *
>>> + * Callers must be in the home AioContext and hold the AioContext lock
>>> + * of job->aio_context.
>>> + */
>>> +int job_wait_unpaused(Job *job, Error **errp);
>>> +
>>>    #endif
>>> diff --git a/blockdev.c b/blockdev.c
>>> index a57590aae4..c0cc2fa364 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
>>>            return;
>>>        }
>>> +    if (job_wait_unpaused(&job->job, errp) < 0) {
>>> +        return;
>>> +    }
>>
>> After which point, we assume we've transitioned back to either RUNNING or
>> READY, and
>>
>>>        trace_qmp_block_job_complete(job);
>>>        job_complete(&job->job, errp);
>>
>> This function checks the usual state table for permission to deliver/perform
>> the verb.
>>
>>>        aio_context_release(aio_context);
>>> diff --git a/job.c b/job.c
>>> index 289edee143..1ea30fd294 100644
>>> --- a/job.c
>>> +++ b/job.c
>>> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>>>        job_unref(job);
>>>        return ret;
>>>    }
>>> +
>>> +int job_wait_unpaused(Job *job, Error **errp)
>>> +{
>>> +    /*
>>> +     * Only run this function from the main context, because this is
>>> +     * what we need, and this way we do not have to think about what
>>> +     * happens if the user concurrently pauses the job from the main
>>> +     * monitor.
>>> +     */
>>> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>> +
>>> +    /*
>>> +     * Quick path (e.g. so we do not get an error if pause_count > 0
>>> +     * but the job is not even paused)
>>> +     */
>>> +    if (!job->paused) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /* If the user has paused the job, waiting will not help */
>>> +    if (job->user_paused) {
>>> +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
>>> +        return -EBUSY;
>>> +    }
>>> +
>>
>> Or the job has encountered an error if that error policy is set. It is maybe
>> more accurate to say that the job is currently paused/halted (for some
>> reason) and is awaiting the explicit unpause instruction.
>>
>> "Job '%s' has been paused and needs to be explicitly resumed with
>> job-resume", maybe?
> 
> Sounds good to me.
> 
>> Job '%s' has been paused and needs to be [explicitly] resumed
>> [by the user] [with job-resume]
>>
>> Some combo of those runes.
>>
>>> +    /* Similarly, if the job is still drained, waiting will not help either */
>>> +    if (job->pause_count > 0) {
>>> +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
>>> +        return -EBUSY;
>>> +    }
>>> +
>>
>> This leaks an internal state detail out to the caller. In which
>> circumstances does this happen? Do we expect it to?
>>
>> As the user: Why is it blocked? Can I unblock it? Do I wait?
> 
> I don't even understand why we need to fail here (which is still an
> unexpected case for management tools, so we would not fix the bug fully,
> but just restrict it to a more unlikely condition).
> 
> We know that we're running in a QMP command handler here, so it's not
> the caller that drained the job, but some iothread. The iothread can
> still make progress, so why doesn't waiting help?

Yes, probably the best would be to drop this block.

>>> +    /*
>>> +     * This function is specifically for waiting for a job to be
>>> +     * resumed after a drained section.  Ending the drained section
>>> +     * includes a job_enter(), which schedules the job loop to be run,
>>> +     * and once it does, job->paused will be cleared.  Therefore, we
>>> +     * do not need to invoke job_enter() here.
>>> +     */
>>> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
> 
> Do we need a aio_wait_kick() in job_pause_point()?

Oh, yes.

>>> +    return 0;
>>> +}
>>>
>>
>> Looks about right to me, but you'll want Kevin's look-see for the finer
>> details, of course.
>>
>> My concern is that this adds a wait of an indefinite period to the
>> job_complete command. We mitigate this by checking for some other internal
>> state criteria first, and then by process of elimination deduce that it's
>> safe to wait, as it will (likely) be very quick.
>>
>> Do we open the door for ourselves to get into trouble here, either by a
>> state we are forgetting to rule out (You'd have added it if you know the
>> answer to this) or a hypothetical future change where we forget to update
>> this function?
>>
>> Not necessarily a blocker, I think, and this does solve a real problem
>> fairly inexpensively.
> 
> I think the only assumption this function really makes (in particular
> after dropping the job->pause_count check) is that an automatically
> paused job (not counting I/O errors which are treated as user paused)
> will be resumed automatically eventually. Which is an entirely
> reasonable assumption because otherwise the job would be stuck with no
> way to resume it.
> 
> The part how long it will block is less clear, but if it isn't very
> short, wouldn't that be a sign that a backend is hanging and we're in
> trouble anyway?

Just as a PS, in a reply to one of Vladimir’s mails 
(da048f58-43a6-6811-6ad2-0d7899737a23@redhat.com) I was wondering 
whether it even makes sense for mirror to do all the stuff it does in 
mirror_complete() to do it there.  Aren’t all of those things that 
should really be done in job-finalize (i.e. mirror_exit_common())?

Max



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-09  9:51         ` Max Reitz
@ 2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
  2021-04-09 10:18             ` Max Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-09 10:07 UTC (permalink / raw)
  To: Max Reitz, John Snow, qemu-block; +Cc: qemu-devel, Kevin Wolf

09.04.2021 12:51, Max Reitz wrote:
> On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
>> 08.04.2021 20:04, John Snow wrote:
>>> On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true?
>>>
>>> job_complete has a synchronous setup, though -- we lose out on a lot of synchronous error checking in that circumstance.
>>
>> yes, that's a problem..
>>
>>>
>>> I was not able to audit it to determine that it'd be safe to attempt that setup during a drained section -- I imagine it won't work and will fail, though.
>>>
>>> So I thought we'd have to signal completion and run the setup *later*, but what do we do if we get an error then? Does the entire job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?
>>>
>>
>> Isn't it possible even now, that after successful job-complete job still fails and we report BLOCK_JOB_COMPLETED with error?
>>
>> And actually, how much benefit user get from the fact that job-complete may fail?
>>
>> We can make job-complete a simple always-success boolean flag setter like job-pause.
> 
> I wanted to say the following:
> 
>   But job-pause does always succeed, in contrast to block-job-complete.
> 
>   block-job-complete is more akin to job-finalize, which too is a
>   synchronous operation.
> 
> But when I wrote that last sentence, I asked myself whether what mirror_complete() does isn’t actually a remnant of what we had to do when we didn’t have job-finalize yet.  Shouldn’t that all be in mirror_exit_common()?  What’s the advantage of opening the backing chain or putting blockers on the to-replace node in block-job-complete? Aren’t that all graph-changing operation, basically, i.e. stuff that should be done in job-finalize?
> 
> If we move everything to mirror_exit_common(), all that remains to do is basically set some should_complete flag (could even be part of the Job struct), and then the whole problem disappears.
> 
> Thoughts?
> 

Sounds good.. ButI want to understand first one simple thing: can job fail even after block-job-complete succeeded?

As I understand current users think that it can't. And block-job-complete is documented as "This command completes an active background block operation synchronously". So it's assumed that if block-job-complete succeeded we are totally done.

But maybe, it's wrong? Can mirror_prepare fail after mirror_complete success? And user must check job status after job is finalized? Or check error in BLOCK_JOB_COMPLETED event?



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-09  9:31     ` Max Reitz
@ 2021-04-09 10:17       ` Kevin Wolf
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin Wolf @ 2021-04-09 10:17 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 09.04.2021 um 11:31 hat Max Reitz geschrieben:
> On 08.04.21 18:55, John Snow wrote:
> > On 4/8/21 12:20 PM, Max Reitz wrote:
> > > +    /* Similarly, if the job is still drained, waiting will not
> > > help either */
> > > +    if (job->pause_count > 0) {
> > > +        error_setg(errp, "Job '%s' is blocked and cannot be
> > > unpaused", job->id);
> > > +        return -EBUSY;
> > > +    }
> > > +
> > 
> > This leaks an internal state detail out to the caller. In which
> > circumstances does this happen?
> 
> Hm.  Now that you ask it.
> 
> The circumstance would be a concurrent drain in some other IO thread.
> Probably the IO thread the job runs in?  I don’t know any other thread that
> could concurrently drain, because this function runs in the main thread, and
> there shouldn’t be any drain in the background.
> 
> If it is another IO thread, waiting would indeed help, so there would not be
> a need to error out.
> 
> Perhaps it’s possible to have a background drain in the main thread?  I
> don’t think so, though...

Hm... Maybe like this:

1. bdrv_do_drained_begin() in the main thread
2. BDRV_POLL_WHILE() reenters the QMP dispatcher coroutine
3. qmp_job_complete()
4. Deadlock because we poll here until the job is undrained

This is why nested event loops are evil...

I guess a way to fix this would be making qmp_job_complete() a coroutine
handler and yielding instead of actively polling. Then job_pause_point()
would have to wake the waiting coroutine rather than calling
aio_wait_kick().

On the other hand, this fix would be a lot more complex for -rc3 than
what you posted here.

So maybe take this one for -rc3 and do the coroutine thing for 6.1?

> > Looks about right to me, but you'll want Kevin's look-see for the finer
> > details, of course.
> > 
> > My concern is that this adds a wait of an indefinite period to the
> > job_complete command. We mitigate this by checking for some other
> > internal state criteria first, and then by process of elimination deduce
> > that it's safe to wait, as it will (likely) be very quick.
> > 
> > Do we open the door for ourselves to get into trouble here, either by a
> > state we are forgetting to rule out (You'd have added it if you know the
> > answer to this) or a hypothetical future change where we forget to
> > update this function?
> 
> Well.  Who knows.
> 
> The alternatives I see are:
> 
> (A) Let drained_end wait for the block jobs to be resumed.  There are some
> details to consider there, I had some discussion around this with Kevin on
> Tuesday.  For example, should every drained_end wait for all jobs involved
> to be resumed?  (That would mean waiting for concurrent drained_ends, too.)
> Would the drained_end_counter be the right tool for the job?  (None of this
> is unsolvable, I guess, but it would mean having another discussion.)

The advantage there would be that you don't have to deal with new nested
event loops.

I think the problem you brought up was that we can't wait for the job
resuming if it is at the same time paused externally. But we could just
check that and not increase the drained_end_counter in this case.

This would probably result in a fix that is simple enough for -rc3.

> It would also mean that you basically just move the wait to wherever the
> drained_end occurs, for example to qmp_transaction().  Now, every
> drained_end is preceded by a drained_begin that always has to wait, so it
> probably isn’t bad.  OTOH, if qmp_transaction() would be allowed to wait for
> a job to be resumed, I think we can allow the same for
> qmp_block_job_complete().
> (And there’s the fact that most of the time not having the block job running
> after drained_end poses no problem.  This is the first time I’m aware of a
> problem, so I think it would be preferable to wait only on the rare occasion
> where we have to.)

Yeah, I can see your point, but on the other hand, waiting in
drained_end until we can be sure that the effect of drained_begin has
been undone is a single place that covers all occasions.

I'm not overly confident that we would catch all occasions where we have
to if we try attacking them one by one. As you noticed, this is not some
simple case where things just fail all the time when you get it wrong,
but results in race conditions that are hard to reproduce.

> (B) Have block-job-complete be kind of asynchronous.  We talked about that
> on IRC yesterday, and the main problem seems to be that we don’t know what
> we’d do with errors.  We could only emit them via an event, or let the whole
> job fail, both of which seem like bad solutions.

I guess my suggestion to make it a coroutine QMP handlers is similar,
except that it blocks QMP, so you could still deliver an error.

On the other hand, this wouldn't be able to address the concern that we
might be waiting for too long (but I'm not convinced that this is a
problem anyway).

> (C) Perhaps mirror’s completion function works just fine when the job is
> paused (we just would have to skip the job_enter()).  I don’t know. Someone™
> would need to find out.

Who knows. Apart from Someone™, of course.

Kevin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
@ 2021-04-09 10:18             ` Max Reitz
  0 siblings, 0 replies; 18+ messages in thread
From: Max Reitz @ 2021-04-09 10:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow, qemu-block
  Cc: Kevin Wolf, qemu-devel

On 09.04.21 12:07, Vladimir Sementsov-Ogievskiy wrote:
> 09.04.2021 12:51, Max Reitz wrote:
>> On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
>>> 08.04.2021 20:04, John Snow wrote:
>>>> On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> job-complete command is async. Can we instead just add a boolean 
>>>>> like job->completion_requested, and set it if job-complete called 
>>>>> in STANDBY state, and on job_resume job_complete will be called 
>>>>> automatically if this boolean is true?
>>>>
>>>> job_complete has a synchronous setup, though -- we lose out on a lot 
>>>> of synchronous error checking in that circumstance.
>>>
>>> yes, that's a problem..
>>>
>>>>
>>>> I was not able to audit it to determine that it'd be safe to attempt 
>>>> that setup during a drained section -- I imagine it won't work and 
>>>> will fail, though.
>>>>
>>>> So I thought we'd have to signal completion and run the setup 
>>>> *later*, but what do we do if we get an error then? Does the entire 
>>>> job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" 
>>>> ?) Is it recoverable?
>>>>
>>>
>>> Isn't it possible even now, that after successful job-complete job 
>>> still fails and we report BLOCK_JOB_COMPLETED with error?
>>>
>>> And actually, how much benefit user get from the fact that 
>>> job-complete may fail?
>>>
>>> We can make job-complete a simple always-success boolean flag setter 
>>> like job-pause.
>>
>> I wanted to say the following:
>>
>>   But job-pause does always succeed, in contrast to block-job-complete.
>>
>>   block-job-complete is more akin to job-finalize, which too is a
>>   synchronous operation.
>>
>> But when I wrote that last sentence, I asked myself whether what 
>> mirror_complete() does isn’t actually a remnant of what we had to do 
>> when we didn’t have job-finalize yet.  Shouldn’t that all be in 
>> mirror_exit_common()?  What’s the advantage of opening the backing 
>> chain or putting blockers on the to-replace node in 
>> block-job-complete? Aren’t that all graph-changing operation, 
>> basically, i.e. stuff that should be done in job-finalize?
>>
>> If we move everything to mirror_exit_common(), all that remains to do 
>> is basically set some should_complete flag (could even be part of the 
>> Job struct), and then the whole problem disappears.
>>
>> Thoughts?
>>
> 
> Sounds good.. ButI want to understand first one simple thing: can job 
> fail even after block-job-complete succeeded?

Sure, if you get an I/O error afterwards.

> As I understand current users think that it can't. And 
> block-job-complete is documented as "This command completes an active 
> background block operation synchronously". So it's assumed that if 
> block-job-complete succeeded we are totally done.

I think the only thing that block-job-complete does is signal to the job 
it should exit once source and target have converged again.  (The READY 
event just says that source and target have converged once already.)

(Only in write-blocking copy mode is there a guarantee of source and 
target remaining converged after READY.)

Well, and of course mirror_complete() also does a couple of stuff that 
prepares replacing the source by the target.

> But maybe, it's wrong? Can mirror_prepare fail after mirror_complete 
> success?

Oh definitely.  For example, mirror_prepare replaces the source by the 
target, which can definitely fail.  (See mirror_exit_common().)

> And user must check job status after job is finalized? Or check 
> error in BLOCK_JOB_COMPLETED event?

If the BLOCK_JOB_COMPLETED event shows an error, then the job doesn’t 
even try to complete.  If there is an error on job-finalize, source and 
target have converged (so the target is consistent), but the source most 
likely couldn’t be replaced by the target.

I suppose in practice if anything goes wrong libvirt just shows an error 
and that’s it.  No matter where the error occurs exactly.

Max



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
  2021-04-09  9:57       ` Max Reitz
@ 2021-04-09 16:54         ` John Snow
  0 siblings, 0 replies; 18+ messages in thread
From: John Snow @ 2021-04-09 16:54 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On 4/9/21 5:57 AM, Max Reitz wrote:
> 
> Just as a PS, in a reply to one of Vladimir’s mails 
> (da048f58-43a6-6811-6ad2-0d7899737a23@redhat.com) I was wondering 
> whether it even makes sense for mirror to do all the stuff it does in 
> mirror_complete() to do it there.  Aren’t all of those things that 
> should really be done in job-finalize (i.e. mirror_exit_common())?
> 
> Max

Yes, I think so -- admittedly, when I added that finalize logic, I was 
just very confused about what was safe to move where in the mirror code 
and never got my patches off the ground to do a more vigorous refactoring.

We've got, I think, three different user-initiated "This job should 
finish now" mechanisms:

- Cancelling the mirror job after it reaches READY
- Issuing "complete" to the mirror job after it reaches READY
- Issuing "finalize" to a job

Maybe these could all be integrated into a single mechanism somehow. I 
think I just lack the knowledge of the draining/threading/aio models to 
do it safely myself, and we'd need some compatibility shims for a while, 
etc.

Would have to look at this stuff again to know for certain what we'd be 
able to change compatibly.

--js



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-09 16:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
2021-04-08 16:55   ` John Snow
2021-04-09  9:31     ` Max Reitz
2021-04-09 10:17       ` Kevin Wolf
2021-04-09  9:44     ` Kevin Wolf
2021-04-09  9:57       ` Max Reitz
2021-04-09 16:54         ` John Snow
2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
2021-04-08 17:04     ` John Snow
2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
2021-04-09  9:51         ` Max Reitz
2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
2021-04-09 10:18             ` Max Reitz
2021-04-09  9:38     ` Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).