qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel
@ 2021-07-26 14:46 Max Reitz
  2021-07-26 14:46 ` [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error Max Reitz
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Hi,

v1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2021-07/msg00705.html

Changes in v2:
- Added patch 2 (as suggested by Vladimir)
- Patch 4 (ex. 3): Rebase conflicts because of patch 2
- Patch 5 (ex. 4):
  - Rebase conflicts because of patch 2
  - Do not use job_cancel_requested() to determine how a soft-cancelled
    job should be completed: A soft-cancelled job should end with
    COMPLETED, not CANCELLED, and so job_is_cancelled() is the
    appropriate condition there.


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/7:[----] [--] 'mirror: Keep s->synced on error'
002/7:[down] 'mirror: Drop s->synced'
003/7:[----] [--] 'job: @force parameter for job_cancel_sync{,_all}()'
004/7:[0006] [FC] 'jobs: Give Job.force_cancel more meaning'
005/7:[0011] [FC] 'job: Add job_cancel_requested()'
006/7:[----] [-C] 'mirror: Check job_is_cancelled() earlier'
007/7:[----] [--] 'iotests: Add mirror-ready-cancel-error test'


Max Reitz (7):
  mirror: Keep s->synced on error
  mirror: Drop s->synced
  job: @force parameter for job_cancel_sync{,_all}()
  jobs: Give Job.force_cancel more meaning
  job: Add job_cancel_requested()
  mirror: Check job_is_cancelled() earlier
  iotests: Add mirror-ready-cancel-error test

 include/qemu/job.h                            |  29 +++-
 block/backup.c                                |   3 +-
 block/mirror.c                                |  47 +++---
 block/replication.c                           |   4 +-
 blockdev.c                                    |   4 +-
 job.c                                         |  40 ++++-
 qemu-nbd.c                                    |   2 +-
 softmmu/runstate.c                            |   2 +-
 storage-daemon/qemu-storage-daemon.c          |   2 +-
 tests/unit/test-block-iothread.c              |   2 +-
 tests/unit/test-blockjob.c                    |   2 +-
 tests/qemu-iotests/109.out                    |  60 +++-----
 .../tests/mirror-ready-cancel-error           | 143 ++++++++++++++++++
 .../tests/mirror-ready-cancel-error.out       |   5 +
 tests/qemu-iotests/tests/qsd-jobs.out         |   2 +-
 15 files changed, 264 insertions(+), 83 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-ready-cancel-error
 create mode 100644 tests/qemu-iotests/tests/mirror-ready-cancel-error.out

-- 
2.31.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

An error does not take us out of the READY phase, which is what
s->synced signifies.  It does of course mean that source and target are
no longer in sync, but that is what s->actively_sync is for -- s->synced
never meant that source and target are in sync, only that they were at
some point (and at that point we transitioned into the READY phase).

The tangible problem is that we transition to READY once we are in sync
and s->synced is false.  By resetting s->synced here, we will transition
from READY to READY once the error is resolved (if the job keeps
running), and that transition is not allowed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 98fc66eabf..d73b704473 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -121,7 +121,6 @@ typedef enum MirrorMethod {
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
                                             int error)
 {
-    s->synced = false;
     s->actively_synced = false;
     if (read) {
         return block_job_error_action(&s->common, s->on_source_error,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 2/7] mirror: Drop s->synced
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
  2021-07-26 14:46 ` [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-26 15:26   ` Eric Blake
  2021-07-27 11:41   ` Vladimir Sementsov-Ogievskiy
  2021-07-26 14:46 ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

As of HEAD^, there is no meaning to s->synced other than whether the job
is READY or not.  job_is_ready() gives us that information, too.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d73b704473..fcb7b65f93 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -56,7 +56,6 @@ typedef struct MirrorBlockJob {
     bool zero_target;
     MirrorCopyMode copy_mode;
     BlockdevOnError on_source_error, on_target_error;
-    bool synced;
     /* Set when the target is synced (dirty bitmap is clean, nothing
      * in flight) and the job is running in active mode */
     bool actively_synced;
@@ -936,7 +935,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
     if (s->bdev_length == 0) {
         /* Transition to the READY state and wait for complete. */
         job_transition_to_ready(&s->common.job);
-        s->synced = true;
         s->actively_synced = true;
         while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
             job_yield(&s->common.job);
@@ -1028,7 +1026,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         should_complete = false;
         if (s->in_flight == 0 && cnt == 0) {
             trace_mirror_before_flush(s);
-            if (!s->synced) {
+            if (!job_is_ready(&s->common.job)) {
                 if (mirror_flush(s) < 0) {
                     /* Go check s->ret.  */
                     continue;
@@ -1039,7 +1037,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
                  * the target in a consistent state.
                  */
                 job_transition_to_ready(&s->common.job);
-                s->synced = true;
                 if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
                     s->actively_synced = true;
                 }
@@ -1083,14 +1080,15 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 
         ret = 0;
 
-        if (s->synced && !should_complete) {
+        if (job_is_ready(&s->common.job) && !should_complete) {
             delay_ns = (s->in_flight == 0 &&
                         cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
         }
-        trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
+        trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
+                                  delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
         if (job_is_cancelled(&s->common.job) &&
-            (!s->synced || s->common.job.force_cancel))
+            (!job_is_ready(&s->common.job) || s->common.job.force_cancel))
         {
             break;
         }
@@ -1103,8 +1101,9 @@ immediate_exit:
          * or it was cancelled prematurely so that we do not guarantee that
          * the target is a copy of the source.
          */
-        assert(ret < 0 || ((s->common.job.force_cancel || !s->synced) &&
-               job_is_cancelled(&s->common.job)));
+        assert(ret < 0 ||
+               ((s->common.job.force_cancel || !job_is_ready(&s->common.job)) &&
+                job_is_cancelled(&s->common.job)));
         assert(need_drain);
         mirror_wait_for_all_io(s);
     }
@@ -1127,7 +1126,7 @@ static void mirror_complete(Job *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 
-    if (!s->synced) {
+    if (!job_is_ready(job)) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}()
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
  2021-07-26 14:46 ` [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error Max Reitz
  2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-08-03 14:00   ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{,_all}() Kevin Wolf
  2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/qemu/job.h                    | 10 ++---
 block/replication.c                   |  4 +-
 blockdev.c                            |  4 +-
 job.c                                 | 27 +++++++++---
 qemu-nbd.c                            |  2 +-
 softmmu/runstate.c                    |  2 +-
 storage-daemon/qemu-storage-daemon.c  |  2 +-
 tests/unit/test-block-iothread.c      |  2 +-
 tests/unit/test-blockjob.c            |  2 +-
 tests/qemu-iotests/109.out            | 60 +++++++++++----------------
 tests/qemu-iotests/tests/qsd-jobs.out |  2 +-
 11 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 41162ed494..5e8edbc2c8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -506,19 +506,19 @@ void job_user_cancel(Job *job, bool force, Error **errp);
 
 /**
  * Synchronously cancel the @job.  The completion callback is called
- * before the function returns.  The job may actually complete
- * instead of canceling itself; the circumstances under which this
- * happens depend on the kind of job that is active.
+ * before the function returns.  If @force is false, the job may
+ * actually complete instead of canceling itself; the circumstances
+ * under which this happens depend on the kind of job that is active.
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
  *
  * Callers must hold the AioContext lock of job->aio_context.
  */
-int job_cancel_sync(Job *job);
+int job_cancel_sync(Job *job, bool force);
 
 /** Synchronously cancels all jobs using job_cancel_sync(). */
-void job_cancel_sync_all(void);
+void job_cancel_sync_all(bool force);
 
 /**
  * @job: The job to be completed.
diff --git a/block/replication.c b/block/replication.c
index 32444b9a8f..e7a9327b12 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,7 @@ static void replication_close(BlockDriverState *bs)
     if (s->stage == BLOCK_REPLICATION_FAILOVER) {
         commit_job = &s->commit_job->job;
         assert(commit_job->aio_context == qemu_get_current_aio_context());
-        job_cancel_sync(commit_job);
+        job_cancel_sync(commit_job, false);
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +726,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            job_cancel_sync(&s->backup_job->job);
+            job_cancel_sync(&s->backup_job->job, false);
         }
 
         if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 3d8ac368a1..aa95918c02 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1848,7 +1848,7 @@ static void drive_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job);
+        job_cancel_sync(&state->job->job, true);
 
         aio_context_release(aio_context);
     }
@@ -1949,7 +1949,7 @@ static void blockdev_backup_abort(BlkActionState *common)
         aio_context = bdrv_get_aio_context(state->bs);
         aio_context_acquire(aio_context);
 
-        job_cancel_sync(&state->job->job);
+        job_cancel_sync(&state->job->job, true);
 
         aio_context_release(aio_context);
     }
diff --git a/job.c b/job.c
index e7a5d28854..9e971d64cf 100644
--- a/job.c
+++ b/job.c
@@ -763,7 +763,12 @@ static void job_completed_txn_abort(Job *job)
         if (other_job != job) {
             ctx = other_job->aio_context;
             aio_context_acquire(ctx);
-            job_cancel_async(other_job, false);
+            /*
+             * This is a transaction: If one job failed, no result will matter.
+             * Therefore, pass force=true to terminate all other jobs as quickly
+             * as possible.
+             */
+            job_cancel_async(other_job, true);
             aio_context_release(ctx);
         }
     }
@@ -964,12 +969,24 @@ static void job_cancel_err(Job *job, Error **errp)
     job_cancel(job, false);
 }
 
-int job_cancel_sync(Job *job)
+/**
+ * Same as job_cancel_err(), but force-cancel.
+ */
+static void job_force_cancel_err(Job *job, Error **errp)
 {
-    return job_finish_sync(job, &job_cancel_err, NULL);
+    job_cancel(job, true);
+}
+
+int job_cancel_sync(Job *job, bool force)
+{
+    if (force) {
+        return job_finish_sync(job, &job_force_cancel_err, NULL);
+    } else {
+        return job_finish_sync(job, &job_cancel_err, NULL);
+    }
 }
 
-void job_cancel_sync_all(void)
+void job_cancel_sync_all(bool force)
 {
     Job *job;
     AioContext *aio_context;
@@ -977,7 +994,7 @@ void job_cancel_sync_all(void)
     while ((job = job_next(NULL))) {
         aio_context = job->aio_context;
         aio_context_acquire(aio_context);
-        job_cancel_sync(job);
+        job_cancel_sync(job, force);
         aio_context_release(aio_context);
     }
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 26ffbf15af..7fadfcfd23 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -479,7 +479,7 @@ static const char *socket_activation_validate_opts(const char *device,
 
 static void qemu_nbd_shutdown(void)
 {
-    job_cancel_sync_all();
+    job_cancel_sync_all(true);
     blk_exp_close_all();
     bdrv_close_all();
 }
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 10d9b7365a..cf239e3b4c 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -812,7 +812,7 @@ void qemu_cleanup(void)
     vm_shutdown();
     replay_finish();
 
-    job_cancel_sync_all();
+    job_cancel_sync_all(true);
     bdrv_close_all();
 
     /* vhost-user must be cleaned up before chardevs.  */
diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-storage-daemon.c
index fc8b150629..6c7142574c 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -347,7 +347,7 @@ int main(int argc, char *argv[])
 
     blk_exp_close_all();
     bdrv_drain_all_begin();
-    job_cancel_sync_all();
+    job_cancel_sync_all(true);
     bdrv_close_all();
 
     monitor_cleanup();
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index c39e70b2f5..09807fd2ca 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -662,7 +662,7 @@ static void test_propagate_mirror(void)
     g_assert(bdrv_get_aio_context(target) == ctx);
     g_assert(bdrv_get_aio_context(filter) == ctx);
 
-    job_cancel_sync_all();
+    job_cancel_sync_all(true);
 
     aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index dcacfa6c7c..4c9e1bf1e5 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -230,7 +230,7 @@ static void cancel_common(CancelJob *s)
     ctx = job->job.aio_context;
     aio_context_acquire(ctx);
 
-    job_cancel_sync(&job->job);
+    job_cancel_sync(&job->job, true);
     if (sts != JOB_STATUS_CREATED && sts != JOB_STATUS_CONCLUDED) {
         Job *dummy = &job->job;
         job_dismiss(&dummy, &error_abort);
diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 8f839b4b7f..e29280015e 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -44,9 +44,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -95,9 +94,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -146,9 +144,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -197,9 +194,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -248,9 +244,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 65536, "offset": 65536, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 65536, "offset": 65536, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -299,9 +294,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -349,9 +343,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -399,9 +392,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 31457280, "offset": 31457280, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 31457280, "offset": 31457280, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -449,9 +441,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -499,9 +490,8 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 2048, "offset": 2048, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 2048, "offset": 2048, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -529,9 +519,8 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
@@ -552,9 +541,8 @@ Images are identical.
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "src"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "src"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "src"}}
 Images are identical.
diff --git a/tests/qemu-iotests/tests/qsd-jobs.out b/tests/qemu-iotests/tests/qsd-jobs.out
index 189423354b..c1bc9b8356 100644
--- a/tests/qemu-iotests/tests/qsd-jobs.out
+++ b/tests/qemu-iotests/tests/qsd-jobs.out
@@ -8,7 +8,7 @@ QMP_VERSION
 {"return": {}}
 {"return": {}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
 
 === Streaming can't get permission on base node ===
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
                   ` (2 preceding siblings ...)
  2021-07-26 14:46 ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-26 21:38   ` Eric Blake
  2021-07-27 12:40   ` Vladimir Sementsov-Ogievskiy
  2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

We largely have two cancel modes for jobs:

First, there is actual cancelling.  The job is terminated as soon as
possible, without trying to reach a consistent result.

Second, we have mirror in the READY state.  Technically, the job is not
really cancelled, but it just is a different completion mode.  The job
can still run for an indefinite amount of time while it tries to reach a
consistent result.

We want to be able to clearly distinguish which cancel mode a job is in
(when it has been cancelled).  We can use Job.force_cancel for this, but
right now it only reflects cancel requests from the user with
force=true, but clearly, jobs that do not even distinguish between
force=false and force=true are effectively always force-cancelled.

So this patch has Job.force_cancel signify whether the job will
terminate as soon as possible (force_cancel=true) or whether it will
effectively remain running despite being "cancelled"
(force_cancel=false).

To this end, we let jobs that provide JobDriver.cancel() tell the
generic job code whether they will terminate as soon as possible or not,
and for jobs that do not provide that method we assume they will.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h | 11 ++++++++++-
 block/backup.c     |  3 ++-
 block/mirror.c     | 24 ++++++++++++++++++------
 job.c              |  6 +++++-
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5e8edbc2c8..8aa90f7395 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -253,8 +253,17 @@ struct JobDriver {
 
     /**
      * If the callback is not NULL, it will be invoked in job_cancel_async
+     *
+     * This function must return true if the job will be cancelled
+     * immediately without any further I/O (mandatory if @force is
+     * true), and false otherwise.  This lets the generic job layer
+     * know whether a job has been truly (force-)cancelled, or whether
+     * it is just in a special completion mode (like mirror after
+     * READY).
+     * (If the callback is NULL, the job is assumed to terminate
+     * without I/O.)
      */
-    void (*cancel)(Job *job, bool force);
+    bool (*cancel)(Job *job, bool force);
 
 
     /** Called when the job is freed */
diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..513e1c8a0b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -331,11 +331,12 @@ static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
     }
 }
 
-static void backup_cancel(Job *job, bool force)
+static bool backup_cancel(Job *job, bool force)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 
     bdrv_cancel_in_flight(s->target_bs);
+    return true;
 }
 
 static const BlockJobDriver backup_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index fcb7b65f93..e93631a9f6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1087,9 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                   delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job) &&
-            (!job_is_ready(&s->common.job) || s->common.job.force_cancel))
-        {
+        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
             break;
         }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1102,7 +1100,7 @@ immediate_exit:
          * the target is a copy of the source.
          */
         assert(ret < 0 ||
-               ((s->common.job.force_cancel || !job_is_ready(&s->common.job)) &&
+               (s->common.job.force_cancel &&
                 job_is_cancelled(&s->common.job)));
         assert(need_drain);
         mirror_wait_for_all_io(s);
@@ -1188,14 +1186,27 @@ static bool mirror_drained_poll(BlockJob *job)
     return !!s->in_flight;
 }
 
-static void mirror_cancel(Job *job, bool force)
+static bool mirror_cancel(Job *job, bool force)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockDriverState *target = blk_bs(s->target);
 
-    if (force || !job_is_ready(job)) {
+    /*
+     * Before the job is READY, we treat any cancellation like a
+     * force-cancellation.
+     */
+    force = force || !job_is_ready(job);
+
+    if (force) {
         bdrv_cancel_in_flight(target);
     }
+    return force;
+}
+
+static bool commit_active_cancel(Job *job, bool force)
+{
+    /* Same as above in mirror_cancel() */
+    return force || !job_is_ready(job);
 }
 
 static const BlockJobDriver mirror_job_driver = {
@@ -1225,6 +1236,7 @@ static const BlockJobDriver commit_active_job_driver = {
         .abort                  = mirror_abort,
         .pause                  = mirror_pause,
         .complete               = mirror_complete,
+        .cancel                 = commit_active_cancel,
     },
     .drained_poll           = mirror_drained_poll,
 };
diff --git a/job.c b/job.c
index 9e971d64cf..e78d893a9c 100644
--- a/job.c
+++ b/job.c
@@ -719,8 +719,12 @@ static int job_finalize_single(Job *job)
 static void job_cancel_async(Job *job, bool force)
 {
     if (job->driver->cancel) {
-        job->driver->cancel(job, force);
+        force = job->driver->cancel(job, force);
+    } else {
+        /* No .cancel() means the job will behave as if force-cancelled */
+        force = true;
     }
+
     if (job->user_paused) {
         /* Do not call job_enter here, the caller will handle it.  */
         if (job->driver->user_resume) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
                   ` (3 preceding siblings ...)
  2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
  2021-08-03 14:25   ` Kevin Wolf
  2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any
jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h |  8 +++++++-
 block/mirror.c     | 10 ++++------
 job.c              |  7 ++++++-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_internal(Job *job);
 
-/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
 bool job_is_cancelled(Job *job);
 
+/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
 /** Returns whether the job is in a completed state. */
 bool job_is_completed(Job *job);
 
diff --git a/block/mirror.c b/block/mirror.c
index e93631a9f6..72e02fa34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         /* Transition to the READY state and wait for complete. */
         job_transition_to_ready(&s->common.job);
         s->actively_synced = true;
-        while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
+        while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
             job_yield(&s->common.job);
         }
         s->common.job.cancelled = false;
@@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
             }
 
             should_complete = s->should_complete ||
-                job_is_cancelled(&s->common.job);
+                job_cancel_requested(&s->common.job);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         }
 
@@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                   delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
+        if (job_is_cancelled(&s->common.job)) {
             break;
         }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1099,9 +1099,7 @@ immediate_exit:
          * or it was cancelled prematurely so that we do not guarantee that
          * the target is a copy of the source.
          */
-        assert(ret < 0 ||
-               (s->common.job.force_cancel &&
-                job_is_cancelled(&s->common.job)));
+        assert(ret < 0 || job_is_cancelled(&s->common.job));
         assert(need_drain);
         mirror_wait_for_all_io(s);
     }
diff --git a/job.c b/job.c
index e78d893a9c..dba17a680f 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
 }
 
 bool job_is_cancelled(Job *job)
+{
+    return job->cancelled && job->force_cancel;
+}
+
+bool job_cancel_requested(Job *job)
 {
     return job->cancelled;
 }
@@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
     if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_is_cancelled(job) || !job->driver->complete) {
+    if (job_cancel_requested(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
                   ` (4 preceding siblings ...)
  2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
  2021-08-03 14:34   ` Kevin Wolf
  2021-07-26 14:46 ` [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test Max Reitz
  2021-08-03 14:36 ` [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Kevin Wolf
  7 siblings, 2 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
             mirror_wait_for_any_operation(s, true);
         }
 
-        if (s->ret < 0) {
+        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
             ret = s->ret;
             goto immediate_exit;
         }
@@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
             break;
         }
 
-        ret = 0;
-
         if (job_is_ready(&s->common.job) && !should_complete) {
             delay_ns = (s->in_flight == 0 &&
                         cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
         trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                   delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job)) {
-            break;
-        }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
                   ` (5 preceding siblings ...)
  2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
@ 2021-07-26 14:46 ` Max Reitz
  2021-07-27 13:24   ` Vladimir Sementsov-Ogievskiy
  2021-08-03 14:36 ` [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Kevin Wolf
  7 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-07-26 14:46 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel,
	Max Reitz

Test what happens when there is an I/O error after a mirror job in the
READY phase has been cancelled.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 .../tests/mirror-ready-cancel-error           | 143 ++++++++++++++++++
 .../tests/mirror-ready-cancel-error.out       |   5 +
 2 files changed, 148 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/mirror-ready-cancel-error
 create mode 100644 tests/qemu-iotests/tests/mirror-ready-cancel-error.out

diff --git a/tests/qemu-iotests/tests/mirror-ready-cancel-error b/tests/qemu-iotests/tests/mirror-ready-cancel-error
new file mode 100755
index 0000000000..f2dc88881f
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-ready-cancel-error
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test what happens when errors occur to a mirror job after it has
+# been cancelled in the READY phase
+#
+# Copyright (C) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+
+
+image_size = 1 * 1024 * 1024
+source = os.path.join(iotests.test_dir, 'source.img')
+target = os.path.join(iotests.test_dir, 'target.img')
+
+
+class TestMirrorReadyCancelError(iotests.QMPTestCase):
+    def setUp(self) -> None:
+        assert iotests.qemu_img_create('-f', iotests.imgfmt, source,
+                                       str(image_size)) == 0
+        assert iotests.qemu_img_create('-f', iotests.imgfmt, target,
+                                       str(image_size)) == 0
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+
+    def tearDown(self) -> None:
+        self.vm.shutdown()
+        os.remove(source)
+        os.remove(target)
+
+    def add_blockdevs(self, once: bool) -> None:
+        res = self.vm.qmp('blockdev-add',
+                          **{'node-name': 'source',
+                             'driver': iotests.imgfmt,
+                             'file': {
+                                 'driver': 'file',
+                                 'filename': source
+                             }})
+        self.assert_qmp(res, 'return', {})
+
+        # blkdebug notes:
+        # Enter state 2 on the first flush, which happens before the
+        # job enters the READY state.  The second flush will happen
+        # when the job is about to complete, and we want that one to
+        # fail.
+        res = self.vm.qmp('blockdev-add',
+                          **{'node-name': 'target',
+                             'driver': iotests.imgfmt,
+                             'file': {
+                                 'driver': 'blkdebug',
+                                 'image': {
+                                     'driver': 'file',
+                                     'filename': target
+                                 },
+                                 'set-state': [{
+                                     'event': 'flush_to_disk',
+                                     'state': 1,
+                                     'new_state': 2
+                                 }],
+                                 'inject-error': [{
+                                     'event': 'flush_to_disk',
+                                     'once': once,
+                                     'immediately': True,
+                                     'state': 2
+                                 }]}})
+        self.assert_qmp(res, 'return', {})
+
+    def start_mirror(self) -> None:
+        res = self.vm.qmp('blockdev-mirror',
+                          job_id='mirror',
+                          device='source',
+                          target='target',
+                          filter_node_name='mirror-top',
+                          sync='full',
+                          on_target_error='stop')
+        self.assert_qmp(res, 'return', {})
+
+    def cancel_mirror_with_error(self) -> None:
+        self.vm.event_wait('BLOCK_JOB_READY')
+
+        # Write something so will not leave the job immediately, but
+        # flush first (which will fail, thanks to blkdebug)
+        res = self.vm.qmp('human-monitor-command',
+                          command_line='qemu-io mirror-top "write 0 64k"')
+        self.assert_qmp(res, 'return', '')
+
+        # Drain status change events
+        while self.vm.event_wait('JOB_STATUS_CHANGE', timeout=0.0) is not None:
+            pass
+
+        res = self.vm.qmp('block-job-cancel', device='mirror')
+        self.assert_qmp(res, 'return', {})
+
+        self.vm.event_wait('BLOCK_JOB_ERROR')
+
+    def test_transient_error(self) -> None:
+        self.add_blockdevs(True)
+        self.start_mirror()
+        self.cancel_mirror_with_error()
+
+        while True:
+            e = self.vm.event_wait('JOB_STATUS_CHANGE')
+            if e['data']['status'] == 'standby':
+                # Transient error, try again
+                self.vm.qmp('block-job-resume', device='mirror')
+            elif e['data']['status'] == 'null':
+                break
+
+    def test_persistent_error(self) -> None:
+        self.add_blockdevs(False)
+        self.start_mirror()
+        self.cancel_mirror_with_error()
+
+        while True:
+            e = self.vm.event_wait('JOB_STATUS_CHANGE')
+            if e['data']['status'] == 'standby':
+                # Persistent error, no point in continuing
+                self.vm.qmp('block-job-cancel', device='mirror', force=True)
+            elif e['data']['status'] == 'null':
+                break
+
+
+if __name__ == '__main__':
+    # LUKS would require special key-secret handling in add_blockdevs()
+    iotests.main(supported_fmts=['generic'],
+                 unsupported_fmts=['luks'],
+                 supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/mirror-ready-cancel-error.out b/tests/qemu-iotests/tests/mirror-ready-cancel-error.out
new file mode 100644
index 0000000000..fbc63e62f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-ready-cancel-error.out
@@ -0,0 +1,5 @@
+..
+----------------------------------------------------------------------
+Ran 2 tests
+
+OK
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 2/7] mirror: Drop s->synced
  2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
@ 2021-07-26 15:26   ` Eric Blake
  2021-07-27 11:41   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Blake @ 2021-07-26 15:26 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On Mon, Jul 26, 2021 at 04:46:08PM +0200, Max Reitz wrote:
> As of HEAD^, there is no meaning to s->synced other than whether the job
> is READY or not.  job_is_ready() gives us that information, too.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/mirror.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning
  2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
@ 2021-07-26 21:38   ` Eric Blake
  2021-07-27 12:40   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Blake @ 2021-07-26 21:38 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

On Mon, Jul 26, 2021 at 04:46:10PM +0200, Max Reitz wrote:
> We largely have two cancel modes for jobs:
> 
> First, there is actual cancelling.  The job is terminated as soon as
> possible, without trying to reach a consistent result.
> 
> Second, we have mirror in the READY state.  Technically, the job is not
> really cancelled, but it just is a different completion mode.  The job
> can still run for an indefinite amount of time while it tries to reach a
> consistent result.
> 
> We want to be able to clearly distinguish which cancel mode a job is in
> (when it has been cancelled).  We can use Job.force_cancel for this, but
> right now it only reflects cancel requests from the user with
> force=true, but clearly, jobs that do not even distinguish between
> force=false and force=true are effectively always force-cancelled.
> 
> So this patch has Job.force_cancel signify whether the job will
> terminate as soon as possible (force_cancel=true) or whether it will
> effectively remain running despite being "cancelled"
> (force_cancel=false).
> 
> To this end, we let jobs that provide JobDriver.cancel() tell the
> generic job code whether they will terminate as soon as possible or not,
> and for jobs that do not provide that method we assume they will.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/qemu/job.h | 11 ++++++++++-
>  block/backup.c     |  3 ++-
>  block/mirror.c     | 24 ++++++++++++++++++------
>  job.c              |  6 +++++-
>  4 files changed, 35 insertions(+), 9 deletions(-)
>

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 2/7] mirror: Drop s->synced
  2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
  2021-07-26 15:26   ` Eric Blake
@ 2021-07-27 11:41   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 11:41 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

26.07.2021 17:46, Max Reitz wrote:
> As of HEAD^, there is no meaning to s->synced other than whether the job
> is READY or not.  job_is_ready() gives us that information, too.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning
  2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
  2021-07-26 21:38   ` Eric Blake
@ 2021-07-27 12:40   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 12:40 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

26.07.2021 17:46, Max Reitz wrote:
> We largely have two cancel modes for jobs:
> 
> First, there is actual cancelling.  The job is terminated as soon as
> possible, without trying to reach a consistent result.
> 
> Second, we have mirror in the READY state.  Technically, the job is not
> really cancelled, but it just is a different completion mode.  The job
> can still run for an indefinite amount of time while it tries to reach a
> consistent result.
> 
> We want to be able to clearly distinguish which cancel mode a job is in
> (when it has been cancelled).  We can use Job.force_cancel for this, but
> right now it only reflects cancel requests from the user with
> force=true, but clearly, jobs that do not even distinguish between
> force=false and force=true are effectively always force-cancelled.
> 
> So this patch has Job.force_cancel signify whether the job will
> terminate as soon as possible (force_cancel=true) or whether it will
> effectively remain running despite being "cancelled"
> (force_cancel=false).
> 
> To this end, we let jobs that provide JobDriver.cancel() tell the
> generic job code whether they will terminate as soon as possible or not,
> and for jobs that do not provide that method we assume they will.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/qemu/job.h | 11 ++++++++++-
>   block/backup.c     |  3 ++-
>   block/mirror.c     | 24 ++++++++++++++++++------
>   job.c              |  6 +++++-
>   4 files changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 5e8edbc2c8..8aa90f7395 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -253,8 +253,17 @@ struct JobDriver {
>   
>       /**
>        * If the callback is not NULL, it will be invoked in job_cancel_async
> +     *
> +     * This function must return true if the job will be cancelled
> +     * immediately without any further I/O (mandatory if @force is
> +     * true), and false otherwise.  This lets the generic job layer
> +     * know whether a job has been truly (force-)cancelled, or whether
> +     * it is just in a special completion mode (like mirror after
> +     * READY).
> +     * (If the callback is NULL, the job is assumed to terminate
> +     * without I/O.)
>        */
> -    void (*cancel)(Job *job, bool force);
> +    bool (*cancel)(Job *job, bool force);
>   
>   
>       /** Called when the job is freed */
> diff --git a/block/backup.c b/block/backup.c
> index bd3614ce70..513e1c8a0b 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -331,11 +331,12 @@ static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
>       }
>   }
>   
> -static void backup_cancel(Job *job, bool force)
> +static bool backup_cancel(Job *job, bool force)
>   {
>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>   
>       bdrv_cancel_in_flight(s->target_bs);
> +    return true;
>   }
>   
>   static const BlockJobDriver backup_job_driver = {
> diff --git a/block/mirror.c b/block/mirror.c
> index fcb7b65f93..e93631a9f6 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1087,9 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                     delay_ns);
>           job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job) &&
> -            (!job_is_ready(&s->common.job) || s->common.job.force_cancel))
> -        {
> +        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {

Seems, it could it be reduced to

if (s->common.job.force_cancel) {


>               break;
>           }
>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> @@ -1102,7 +1100,7 @@ immediate_exit:
>            * the target is a copy of the source.
>            */
>           assert(ret < 0 ||
> -               ((s->common.job.force_cancel || !job_is_ready(&s->common.job)) &&
> +               (s->common.job.force_cancel &&

and here

>                   job_is_cancelled(&s->common.job)));
>           assert(need_drain);
>           mirror_wait_for_all_io(s);
> @@ -1188,14 +1186,27 @@ static bool mirror_drained_poll(BlockJob *job)
>       return !!s->in_flight;
>   }
>   

anyway:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
@ 2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
  2021-07-27 15:39     ` Max Reitz
  2021-08-03 14:25   ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 13:04 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

26.07.2021 17:46, Max Reitz wrote:
> Most callers of job_is_cancelled() actually want to know whether the job
> is on its way to immediate termination.  For example, we refuse to pause
> jobs that are cancelled; but this only makes sense for jobs that are
> really actually cancelled.
> 
> A mirror job that is cancelled during READY with force=false should
> absolutely be allowed to pause.  This "cancellation" (which is actually
> a kind of completion) may take an indefinite amount of time, and so
> should behave like any job during normal operation.  For example, with
> on-target-error=stop, the job should stop on write errors.  (In
> contrast, force-cancelled jobs should not get write errors, as they
> should just terminate and not do further I/O.)
> 
> Therefore, redefine job_is_cancelled() to only return true for jobs that
> are force-cancelled (which as of HEAD^ means any job that interprets the
> cancellation request as a request for immediate termination), and add
> job_cancel_request() as the general variant, which returns true for any

job_cancel_requested()

> jobs which have been requested to be cancelled, whether it be
> immediately or after an arbitrarily long completion phase.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/qemu/job.h |  8 +++++++-
>   block/mirror.c     | 10 ++++------
>   job.c              |  7 ++++++-
>   3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 8aa90f7395..032edf3c5f 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>   /** Returns true if the job should not be visible to the management layer. */
>   bool job_is_internal(Job *job);
>   
> -/** Returns whether the job is scheduled for cancellation. */
> +/** Returns whether the job is being cancelled. */
>   bool job_is_cancelled(Job *job);
>   
> +/**
> + * Returns whether the job is scheduled for cancellation (at an
> + * indefinite point).
> + */
> +bool job_cancel_requested(Job *job);
> +
>   /** Returns whether the job is in a completed state. */
>   bool job_is_completed(Job *job);
>   
> diff --git a/block/mirror.c b/block/mirror.c
> index e93631a9f6..72e02fa34e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>           /* Transition to the READY state and wait for complete. */
>           job_transition_to_ready(&s->common.job);
>           s->actively_synced = true;
> -        while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
> +        while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
>               job_yield(&s->common.job);
>           }
>           s->common.job.cancelled = false;
> @@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>               }
>   
>               should_complete = s->should_complete ||
> -                job_is_cancelled(&s->common.job);
> +                job_cancel_requested(&s->common.job);
>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>           }
>   
> @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                     delay_ns);
>           job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
> +        if (job_is_cancelled(&s->common.job)) {
>               break;
>           }
>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> @@ -1099,9 +1099,7 @@ immediate_exit:
>            * or it was cancelled prematurely so that we do not guarantee that
>            * the target is a copy of the source.
>            */
> -        assert(ret < 0 ||
> -               (s->common.job.force_cancel &&
> -                job_is_cancelled(&s->common.job)));
> +        assert(ret < 0 || job_is_cancelled(&s->common.job));
>           assert(need_drain);
>           mirror_wait_for_all_io(s);
>       }
> diff --git a/job.c b/job.c
> index e78d893a9c..dba17a680f 100644
> --- a/job.c
> +++ b/job.c
> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>   }
>   
>   bool job_is_cancelled(Job *job)
> +{
> +    return job->cancelled && job->force_cancel;

can job->cancelled be false when job->force_cancel is true ? I think not and worth an assertion here. Something like

if (job->force_cancel) {
    assert(job->cancelled);
    return true;
}

return false;

> +}
> +
> +bool job_cancel_requested(Job *job)
>   {
>       return job->cancelled;
>   }
> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>           return;
>       }
> -    if (job_is_cancelled(job) || !job->driver->complete) {
> +    if (job_cancel_requested(job) || !job->driver->complete) {
>           error_setg(errp, "The active block job '%s' cannot be completed",
>                      job->id);
>           return;
> 

I think it's a correct change, although there may be unexpected side-effects, it's hard to imagine all consequences of changing job_is_cancelled() semantics called in several places in job.c.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
@ 2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
  2021-07-27 15:40     ` Max Reitz
  2021-08-03 14:34   ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 13:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

26.07.2021 17:46, Max Reitz wrote:
> We must check whether the job is force-cancelled early in our main loop,
> most importantly before any `continue` statement.  For example, we used
> to have `continue`s before our current checking location that are
> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> failing, force-cancelling the job would not terminate it.
> 
> A job being force-cancelled should be treated the same as the job having
> failed, so put the check in the same place where we check `s->ret < 0`.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/mirror.c | 7 +------
>   1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 72e02fa34e..46d1a1e5a2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>               mirror_wait_for_any_operation(s, true);
>           }
>   
> -        if (s->ret < 0) {
> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>               ret = s->ret;
>               goto immediate_exit;
>           }
> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>               break;
>           }
>   
> -        ret = 0;
> -

That's just a cleanup, that statement is useless pre-patch, yes?

>           if (job_is_ready(&s->common.job) && !should_complete) {
>               delay_ns = (s->in_flight == 0 &&
>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                     delay_ns);
>           job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job)) {
> -            break;
> -        }
>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>       }
>   
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test
  2021-07-26 14:46 ` [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test Max Reitz
@ 2021-07-27 13:24   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 13:24 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

26.07.2021 17:46, Max Reitz wrote:
> Test what happens when there is an I/O error after a mirror job in the
> READY phase has been cancelled.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
@ 2021-07-27 15:39     ` Max Reitz
  2021-07-27 15:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-07-27 15:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, John Snow, qemu-devel

On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:
> 26.07.2021 17:46, Max Reitz wrote:
>> Most callers of job_is_cancelled() actually want to know whether the job
>> is on its way to immediate termination.  For example, we refuse to pause
>> jobs that are cancelled; but this only makes sense for jobs that are
>> really actually cancelled.
>>
>> A mirror job that is cancelled during READY with force=false should
>> absolutely be allowed to pause.  This "cancellation" (which is actually
>> a kind of completion) may take an indefinite amount of time, and so
>> should behave like any job during normal operation.  For example, with
>> on-target-error=stop, the job should stop on write errors.  (In
>> contrast, force-cancelled jobs should not get write errors, as they
>> should just terminate and not do further I/O.)
>>
>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>> are force-cancelled (which as of HEAD^ means any job that interprets the
>> cancellation request as a request for immediate termination), and add
>> job_cancel_request() as the general variant, which returns true for any
>
> job_cancel_requested()
>
>> jobs which have been requested to be cancelled, whether it be
>> immediately or after an arbitrarily long completion phase.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/qemu/job.h |  8 +++++++-
>>   block/mirror.c     | 10 ++++------
>>   job.c              |  7 ++++++-
>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 8aa90f7395..032edf3c5f 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>   /** Returns true if the job should not be visible to the management 
>> layer. */
>>   bool job_is_internal(Job *job);
>>   -/** Returns whether the job is scheduled for cancellation. */
>> +/** Returns whether the job is being cancelled. */
>>   bool job_is_cancelled(Job *job);
>>   +/**
>> + * Returns whether the job is scheduled for cancellation (at an
>> + * indefinite point).
>> + */
>> +bool job_cancel_requested(Job *job);
>> +
>>   /** Returns whether the job is in a completed state. */
>>   bool job_is_completed(Job *job);
>>   diff --git a/block/mirror.c b/block/mirror.c
>> index e93631a9f6..72e02fa34e 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>           /* Transition to the READY state and wait for complete. */
>>           job_transition_to_ready(&s->common.job);
>>           s->actively_synced = true;
>> -        while (!job_is_cancelled(&s->common.job) && 
>> !s->should_complete) {
>> +        while (!job_cancel_requested(&s->common.job) && 
>> !s->should_complete) {
>>               job_yield(&s->common.job);
>>           }
>>           s->common.job.cancelled = false;
>> @@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>               }
>>                 should_complete = s->should_complete ||
>> -                job_is_cancelled(&s->common.job);
>> +                job_cancel_requested(&s->common.job);
>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>           }
>>   @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>           trace_mirror_before_sleep(s, cnt, 
>> job_is_ready(&s->common.job),
>>                                     delay_ns);
>>           job_sleep_ns(&s->common.job, delay_ns);
>> -        if (job_is_cancelled(&s->common.job) && 
>> s->common.job.force_cancel) {
>> +        if (job_is_cancelled(&s->common.job)) {
>>               break;
>>           }
>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> @@ -1099,9 +1099,7 @@ immediate_exit:
>>            * or it was cancelled prematurely so that we do not 
>> guarantee that
>>            * the target is a copy of the source.
>>            */
>> -        assert(ret < 0 ||
>> -               (s->common.job.force_cancel &&
>> -                job_is_cancelled(&s->common.job)));
>> +        assert(ret < 0 || job_is_cancelled(&s->common.job));

(As a note, I hope this does the job regarding your suggestions for 
patch 4. :))

>>           assert(need_drain);
>>           mirror_wait_for_all_io(s);
>>       }
>> diff --git a/job.c b/job.c
>> index e78d893a9c..dba17a680f 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>>   }
>>     bool job_is_cancelled(Job *job)
>> +{
>> +    return job->cancelled && job->force_cancel;
>
> can job->cancelled be false when job->force_cancel is true ? I think 
> not and worth an assertion here. Something like
>
> if (job->force_cancel) {
>    assert(job->cancelled);
>    return true;
> }
>
> return false;

Sounds good, why not.

>
>> +}
>> +
>> +bool job_cancel_requested(Job *job)
>>   {
>>       return job->cancelled;
>>   }
>> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>>           return;
>>       }
>> -    if (job_is_cancelled(job) || !job->driver->complete) {
>> +    if (job_cancel_requested(job) || !job->driver->complete) {
>>           error_setg(errp, "The active block job '%s' cannot be 
>> completed",
>>                      job->id);
>>           return;
>>
>
> I think it's a correct change, although there may be unexpected 
> side-effects, it's hard to imagine all consequences of changing 
> job_is_cancelled() semantics called in several places in job.c.

Yeah.  Targeting 6.2, I don’t have a bad feeling about it, though.

> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Thanks for the review, by the way!

Max



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
@ 2021-07-27 15:40     ` Max Reitz
  0 siblings, 0 replies; 31+ messages in thread
From: Max Reitz @ 2021-07-27 15:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, John Snow, qemu-devel

On 27.07.21 15:13, Vladimir Sementsov-Ogievskiy wrote:
> 26.07.2021 17:46, Max Reitz wrote:
>> We must check whether the job is force-cancelled early in our main loop,
>> most importantly before any `continue` statement.  For example, we used
>> to have `continue`s before our current checking location that are
>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>> failing, force-cancelling the job would not terminate it.
>>
>> A job being force-cancelled should be treated the same as the job having
>> failed, so put the check in the same place where we check `s->ret < 0`.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/mirror.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 72e02fa34e..46d1a1e5a2 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>               mirror_wait_for_any_operation(s, true);
>>           }
>>   -        if (s->ret < 0) {
>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>               ret = s->ret;
>>               goto immediate_exit;
>>           }
>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>               break;
>>           }
>>   -        ret = 0;
>> -
>
> That's just a cleanup, that statement is useless pre-patch, yes?

I think it was intended for if we left this loop via the 
job_is_cancelled() condition below.  Since it’s removed, this statement 
seems meaningless, so I removed it along with the `break`.

Max

>
>>           if (job_is_ready(&s->common.job) && !should_complete) {
>>               delay_ns = (s->in_flight == 0 &&
>>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>           trace_mirror_before_sleep(s, cnt, 
>> job_is_ready(&s->common.job),
>>                                     delay_ns);
>>           job_sleep_ns(&s->common.job, delay_ns);
>> -        if (job_is_cancelled(&s->common.job)) {
>> -            break;
>> -        }
>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       }
>>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-27 15:39     ` Max Reitz
@ 2021-07-27 15:47       ` Vladimir Sementsov-Ogievskiy
  2021-08-02 10:23         ` Max Reitz
  0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-27 15:47 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

27.07.2021 18:39, Max Reitz wrote:
> On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:
>> 26.07.2021 17:46, Max Reitz wrote:
>>> Most callers of job_is_cancelled() actually want to know whether the job
>>> is on its way to immediate termination.  For example, we refuse to pause
>>> jobs that are cancelled; but this only makes sense for jobs that are
>>> really actually cancelled.
>>>
>>> A mirror job that is cancelled during READY with force=false should
>>> absolutely be allowed to pause.  This "cancellation" (which is actually
>>> a kind of completion) may take an indefinite amount of time, and so
>>> should behave like any job during normal operation.  For example, with
>>> on-target-error=stop, the job should stop on write errors.  (In
>>> contrast, force-cancelled jobs should not get write errors, as they
>>> should just terminate and not do further I/O.)
>>>
>>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>>> are force-cancelled (which as of HEAD^ means any job that interprets the
>>> cancellation request as a request for immediate termination), and add
>>> job_cancel_request() as the general variant, which returns true for any
>>
>> job_cancel_requested()
>>
>>> jobs which have been requested to be cancelled, whether it be
>>> immediately or after an arbitrarily long completion phase.
>>>
>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   include/qemu/job.h |  8 +++++++-
>>>   block/mirror.c     | 10 ++++------
>>>   job.c              |  7 ++++++-
>>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>> index 8aa90f7395..032edf3c5f 100644
>>> --- a/include/qemu/job.h
>>> +++ b/include/qemu/job.h
>>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>>   /** Returns true if the job should not be visible to the management layer. */
>>>   bool job_is_internal(Job *job);
>>>   -/** Returns whether the job is scheduled for cancellation. */
>>> +/** Returns whether the job is being cancelled. */
>>>   bool job_is_cancelled(Job *job);
>>>   +/**
>>> + * Returns whether the job is scheduled for cancellation (at an
>>> + * indefinite point).
>>> + */
>>> +bool job_cancel_requested(Job *job);
>>> +
>>>   /** Returns whether the job is in a completed state. */
>>>   bool job_is_completed(Job *job);
>>>   diff --git a/block/mirror.c b/block/mirror.c
>>> index e93631a9f6..72e02fa34e 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>           /* Transition to the READY state and wait for complete. */
>>>           job_transition_to_ready(&s->common.job);
>>>           s->actively_synced = true;
>>> -        while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
>>> +        while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
>>>               job_yield(&s->common.job);
>>>           }
>>>           s->common.job.cancelled = false;
>>> @@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>               }
>>>                 should_complete = s->should_complete ||
>>> -                job_is_cancelled(&s->common.job);
>>> +                job_cancel_requested(&s->common.job);
>>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>           }
>>>   @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>>                                     delay_ns);
>>>           job_sleep_ns(&s->common.job, delay_ns);
>>> -        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
>>> +        if (job_is_cancelled(&s->common.job)) {
>>>               break;
>>>           }
>>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>> @@ -1099,9 +1099,7 @@ immediate_exit:
>>>            * or it was cancelled prematurely so that we do not guarantee that
>>>            * the target is a copy of the source.
>>>            */
>>> -        assert(ret < 0 ||
>>> -               (s->common.job.force_cancel &&
>>> -                job_is_cancelled(&s->common.job)));
>>> +        assert(ret < 0 || job_is_cancelled(&s->common.job));
> 
> (As a note, I hope this does the job regarding your suggestions for patch 4. :))
> 
>>>           assert(need_drain);
>>>           mirror_wait_for_all_io(s);
>>>       }
>>> diff --git a/job.c b/job.c
>>> index e78d893a9c..dba17a680f 100644
>>> --- a/job.c
>>> +++ b/job.c
>>> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>>>   }
>>>     bool job_is_cancelled(Job *job)
>>> +{
>>> +    return job->cancelled && job->force_cancel;
>>
>> can job->cancelled be false when job->force_cancel is true ? I think not and worth an assertion here. Something like
>>
>> if (job->force_cancel) {
>>    assert(job->cancelled);
>>    return true;
>> }
>>
>> return false;
> 
> Sounds good, why not.
> 
>>
>>> +}
>>> +
>>> +bool job_cancel_requested(Job *job)
>>>   {
>>>       return job->cancelled;
>>>   }
>>> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>>>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>>>           return;
>>>       }
>>> -    if (job_is_cancelled(job) || !job->driver->complete) {
>>> +    if (job_cancel_requested(job) || !job->driver->complete) {
>>>           error_setg(errp, "The active block job '%s' cannot be completed",
>>>                      job->id);
>>>           return;
>>>
>>
>> I think it's a correct change, although there may be unexpected side-effects, it's hard to imagine all consequences of changing job_is_cancelled() semantics called in several places in job.c.

For example: so we now don't set -ECANCELED in job_update_rc for soft-cancel..

This mean that job_finalize_single() will call job_commit instead of job_abort, and job_commit may do some graph changes, which shouldn't happen for soft-cancel

> 
> Yeah.  Targeting 6.2, I don’t have a bad feeling about it, though.
> 
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Thanks for the review, by the way!
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-27 15:47       ` Vladimir Sementsov-Ogievskiy
@ 2021-08-02 10:23         ` Max Reitz
  2021-08-03 12:35           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-08-02 10:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: Kevin Wolf, John Snow, qemu-devel

On 27.07.21 17:47, Vladimir Sementsov-Ogievskiy wrote:
> 27.07.2021 18:39, Max Reitz wrote:
>> On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:
>>> 26.07.2021 17:46, Max Reitz wrote:
>>>> Most callers of job_is_cancelled() actually want to know whether 
>>>> the job
>>>> is on its way to immediate termination.  For example, we refuse to 
>>>> pause
>>>> jobs that are cancelled; but this only makes sense for jobs that are
>>>> really actually cancelled.
>>>>
>>>> A mirror job that is cancelled during READY with force=false should
>>>> absolutely be allowed to pause.  This "cancellation" (which is 
>>>> actually
>>>> a kind of completion) may take an indefinite amount of time, and so
>>>> should behave like any job during normal operation.  For example, with
>>>> on-target-error=stop, the job should stop on write errors. (In
>>>> contrast, force-cancelled jobs should not get write errors, as they
>>>> should just terminate and not do further I/O.)
>>>>
>>>> Therefore, redefine job_is_cancelled() to only return true for jobs 
>>>> that
>>>> are force-cancelled (which as of HEAD^ means any job that 
>>>> interprets the
>>>> cancellation request as a request for immediate termination), and add
>>>> job_cancel_request() as the general variant, which returns true for 
>>>> any
>>>
>>> job_cancel_requested()
>>>
>>>> jobs which have been requested to be cancelled, whether it be
>>>> immediately or after an arbitrarily long completion phase.
>>>>
>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>   include/qemu/job.h |  8 +++++++-
>>>>   block/mirror.c     | 10 ++++------
>>>>   job.c              |  7 ++++++-
>>>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>>> index 8aa90f7395..032edf3c5f 100644
>>>> --- a/include/qemu/job.h
>>>> +++ b/include/qemu/job.h
>>>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>>>   /** Returns true if the job should not be visible to the 
>>>> management layer. */
>>>>   bool job_is_internal(Job *job);
>>>>   -/** Returns whether the job is scheduled for cancellation. */
>>>> +/** Returns whether the job is being cancelled. */
>>>>   bool job_is_cancelled(Job *job);
>>>>   +/**
>>>> + * Returns whether the job is scheduled for cancellation (at an
>>>> + * indefinite point).
>>>> + */
>>>> +bool job_cancel_requested(Job *job);
>>>> +
>>>>   /** Returns whether the job is in a completed state. */
>>>>   bool job_is_completed(Job *job);
>>>>   diff --git a/block/mirror.c b/block/mirror.c
>>>> index e93631a9f6..72e02fa34e 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, 
>>>> Error **errp)
>>>>           /* Transition to the READY state and wait for complete. */
>>>>           job_transition_to_ready(&s->common.job);
>>>>           s->actively_synced = true;
>>>> -        while (!job_is_cancelled(&s->common.job) && 
>>>> !s->should_complete) {
>>>> +        while (!job_cancel_requested(&s->common.job) && 
>>>> !s->should_complete) {
>>>>               job_yield(&s->common.job);
>>>>           }
>>>>           s->common.job.cancelled = false;
>>>> @@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, 
>>>> Error **errp)
>>>>               }
>>>>                 should_complete = s->should_complete ||
>>>> -                job_is_cancelled(&s->common.job);
>>>> + job_cancel_requested(&s->common.job);
>>>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>>           }
>>>>   @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job 
>>>> *job, Error **errp)
>>>>           trace_mirror_before_sleep(s, cnt, 
>>>> job_is_ready(&s->common.job),
>>>>                                     delay_ns);
>>>>           job_sleep_ns(&s->common.job, delay_ns);
>>>> -        if (job_is_cancelled(&s->common.job) && 
>>>> s->common.job.force_cancel) {
>>>> +        if (job_is_cancelled(&s->common.job)) {
>>>>               break;
>>>>           }
>>>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>>> @@ -1099,9 +1099,7 @@ immediate_exit:
>>>>            * or it was cancelled prematurely so that we do not 
>>>> guarantee that
>>>>            * the target is a copy of the source.
>>>>            */
>>>> -        assert(ret < 0 ||
>>>> -               (s->common.job.force_cancel &&
>>>> -                job_is_cancelled(&s->common.job)));
>>>> +        assert(ret < 0 || job_is_cancelled(&s->common.job));
>>
>> (As a note, I hope this does the job regarding your suggestions for 
>> patch 4. :))
>>
>>>>           assert(need_drain);
>>>>           mirror_wait_for_all_io(s);
>>>>       }
>>>> diff --git a/job.c b/job.c
>>>> index e78d893a9c..dba17a680f 100644
>>>> --- a/job.c
>>>> +++ b/job.c
>>>> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>>>>   }
>>>>     bool job_is_cancelled(Job *job)
>>>> +{
>>>> +    return job->cancelled && job->force_cancel;
>>>
>>> can job->cancelled be false when job->force_cancel is true ? I think 
>>> not and worth an assertion here. Something like
>>>
>>> if (job->force_cancel) {
>>>    assert(job->cancelled);
>>>    return true;
>>> }
>>>
>>> return false;
>>
>> Sounds good, why not.
>>
>>>
>>>> +}
>>>> +
>>>> +bool job_cancel_requested(Job *job)
>>>>   {
>>>>       return job->cancelled;
>>>>   }
>>>> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>>>>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>>>>           return;
>>>>       }
>>>> -    if (job_is_cancelled(job) || !job->driver->complete) {
>>>> +    if (job_cancel_requested(job) || !job->driver->complete) {
>>>>           error_setg(errp, "The active block job '%s' cannot be 
>>>> completed",
>>>>                      job->id);
>>>>           return;
>>>>
>>>
>>> I think it's a correct change, although there may be unexpected 
>>> side-effects, it's hard to imagine all consequences of changing 
>>> job_is_cancelled() semantics called in several places in job.c.
>
> For example: so we now don't set -ECANCELED in job_update_rc for 
> soft-cancel..
>
> This mean that job_finalize_single() will call job_commit instead of 
> job_abort, and job_commit may do some graph changes, which shouldn't 
> happen for soft-cancel

So the question is when these two conditions come into play.

There are two places that set job->ret to ECANCELED if the job is 
cancelled, namely job_update_rc(), and job_finish_sync().

job_finish_sync() will do so only after the job has been completed, 
which requires the job to either have been aborted (i.e. ret is non-zero 
anyway) or job_completed() to have been called. job_completed() is 
called by job_exit(), which is run after the job’s main loop has 
exited.  If mirror is soft-cancelled, mirror_run() will clear 
s->common.job.cancelled before returning, so job_finish_sync() will not 
see the job as cancelled.

job_update_rc() is called from three places:

job_finalize_single(): Asserts that job_is_completed(), so the same 
reasoning as for job_finish_sync() applies.

job_prepare(): Called by job_do_finalize(), which can only happen when 
the job is completed.  (JobVerbTable only allows finalization when the 
job is PENDING, which is a state where job_is_completed() is true, i.e. 
after mirror_run().)

job_completed(): Same reasoning as for job_finish_sync().


So it looks to me like these places that set job->ret to ECANCELED if 
the job has been cancelled do not consider a soft-cancelled mirror job 
to have been cancelled, which makes using job_is_cancelled() instead of 
job_cancel_requested() correct there. (And most likely, we can drop the 
`.cancelled = false` statements from the mirror job in turn.)

Max



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-08-02 10:23         ` Max Reitz
@ 2021-08-03 12:35           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-08-03 12:35 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow

02.08.2021 13:23, Max Reitz wrote:
> On 27.07.21 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> 27.07.2021 18:39, Max Reitz wrote:
>>> On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:
>>>> 26.07.2021 17:46, Max Reitz wrote:
>>>>> Most callers of job_is_cancelled() actually want to know whether the job
>>>>> is on its way to immediate termination.  For example, we refuse to pause
>>>>> jobs that are cancelled; but this only makes sense for jobs that are
>>>>> really actually cancelled.
>>>>>
>>>>> A mirror job that is cancelled during READY with force=false should
>>>>> absolutely be allowed to pause.  This "cancellation" (which is actually
>>>>> a kind of completion) may take an indefinite amount of time, and so
>>>>> should behave like any job during normal operation.  For example, with
>>>>> on-target-error=stop, the job should stop on write errors. (In
>>>>> contrast, force-cancelled jobs should not get write errors, as they
>>>>> should just terminate and not do further I/O.)
>>>>>
>>>>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>>>>> are force-cancelled (which as of HEAD^ means any job that interprets the
>>>>> cancellation request as a request for immediate termination), and add
>>>>> job_cancel_request() as the general variant, which returns true for any
>>>>
>>>> job_cancel_requested()
>>>>
>>>>> jobs which have been requested to be cancelled, whether it be
>>>>> immediately or after an arbitrarily long completion phase.
>>>>>
>>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>   include/qemu/job.h |  8 +++++++-
>>>>>   block/mirror.c     | 10 ++++------
>>>>>   job.c              |  7 ++++++-
>>>>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>>>> index 8aa90f7395..032edf3c5f 100644
>>>>> --- a/include/qemu/job.h
>>>>> +++ b/include/qemu/job.h
>>>>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>>>>   /** Returns true if the job should not be visible to the management layer. */
>>>>>   bool job_is_internal(Job *job);
>>>>>   -/** Returns whether the job is scheduled for cancellation. */
>>>>> +/** Returns whether the job is being cancelled. */
>>>>>   bool job_is_cancelled(Job *job);
>>>>>   +/**
>>>>> + * Returns whether the job is scheduled for cancellation (at an
>>>>> + * indefinite point).
>>>>> + */
>>>>> +bool job_cancel_requested(Job *job);
>>>>> +
>>>>>   /** Returns whether the job is in a completed state. */
>>>>>   bool job_is_completed(Job *job);
>>>>>   diff --git a/block/mirror.c b/block/mirror.c
>>>>> index e93631a9f6..72e02fa34e 100644
>>>>> --- a/block/mirror.c
>>>>> +++ b/block/mirror.c
>>>>> @@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>>           /* Transition to the READY state and wait for complete. */
>>>>>           job_transition_to_ready(&s->common.job);
>>>>>           s->actively_synced = true;
>>>>> -        while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
>>>>> +        while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
>>>>>               job_yield(&s->common.job);
>>>>>           }
>>>>>           s->common.job.cancelled = false;
>>>>> @@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>>               }
>>>>>                 should_complete = s->should_complete ||
>>>>> -                job_is_cancelled(&s->common.job);
>>>>> + job_cancel_requested(&s->common.job);
>>>>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>>>           }
>>>>>   @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>>>>                                     delay_ns);
>>>>>           job_sleep_ns(&s->common.job, delay_ns);
>>>>> -        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
>>>>> +        if (job_is_cancelled(&s->common.job)) {
>>>>>               break;
>>>>>           }
>>>>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>>>> @@ -1099,9 +1099,7 @@ immediate_exit:
>>>>>            * or it was cancelled prematurely so that we do not guarantee that
>>>>>            * the target is a copy of the source.
>>>>>            */
>>>>> -        assert(ret < 0 ||
>>>>> -               (s->common.job.force_cancel &&
>>>>> -                job_is_cancelled(&s->common.job)));
>>>>> +        assert(ret < 0 || job_is_cancelled(&s->common.job));
>>>
>>> (As a note, I hope this does the job regarding your suggestions for patch 4. :))
>>>
>>>>>           assert(need_drain);
>>>>>           mirror_wait_for_all_io(s);
>>>>>       }
>>>>> diff --git a/job.c b/job.c
>>>>> index e78d893a9c..dba17a680f 100644
>>>>> --- a/job.c
>>>>> +++ b/job.c
>>>>> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>>>>>   }
>>>>>     bool job_is_cancelled(Job *job)
>>>>> +{
>>>>> +    return job->cancelled && job->force_cancel;
>>>>
>>>> can job->cancelled be false when job->force_cancel is true ? I think not and worth an assertion here. Something like
>>>>
>>>> if (job->force_cancel) {
>>>>    assert(job->cancelled);
>>>>    return true;
>>>> }
>>>>
>>>> return false;
>>>
>>> Sounds good, why not.
>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +bool job_cancel_requested(Job *job)
>>>>>   {
>>>>>       return job->cancelled;
>>>>>   }
>>>>> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>>>>>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>>>>>           return;
>>>>>       }
>>>>> -    if (job_is_cancelled(job) || !job->driver->complete) {
>>>>> +    if (job_cancel_requested(job) || !job->driver->complete) {
>>>>>           error_setg(errp, "The active block job '%s' cannot be completed",
>>>>>                      job->id);
>>>>>           return;
>>>>>
>>>>
>>>> I think it's a correct change, although there may be unexpected side-effects, it's hard to imagine all consequences of changing job_is_cancelled() semantics called in several places in job.c.
>>
>> For example: so we now don't set -ECANCELED in job_update_rc for soft-cancel..
>>
>> This mean that job_finalize_single() will call job_commit instead of job_abort, and job_commit may do some graph changes, which shouldn't happen for soft-cancel
> 
> So the question is when these two conditions come into play.
> 
> There are two places that set job->ret to ECANCELED if the job is cancelled, namely job_update_rc(), and job_finish_sync().
> 
> job_finish_sync() will do so only after the job has been completed, which requires the job to either have been aborted (i.e. ret is non-zero anyway) or job_completed() to have been called. job_completed() is called by job_exit(), which is run after the job’s main loop has exited.  If mirror is soft-cancelled, mirror_run() will clear s->common.job.cancelled before returning, so job_finish_sync() will not see the job as cancelled.
> 
> job_update_rc() is called from three places:
> 
> job_finalize_single(): Asserts that job_is_completed(), so the same reasoning as for job_finish_sync() applies.
> 
> job_prepare(): Called by job_do_finalize(), which can only happen when the job is completed.  (JobVerbTable only allows finalization when the job is PENDING, which is a state where job_is_completed() is true, i.e. after mirror_run().)
> 
> job_completed(): Same reasoning as for job_finish_sync().
> 
> 
> So it looks to me like these places that set job->ret to ECANCELED if the job has been cancelled do not consider a soft-cancelled mirror job to have been cancelled, which makes using job_is_cancelled() instead of job_cancel_requested() correct there. (And most likely, we can drop the `.cancelled = false` statements from the mirror job in turn.)
> 


Hm, reasonable. OK than, thanks for explanation.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{,_all}()
  2021-07-26 14:46 ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
@ 2021-08-03 14:00   ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2021-08-03 14:00 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> Callers should be able to specify whether they want job_cancel_sync() to
> force-cancel the job or not.
> 
> In fact, almost all invocations do not care about consistency of the
> result and just want the job to terminate as soon as possible, so they
> should pass force=true.  The replication block driver is the exception.
> 
> This changes some iotest outputs, because quitting qemu while a mirror
> job is active will now lead to it being cancelled instead of completed,
> which is what we want.  (Cancelling a READY mirror job with force=false
> may take an indefinite amount of time, which we do not want when
> quitting.  If users want consistent results, they must have all jobs be
> done before they quit qemu.)
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> diff --git a/job.c b/job.c
> index e7a5d28854..9e971d64cf 100644
> --- a/job.c
> +++ b/job.c
> @@ -763,7 +763,12 @@ static void job_completed_txn_abort(Job *job)
>          if (other_job != job) {
>              ctx = other_job->aio_context;
>              aio_context_acquire(ctx);
> -            job_cancel_async(other_job, false);
> +            /*
> +             * This is a transaction: If one job failed, no result will matter.
> +             * Therefore, pass force=true to terminate all other jobs as quickly
> +             * as possible.
> +             */
> +            job_cancel_async(other_job, true);
>              aio_context_release(ctx);
>          }
>      }

Sneaking in a hunk that is unrelated to what the commit message
promises? How naughty! :-)

(But I guess the change makes sense.)

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
  2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
@ 2021-08-03 14:25   ` Kevin Wolf
  2021-08-04  8:07     ` Max Reitz
  1 sibling, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2021-08-03 14:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> Most callers of job_is_cancelled() actually want to know whether the job
> is on its way to immediate termination.  For example, we refuse to pause
> jobs that are cancelled; but this only makes sense for jobs that are
> really actually cancelled.
> 
> A mirror job that is cancelled during READY with force=false should
> absolutely be allowed to pause.  This "cancellation" (which is actually
> a kind of completion) may take an indefinite amount of time, and so
> should behave like any job during normal operation.  For example, with
> on-target-error=stop, the job should stop on write errors.  (In
> contrast, force-cancelled jobs should not get write errors, as they
> should just terminate and not do further I/O.)
> 
> Therefore, redefine job_is_cancelled() to only return true for jobs that
> are force-cancelled (which as of HEAD^ means any job that interprets the
> cancellation request as a request for immediate termination), and add
> job_cancel_request() as the general variant, which returns true for any
> jobs which have been requested to be cancelled, whether it be
> immediately or after an arbitrarily long completion phase.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/qemu/job.h |  8 +++++++-
>  block/mirror.c     | 10 ++++------
>  job.c              |  7 ++++++-
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 8aa90f7395..032edf3c5f 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>  /** Returns true if the job should not be visible to the management layer. */
>  bool job_is_internal(Job *job);
>  
> -/** Returns whether the job is scheduled for cancellation. */
> +/** Returns whether the job is being cancelled. */
>  bool job_is_cancelled(Job *job);
>  
> +/**
> + * Returns whether the job is scheduled for cancellation (at an
> + * indefinite point).
> + */
> +bool job_cancel_requested(Job *job);

I don't think non-force blockdev-cancel for mirror should actually be
considered cancellation, so what is the question that this function
answers?

"Is this a cancelled job, or a mirror block job that is supposed to
complete soon, but only if it doesn't switch over the users to the
target on completion"?

Is this ever a reasonable question to ask, except maybe inside the
mirror implementation itself?

job_complete() is the only function outside of mirror that seems to use
it. But even there, it feels wrong to make a difference. Either we
accept redundant completion requests, or we don't. It doesn't really
matter how the job reconfigures the graph on completion. (Also, I feel
this should really have been part of the state machine, but I'm not sure
if we want to touch it now...)

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
  2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
@ 2021-08-03 14:34   ` Kevin Wolf
  2021-08-04  8:25     ` Max Reitz
  1 sibling, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2021-08-03 14:34 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> We must check whether the job is force-cancelled early in our main loop,
> most importantly before any `continue` statement.  For example, we used
> to have `continue`s before our current checking location that are
> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> failing, force-cancelling the job would not terminate it.
> 
> A job being force-cancelled should be treated the same as the job having
> failed, so put the check in the same place where we check `s->ret < 0`.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/mirror.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 72e02fa34e..46d1a1e5a2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>              mirror_wait_for_any_operation(s, true);
>          }
>  
> -        if (s->ret < 0) {
> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>              ret = s->ret;
>              goto immediate_exit;
>          }
> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>              break;
>          }
>  
> -        ret = 0;
> -
>          if (job_is_ready(&s->common.job) && !should_complete) {
>              delay_ns = (s->in_flight == 0 &&
>                          cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>          trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                    delay_ns);
>          job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job)) {
> -            break;
> -        }

I think it was intentional that the check is here because it means
skipping the job_sleep_ns() and instead cancelling immediately, and we
probably still want that. Between your check above and here, the
coroutine can yield, so cancellation could have been newly requested.

So have the check in both places, I guess? And a comment to explain why
neither is redundant.

>          s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      }

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel
  2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
                   ` (6 preceding siblings ...)
  2021-07-26 14:46 ` [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test Max Reitz
@ 2021-08-03 14:36 ` Kevin Wolf
  7 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2021-08-03 14:36 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> Hi,
> 
> v1 cover letter:
> https://lists.nongnu.org/archive/html/qemu-block/2021-07/msg00705.html
> 
> Changes in v2:
> - Added patch 2 (as suggested by Vladimir)
> - Patch 4 (ex. 3): Rebase conflicts because of patch 2
> - Patch 5 (ex. 4):
>   - Rebase conflicts because of patch 2
>   - Do not use job_cancel_requested() to determine how a soft-cancelled
>     job should be completed: A soft-cancelled job should end with
>     COMPLETED, not CANCELLED, and so job_is_cancelled() is the
>     appropriate condition there.

Patches 1-4, preferably with the unrelated hunk split out from 3:

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-08-03 14:25   ` Kevin Wolf
@ 2021-08-04  8:07     ` Max Reitz
  2021-08-04 10:34       ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-08-04  8:07 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

On 03.08.21 16:25, Kevin Wolf wrote:
> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>> Most callers of job_is_cancelled() actually want to know whether the job
>> is on its way to immediate termination.  For example, we refuse to pause
>> jobs that are cancelled; but this only makes sense for jobs that are
>> really actually cancelled.
>>
>> A mirror job that is cancelled during READY with force=false should
>> absolutely be allowed to pause.  This "cancellation" (which is actually
>> a kind of completion) may take an indefinite amount of time, and so
>> should behave like any job during normal operation.  For example, with
>> on-target-error=stop, the job should stop on write errors.  (In
>> contrast, force-cancelled jobs should not get write errors, as they
>> should just terminate and not do further I/O.)
>>
>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>> are force-cancelled (which as of HEAD^ means any job that interprets the
>> cancellation request as a request for immediate termination), and add
>> job_cancel_request() as the general variant, which returns true for any
>> jobs which have been requested to be cancelled, whether it be
>> immediately or after an arbitrarily long completion phase.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/qemu/job.h |  8 +++++++-
>>   block/mirror.c     | 10 ++++------
>>   job.c              |  7 ++++++-
>>   3 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 8aa90f7395..032edf3c5f 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>   /** Returns true if the job should not be visible to the management layer. */
>>   bool job_is_internal(Job *job);
>>   
>> -/** Returns whether the job is scheduled for cancellation. */
>> +/** Returns whether the job is being cancelled. */
>>   bool job_is_cancelled(Job *job);
>>   
>> +/**
>> + * Returns whether the job is scheduled for cancellation (at an
>> + * indefinite point).
>> + */
>> +bool job_cancel_requested(Job *job);
> I don't think non-force blockdev-cancel for mirror should actually be
> considered cancellation, so what is the question that this function
> answers?
>
> "Is this a cancelled job, or a mirror block job that is supposed to
> complete soon, but only if it doesn't switch over the users to the
> target on completion"?

Well, technically yes, but it was more intended as “Has the user ever 
invoked (block-)job-cancel on this job?”.

> Is this ever a reasonable question to ask, except maybe inside the
> mirror implementation itself?

I asked myself the same for v3, but found two places in job.c where I 
would like to keep it:

First, there’s an assertion in job_completed_txn_abort().  All jobs 
other than @job have been force-cancelled, and so job_is_cancelled() 
would be true for them.  As for @job itself, the function is mostly 
called when the job’s return value is not 0, but a soft-cancelled mirror 
does have a return value of 0 and so would not end up in that function.
But job_cancel() invokes job_completed_txn_abort() if the job has been 
deferred to the main loop, which mostly correlates with the job having 
been completed (in which case the assertion is skipped), but not 100 % 
(there’s a small window between setting deferred_to_main_loop and the 
job changing to a completed state).
So I’d prefer to keep the assertion as-is functionally, i.e. to only 
check job->cancelled.

Second, job_complete() refuses to let a job complete that has been 
cancelled.  This function is basically only invoked by the user (through 
qmp_block_job_complete()/qmp_job_complete(), or job_complete_sync(), 
which comes from qemu-img), so I believe that it should correspond to 
the external interface we have right now; i.e., if the user has invoked 
(block-)job-cancel at one point, job_complete() should generally return 
an error.

> job_complete() is the only function outside of mirror that seems to use
> it. But even there, it feels wrong to make a difference. Either we
> accept redundant completion requests, or we don't. It doesn't really
> matter how the job reconfigures the graph on completion. (Also, I feel
> this should really have been part of the state machine, but I'm not sure
> if we want to touch it now...)

Well, yes, I don’t think it makes a difference because I don’t think 
anyone will first tell the job via block-job-cancel to complete without 
pivoting, and then change their mind and call block-job-complete after 
all.  (Not least because that’s an error pre-series.)

Also, I’m not even sure whether completing after a soft cancel request 
works.  I don’t think any of our code accounts for such a case, so I’d 
rather avoid allowing it if there’s no need to allow it anyway.

Max



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-08-03 14:34   ` Kevin Wolf
@ 2021-08-04  8:25     ` Max Reitz
  2021-08-04  9:48       ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Max Reitz @ 2021-08-04  8:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

On 03.08.21 16:34, Kevin Wolf wrote:
> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>> We must check whether the job is force-cancelled early in our main loop,
>> most importantly before any `continue` statement.  For example, we used
>> to have `continue`s before our current checking location that are
>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>> failing, force-cancelling the job would not terminate it.
>>
>> A job being force-cancelled should be treated the same as the job having
>> failed, so put the check in the same place where we check `s->ret < 0`.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/mirror.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 72e02fa34e..46d1a1e5a2 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>               mirror_wait_for_any_operation(s, true);
>>           }
>>   
>> -        if (s->ret < 0) {
>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>               ret = s->ret;
>>               goto immediate_exit;
>>           }
>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>               break;
>>           }
>>   
>> -        ret = 0;
>> -
>>           if (job_is_ready(&s->common.job) && !should_complete) {
>>               delay_ns = (s->in_flight == 0 &&
>>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>                                     delay_ns);
>>           job_sleep_ns(&s->common.job, delay_ns);
>> -        if (job_is_cancelled(&s->common.job)) {
>> -            break;
>> -        }
> I think it was intentional that the check is here because it means
> skipping the job_sleep_ns() and instead cancelling immediately, and we
> probably still want that. Between your check above and here, the
> coroutine can yield, so cancellation could have been newly requested.

I’m afraid I don’t quite understand.  If cancel is requested in 
job_sleep_ns(), then we will go back to the top of the loop, wait for 
in-flight active requests and then break.  Waiting for the in-flight 
requests seems unnecessary, but does it really make a difference in 
practice?  We don’t start new requests, so it should be legal to wait 
for existing ones to settle, and also I believe someone will have to 
wait for those in-flight requests anyway (when the mirror top node is 
removed).  (The only thing we could do is to cancel the in-flight 
requests, but that is what mirror_cancel() does.)

Looking more at the whole loop, there are a couple of places that can 
yield.  Of course we can check whether the job has been cancelled after 
every single one of them, but that would be a bit strange.  We only 
really need to check before we initiate new requests or want to change 
the state.  I believe the right place to do the check would be after the 
job_pause_point().

And perhaps the active write functions (bdrv_mirror_top_do_write() and 
bdrv_mirror_top_pwritev()) should stop copying to the target if the job 
has been cancelled.

Max

> So have the check in both places, I guess? And a comment to explain why
> neither is redundant.
>
>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       }
> Kevin
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-08-04  8:25     ` Max Reitz
@ 2021-08-04  9:48       ` Kevin Wolf
  2021-08-04 10:12         ` Max Reitz
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2021-08-04  9:48 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

Am 04.08.2021 um 10:25 hat Max Reitz geschrieben:
> On 03.08.21 16:34, Kevin Wolf wrote:
> > Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> > > We must check whether the job is force-cancelled early in our main loop,
> > > most importantly before any `continue` statement.  For example, we used
> > > to have `continue`s before our current checking location that are
> > > triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> > > failing, force-cancelling the job would not terminate it.
> > > 
> > > A job being force-cancelled should be treated the same as the job having
> > > failed, so put the check in the same place where we check `s->ret < 0`.
> > > 
> > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   block/mirror.c | 7 +------
> > >   1 file changed, 1 insertion(+), 6 deletions(-)
> > > 
> > > diff --git a/block/mirror.c b/block/mirror.c
> > > index 72e02fa34e..46d1a1e5a2 100644
> > > --- a/block/mirror.c
> > > +++ b/block/mirror.c
> > > @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >               mirror_wait_for_any_operation(s, true);
> > >           }
> > > -        if (s->ret < 0) {
> > > +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
> > >               ret = s->ret;
> > >               goto immediate_exit;
> > >           }
> > > @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >               break;
> > >           }
> > > -        ret = 0;
> > > -
> > >           if (job_is_ready(&s->common.job) && !should_complete) {
> > >               delay_ns = (s->in_flight == 0 &&
> > >                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> > > @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
> > >                                     delay_ns);
> > >           job_sleep_ns(&s->common.job, delay_ns);
> > > -        if (job_is_cancelled(&s->common.job)) {
> > > -            break;
> > > -        }
> > I think it was intentional that the check is here because it means
> > skipping the job_sleep_ns() and instead cancelling immediately, and we
> > probably still want that. Between your check above and here, the
> > coroutine can yield, so cancellation could have been newly requested.
> 
> I’m afraid I don’t quite understand.

Hm, I don't either. Somehow I thought job_sleep_ns() was after the
check, while quoting the exact hunk that shows that it comes before
it...

I'm still not sure if sleeping before exiting is really useful, but it
seems we never cared about that.

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier
  2021-08-04  9:48       ` Kevin Wolf
@ 2021-08-04 10:12         ` Max Reitz
  0 siblings, 0 replies; 31+ messages in thread
From: Max Reitz @ 2021-08-04 10:12 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, qemu-block

On 04.08.21 11:48, Kevin Wolf wrote:
> Am 04.08.2021 um 10:25 hat Max Reitz geschrieben:
>> On 03.08.21 16:34, Kevin Wolf wrote:
>>> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>>>> We must check whether the job is force-cancelled early in our main loop,
>>>> most importantly before any `continue` statement.  For example, we used
>>>> to have `continue`s before our current checking location that are
>>>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>>>> failing, force-cancelling the job would not terminate it.
>>>>
>>>> A job being force-cancelled should be treated the same as the job having
>>>> failed, so put the check in the same place where we check `s->ret < 0`.
>>>>
>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    block/mirror.c | 7 +------
>>>>    1 file changed, 1 insertion(+), 6 deletions(-)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index 72e02fa34e..46d1a1e5a2 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>                mirror_wait_for_any_operation(s, true);
>>>>            }
>>>> -        if (s->ret < 0) {
>>>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>>>                ret = s->ret;
>>>>                goto immediate_exit;
>>>>            }
>>>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>                break;
>>>>            }
>>>> -        ret = 0;
>>>> -
>>>>            if (job_is_ready(&s->common.job) && !should_complete) {
>>>>                delay_ns = (s->in_flight == 0 &&
>>>>                            cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>>>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>            trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>>>                                      delay_ns);
>>>>            job_sleep_ns(&s->common.job, delay_ns);
>>>> -        if (job_is_cancelled(&s->common.job)) {
>>>> -            break;
>>>> -        }
>>> I think it was intentional that the check is here because it means
>>> skipping the job_sleep_ns() and instead cancelling immediately, and we
>>> probably still want that. Between your check above and here, the
>>> coroutine can yield, so cancellation could have been newly requested.
>> I’m afraid I don’t quite understand.
> Hm, I don't either. Somehow I thought job_sleep_ns() was after the
> check, while quoting the exact hunk that shows that it comes before
> it...
>
> I'm still not sure if sleeping before exiting is really useful, but it
> seems we never cared about that.

Jobs that are (force-)cancelled cannot yield or sleep anyway 
(job_sleep_ns(), job_yield(), and job_pause_point() will all return 
immediately when called on a cancelled job).

So I thought you meant that a job can only be cancelled while it is 
yielding, so we should prefer to put the is_cancelled check after a 
yield point (like job_pause_point()) than before it.

But I mean, if you’re happy, I’ll be happy, too. :)

Max



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-08-04  8:07     ` Max Reitz
@ 2021-08-04 10:34       ` Kevin Wolf
  2021-08-04 11:00         ` Peter Krempa
  2021-08-04 14:15         ` Max Reitz
  0 siblings, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2021-08-04 10:34 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, libvir-list,
	qemu-devel, pkrempa, John Snow

[ Peter, the question for you is at the end. ]

Am 04.08.2021 um 10:07 hat Max Reitz geschrieben:
> On 03.08.21 16:25, Kevin Wolf wrote:
> > Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> > > Most callers of job_is_cancelled() actually want to know whether the job
> > > is on its way to immediate termination.  For example, we refuse to pause
> > > jobs that are cancelled; but this only makes sense for jobs that are
> > > really actually cancelled.
> > > 
> > > A mirror job that is cancelled during READY with force=false should
> > > absolutely be allowed to pause.  This "cancellation" (which is actually
> > > a kind of completion) may take an indefinite amount of time, and so
> > > should behave like any job during normal operation.  For example, with
> > > on-target-error=stop, the job should stop on write errors.  (In
> > > contrast, force-cancelled jobs should not get write errors, as they
> > > should just terminate and not do further I/O.)
> > > 
> > > Therefore, redefine job_is_cancelled() to only return true for jobs that
> > > are force-cancelled (which as of HEAD^ means any job that interprets the
> > > cancellation request as a request for immediate termination), and add
> > > job_cancel_request() as the general variant, which returns true for any
> > > jobs which have been requested to be cancelled, whether it be
> > > immediately or after an arbitrarily long completion phase.
> > > 
> > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   include/qemu/job.h |  8 +++++++-
> > >   block/mirror.c     | 10 ++++------
> > >   job.c              |  7 ++++++-
> > >   3 files changed, 17 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/include/qemu/job.h b/include/qemu/job.h
> > > index 8aa90f7395..032edf3c5f 100644
> > > --- a/include/qemu/job.h
> > > +++ b/include/qemu/job.h
> > > @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
> > >   /** Returns true if the job should not be visible to the management layer. */
> > >   bool job_is_internal(Job *job);
> > > -/** Returns whether the job is scheduled for cancellation. */
> > > +/** Returns whether the job is being cancelled. */
> > >   bool job_is_cancelled(Job *job);
> > > +/**
> > > + * Returns whether the job is scheduled for cancellation (at an
> > > + * indefinite point).
> > > + */
> > > +bool job_cancel_requested(Job *job);
> > I don't think non-force blockdev-cancel for mirror should actually be
> > considered cancellation, so what is the question that this function
> > answers?
> > 
> > "Is this a cancelled job, or a mirror block job that is supposed to
> > complete soon, but only if it doesn't switch over the users to the
> > target on completion"?
> 
> Well, technically yes, but it was more intended as “Has the user ever
> invoked (block-)job-cancel on this job?”.

I understand this, but is this much more useful to know than "Has the
user ever called HMP 'change'?", if you know what I mean?

> > Is this ever a reasonable question to ask, except maybe inside the
> > mirror implementation itself?
> 
> I asked myself the same for v3, but found two places in job.c where I
> would like to keep it:
> 
> First, there’s an assertion in job_completed_txn_abort().  All jobs
> other than @job have been force-cancelled, and so job_is_cancelled()
> would be true for them.  As for @job itself, the function is mostly
> called when the job’s return value is not 0, but a soft-cancelled
> mirror does have a return value of 0 and so would not end up in that
> function.
> But job_cancel() invokes job_completed_txn_abort() if the job has been
> deferred to the main loop, which mostly correlates with the job having
> been completed (in which case the assertion is skipped), but not 100 %
> (there’s a small window between setting deferred_to_main_loop and the
> job changing to a completed state).
> So I’d prefer to keep the assertion as-is functionally, i.e. to only
> check job->cancelled.

Well, you don't. It's still job_is_cancelled() after this patch.

So the scenario you're concerned about is a job that has just finished
successfully (job->ret = 0) and then gets a cancel request?

With force=false, I'm pretty sure the code is wrong anyway because
calling job_completed_txn_abort() is not the right response. It should
return an error because you're trying to complete twice, possibly with
conflicting completion modes. Second best is just ignoring the cancel
request because we obviously already fulfilled the request of completing
the job (the completion mode might be different, though).

With force=true, arguably still letting the job fail is correct.
However, letting it fail involves more than just letting the transaction
fail. We would have to call job_update_rc() as well so that instead of
reporting success for the job, ECANCELED is returned and the job
transitions to JOB_STATUS_ABORTING (after which job_is_completed()
returns true).

So, just bugs to be fixed.

After this, I think we could even assert(job->ret != 0 ||
job->status == JOB_STATUS_PENDING) in job_completed_txn_abort().
ret == 0 can only happen when called from job_do_finalize(), when the
job is only failing because other jobs in the same transaction have
failed in their .prepare callback.

> Second, job_complete() refuses to let a job complete that has been
> cancelled.  This function is basically only invoked by the user
> (through qmp_block_job_complete()/qmp_job_complete(), or
> job_complete_sync(), which comes from qemu-img), so I believe that it
> should correspond to the external interface we have right now; i.e.,
> if the user has invoked (block-)job-cancel at one point,
> job_complete() should generally return an error.

True. But it should also return an error if the user has invoked
job-complete at some point. The distinction between complete and
non-force cancel doesn't make sense there.

And cancelling with force=false should fail, too, when either job-cancel
or job-complete was called for the job before.

> > job_complete() is the only function outside of mirror that seems to use
> > it. But even there, it feels wrong to make a difference. Either we
> > accept redundant completion requests, or we don't. It doesn't really
> > matter how the job reconfigures the graph on completion. (Also, I feel
> > this should really have been part of the state machine, but I'm not sure
> > if we want to touch it now...)
> 
> Well, yes, I don’t think it makes a difference because I don’t think
> anyone will first tell the job via block-job-cancel to complete
> without pivoting, and then change their mind and call
> block-job-complete after all.  (Not least because that’s an error
> pre-series.)

Right, I'm just arguing that we shouldn't allow the opposite order
either. Currently I think we do, and it's buggy, as explained above.

> Also, I’m not even sure whether completing after a soft cancel request
> works.  I don’t think any of our code accounts for such a case, so I’d
> rather avoid allowing it if there’s no need to allow it anyway.

Yes, definitely avoid it. We should allow only one completion request
(be it with job-complete or block-job-cancel) and return an error for
all future completion requests for the same job.

We could in theory keep allowing redundant completion requests when the
completion mode doesn't conflict, but I don't see the point of that.

Unless libvirt can actually issue multiple completion requests (again,
this includes both (block-)job-complete and non-force block-job-cancel
for mirror) for the same block job - Peter, I hope it doesn't?

Kevin



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-08-04 10:34       ` Kevin Wolf
@ 2021-08-04 11:00         ` Peter Krempa
  2021-08-04 14:15         ` Max Reitz
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Krempa @ 2021-08-04 11:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, libvir-list,
	qemu-devel, Max Reitz, John Snow

On Wed, Aug 04, 2021 at 12:34:31 +0200, Kevin Wolf wrote:
> We could in theory keep allowing redundant completion requests when the
> completion mode doesn't conflict, but I don't see the point of that.

I don't see either. Especially since ...



> Unless libvirt can actually issue multiple completion requests (again,
> this includes both (block-)job-complete and non-force block-job-cancel
> for mirror) for the same block job - Peter, I hope it doesn't?

... the regular job completion code in libvirt which is meant for user
interaction (qemuDomainBlockJobAbort) has the following interlock:

    if (job->state == QEMU_BLOCKJOB_STATE_ABORTING ||
        job->state == QEMU_BLOCKJOB_STATE_PIVOTING) {
        virReportError(VIR_ERR_OPERATION_INVALID,
                       _("block job on disk '%s' is still being ended"),
                       disk->dst);
        goto endjob;
    }

.. the other two uses of blockjobs are internal for handling migration
with non shared storage and there we also issue exactly one cancel
request and backup jobs were we too make sure to cancel it just once.

As of such it's okay to forbid the case you are mentioning.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
  2021-08-04 10:34       ` Kevin Wolf
  2021-08-04 11:00         ` Peter Krempa
@ 2021-08-04 14:15         ` Max Reitz
  1 sibling, 0 replies; 31+ messages in thread
From: Max Reitz @ 2021-08-04 14:15 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, libvir-list,
	qemu-devel, pkrempa, John Snow

On 04.08.21 12:34, Kevin Wolf wrote:
> [ Peter, the question for you is at the end. ]
>
> Am 04.08.2021 um 10:07 hat Max Reitz geschrieben:
>> On 03.08.21 16:25, Kevin Wolf wrote:
>>> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>>>> Most callers of job_is_cancelled() actually want to know whether the job
>>>> is on its way to immediate termination.  For example, we refuse to pause
>>>> jobs that are cancelled; but this only makes sense for jobs that are
>>>> really actually cancelled.
>>>>
>>>> A mirror job that is cancelled during READY with force=false should
>>>> absolutely be allowed to pause.  This "cancellation" (which is actually
>>>> a kind of completion) may take an indefinite amount of time, and so
>>>> should behave like any job during normal operation.  For example, with
>>>> on-target-error=stop, the job should stop on write errors.  (In
>>>> contrast, force-cancelled jobs should not get write errors, as they
>>>> should just terminate and not do further I/O.)
>>>>
>>>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>>>> are force-cancelled (which as of HEAD^ means any job that interprets the
>>>> cancellation request as a request for immediate termination), and add
>>>> job_cancel_request() as the general variant, which returns true for any
>>>> jobs which have been requested to be cancelled, whether it be
>>>> immediately or after an arbitrarily long completion phase.
>>>>
>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    include/qemu/job.h |  8 +++++++-
>>>>    block/mirror.c     | 10 ++++------
>>>>    job.c              |  7 ++++++-
>>>>    3 files changed, 17 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>>> index 8aa90f7395..032edf3c5f 100644
>>>> --- a/include/qemu/job.h
>>>> +++ b/include/qemu/job.h
>>>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>>>    /** Returns true if the job should not be visible to the management layer. */
>>>>    bool job_is_internal(Job *job);
>>>> -/** Returns whether the job is scheduled for cancellation. */
>>>> +/** Returns whether the job is being cancelled. */
>>>>    bool job_is_cancelled(Job *job);
>>>> +/**
>>>> + * Returns whether the job is scheduled for cancellation (at an
>>>> + * indefinite point).
>>>> + */
>>>> +bool job_cancel_requested(Job *job);
>>> I don't think non-force blockdev-cancel for mirror should actually be
>>> considered cancellation, so what is the question that this function
>>> answers?
>>>
>>> "Is this a cancelled job, or a mirror block job that is supposed to
>>> complete soon, but only if it doesn't switch over the users to the
>>> target on completion"?
>> Well, technically yes, but it was more intended as “Has the user ever
>> invoked (block-)job-cancel on this job?”.
> I understand this, but is this much more useful to know than "Has the
> user ever called HMP 'change'?", if you know what I mean?

Hm.  Not really.  It’s still a crutch that shouldn’t be there ideally.

But I like this crutch for this series so I can get this batch done, and 
then worry about all the other bugs that keep popping up (and where 
job_cancel_requested() is a nice sign that something’s off).

>>> Is this ever a reasonable question to ask, except maybe inside the
>>> mirror implementation itself?
>> I asked myself the same for v3, but found two places in job.c where I
>> would like to keep it:
>>
>> First, there’s an assertion in job_completed_txn_abort().  All jobs
>> other than @job have been force-cancelled, and so job_is_cancelled()
>> would be true for them.  As for @job itself, the function is mostly
>> called when the job’s return value is not 0, but a soft-cancelled
>> mirror does have a return value of 0 and so would not end up in that
>> function.
>> But job_cancel() invokes job_completed_txn_abort() if the job has been
>> deferred to the main loop, which mostly correlates with the job having
>> been completed (in which case the assertion is skipped), but not 100 %
>> (there’s a small window between setting deferred_to_main_loop and the
>> job changing to a completed state).
>> So I’d prefer to keep the assertion as-is functionally, i.e. to only
>> check job->cancelled.
> Well, you don't. It's still job_is_cancelled() after this patch.

No: I didn’t. O:)

For v3, I had absolutely planned to use job_cancel_requested(), and I 
wanted to put the above explanation into the commit message.

> So the scenario you're concerned about is a job that has just finished
> successfully (job->ret = 0) and then gets a cancel request?

Yes.

> With force=false, I'm pretty sure the code is wrong anyway because
> calling job_completed_txn_abort() is not the right response.

Absolutely possible, I just didn’t want to deal with this, too… :/

> It should
> return an error because you're trying to complete twice, possibly with
> conflicting completion modes. Second best is just ignoring the cancel
> request because we obviously already fulfilled the request of completing
> the job (the completion mode might be different, though).
>
> With force=true, arguably still letting the job fail is correct.
> However, letting it fail involves more than just letting the transaction
> fail. We would have to call job_update_rc() as well so that instead of
> reporting success for the job, ECANCELED is returned and the job
> transitions to JOB_STATUS_ABORTING (after which job_is_completed()
> returns true).
>
> So, just bugs to be fixed.

Yep.  The question is, when/where; in this series or later?

> After this, I think we could even assert(job->ret != 0 ||
> job->status == JOB_STATUS_PENDING) in job_completed_txn_abort().
> ret == 0 can only happen when called from job_do_finalize(), when the
> job is only failing because other jobs in the same transaction have
> failed in their .prepare callback.

Sounds right.

>> Second, job_complete() refuses to let a job complete that has been
>> cancelled.  This function is basically only invoked by the user
>> (through qmp_block_job_complete()/qmp_job_complete(), or
>> job_complete_sync(), which comes from qemu-img), so I believe that it
>> should correspond to the external interface we have right now; i.e.,
>> if the user has invoked (block-)job-cancel at one point,
>> job_complete() should generally return an error.
> True. But it should also return an error if the user has invoked
> job-complete at some point. The distinction between complete and
> non-force cancel doesn't make sense there.

Yes, that’s true, it’s just that having double complete be a failure 
would be a change in behavior, and it would require another patch. Which 
is why I didn’t do it.

> And cancelling with force=false should fail, too, when either job-cancel
> or job-complete was called for the job before.

Yes.  At least for force=true, force=false is just a no-op, I believe.  
(.force_cancel is never reset to false.)

I’d like to defer this for the design series that Vladimir is planning 
to write, though.

>>> job_complete() is the only function outside of mirror that seems to use
>>> it. But even there, it feels wrong to make a difference. Either we
>>> accept redundant completion requests, or we don't. It doesn't really
>>> matter how the job reconfigures the graph on completion. (Also, I feel
>>> this should really have been part of the state machine, but I'm not sure
>>> if we want to touch it now...)
>> Well, yes, I don’t think it makes a difference because I don’t think
>> anyone will first tell the job via block-job-cancel to complete
>> without pivoting, and then change their mind and call
>> block-job-complete after all.  (Not least because that’s an error
>> pre-series.)
> Right, I'm just arguing that we shouldn't allow the opposite order
> either. Currently I think we do, and it's buggy, as explained above.

I agree.

>> Also, I’m not even sure whether completing after a soft cancel request
>> works.  I don’t think any of our code accounts for such a case, so I’d
>> rather avoid allowing it if there’s no need to allow it anyway.
> Yes, definitely avoid it. We should allow only one completion request
> (be it with job-complete or block-job-cancel) and return an error for
> all future completion requests for the same job.
>
> We could in theory keep allowing redundant completion requests when the
> completion mode doesn't conflict, but I don't see the point of that.

OK.  I personally think this need not be part of this series, though, so 
I’d like to defer it for now. O:)

(And since Vladimir is planning on turning soft cancel into a completion 
mode, I think we’re unlikely to forget about the problem.)

> Unless libvirt can actually issue multiple completion requests (again,
> this includes both (block-)job-complete and non-force block-job-cancel
> for mirror) for the same block job - Peter, I hope it doesn't?
>
> Kevin
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-08-04 14:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
2021-07-26 15:26   ` Eric Blake
2021-07-27 11:41   ` Vladimir Sementsov-Ogievskiy
2021-07-26 14:46 ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
2021-08-03 14:00   ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{,_all}() Kevin Wolf
2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
2021-07-26 21:38   ` Eric Blake
2021-07-27 12:40   ` Vladimir Sementsov-Ogievskiy
2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
2021-07-27 15:39     ` Max Reitz
2021-07-27 15:47       ` Vladimir Sementsov-Ogievskiy
2021-08-02 10:23         ` Max Reitz
2021-08-03 12:35           ` Vladimir Sementsov-Ogievskiy
2021-08-03 14:25   ` Kevin Wolf
2021-08-04  8:07     ` Max Reitz
2021-08-04 10:34       ` Kevin Wolf
2021-08-04 11:00         ` Peter Krempa
2021-08-04 14:15         ` Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
2021-07-27 15:40     ` Max Reitz
2021-08-03 14:34   ` Kevin Wolf
2021-08-04  8:25     ` Max Reitz
2021-08-04  9:48       ` Kevin Wolf
2021-08-04 10:12         ` Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test Max Reitz
2021-07-27 13:24   ` Vladimir Sementsov-Ogievskiy
2021-08-03 14:36 ` [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).