All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	John Snow <jsnow@redhat.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: [PATCH for-6.1? 0/6] mirror: Handle errors after READY cancel
Date: Thu, 22 Jul 2021 14:26:21 +0200	[thread overview]
Message-ID: <20210722122627.29605-1-mreitz@redhat.com> (raw)

Hi,

This is a rather complex series with changes that aren’t exactly local
to the mirror job, so maybe it’s too complex for 6.1.

However, it is a bug fix, and not an insignificant one, though probably
not a regression of any kind.

Bug report:
https://gitlab.com/qemu-project/qemu/-/issues/462

(I didn’t put any “Fixes:” or “Resolves:” into the commit messages,
because there is no single patch here that fixes the bug.)

The root of the problem is that if you cancel a mirror job during its
READY phase, any kind of I/O error (with the error action 'stop') is
likely to not be handled gracefully, which means that perhaps the job
will just loop forever without pausing, doing nothing but emitting
errors.  There is no way to stop the job, or cancel it with force=true,
and so you also cannot quit qemu normally, because, well, cancelling the
job doesn’t do anything.  So you have to kill qemu to stop the mess.

If you’re lucky, the error is transient.  Then qemu will just kill
itself with a failed assertion, because it’ll try a READY -> READY
transition, which isn’t allowed.

There are a couple of problems contributing to it all:

(1) The READY -> READY transition comes from the fact that we will enter
    the READY state whenever source and target are synced, and whenever
    s->synced is false.  I/O errors reset s->synced.  I believe they
    shouldn’t.
    (Patch 1)

(2) Quitting qemu doesn’t force-cancel jobs.  I don’t understand why.
    If for all jobs but mirror we want them to be cancelled and not
    properly completed, why do we want mirror to get a consistent
    result?  (Which is what cancel with force=false gives you.)
    I believe we actually don’t care, and so on many occasions where we
    invoke job_cancel_sync() and job_cancel_sync_all(), we want to
    force-cancel the job(s) in question.
    (Patch 2)

(3) Cancelling mirror post-READY with force=false is actually not really
    cancelling the job.  It’s a different completion mode.  The job
    should run like any normal job, it shouldn’t be special-cased.
    However, we have a couple of places that special-case cancelled job
    because they believe that such jobs are on their way to definite
    termination.  For example, we don’t allow pausing cancelled jobs.
    We definitely do want to allow pausing a mirror post-READY job that
    is being non-force-cancelled.  The job may still take an arbitrary
    amount of time, so it absolutely should be pausable.
    (Patches 3, 4)

(4) Mirror only checks whether it’s been force-cancelled at the bottom
    of its main loop, after several `continue`s.  Therefore, if flushing
    fails (and it then `continue`s), that check will be skipped.  If
    flushing fails continuously, the job cannot be force-cancelled.
    (Patch 5)


Max Reitz (6):
  mirror: Keep s->synced on error
  job: @force parameter for job_cancel_sync{,_all}()
  jobs: Give Job.force_cancel more meaning
  job: Add job_cancel_requested()
  mirror: Check job_is_cancelled() earlier
  iotests: Add mirror-ready-cancel-error test

 include/qemu/job.h                            |  29 +++-
 block/backup.c                                |   3 +-
 block/mirror.c                                |  35 +++--
 block/replication.c                           |   4 +-
 blockdev.c                                    |   4 +-
 job.c                                         |  46 ++++--
 qemu-nbd.c                                    |   2 +-
 softmmu/runstate.c                            |   2 +-
 storage-daemon/qemu-storage-daemon.c          |   2 +-
 tests/unit/test-block-iothread.c              |   2 +-
 tests/unit/test-blockjob.c                    |   2 +-
 tests/qemu-iotests/109.out                    |  60 +++-----
 .../tests/mirror-ready-cancel-error           | 143 ++++++++++++++++++
 .../tests/mirror-ready-cancel-error.out       |   5 +
 tests/qemu-iotests/tests/qsd-jobs.out         |   2 +-
 15 files changed, 262 insertions(+), 79 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-ready-cancel-error
 create mode 100644 tests/qemu-iotests/tests/mirror-ready-cancel-error.out

-- 
2.31.1



             reply	other threads:[~2021-07-22 12:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22 12:26 Max Reitz [this message]
2021-07-22 12:26 ` [PATCH for-6.1? 1/6] mirror: Keep s->synced on error Max Reitz
2021-07-22 16:25   ` Vladimir Sementsov-Ogievskiy
2021-07-26  7:11     ` Max Reitz
2021-07-26 10:24   ` Vladimir Sementsov-Ogievskiy
2021-07-22 12:26 ` [PATCH for-6.1? 2/6] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
2021-07-22 17:00   ` [PATCH for-6.1? 2/6] job: @force parameter for job_cancel_sync{,_all}() Vladimir Sementsov-Ogievskiy
2021-07-22 12:26 ` [PATCH for-6.1? 3/6] jobs: Give Job.force_cancel more meaning Max Reitz
2021-07-22 18:16   ` Vladimir Sementsov-Ogievskiy
2021-07-22 12:26 ` [PATCH for-6.1? 4/6] job: Add job_cancel_requested() Max Reitz
2021-07-22 17:58   ` Vladimir Sementsov-Ogievskiy
2021-07-26  7:09     ` Max Reitz
2021-07-27 14:47       ` Vladimir Sementsov-Ogievskiy
2021-07-27 15:30         ` Max Reitz
2021-07-22 12:26 ` [PATCH for-6.1? 5/6] mirror: Check job_is_cancelled() earlier Max Reitz
2021-07-22 12:26 ` [PATCH for-6.1? 6/6] iotests: Add mirror-ready-cancel-error test Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210722122627.29605-1-mreitz@redhat.com \
    --to=mreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.