qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, libvir-list@redhat.com,
	qemu-devel@nongnu.org, pkrempa@redhat.com,
	John Snow <jsnow@redhat.com>
Subject: Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()
Date: Wed, 4 Aug 2021 16:15:03 +0200	[thread overview]
Message-ID: <ea243456-53c7-9642-529a-20628de4661a@redhat.com> (raw)
In-Reply-To: <YQptN0CLpV5MN5kb@redhat.com>

On 04.08.21 12:34, Kevin Wolf wrote:
> [ Peter, the question for you is at the end. ]
>
> Am 04.08.2021 um 10:07 hat Max Reitz geschrieben:
>> On 03.08.21 16:25, Kevin Wolf wrote:
>>> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>>>> Most callers of job_is_cancelled() actually want to know whether the job
>>>> is on its way to immediate termination.  For example, we refuse to pause
>>>> jobs that are cancelled; but this only makes sense for jobs that are
>>>> really actually cancelled.
>>>>
>>>> A mirror job that is cancelled during READY with force=false should
>>>> absolutely be allowed to pause.  This "cancellation" (which is actually
>>>> a kind of completion) may take an indefinite amount of time, and so
>>>> should behave like any job during normal operation.  For example, with
>>>> on-target-error=stop, the job should stop on write errors.  (In
>>>> contrast, force-cancelled jobs should not get write errors, as they
>>>> should just terminate and not do further I/O.)
>>>>
>>>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>>>> are force-cancelled (which as of HEAD^ means any job that interprets the
>>>> cancellation request as a request for immediate termination), and add
>>>> job_cancel_request() as the general variant, which returns true for any
>>>> jobs which have been requested to be cancelled, whether it be
>>>> immediately or after an arbitrarily long completion phase.
>>>>
>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    include/qemu/job.h |  8 +++++++-
>>>>    block/mirror.c     | 10 ++++------
>>>>    job.c              |  7 ++++++-
>>>>    3 files changed, 17 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>>> index 8aa90f7395..032edf3c5f 100644
>>>> --- a/include/qemu/job.h
>>>> +++ b/include/qemu/job.h
>>>> @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
>>>>    /** Returns true if the job should not be visible to the management layer. */
>>>>    bool job_is_internal(Job *job);
>>>> -/** Returns whether the job is scheduled for cancellation. */
>>>> +/** Returns whether the job is being cancelled. */
>>>>    bool job_is_cancelled(Job *job);
>>>> +/**
>>>> + * Returns whether the job is scheduled for cancellation (at an
>>>> + * indefinite point).
>>>> + */
>>>> +bool job_cancel_requested(Job *job);
>>> I don't think non-force blockdev-cancel for mirror should actually be
>>> considered cancellation, so what is the question that this function
>>> answers?
>>>
>>> "Is this a cancelled job, or a mirror block job that is supposed to
>>> complete soon, but only if it doesn't switch over the users to the
>>> target on completion"?
>> Well, technically yes, but it was more intended as “Has the user ever
>> invoked (block-)job-cancel on this job?”.
> I understand this, but is this much more useful to know than "Has the
> user ever called HMP 'change'?", if you know what I mean?

Hm.  Not really.  It’s still a crutch that shouldn’t be there ideally.

But I like this crutch for this series so I can get this batch done, and 
then worry about all the other bugs that keep popping up (and where 
job_cancel_requested() is a nice sign that something’s off).

>>> Is this ever a reasonable question to ask, except maybe inside the
>>> mirror implementation itself?
>> I asked myself the same for v3, but found two places in job.c where I
>> would like to keep it:
>>
>> First, there’s an assertion in job_completed_txn_abort().  All jobs
>> other than @job have been force-cancelled, and so job_is_cancelled()
>> would be true for them.  As for @job itself, the function is mostly
>> called when the job’s return value is not 0, but a soft-cancelled
>> mirror does have a return value of 0 and so would not end up in that
>> function.
>> But job_cancel() invokes job_completed_txn_abort() if the job has been
>> deferred to the main loop, which mostly correlates with the job having
>> been completed (in which case the assertion is skipped), but not 100 %
>> (there’s a small window between setting deferred_to_main_loop and the
>> job changing to a completed state).
>> So I’d prefer to keep the assertion as-is functionally, i.e. to only
>> check job->cancelled.
> Well, you don't. It's still job_is_cancelled() after this patch.

No: I didn’t. O:)

For v3, I had absolutely planned to use job_cancel_requested(), and I 
wanted to put the above explanation into the commit message.

> So the scenario you're concerned about is a job that has just finished
> successfully (job->ret = 0) and then gets a cancel request?

Yes.

> With force=false, I'm pretty sure the code is wrong anyway because
> calling job_completed_txn_abort() is not the right response.

Absolutely possible, I just didn’t want to deal with this, too… :/

> It should
> return an error because you're trying to complete twice, possibly with
> conflicting completion modes. Second best is just ignoring the cancel
> request because we obviously already fulfilled the request of completing
> the job (the completion mode might be different, though).
>
> With force=true, arguably still letting the job fail is correct.
> However, letting it fail involves more than just letting the transaction
> fail. We would have to call job_update_rc() as well so that instead of
> reporting success for the job, ECANCELED is returned and the job
> transitions to JOB_STATUS_ABORTING (after which job_is_completed()
> returns true).
>
> So, just bugs to be fixed.

Yep.  The question is, when/where; in this series or later?

> After this, I think we could even assert(job->ret != 0 ||
> job->status == JOB_STATUS_PENDING) in job_completed_txn_abort().
> ret == 0 can only happen when called from job_do_finalize(), when the
> job is only failing because other jobs in the same transaction have
> failed in their .prepare callback.

Sounds right.

>> Second, job_complete() refuses to let a job complete that has been
>> cancelled.  This function is basically only invoked by the user
>> (through qmp_block_job_complete()/qmp_job_complete(), or
>> job_complete_sync(), which comes from qemu-img), so I believe that it
>> should correspond to the external interface we have right now; i.e.,
>> if the user has invoked (block-)job-cancel at one point,
>> job_complete() should generally return an error.
> True. But it should also return an error if the user has invoked
> job-complete at some point. The distinction between complete and
> non-force cancel doesn't make sense there.

Yes, that’s true, it’s just that having double complete be a failure 
would be a change in behavior, and it would require another patch. Which 
is why I didn’t do it.

> And cancelling with force=false should fail, too, when either job-cancel
> or job-complete was called for the job before.

Yes.  At least for force=true, force=false is just a no-op, I believe.  
(.force_cancel is never reset to false.)

I’d like to defer this for the design series that Vladimir is planning 
to write, though.

>>> job_complete() is the only function outside of mirror that seems to use
>>> it. But even there, it feels wrong to make a difference. Either we
>>> accept redundant completion requests, or we don't. It doesn't really
>>> matter how the job reconfigures the graph on completion. (Also, I feel
>>> this should really have been part of the state machine, but I'm not sure
>>> if we want to touch it now...)
>> Well, yes, I don’t think it makes a difference because I don’t think
>> anyone will first tell the job via block-job-cancel to complete
>> without pivoting, and then change their mind and call
>> block-job-complete after all.  (Not least because that’s an error
>> pre-series.)
> Right, I'm just arguing that we shouldn't allow the opposite order
> either. Currently I think we do, and it's buggy, as explained above.

I agree.

>> Also, I’m not even sure whether completing after a soft cancel request
>> works.  I don’t think any of our code accounts for such a case, so I’d
>> rather avoid allowing it if there’s no need to allow it anyway.
> Yes, definitely avoid it. We should allow only one completion request
> (be it with job-complete or block-job-cancel) and return an error for
> all future completion requests for the same job.
>
> We could in theory keep allowing redundant completion requests when the
> completion mode doesn't conflict, but I don't see the point of that.

OK.  I personally think this need not be part of this series, though, so 
I’d like to defer it for now. O:)

(And since Vladimir is planning on turning soft cancel into a completion 
mode, I think we’re unlikely to forget about the problem.)

> Unless libvirt can actually issue multiple completion requests (again,
> this includes both (block-)job-complete and non-force block-job-cancel
> for mirror) for the same block job - Peter, I hope it doesn't?
>
> Kevin
>



  parent reply	other threads:[~2021-08-04 14:16 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-26 14:46 [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 1/7] mirror: Keep s->synced on error Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 2/7] mirror: Drop s->synced Max Reitz
2021-07-26 15:26   ` Eric Blake
2021-07-27 11:41   ` Vladimir Sementsov-Ogievskiy
2021-07-26 14:46 ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{, _all}() Max Reitz
2021-08-03 14:00   ` [PATCH for-6.1? v2 3/7] job: @force parameter for job_cancel_sync{,_all}() Kevin Wolf
2021-07-26 14:46 ` [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning Max Reitz
2021-07-26 21:38   ` Eric Blake
2021-07-27 12:40   ` Vladimir Sementsov-Ogievskiy
2021-07-26 14:46 ` [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested() Max Reitz
2021-07-27 13:04   ` Vladimir Sementsov-Ogievskiy
2021-07-27 15:39     ` Max Reitz
2021-07-27 15:47       ` Vladimir Sementsov-Ogievskiy
2021-08-02 10:23         ` Max Reitz
2021-08-03 12:35           ` Vladimir Sementsov-Ogievskiy
2021-08-03 14:25   ` Kevin Wolf
2021-08-04  8:07     ` Max Reitz
2021-08-04 10:34       ` Kevin Wolf
2021-08-04 11:00         ` Peter Krempa
2021-08-04 14:15         ` Max Reitz [this message]
2021-07-26 14:46 ` [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier Max Reitz
2021-07-27 13:13   ` Vladimir Sementsov-Ogievskiy
2021-07-27 15:40     ` Max Reitz
2021-08-03 14:34   ` Kevin Wolf
2021-08-04  8:25     ` Max Reitz
2021-08-04  9:48       ` Kevin Wolf
2021-08-04 10:12         ` Max Reitz
2021-07-26 14:46 ` [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test Max Reitz
2021-07-27 13:24   ` Vladimir Sementsov-Ogievskiy
2021-08-03 14:36 ` [PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea243456-53c7-9642-529a-20628de4661a@redhat.com \
    --to=mreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=pkrempa@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).