[re-adding the original CCs, why not]

On 13.06.19 16:30, Vladimir Sementsov-Ogievskiy wrote:
> 13.06.2019 17:21, Max Reitz wrote:
>> On 13.06.19 16:19, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.06.2019 1:08, Max Reitz wrote:
>>>> If the main loop cancels all block jobs while the block layer is not
>>>> drained, this cancelling may not happen instantaneously.  We can start a
>>>> drained section before vm_shutdown(), which entails another
>>>> bdrv_drain_all(); this nested bdrv_drain_all() will thus be a no-op,
>>>> basically.
>>>>
>>>> We do not have to end the drained section, because we actually do not
>>>> want any requests to happen from this point on.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>> I don't know whether it actually makes sense to never end this drained
>>>> section.  It makes sense to me.  Please correct me if I'm wrong.
>>>> ---
>>>>    vl.c | 11 +++++++++++
>>>>    1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/vl.c b/vl.c
>>>> index cd1fbc4cdc..3f8b3f74f5 100644
>>>> --- a/vl.c
>>>> +++ b/vl.c
>>>> @@ -4538,6 +4538,17 @@ int main(int argc, char **argv, char **envp)
>>>>         */
>>>>        migration_shutdown();
>>>>    
>>>> +    /*
>>>> +     * We must cancel all block jobs while the block layer is drained,
>>>> +     * or cancelling will be affected by throttling and thus may block
>>>> +     * for an extended period of time.
>>>> +     * vm_shutdown() will bdrv_drain_all(), so we may as well include
>>>> +     * it in the drained section.
>>>> +     * We do not need to end this section, because we do not want any
>>>> +     * requests happening from here on anyway.
>>>> +     */
>>>> +    bdrv_drain_all_begin();
>>>> +
>>>>        /* No more vcpu or device emulation activity beyond this point */
>>>>        vm_shutdown();
>>>>    
>>>>
>>>
>>> So, actually, the problem is that we may wait for job requests twice:
>>> on drain and then on cancel.
>>
>> We don’t wait on drain.  When the throttle node is drained, it will
>> ignore throttling (as noted in the cover letter).
>>
>> We do wait when cancelling a job while the throttle node isn’t drained,
>> though.  That’s the problem.
> 
> Ah, understand now.
> 
> Is it safe to drain_begin before stopping cpus? We may finish up then with some queued
> somewhere IO requests..

Hm...  Aren’t guest devices prohibited from issuing requests to the
block layer while their respective block device is drained?

Otherwise, I suppose I’ll have to move the bdrv_drain_all_begin() below
the vm_shutdown().  That wouldn’t be too big of a problem.

Max