[re-adding the original CCs, why not] On 13.06.19 16:30, Vladimir Sementsov-Ogievskiy wrote: > 13.06.2019 17:21, Max Reitz wrote: >> On 13.06.19 16:19, Vladimir Sementsov-Ogievskiy wrote: >>> 13.06.2019 1:08, Max Reitz wrote: >>>> If the main loop cancels all block jobs while the block layer is not >>>> drained, this cancelling may not happen instantaneously. We can start a >>>> drained section before vm_shutdown(), which entails another >>>> bdrv_drain_all(); this nested bdrv_drain_all() will thus be a no-op, >>>> basically. >>>> >>>> We do not have to end the drained section, because we actually do not >>>> want any requests to happen from this point on. >>>> >>>> Signed-off-by: Max Reitz >>>> --- >>>> I don't know whether it actually makes sense to never end this drained >>>> section. It makes sense to me. Please correct me if I'm wrong. >>>> --- >>>> vl.c | 11 +++++++++++ >>>> 1 file changed, 11 insertions(+) >>>> >>>> diff --git a/vl.c b/vl.c >>>> index cd1fbc4cdc..3f8b3f74f5 100644 >>>> --- a/vl.c >>>> +++ b/vl.c >>>> @@ -4538,6 +4538,17 @@ int main(int argc, char **argv, char **envp) >>>> */ >>>> migration_shutdown(); >>>> >>>> + /* >>>> + * We must cancel all block jobs while the block layer is drained, >>>> + * or cancelling will be affected by throttling and thus may block >>>> + * for an extended period of time. >>>> + * vm_shutdown() will bdrv_drain_all(), so we may as well include >>>> + * it in the drained section. >>>> + * We do not need to end this section, because we do not want any >>>> + * requests happening from here on anyway. >>>> + */ >>>> + bdrv_drain_all_begin(); >>>> + >>>> /* No more vcpu or device emulation activity beyond this point */ >>>> vm_shutdown(); >>>> >>>> >>> >>> So, actually, the problem is that we may wait for job requests twice: >>> on drain and then on cancel. >> >> We don’t wait on drain. When the throttle node is drained, it will >> ignore throttling (as noted in the cover letter). >> >> We do wait when cancelling a job while the throttle node isn’t drained, >> though. That’s the problem. > > Ah, understand now. > > Is it safe to drain_begin before stopping cpus? We may finish up then with some queued > somewhere IO requests.. Hm... Aren’t guest devices prohibited from issuing requests to the block layer while their respective block device is drained? Otherwise, I suppose I’ll have to move the bdrv_drain_all_begin() below the vm_shutdown(). That wouldn’t be too big of a problem. Max