Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive

From: Bart Van Assche <bvanassche@acm.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V10 07/11] blk-mq: stop to handle IO and drain IO before hctx becomes inactive
Date: Sun, 10 May 2020 20:20:24 -0700	[thread overview]
Message-ID: <8ef5352b-a1bb-a3c1-3ad2-696df6e86f1f@acm.org> (raw)
In-Reply-To: <20200511014538.GB1418834@T590>

On 2020-05-10 18:45, Ming Lei wrote:
> On Sat, May 09, 2020 at 07:18:46AM -0700, Bart Van Assche wrote:
>> On 2020-05-08 21:10, Ming Lei wrote:
>>> queue freezing can only be applied on the request queue level, and not
>>> hctx level. When requests can't be completed, wait freezing just hangs
>>> for-ever.
>>
>> That's indeed what I meant: freeze the entire queue instead of
>> introducing a new mechanism that freezes only one hardware queue at a time.
> 
> No, the issue is exactly that one single hctx becomes inactive, and
> other hctx are still active and workable.
> 
> If one entire queue is frozen because of some of CPUs are offline, how
> can userspace submit IO to this disk? You suggestion justs makes the
> disk not usable, that won't be accepted.

What I meant is to freeze a request queue temporarily (until hot
unplugging of a CPU has finished). I would never suggest to freeze a
request queue forever and I think that you already knew that.

>> Please clarify what "when requests can't be completed" means. Are you
>> referring to requests that take longer than expected due to e.g. a
>> controller lockup or to requests that take a long time intentionally?
> 
> If all CPUs in one hctx->cpumask are offline, the managed irq of this hw
> queue will be shutdown by genirq code, so any in-flight IO won't be
> completed or timedout after the managed irq is shutdown because of cpu
> offline.
> 
> Some drivers may implement timeout handler, so these in-flight requests
> will be timed out, but still not friendly behaviour given the default
> timeout is too long.
> 
> Some drivers don't implement timeout handler at all, so these IO won't
> be completed.

I think that the block layer needs to be notified after the decision has
been taken to offline a CPU and before the interrupts associated with
that CPU are disabled. That would allow the block layer to freeze a
request queue without triggering any timeouts (ignoring block driver and
hardware bugs). I'm not familiar with CPU hotplugging so I don't know
whether or not such a mechanism already exists.

>> The former case is handled by the block layer timeout handler. I propose
>> to handle the latter case by introducing a new callback function pointer
>> in struct blk_mq_ops that aborts all outstanding requests.
> 
> As I mentioned, timeout isn't a friendly behavior. Or not every driver
> implements timeout handler or well enough.

What I propose is to fix those block drivers instead of complicating the
block layer core further and instead of introducing potential deadlocks
in the block layer core.

>> Request queue
>> freezing is such an important block layer mechanism that I think we
>> should require that all block drivers support freezing a request queue
>> in a short time.
> 
> Firstly, we just need to drain in-flight requests and re-submit queued
> requests from one single hctx, and queue wide freezing causes whole
> userspace IOs blocked unnecessarily.

Freezing a request queue for a short time is acceptable. As you know we
already do that when the queue depth is modified, when the write-back
throttling latency is modified and also when the I/O scheduler is changed.

> Secondly, some requests may not be completed at all, so freezing can't
> work because freeze_wait may hang forever.

If a request neither can be aborted nor completes then that's a severe
bug in the block driver that submitted the request to the block device.

Bart.