Re: [PATCH V10 11/11] block: deactivate hctx when the hctx is actually inactive

From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH V10 11/11] block: deactivate hctx when the hctx is actually inactive
Date: Tue, 12 May 2020 09:43:58 +0800	[thread overview]
Message-ID: <20200512014358.GB1531898@T590> (raw)
In-Reply-To: <c4d78e75-91e1-1521-1ba0-d30bf3716f83@acm.org>

On Mon, May 11, 2020 at 01:52:14PM -0700, Bart Van Assche wrote:
> On 2020-05-10 21:08, Ming Lei wrote:
> > OK, just forgot the whole story, but the issue can be fixed quite easily
> > by adding a new request allocation flag in slow path, see the following
> > patch:
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index ec50d7e6be21..d743be1b45a2 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -418,6 +418,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> >  		if (success)
> >  			return 0;
> >  
> > +		if (flags & BLK_MQ_REQ_FORCE) {
> > +			percpu_ref_get(ref);
> > +			return 0;
> > +		}
> > +
> >  		if (flags & BLK_MQ_REQ_NOWAIT)
> >  			return -EBUSY;
> >  
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index c2ea0a6e5b56..2816886d0bea 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -448,6 +448,13 @@ enum {
> >  	BLK_MQ_REQ_INTERNAL	= (__force blk_mq_req_flags_t)(1 << 2),
> >  	/* set RQF_PREEMPT */
> >  	BLK_MQ_REQ_PREEMPT	= (__force blk_mq_req_flags_t)(1 << 3),
> > +
> > +	/*
> > +	 * force to allocate request and caller has to make sure queue
> > +	 * won't be forzen completely during allocation, and this flag
> > +	 * is only applied after queue freeze is started
> > +	 */
> > +	BLK_MQ_REQ_FORCE	= (__force blk_mq_req_flags_t)(1 << 4),
> >  };
> >  
> >  struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> 
> I'm not sure that introducing such a flag is a good idea. After
> blk_mq_freeze_queue() has made it clear that a request queue must be

BLK_MQ_REQ_FORCE is only allowed to be used when caller guarantees that the
queue's freezing isn't done, here because there is pending request which
can't be completed. So blk_mq_freeze_queue() won't be done if there is one
request allocated with BLK_MQ_REQ_FORCE. And once blk_mq_freeze_queue() is
done, there can't be such allocation request any more.

> frozen and before the request queue is really frozen, an RCU grace
> period must expire. Otherwise it cannot be guaranteed that the intention
> to freeze a request queue (by calling percpu_ref_kill()) has been
> observed by all potential blk_queue_enter() callers (blk_queue_enter()
> calls percpu_ref_tryget_live()). Not introducing any new race conditions
> would either require to introduce an smp_mb() call in blk_queue_enter()

percpu_ref_tryget_live() returns false when the percpu refcount is dead
or atomic_read() returns zero in case of atomic mode.

BLK_MQ_REQ_FORCE is only allowed to be used when caller guarantees that the
queue's freezing isn't done. So if the refcount is in percpu mode,
true is always returned, otherwise false is always returned. So there
isn't any race. And no any smp_mb() is required too cause no any new
global memory RW is introduced by this patch.

> or to let another RCU grace period expire after the last allocation of a
> request with BLK_MQ_REQ_FORCE and before the request queue is really frozen.

Not at all. As I explained, allocation with BLK_MQ_REQ_FORCE is always
called when the actual percpu refcount is > 1 because we are quite clear
that there is pending request which can't be completed and the pending
request relies on the new allocation with BLK_MQ_REQ_FORCE, so after the
new request is allocated, blk_mq_freeze_queue_wait() will wait until
the actual percpu refcount becomes 0.

Thanks,
Ming