All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
	Dongli Zhang <dongli.zhang@oracle.com>,
	James Smart <james.smart@broadcom.com>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	"James E . J . Bottomley" <jejb@linux.vnet.ibm.com>,
	jianchao wang <jianchao.w.wang@oracle.com>
Subject: Re: [PATCH V5 1/9] blk-mq: grab .q_usage_counter when queuing request from plug code path
Date: Fri, 12 Apr 2019 12:55:11 +0200	[thread overview]
Message-ID: <e6050974-08a9-eb60-88ac-a52c9c05dc66@suse.de> (raw)
In-Reply-To: <20190412033032.10418-2-ming.lei@redhat.com>

On 4/12/19 5:30 AM, Ming Lei wrote:
> Just like aio/io_uring, we need to grab 2 refcount for queuing one
> request, one is for submission, another is for completion.
> 
> If the request isn't queued from plug code path, the refcount grabbed
> in generic_make_request() serves for submission. In theroy, this
> refcount should have been released after the sumission(async run queue)
> is done. blk_freeze_queue() works with blk_sync_queue() together
> for avoiding race between cleanup queue and IO submission, given async
> run queue activities are canceled because hctx->run_work is scheduled with
> the refcount held, so it is fine to not hold the refcount when
> running the run queue work function for dispatch IO.
> 
> However, if request is staggered into plug list, and finally queued
> from plug code path, the refcount in submission side is actually missed.
> And we may start to run queue after queue is removed because the queue's
> kobject refcount isn't guaranteed to be grabbed in flushing plug list
> context, then kernel oops is triggered, see the following race:
> 
> blk_mq_flush_plug_list():
>          blk_mq_sched_insert_requests()
>                  insert requests to sw queue or scheduler queue
>                  blk_mq_run_hw_queue
> 
> Because of concurrent run queue, all requests inserted above may be
> completed before calling the above blk_mq_run_hw_queue. Then queue can
> be freed during the above blk_mq_run_hw_queue().
> 
> Fixes the issue by grab .q_usage_counter before calling
> blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
> safe because the queue is absolutely alive before inserting request.
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: linux-scsi@vger.kernel.org,
> Cc: Martin K . Petersen <martin.petersen@oracle.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
> Cc: jianchao wang <jianchao.w.wang@oracle.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3ff3d7b49969..5b586affee09 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1728,9 +1728,12 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   		if (rq->mq_hctx != this_hctx || rq->mq_ctx != this_ctx) {
>   			if (this_hctx) {
>   				trace_block_unplug(this_q, depth, !from_schedule);
> +
> +				percpu_ref_get(&this_q->q_usage_counter);
>   				blk_mq_sched_insert_requests(this_hctx, this_ctx,
>   								&rq_list,
>   								from_schedule);
> +				percpu_ref_put(&this_q->q_usage_counter);
>   			}
>   
>   			this_q = rq->q;
> @@ -1749,8 +1752,11 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>   	 */
>   	if (this_hctx) {
>   		trace_block_unplug(this_q, depth, !from_schedule);
> +
> +		percpu_ref_get(&this_q->q_usage_counter);
>   		blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list,
>   						from_schedule);
> +		percpu_ref_put(&this_q->q_usage_counter);
>   	}
>   }
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

  parent reply	other threads:[~2019-04-12 10:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12  3:30 [PATCH V5 0/9] blk-mq: fix races related with freeing queue Ming Lei
2019-04-12  3:30 ` [PATCH V5 1/9] blk-mq: grab .q_usage_counter when queuing request from plug code path Ming Lei
2019-04-12  8:20   ` Johannes Thumshirn
2019-04-12 10:55   ` Hannes Reinecke [this message]
2019-04-12  3:30 ` [PATCH V5 2/9] blk-mq: move cancel of requeue_work into blk_mq_release Ming Lei
2019-04-12  8:23   ` Johannes Thumshirn
2019-04-12  3:30 ` [PATCH V5 3/9] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
2019-04-12 11:03   ` Hannes Reinecke
2019-04-13  7:18     ` Ming Lei
2019-04-12  3:30 ` [PATCH V5 4/9] blk-mq: move all hctx alloction & initialization into __blk_mq_alloc_and_init_hctx Ming Lei
2019-04-12 11:04   ` Hannes Reinecke
2019-04-12  3:30 ` [PATCH V5 5/9] blk-mq: split blk_mq_alloc_and_init_hctx into two parts Ming Lei
2019-04-12 11:04   ` Hannes Reinecke
2019-04-12  3:30 ` [PATCH V5 6/9] blk-mq: always free hctx after request queue is freed Ming Lei
2019-04-12 11:06   ` Hannes Reinecke
2019-04-13  7:27     ` Ming Lei
2019-04-12  3:30 ` [PATCH V5 7/9] blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release Ming Lei
2019-04-12 11:08   ` Hannes Reinecke
2019-04-12  3:30 ` [PATCH V5 8/9] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
2019-04-12 11:09   ` Hannes Reinecke
2019-04-12  3:30 ` [PATCH V5 9/9] SCSI: don't hold device refcount in IO path Ming Lei
2019-04-12 11:09   ` Hannes Reinecke
2019-04-13  0:04   ` Martin K. Petersen
2019-04-13  6:56     ` Ming Lei
2019-04-13  9:23       ` Ming Lei
2019-04-16  2:12       ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6050974-08a9-eb60-88ac-a52c9c05dc66@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@wdc.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jianchao.w.wang@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.