linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/3] blk-mq: allow to run queue if queue refcount is held
@ 2019-04-01  4:42 Ming Lei
  2019-04-01  4:42 ` [PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ming Lei @ 2019-04-01  4:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Dongli Zhang, James Smart,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	Christoph Hellwig, James E . J . Bottomley, jianchao wang

Hi,

Since 45a9c9d909b2 ("blk-mq: Fix a use-after-free"), run queue isn't
allowed during cleanup queue even though queue refcount is held.

This change has caused lots of kernel oops triggered in run queue path,
turns out it isn't easy to fix them all.

So move freeing of hw queue resources into hctx's release handler, then
the above issue is fixed. Meantime, this way is safe given freeing hw
queue resource doesn't require tags.


V2:
	- moving freeing hw queue resources into hctx's release handler


Ming Lei (3):
  blk-mq: free hw queue's resource in hctx's release handler
  block: don't drain in-progress dispatch in blk_cleanup_queue()
  SCSI: don't grab queue usage counter before run queue

 block/blk-core.c        | 14 +-------------
 block/blk-mq-sysfs.c    |  6 ++++++
 block/blk-mq.c          |  8 ++------
 block/blk-mq.h          |  2 +-
 drivers/scsi/scsi_lib.c |  7 -------
 5 files changed, 10 insertions(+), 27 deletions(-)

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>

-- 
2.9.5


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler
  2019-04-01  4:42 [PATCH V2 0/3] blk-mq: allow to run queue if queue refcount is held Ming Lei
@ 2019-04-01  4:42 ` Ming Lei
  2019-04-01  4:42 ` [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
  2019-04-01  4:42 ` [PATCH V2 3/3] SCSI: don't grab queue usage counter before run queue Ming Lei
  2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-04-01  4:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Dongli Zhang, James Smart,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	Christoph Hellwig, James E . J . Bottomley, jianchao wang,
	stable

Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.

However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().

We have invented ways for addressing this kind of issue before, such as:

	8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
	c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")

But still can't cover all cases, recently James reports another such
kind of issue:

	https://marc.info/?l=linux-scsi&m=155389088124782&w=2

This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.

Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.

This approach follows typical design pattern wrt. kobject's release handler.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>
Reported-by: James Smart <james.smart@broadcom.com>
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c     | 2 +-
 block/blk-mq-sysfs.c | 6 ++++++
 block/blk-mq.c       | 8 ++------
 block/blk-mq.h       | 2 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4673ebe42255..b3bbf8a5110d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -375,7 +375,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	blk_exit_queue(q);
 
 	if (queue_is_mq(q))
-		blk_mq_free_queue(q);
+		blk_mq_exit_queue(q);
 
 	percpu_ref_exit(&q->q_usage_counter);
 
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 3f9c3f4ac44c..4040e62c3737 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -10,6 +10,7 @@
 #include <linux/smp.h>
 
 #include <linux/blk-mq.h>
+#include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
@@ -33,6 +34,11 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
 {
 	struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx,
 						  kobj);
+
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(hctx->srcu);
+	blk_free_flush_queue(hctx->fq);
+	sbitmap_free(&hctx->ctx_map);
 	free_cpumask_var(hctx->cpumask);
 	kfree(hctx->ctxs);
 	kfree(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 70b210a308c4..05d11149e75a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2243,12 +2243,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
-	if (hctx->flags & BLK_MQ_F_BLOCKING)
-		cleanup_srcu_struct(hctx->srcu);
-
 	blk_mq_remove_cpuhp(hctx);
-	blk_free_flush_queue(hctx->fq);
-	sbitmap_free(&hctx->ctx_map);
 }
 
 static void blk_mq_exit_hw_queues(struct request_queue *q,
@@ -2881,7 +2876,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 }
 EXPORT_SYMBOL(blk_mq_init_allocated_queue);
 
-void blk_mq_free_queue(struct request_queue *q)
+/* tags can _not_ be used after returning from blk_mq_exit_queue */
+void blk_mq_exit_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0ed8e5a8729f..0a5d0893dae4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,7 +37,7 @@ struct blk_mq_ctx {
 	struct kobject		kobj;
 } ____cacheline_aligned_in_smp;
 
-void blk_mq_free_queue(struct request_queue *q);
+void blk_mq_exit_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue()
  2019-04-01  4:42 [PATCH V2 0/3] blk-mq: allow to run queue if queue refcount is held Ming Lei
  2019-04-01  4:42 ` [PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
@ 2019-04-01  4:42 ` Ming Lei
  2019-04-01 15:14   ` Bart Van Assche
  2019-04-01  4:42 ` [PATCH V2 3/3] SCSI: don't grab queue usage counter before run queue Ming Lei
  2 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-04-01  4:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Dongli Zhang, James Smart,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	Christoph Hellwig, James E . J . Bottomley, jianchao wang

Now freeing hw queue resource is moved to hctx's release handler,
we don't need to worry about the race between blk_cleanup_queue and
run queue any more.

So don't drain in-progress dispatch in blk_cleanup_queue().

This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before
freeing queue").

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b3bbf8a5110d..491dc0295778 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -347,18 +347,6 @@ void blk_cleanup_queue(struct request_queue *q)
 
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
-	/*
-	 * make sure all in-progress dispatch are completed because
-	 * blk_freeze_queue() can only complete all requests, and
-	 * dispatch may still be in-progress since we dispatch requests
-	 * from more than one contexts.
-	 *
-	 * We rely on driver to deal with the race in case that queue
-	 * initialization isn't done.
-	 */
-	if (queue_is_mq(q) && blk_queue_init_done(q))
-		blk_mq_quiesce_queue(q);
-
 	/* for synchronous bio-based driver finish in-flight integrity i/o */
 	blk_flush_integrity();
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 3/3] SCSI: don't grab queue usage counter before run queue
  2019-04-01  4:42 [PATCH V2 0/3] blk-mq: allow to run queue if queue refcount is held Ming Lei
  2019-04-01  4:42 ` [PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
  2019-04-01  4:42 ` [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
@ 2019-04-01  4:42 ` Ming Lei
  2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-04-01  4:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Dongli Zhang, James Smart,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	Christoph Hellwig, James E . J . Bottomley, jianchao wang

Now freeing hw queue resource is moved to hctx's release handler,
we don't need to worry about the possible race between blk_cleanup_queue
and run queue.

So don't grab the queue usage counter before run queue in
scsi_end_request().

This is basically revert of 8dc765d438f1 ("SCSI: fix queue cleanup race before queue
initialization is done").

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 601b9f1de267..18bf341d1236 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -604,12 +604,6 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
 	 */
 	scsi_mq_uninit_cmd(cmd);
 
-	/*
-	 * queue is still alive, so grab the ref for preventing it
-	 * from being cleaned up during running queue.
-	 */
-	percpu_ref_get(&q->q_usage_counter);
-
 	__blk_mq_end_request(req, error);
 
 	if (scsi_target(sdev)->single_lun ||
@@ -618,7 +612,6 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
 	else
 		blk_mq_run_hw_queues(q, true);
 
-	percpu_ref_put(&q->q_usage_counter);
 	put_device(&sdev->sdev_gendev);
 	return false;
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue()
  2019-04-01  4:42 ` [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
@ 2019-04-01 15:14   ` Bart Van Assche
  2019-04-02  0:38     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-04-01 15:14 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Dongli Zhang, James Smart, Bart Van Assche,
	linux-scsi, Martin K . Petersen, Christoph Hellwig,
	James E . J . Bottomley, jianchao wang

On Mon, 2019-04-01 at 12:42 +0800, Ming Lei wrote:
> Now freeing hw queue resource is moved to hctx's release handler,
> we don't need to worry about the race between blk_cleanup_queue and
> run queue any more.
> 
> So don't drain in-progress dispatch in blk_cleanup_queue().
> 
> This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before
> freeing queue").
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: linux-scsi@vger.kernel.org,
> Cc: Martin K . Petersen <martin.petersen@oracle.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
> Cc: jianchao wang <jianchao.w.wang@oracle.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b3bbf8a5110d..491dc0295778 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -347,18 +347,6 @@ void blk_cleanup_queue(struct request_queue *q)
>  
>         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>  
> -       /*
> -        * make sure all in-progress dispatch are completed because
> -        * blk_freeze_queue() can only complete all requests, and
> -        * dispatch may still be in-progress since we dispatch requests
> -        * from more than one contexts.
> -        *
> -        * We rely on driver to deal with the race in case that queue
> -        * initialization isn't done.
> -        */
> -       if (queue_is_mq(q) && blk_queue_init_done(q))
> -               blk_mq_quiesce_queue(q);
> -
>         /* for synchronous bio-based driver finish in-flight integrity i/o */
>         blk_flush_integrity();

Many block drivers clean up resources immediately after blk_cleanup_queue()
returns. Not waiting for ongoing .queue_rq() calls to finish is wrong because
it can cause block drivers to destroy resources that are in use by a concurrent
.queue_rq() call.

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue()
  2019-04-01 15:14   ` Bart Van Assche
@ 2019-04-02  0:38     ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-04-02  0:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Dongli Zhang, James Smart,
	Bart Van Assche, linux-scsi, Martin K . Petersen,
	Christoph Hellwig, James E . J . Bottomley, jianchao wang

On Mon, Apr 01, 2019 at 08:14:59AM -0700, Bart Van Assche wrote:
> On Mon, 2019-04-01 at 12:42 +0800, Ming Lei wrote:
> > Now freeing hw queue resource is moved to hctx's release handler,
> > we don't need to worry about the race between blk_cleanup_queue and
> > run queue any more.
> > 
> > So don't drain in-progress dispatch in blk_cleanup_queue().
> > 
> > This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before
> > freeing queue").
> > 
> > Cc: Dongli Zhang <dongli.zhang@oracle.com>
> > Cc: James Smart <james.smart@broadcom.com>
> > Cc: Bart Van Assche <bart.vanassche@wdc.com>
> > Cc: linux-scsi@vger.kernel.org,
> > Cc: Martin K . Petersen <martin.petersen@oracle.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
> > Cc: jianchao wang <jianchao.w.wang@oracle.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-core.c | 12 ------------
> >  1 file changed, 12 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index b3bbf8a5110d..491dc0295778 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -347,18 +347,6 @@ void blk_cleanup_queue(struct request_queue *q)
> >  
> >         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
> >  
> > -       /*
> > -        * make sure all in-progress dispatch are completed because
> > -        * blk_freeze_queue() can only complete all requests, and
> > -        * dispatch may still be in-progress since we dispatch requests
> > -        * from more than one contexts.
> > -        *
> > -        * We rely on driver to deal with the race in case that queue
> > -        * initialization isn't done.
> > -        */
> > -       if (queue_is_mq(q) && blk_queue_init_done(q))
> > -               blk_mq_quiesce_queue(q);
> > -
> >         /* for synchronous bio-based driver finish in-flight integrity i/o */
> >         blk_flush_integrity();
> 
> Many block drivers clean up resources immediately after blk_cleanup_queue()
> returns. Not waiting for ongoing .queue_rq() calls to finish is wrong because
> it can cause block drivers to destroy resources that are in use by a concurrent
> .queue_rq() call.

blk_freeze_queue() has returned, so there can't be any in-flight IOs and .queue_rq(),
but there might be run queue activities, which do not call into driver.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-02  0:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01  4:42 [PATCH V2 0/3] blk-mq: allow to run queue if queue refcount is held Ming Lei
2019-04-01  4:42 ` [PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
2019-04-01  4:42 ` [PATCH V2 2/3] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
2019-04-01 15:14   ` Bart Van Assche
2019-04-02  0:38     ` Ming Lei
2019-04-01  4:42 ` [PATCH V2 3/3] SCSI: don't grab queue usage counter before run queue Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).