linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
@ 2020-11-12  7:55 Ming Lei
  2020-11-12  7:55 ` [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Ming Lei @ 2020-11-12  7:55 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Ming Lei, Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

Hi,

Qian reported there is hang during booting when shared host tagset is
introduced on megaraid sas. Sumit reported the whole SCSI probe takes
about ~45min in his test.

Turns out it is caused by nr_hw_queues increased, especially commit
b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing")
adds synchronize_rcu() for each hctx's release handler.

Address the original lockdep false positive warning by simpler way, then
long scsi probe can be avoided with lockdep enabled.

Ming Lei (3):
  blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  Revert "block: Fix a lockdep complaint triggered by request queue
    flushing"

 block/blk-flush.c          | 30 +++++++++++++++++++++++++-----
 block/blk.h                |  1 -
 drivers/nvme/target/loop.c | 10 ++++++++++
 include/linux/blk-mq.h     |  3 +++
 4 files changed, 38 insertions(+), 6 deletions(-)

Cc: Qian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
-- 
2.25.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
@ 2020-11-12  7:55 ` Ming Lei
  2020-11-16 17:26   ` Christoph Hellwig
  2020-11-12  7:55 ` [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-12  7:55 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Ming Lei, Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

flush_end_io() may be called recursively from some driver, such as
nvme-loop, so lockdep may complain 'possible recursive locking'.
Commit b3c6a5997541("block: Fix a lockdep complaint triggered by
request queue flushing") tried to address this issue by assigning
dynamically allocated per-flush-queue lock class. This solution
adds synchronize_rcu() for each hctx's release handler, and causes
horrible SCSI MQ probe delay(more than half an hour on megaraid sas).

Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
we just need to use driver specific lock class for avoiding the
lockdep warning of 'possible recursive locking'.

Reported-by: Qian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c      | 25 +++++++++++++++++++++++++
 include/linux/blk-mq.h |  3 +++
 2 files changed, 28 insertions(+)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e32958f0b687..657743524e15 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -490,3 +490,28 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
 	kfree(fq->flush_rq);
 	kfree(fq);
 }
+
+/*
+ * Allow driver to set its own lock class to fq->mq_flush_lock for
+ * avoiding lockdep complaint.
+ *
+ * flush_end_io() may be called recursively from some driver, such as
+ * nvme-loop, so lockdep may complain 'possible recursive locking' because
+ * all 'struct blk_flush_queue' instance share same mq_flush_lock lock class
+ * key. We need to assign different lock class for these driver's
+ * fq->mq_flush_lock for avoiding the lockdep warning.
+ *
+ * Use dynamically allocated lock class key for each 'blk_flush_queue'
+ * instance is over-kill, and more worse it introduces horrible boot delay
+ * issue because synchronize_rcu() is implied in lockdep_unregister_key which
+ * is called for each hctx release. SCSI probing may synchronously create and
+ * destroy lots of MQ request_queues for non-existent devices, and some robot
+ * test kernel always enable lockdep option. It is observed that more than half
+ * an hour is taken during SCSI MQ probe with per-fq lock class.
+ */
+void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx,
+		struct lock_class_key *key)
+{
+	lockdep_set_class(&hctx->fq->mq_flush_lock, key);
+}
+EXPORT_SYMBOL_GPL(blk_mq_hctx_set_fq_lock_class);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 794b2a33a2c3..5f639240760e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,7 @@
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
 #include <linux/srcu.h>
+#include <linux/lockdep.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -594,5 +595,7 @@ static inline void blk_mq_cleanup_rq(struct request *rq)
 }
 
 blk_qc_t blk_mq_submit_bio(struct bio *bio);
+void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx,
+		struct lock_class_key *key);
 
 #endif
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
  2020-11-12  7:55 ` [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class Ming Lei
@ 2020-11-12  7:55 ` Ming Lei
  2020-11-16 17:27   ` Christoph Hellwig
  2020-11-12  7:55 ` [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing" Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-12  7:55 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Ming Lei, Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

Set nvme-loop's lock class via blk_mq_hctx_set_fq_lock_class for avoiding
lockdep possible recursive locking, then we can remove the dynamically
allocated lock class for each flush queue, finally we can avoid horrible
SCSI probe delay.

This way may not address situation in which one nvme-loop is backed on
another nvme-loop. However, in reality, people seldom uses this way
for test. Even though someone played in this way, it is just one
recursive locking false positive, no real deadlock issue.

Reported-by: Qian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/target/loop.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index f6d81239be21..07806016c09d 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -211,6 +211,8 @@ static int nvme_loop_init_request(struct blk_mq_tag_set *set,
 			(set == &ctrl->tag_set) ? hctx_idx + 1 : 0);
 }
 
+static struct lock_class_key loop_hctx_fq_lock_key;
+
 static int nvme_loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 		unsigned int hctx_idx)
 {
@@ -219,6 +221,14 @@ static int nvme_loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 
 	BUG_ON(hctx_idx >= ctrl->ctrl.queue_count);
 
+	/*
+	 * flush_end_io() can be called recursively for us, so use our own
+	 * lock class key for avoiding lockdep possible recursive locking,
+	 * then we can remove the dynamically allocated lock class for each
+	 * flush queue, that way may cause horrible boot delay.
+	 */
+	blk_mq_hctx_set_fq_lock_class(hctx, &loop_hctx_fq_lock_key);
+
 	hctx->driver_data = queue;
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
  2020-11-12  7:55 ` [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class Ming Lei
  2020-11-12  7:55 ` [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class Ming Lei
@ 2020-11-12  7:55 ` Ming Lei
  2020-11-12 15:12   ` Bart Van Assche
  2020-11-13  1:27   ` [PATCH V2 " Ming Lei
  2020-11-25  1:31 ` [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
  2020-11-30  2:36 ` Ming Lei
  4 siblings, 2 replies; 19+ messages in thread
From: Ming Lei @ 2020-11-12  7:55 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Ming Lei, Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.

Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
locking' by nvme-loop's lock class, no need to apply dynamically
allocated lock class key, so revert commit b3c6a5997541("block: Fix a
lockdep complaint triggered by request queue flushing").

This way fixes horrible SCSI probe delay issue on megaraid_sas, and it
is reported the whole probe may take more than half an hour.

Reported-by: Qian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 5 -----
 block/blk.h       | 1 -
 2 files changed, 6 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 657743524e15..c64f049226f6 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -69,7 +69,6 @@
 #include <linux/blkdev.h>
 #include <linux/gfp.h>
 #include <linux/blk-mq.h>
-#include <linux/lockdep.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -469,9 +468,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 	INIT_LIST_HEAD(&fq->flush_queue[1]);
 	INIT_LIST_HEAD(&fq->flush_data_in_flight);
 
-	lockdep_register_key(&fq->key);
-	lockdep_set_class(&fq->mq_flush_lock, &fq->key);
-
 	return fq;
 
  fail_rq:
@@ -486,7 +482,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
 	if (!fq)
 		return;
 
-	lockdep_unregister_key(&fq->key);
 	kfree(fq->flush_rq);
 	kfree(fq);
 }
diff --git a/block/blk.h b/block/blk.h
index dfab98465db9..806fd6537295 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -25,7 +25,6 @@ struct blk_flush_queue {
 	struct list_head	flush_data_in_flight;
 	struct request		*flush_rq;
 
-	struct lock_class_key	key;
 	spinlock_t		mq_flush_lock;
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-12  7:55 ` [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing" Ming Lei
@ 2020-11-12 15:12   ` Bart Van Assche
  2020-11-12 15:29     ` Bart Van Assche
  2020-11-13  1:27   ` [PATCH V2 " Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2020-11-12 15:12 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Qian Cai, Sumit Saxena, John Garry, Kashyap Desai, Hannes Reinecke

On 11/11/20 11:55 PM, Ming Lei wrote:
> This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.
> 
> Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
> locking' by nvme-loop's lock class, no need to apply dynamically
> allocated lock class key, so revert commit b3c6a5997541("block: Fix a
> lockdep complaint triggered by request queue flushing").
> 
> This way fixes horrible SCSI probe delay issue on megaraid_sas, and it
> is reported the whole probe may take more than half an hour.

The code touched by this patch is compiled out with locked disabled so
it is not clear to me how this patch could affect a probe delay for
megaraid_sas? Has the megaraid_sas probe issue perhaps not been root
caused correctly?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-12 15:12   ` Bart Van Assche
@ 2020-11-12 15:29     ` Bart Van Assche
  2020-11-13  1:11       ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2020-11-12 15:29 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Qian Cai, Sumit Saxena, John Garry, Kashyap Desai, Hannes Reinecke

On 11/12/20 7:12 AM, Bart Van Assche wrote:
> On 11/11/20 11:55 PM, Ming Lei wrote:
>> This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.
>>
>> Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
>> locking' by nvme-loop's lock class, no need to apply dynamically
>> allocated lock class key, so revert commit b3c6a5997541("block: Fix a
>> lockdep complaint triggered by request queue flushing").
>>
>> This way fixes horrible SCSI probe delay issue on megaraid_sas, and it
>> is reported the whole probe may take more than half an hour.
> 
> The code touched by this patch is compiled out with locked disabled so
> it is not clear to me how this patch could affect a probe delay for
> megaraid_sas? Has the megaraid_sas probe issue perhaps not been root
> caused correctly?

(replying to my own email)

I found the answer in the descriptions of the previous patches of this
series. I think this patch would benefit from a more complete patch
description.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-12 15:29     ` Bart Van Assche
@ 2020-11-13  1:11       ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2020-11-13  1:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig, Qian Cai,
	Sumit Saxena, John Garry, Kashyap Desai, Hannes Reinecke

On Thu, Nov 12, 2020 at 07:29:13AM -0800, Bart Van Assche wrote:
> On 11/12/20 7:12 AM, Bart Van Assche wrote:
> > On 11/11/20 11:55 PM, Ming Lei wrote:
> >> This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.
> >>
> >> Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
> >> locking' by nvme-loop's lock class, no need to apply dynamically
> >> allocated lock class key, so revert commit b3c6a5997541("block: Fix a
> >> lockdep complaint triggered by request queue flushing").
> >>
> >> This way fixes horrible SCSI probe delay issue on megaraid_sas, and it
> >> is reported the whole probe may take more than half an hour.
> > 
> > The code touched by this patch is compiled out with locked disabled so
> > it is not clear to me how this patch could affect a probe delay for
> > megaraid_sas? Has the megaraid_sas probe issue perhaps not been root
> > caused correctly?
> 
> (replying to my own email)
> 
> I found the answer in the descriptions of the previous patches of this
> series. I think this patch would benefit from a more complete patch
> description.

Hello Bart,

OK, I can add more words to this commit log.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V2 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-12  7:55 ` [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing" Ming Lei
  2020-11-12 15:12   ` Bart Van Assche
@ 2020-11-13  1:27   ` Ming Lei
  2020-11-16 17:27     ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-13  1:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

From a519f421957a1205918e9bcc15087d15234e4e9f Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Thu, 12 Nov 2020 09:56:02 +0800
Subject: [PATCH V2 3/3] Revert "block: Fix a lockdep complaint triggered by
 request queue flushing"

This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.

Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
locking' by nvme-loop's lock class, no need to apply dynamically
allocated lock class key, so revert commit b3c6a5997541("block: Fix a
lockdep complaint triggered by request queue flushing").

This way fixes horrible SCSI probe delay issue on megaraid_sas(host_tagset is 1),
and it is reported the whole probe may take more than half an hour. The reason
is that synchronize_rcu() is implied in lockdep_unregister_key() which is
called from each hctx's release handler, and some SCSI hosts can support
too many hw queues, meantime generic SCSI probe may synchronously create
and destroy lots of MQ request_queues for non-existent devices.

Another benefit is that lockdep doesn't maintain so many runtime lock class for
fq->mq_flush_lock which is per-hctx, then lock validation can be improved much.

Reported-by: Qian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
	- add more commit log

 block/blk-flush.c | 5 -----
 block/blk.h       | 1 -
 2 files changed, 6 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 657743524e15..c64f049226f6 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -69,7 +69,6 @@
 #include <linux/blkdev.h>
 #include <linux/gfp.h>
 #include <linux/blk-mq.h>
-#include <linux/lockdep.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -469,9 +468,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 	INIT_LIST_HEAD(&fq->flush_queue[1]);
 	INIT_LIST_HEAD(&fq->flush_data_in_flight);
 
-	lockdep_register_key(&fq->key);
-	lockdep_set_class(&fq->mq_flush_lock, &fq->key);
-
 	return fq;
 
  fail_rq:
@@ -486,7 +482,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
 	if (!fq)
 		return;
 
-	lockdep_unregister_key(&fq->key);
 	kfree(fq->flush_rq);
 	kfree(fq);
 }
diff --git a/block/blk.h b/block/blk.h
index dfab98465db9..806fd6537295 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -25,7 +25,6 @@ struct blk_flush_queue {
 	struct list_head	flush_data_in_flight;
 	struct request		*flush_rq;
 
-	struct lock_class_key	key;
 	spinlock_t		mq_flush_lock;
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  2020-11-12  7:55 ` [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class Ming Lei
@ 2020-11-16 17:26   ` Christoph Hellwig
  2020-11-17  1:04     ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2020-11-16 17:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig, Qian Cai,
	Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

On Thu, Nov 12, 2020 at 03:55:24PM +0800, Ming Lei wrote:
> flush_end_io() may be called recursively from some driver, such as
> nvme-loop, so lockdep may complain 'possible recursive locking'.
> Commit b3c6a5997541("block: Fix a lockdep complaint triggered by
> request queue flushing") tried to address this issue by assigning
> dynamically allocated per-flush-queue lock class. This solution
> adds synchronize_rcu() for each hctx's release handler, and causes
> horrible SCSI MQ probe delay(more than half an hour on megaraid sas).
> 
> Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
> we just need to use driver specific lock class for avoiding the
> lockdep warning of 'possible recursive locking'.

I'd turn this into an inline function to avoid the (although very
minimal) cost when LOCKDEP is not enabled.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  2020-11-12  7:55 ` [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class Ming Lei
@ 2020-11-16 17:27   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2020-11-16 17:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig, Qian Cai,
	Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V2 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  2020-11-13  1:27   ` [PATCH V2 " Ming Lei
@ 2020-11-16 17:27     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2020-11-16 17:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig, Qian Cai,
	Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

On Fri, Nov 13, 2020 at 09:27:22AM +0800, Ming Lei wrote:
> >From a519f421957a1205918e9bcc15087d15234e4e9f Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@redhat.com>
> Date: Thu, 12 Nov 2020 09:56:02 +0800
> Subject: [PATCH V2 3/3] Revert "block: Fix a lockdep complaint triggered by
>  request queue flushing"
> 
> This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9.
> 
> Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive
> locking' by nvme-loop's lock class, no need to apply dynamically
> allocated lock class key, so revert commit b3c6a5997541("block: Fix a
> lockdep complaint triggered by request queue flushing").
> 
> This way fixes horrible SCSI probe delay issue on megaraid_sas(host_tagset is 1),
> and it is reported the whole probe may take more than half an hour. The reason

There are a bunch of overly long lines in the commit log.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  2020-11-16 17:26   ` Christoph Hellwig
@ 2020-11-17  1:04     ` Ming Lei
  2020-11-17  1:16       ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-17  1:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, Qian Cai, John Garry, linux-nvme,
	Kashyap Desai, linux-block, Sumit Saxena, Hannes Reinecke

On Mon, Nov 16, 2020 at 06:26:58PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 12, 2020 at 03:55:24PM +0800, Ming Lei wrote:
> > flush_end_io() may be called recursively from some driver, such as
> > nvme-loop, so lockdep may complain 'possible recursive locking'.
> > Commit b3c6a5997541("block: Fix a lockdep complaint triggered by
> > request queue flushing") tried to address this issue by assigning
> > dynamically allocated per-flush-queue lock class. This solution
> > adds synchronize_rcu() for each hctx's release handler, and causes
> > horrible SCSI MQ probe delay(more than half an hour on megaraid sas).
> > 
> > Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
> > we just need to use driver specific lock class for avoiding the
> > lockdep warning of 'possible recursive locking'.
> 
> I'd turn this into an inline function to avoid the (although very
> minimal) cost when LOCKDEP is not enabled.

blk_mq_hctx_set_fq_lock_class() is just one-shot thing, do you really
care the cost?


thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  2020-11-17  1:04     ` Ming Lei
@ 2020-11-17  1:16       ` Ming Lei
  2020-11-17  6:46         ` Kashyap Desai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-17  1:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, Qian Cai, John Garry, linux-nvme,
	Kashyap Desai, linux-block, Sumit Saxena, Hannes Reinecke

On Tue, Nov 17, 2020 at 09:05:04AM +0800, Ming Lei wrote:
> On Mon, Nov 16, 2020 at 06:26:58PM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 12, 2020 at 03:55:24PM +0800, Ming Lei wrote:
> > > flush_end_io() may be called recursively from some driver, such as
> > > nvme-loop, so lockdep may complain 'possible recursive locking'.
> > > Commit b3c6a5997541("block: Fix a lockdep complaint triggered by
> > > request queue flushing") tried to address this issue by assigning
> > > dynamically allocated per-flush-queue lock class. This solution
> > > adds synchronize_rcu() for each hctx's release handler, and causes
> > > horrible SCSI MQ probe delay(more than half an hour on megaraid sas).
> > > 
> > > Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
> > > we just need to use driver specific lock class for avoiding the
> > > lockdep warning of 'possible recursive locking'.
> > 
> > I'd turn this into an inline function to avoid the (although very
> > minimal) cost when LOCKDEP is not enabled.
> 
> blk_mq_hctx_set_fq_lock_class() is just one-shot thing, do you really
> care the cost?

Forget to mention, 'blk_flush_queue' is one private structure inside block
layer, so we can't define as inline.

thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  2020-11-17  1:16       ` Ming Lei
@ 2020-11-17  6:46         ` Kashyap Desai
  0 siblings, 0 replies; 19+ messages in thread
From: Kashyap Desai @ 2020-11-17  6:46 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, Qian Cai, John Garry, linux-nvme,
	linux-block, Sumit Saxena, Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

> On Tue, Nov 17, 2020 at 09:05:04AM +0800, Ming Lei wrote:
> > On Mon, Nov 16, 2020 at 06:26:58PM +0100, Christoph Hellwig wrote:
> > > On Thu, Nov 12, 2020 at 03:55:24PM +0800, Ming Lei wrote:
> > > > flush_end_io() may be called recursively from some driver, such as
> > > > nvme-loop, so lockdep may complain 'possible recursive locking'.
> > > > Commit b3c6a5997541("block: Fix a lockdep complaint triggered by
> > > > request queue flushing") tried to address this issue by assigning
> > > > dynamically allocated per-flush-queue lock class. This solution
> > > > adds synchronize_rcu() for each hctx's release handler, and causes
> > > > horrible SCSI MQ probe delay(more than half an hour on megaraid
sas).
> > > >
> > > > Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers,
> > > > so we just need to use driver specific lock class for avoiding the
> > > > lockdep warning of 'possible recursive locking'.
> > >
> > > I'd turn this into an inline function to avoid the (although very
> > > minimal) cost when LOCKDEP is not enabled.
> >
> > blk_mq_hctx_set_fq_lock_class() is just one-shot thing, do you really
> > care the cost?
>
> Forget to mention, 'blk_flush_queue' is one private structure inside
block
> layer, so we can't define as inline.

Original issue reported by Qian is fixed by this patch series. Tested on
MegaRaid h/w.

Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>

>
> thanks,
> Ming

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
  2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
                   ` (2 preceding siblings ...)
  2020-11-12  7:55 ` [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing" Ming Lei
@ 2020-11-25  1:31 ` Ming Lei
  2020-11-30  2:36 ` Ming Lei
  4 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2020-11-25  1:31 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

On Thu, Nov 12, 2020 at 03:55:23PM +0800, Ming Lei wrote:
> Hi,
> 
> Qian reported there is hang during booting when shared host tagset is
> introduced on megaraid sas. Sumit reported the whole SCSI probe takes
> about ~45min in his test.
> 
> Turns out it is caused by nr_hw_queues increased, especially commit
> b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing")
> adds synchronize_rcu() for each hctx's release handler.
> 
> Address the original lockdep false positive warning by simpler way, then
> long scsi probe can be avoided with lockdep enabled.
> 
> Ming Lei (3):
>   blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
>   nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
>   Revert "block: Fix a lockdep complaint triggered by request queue
>     flushing"
> 
>  block/blk-flush.c          | 30 +++++++++++++++++++++++++-----
>  block/blk.h                |  1 -
>  drivers/nvme/target/loop.c | 10 ++++++++++
>  include/linux/blk-mq.h     |  3 +++
>  4 files changed, 38 insertions(+), 6 deletions(-)
> 
> Cc: Qian Cai <cai@redhat.com>
> Cc: Sumit Saxena <sumit.saxena@broadcom.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>

Hello Jens,

Ping...


thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
  2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
                   ` (3 preceding siblings ...)
  2020-11-25  1:31 ` [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
@ 2020-11-30  2:36 ` Ming Lei
  2020-12-02 19:07   ` Qian Cai
  4 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-11-30  2:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Qian Cai, Sumit Saxena, John Garry, Kashyap Desai,
	Bart Van Assche, Hannes Reinecke

On Thu, Nov 12, 2020 at 03:55:23PM +0800, Ming Lei wrote:
> Hi,
> 
> Qian reported there is hang during booting when shared host tagset is
> introduced on megaraid sas. Sumit reported the whole SCSI probe takes
> about ~45min in his test.
> 
> Turns out it is caused by nr_hw_queues increased, especially commit
> b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing")
> adds synchronize_rcu() for each hctx's release handler.
> 
> Address the original lockdep false positive warning by simpler way, then
> long scsi probe can be avoided with lockdep enabled.
> 
> Ming Lei (3):
>   blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
>   nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
>   Revert "block: Fix a lockdep complaint triggered by request queue
>     flushing"
> 
>  block/blk-flush.c          | 30 +++++++++++++++++++++++++-----
>  block/blk.h                |  1 -
>  drivers/nvme/target/loop.c | 10 ++++++++++
>  include/linux/blk-mq.h     |  3 +++
>  4 files changed, 38 insertions(+), 6 deletions(-)
> 
> Cc: Qian Cai <cai@redhat.com>
> Cc: Sumit Saxena <sumit.saxena@broadcom.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>

Hello Jens,

Any chance to take a look? And this issue has been reported several
times in RH internal test.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
  2020-11-30  2:36 ` Ming Lei
@ 2020-12-02 19:07   ` Qian Cai
  2020-12-03  0:50     ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2020-12-02 19:07 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-nvme, Christoph Hellwig
  Cc: Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

On Mon, 2020-11-30 at 10:36 +0800, Ming Lei wrote:
> Any chance to take a look? And this issue has been reported several
> times in RH internal test.

I suppose that you will need to rebase as it does not apply cleanly on today's
linux-next.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
  2020-12-02 19:07   ` Qian Cai
@ 2020-12-03  0:50     ` Ming Lei
  2020-12-03  1:23       ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-12-03  0:50 UTC (permalink / raw)
  To: Qian Cai
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig,
	Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

On Wed, Dec 02, 2020 at 02:07:08PM -0500, Qian Cai wrote:
> On Mon, 2020-11-30 at 10:36 +0800, Ming Lei wrote:
> > Any chance to take a look? And this issue has been reported several
> > times in RH internal test.
> 
> I suppose that you will need to rebase as it does not apply cleanly on today's
> linux-next.
> 

I just apply the three patches against for-5.11/block, and they can be
done cleanly.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning
  2020-12-03  0:50     ` Ming Lei
@ 2020-12-03  1:23       ` Qian Cai
  0 siblings, 0 replies; 19+ messages in thread
From: Qian Cai @ 2020-12-03  1:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Christoph Hellwig,
	Sumit Saxena, John Garry, Kashyap Desai, Bart Van Assche,
	Hannes Reinecke

On Thu, 2020-12-03 at 08:50 +0800, Ming Lei wrote:
> On Wed, Dec 02, 2020 at 02:07:08PM -0500, Qian Cai wrote:
> > On Mon, 2020-11-30 at 10:36 +0800, Ming Lei wrote:
> > > Any chance to take a look? And this issue has been reported several
> > > times in RH internal test.
> > 
> > I suppose that you will need to rebase as it does not apply cleanly on
> > today's
> > linux-next.
> > 
> 
> I just apply the three patches against for-5.11/block, and they can be
> done cleanly.

Ah, sorry. It was a bit whitespace damages on my end. This did really fix the
megasas_raid boot hang here. Since 103fbf8e4020 ("scsi: megaraid_sas: Added
support for shared host tagset for cpuhotplug") is now in the mainline, it
probably makes sense to merge this patchset for v5.10 as well to avoid breakage.
Anyway,

Tested-by: Qian Cai <qcai@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-12-03  1:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12  7:55 [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
2020-11-12  7:55 ` [PATCH 1/3] blk-mq: add new API of blk_mq_hctx_set_fq_lock_class Ming Lei
2020-11-16 17:26   ` Christoph Hellwig
2020-11-17  1:04     ` Ming Lei
2020-11-17  1:16       ` Ming Lei
2020-11-17  6:46         ` Kashyap Desai
2020-11-12  7:55 ` [PATCH 2/3] nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class Ming Lei
2020-11-16 17:27   ` Christoph Hellwig
2020-11-12  7:55 ` [PATCH 3/3] Revert "block: Fix a lockdep complaint triggered by request queue flushing" Ming Lei
2020-11-12 15:12   ` Bart Van Assche
2020-11-12 15:29     ` Bart Van Assche
2020-11-13  1:11       ` Ming Lei
2020-11-13  1:27   ` [PATCH V2 " Ming Lei
2020-11-16 17:27     ` Christoph Hellwig
2020-11-25  1:31 ` [PATCH 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning Ming Lei
2020-11-30  2:36 ` Ming Lei
2020-12-02 19:07   ` Qian Cai
2020-12-03  0:50     ` Ming Lei
2020-12-03  1:23       ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).