All of lore.kernel.org
 help / color / mirror / Atom feed
* [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-12 15:06 Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 1/4] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:06 UTC (permalink / raw)
  To: axboe
  Cc: Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel, linux-block

Hi Jens,

I'm submitting this v5 with more feeling now ;)

I've distilled the changes down to be quite minimal.  Hopefully this
will help you and others review.

I've also done a dry-run of applying 4.16 block changes and then
merging dm-4.16; no merge conflicts:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-4.16
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16_dm-4.16

And all tests very well for me.

Please consider for 4.16, thanks!

Mike

Mike Snitzer (4):
  block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN
  block: properly protect the 'queue' kobj in blk_unregister_queue
  block: allow gendisk's request_queue registration to be deferred
  dm: fix incomplete request_queue initialization

 block/blk-sysfs.c     | 18 +++++++++++++++---
 block/genhd.c         | 23 +++++++++++++++++++----
 drivers/md/dm-rq.c    |  9 ---------
 drivers/md/dm.c       | 11 ++++++++++-
 include/linux/genhd.h |  5 +++++
 5 files changed, 49 insertions(+), 17 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [for-4.16 PATCH v5 1/4] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
@ 2018-01-12 15:06 ` Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:06 UTC (permalink / raw)
  To: axboe
  Cc: Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel, linux-block

device_add_disk() will only call bdi_register_owner() if
!GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call
bdi_unregister() if !GENHD_FL_HIDDEN.

Found with code inspection.  bdi_unregister() won't do any harm if
bdi_register_owner() wasn't used but best to avoid the unnecessary
call to bdi_unregister().

Fixes: 8ddcd65325 ("block: introduce GENHD_FL_HIDDEN")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 block/genhd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index 96a66f671720..00620e01e043 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -725,7 +725,8 @@ void del_gendisk(struct gendisk *disk)
 		 * Unregister bdi before releasing device numbers (as they can
 		 * get reused and we'd get clashes in sysfs).
 		 */
-		bdi_unregister(disk->queue->backing_dev_info);
+		if (!(disk->flags & GENHD_FL_HIDDEN))
+			bdi_unregister(disk->queue->backing_dev_info);
 		blk_unregister_queue(disk);
 	} else {
 		WARN_ON(1);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 1/4] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer
@ 2018-01-12 15:06 ` Mike Snitzer
  2018-01-12 15:17   ` Ming Lei
  2018-01-12 16:03   ` [for-4.16 PATCH v6 " Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 3/4] block: allow gendisk's request_queue registration to be deferred Mike Snitzer
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:06 UTC (permalink / raw)
  To: axboe
  Cc: Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel, linux-block

The original commit e9a823fb34a8b (block: fix warning when I/O elevator
is changed as request_queue is being removed) is pretty conflated.
"conflated" because the resource being protected by q->sysfs_lock isn't
the queue_flags (it is the 'queue' kobj).

q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
from racing with blk_unregister_queue():
1) By holding q->sysfs_lock first, __elevator_change() can complete
before a racing blk_unregister_queue().
2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
in case elv_iosched_store() loses the race with blk_unregister_queue(),
it needs a way to know the 'queue' kobj isn't there.

Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
held until after the 'queue' kobj is removed.

Also, blk_unregister_queue() should use q->queue_lock to protect against
any concurrent writes to q->queue_flags -- even though chances are the
queue is being cleaned up so no concurrent writes are likely.

Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-sysfs.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 870484eaed1f..9272452ff456 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -929,12 +929,17 @@ void blk_unregister_queue(struct gendisk *disk)
 	if (WARN_ON(!q))
 		return;
 
+	/*
+	 * Protect against the 'queue' kobj being accessed
+	 * while/after it is removed.
+	 */
 	mutex_lock(&q->sysfs_lock);
-	queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
-	mutex_unlock(&q->sysfs_lock);
 
-	wbt_exit(q);
+	spin_lock_irq(q->queue_lock);
+	queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
+	spin_unlock_irq(q->queue_lock);
 
+	wbt_exit(q);
 
 	if (q->mq_ops)
 		blk_mq_unregister_dev(disk_to_dev(disk), q);
@@ -946,4 +951,6 @@ void blk_unregister_queue(struct gendisk *disk)
 	kobject_del(&q->kobj);
 	blk_trace_remove_sysfs(disk_to_dev(disk));
 	kobject_put(&disk_to_dev(disk)->kobj);
+
+	mutex_unlock(&q->sysfs_lock);
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [for-4.16 PATCH v5 3/4] block: allow gendisk's request_queue registration to be deferred
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 1/4] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
@ 2018-01-12 15:06 ` Mike Snitzer
  2018-01-12 15:06 ` [for-4.16 PATCH v5 4/4] dm: fix incomplete request_queue initialization Mike Snitzer
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:06 UTC (permalink / raw)
  To: axboe
  Cc: Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel, linux-block

Since I can remember DM has forced the block layer to allow the
allocation and initialization of the request_queue to be distinct
operations.  Reason for this is block/genhd.c:add_disk() has requires
that the request_queue (and associated bdi) be tied to the gendisk
before add_disk() is called -- because add_disk() also deals with
exposing the request_queue via blk_register_queue().

DM's dynamic creation of arbitrary device types (and associated
request_queue types) requires the DM device's gendisk be available so
that DM table loads can establish a master/slave relationship with
subordinate devices that are referenced by loaded DM tables -- using
bd_link_disk_holder().  But until these DM tables, and their associated
subordinate devices, are known DM cannot know what type of request_queue
it needs -- nor what its queue_limits should be.

This chicken and egg scenario has created all manner of problems for DM
and, at times, the block layer.

Summary of changes:

- Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
  that drivers may use to add a disk without also calling
  blk_register_queue().  Driver must call blk_register_queue() once its
  request_queue is fully initialized.

- Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
  is not set.  It won't be set if driver used add_disk_no_queue_reg()
  but driver encounters an error and must del_gendisk() before calling
  blk_register_queue().

- Export blk_register_queue().

These changes allow DM to use add_disk_no_queue_reg() to anchor its
gendisk as the "master" for master/slave relationships DM must establish
with subordinate devices referenced in DM tables that get loaded.  Once
all "slave" devices for a DM device are known its request_queue can be
properly initialized and then advertised via sysfs -- important
improvement being that no request_queue resource initialization
performed by blk_register_queue() is missed for DM devices anymore.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-sysfs.c     |  5 +++++
 block/genhd.c         | 20 +++++++++++++++++---
 include/linux/genhd.h |  5 +++++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9272452ff456..4a6a40ffd78e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -921,6 +921,7 @@ int blk_register_queue(struct gendisk *disk)
 	mutex_unlock(&q->sysfs_lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(blk_register_queue);
 
 void blk_unregister_queue(struct gendisk *disk)
 {
@@ -929,6 +930,10 @@ void blk_unregister_queue(struct gendisk *disk)
 	if (WARN_ON(!q))
 		return;
 
+	/* Return early if disk->queue was never registered. */
+	if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
+		return;
+
 	/*
 	 * Protect against the 'queue' kobj being accessed
 	 * while/after it is removed.
diff --git a/block/genhd.c b/block/genhd.c
index 00620e01e043..d4aaf0cae9ad 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -629,7 +629,7 @@ static void register_disk(struct device *parent, struct gendisk *disk)
 }
 
 /**
- * device_add_disk - add partitioning information to kernel list
+ * device_add_disk_no_queue_reg - add partitioning information to kernel list
  * @parent: parent device for the disk
  * @disk: per-device partitioning information
  *
@@ -638,7 +638,7 @@ static void register_disk(struct device *parent, struct gendisk *disk)
  *
  * FIXME: error handling
  */
-void device_add_disk(struct device *parent, struct gendisk *disk)
+void device_add_disk_no_queue_reg(struct device *parent, struct gendisk *disk)
 {
 	dev_t devt;
 	int retval;
@@ -682,7 +682,6 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 				    exact_match, exact_lock, disk);
 	}
 	register_disk(parent, disk);
-	blk_register_queue(disk);
 
 	/*
 	 * Take an extra ref on queue which will be put on disk_release()
@@ -693,6 +692,21 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 	disk_add_events(disk);
 	blk_integrity_add(disk);
 }
+EXPORT_SYMBOL(device_add_disk_no_queue_reg);
+
+/**
+ * device_add_disk - add disk information to kernel list
+ * @parent: parent device for the disk
+ * @disk: per-device disk information
+ *
+ * Adds partitioning information and also registers the
+ * associated request_queue to @disk.
+ */
+void device_add_disk(struct device *parent, struct gendisk *disk)
+{
+	device_add_disk_no_queue_reg(parent, disk);
+	blk_register_queue(disk);
+}
 EXPORT_SYMBOL(device_add_disk);
 
 void del_gendisk(struct gendisk *disk)
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 5144ebe046c9..5e3531027b51 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -395,6 +395,11 @@ static inline void add_disk(struct gendisk *disk)
 {
 	device_add_disk(NULL, disk);
 }
+extern void device_add_disk_no_queue_reg(struct device *parent, struct gendisk *disk);
+static inline void add_disk_no_queue_reg(struct gendisk *disk)
+{
+	device_add_disk_no_queue_reg(NULL, disk);
+}
 
 extern void del_gendisk(struct gendisk *gp);
 extern struct gendisk *get_gendisk(dev_t dev, int *partno);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [for-4.16 PATCH v5 4/4] dm: fix incomplete request_queue initialization
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
                   ` (2 preceding siblings ...)
  2018-01-12 15:06 ` [for-4.16 PATCH v5 3/4] block: allow gendisk's request_queue registration to be deferred Mike Snitzer
@ 2018-01-12 15:06 ` Mike Snitzer
  2018-01-12 16:13 ` [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
  2018-01-15 17:16   ` Bart Van Assche
  5 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:06 UTC (permalink / raw)
  To: axboe
  Cc: Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel, linux-block

DM is no longer prone to having its request_queue be improperly
initialized.

Summary of changes:

- defer DM's blk_register_queue() from add_disk()-time until
  dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().

- dm_setup_md_queue() is updated to fully initialize DM's request_queue
  (_after_ all table loads have occurred and the request_queue's type,
  features and limits are known).

A very welcome side-effect of these changes is DM no longer needs to:
1) backfill the "mq" sysfs entry (because historically DM didn't
initialize the request_queue to use blk-mq until _after_
blk_register_queue() was called via add_disk()).
2) call elv_register_queue() to get .request_fn request-based DM
device's "iosched" exposed in syfs.

In addition, blk-mq debugfs support is now made available because
request-based DM's blk-mq request_queue is now properly initialized
before dm_setup_md_queue() calls blk_register_queue().

These changes also stave off the need to introduce new DM-specific
workarounds in block core, e.g. this proposal:
https://patchwork.kernel.org/patch/10067961/

In the end DM devices should be less unicorn in nature (relative to
initialization and availability of block core infrastructure provided by
the request_queue).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-rq.c |  9 ---------
 drivers/md/dm.c    | 11 ++++++++++-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 9d32f25489c2..c28357f5cb0e 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -713,8 +713,6 @@ int dm_old_init_request_queue(struct mapped_device *md, struct dm_table *t)
 		return error;
 	}
 
-	elv_register_queue(md->queue);
-
 	return 0;
 }
 
@@ -812,15 +810,8 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
 	}
 	dm_init_md_queue(md);
 
-	/* backfill 'mq' sysfs registration normally done in blk_register_queue */
-	err = blk_mq_register_dev(disk_to_dev(md->disk), q);
-	if (err)
-		goto out_cleanup_queue;
-
 	return 0;
 
-out_cleanup_queue:
-	blk_cleanup_queue(q);
 out_tag_set:
 	blk_mq_free_tag_set(md->tag_set);
 out_kfree_tag_set:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7475739fee49..8c26bfc35335 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1761,7 +1761,7 @@ static struct mapped_device *alloc_dev(int minor)
 		goto bad;
 	md->dax_dev = dax_dev;
 
-	add_disk(md->disk);
+	add_disk_no_queue_reg(md->disk);
 	format_dev_t(md->name, MKDEV(_major, minor));
 
 	md->wq = alloc_workqueue("kdmflush", WQ_MEM_RECLAIM, 0);
@@ -2021,6 +2021,7 @@ EXPORT_SYMBOL_GPL(dm_get_queue_limits);
 int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 {
 	int r;
+	struct queue_limits limits;
 	enum dm_queue_mode type = dm_get_md_type(md);
 
 	switch (type) {
@@ -2057,6 +2058,14 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		break;
 	}
 
+	r = dm_calculate_queue_limits(t, &limits);
+	if (r) {
+		DMERR("Cannot calculate initial queue limits");
+		return r;
+	}
+	dm_table_set_restrictions(t, md->queue, &limits);
+	blk_register_queue(md->disk);
+
 	return 0;
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue
  2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
@ 2018-01-12 15:17   ` Ming Lei
  2018-01-12 15:29     ` Mike Snitzer
  2018-01-12 16:03   ` [for-4.16 PATCH v6 " Mike Snitzer
  1 sibling, 1 reply; 27+ messages in thread
From: Ming Lei @ 2018-01-12 15:17 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel,
	linux-block

On Fri, Jan 12, 2018 at 10:06:04AM -0500, Mike Snitzer wrote:
> The original commit e9a823fb34a8b (block: fix warning when I/O elevator
> is changed as request_queue is being removed) is pretty conflated.
> "conflated" because the resource being protected by q->sysfs_lock isn't
> the queue_flags (it is the 'queue' kobj).
> 
> q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
> from racing with blk_unregister_queue():
> 1) By holding q->sysfs_lock first, __elevator_change() can complete
> before a racing blk_unregister_queue().
> 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
> in case elv_iosched_store() loses the race with blk_unregister_queue(),
> it needs a way to know the 'queue' kobj isn't there.
> 
> Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
> held until after the 'queue' kobj is removed.

This way will cause deadlock, see blow.

> 
> Also, blk_unregister_queue() should use q->queue_lock to protect against
> any concurrent writes to q->queue_flags -- even though chances are the
> queue is being cleaned up so no concurrent writes are likely.
> 
> Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-sysfs.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 870484eaed1f..9272452ff456 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -929,12 +929,17 @@ void blk_unregister_queue(struct gendisk *disk)
>  	if (WARN_ON(!q))
>  		return;
>  
> +	/*
> +	 * Protect against the 'queue' kobj being accessed
> +	 * while/after it is removed.
> +	 */
>  	mutex_lock(&q->sysfs_lock);
> -	queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
> -	mutex_unlock(&q->sysfs_lock);
>  
> -	wbt_exit(q);
> +	spin_lock_irq(q->queue_lock);
> +	queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
> +	spin_unlock_irq(q->queue_lock);
>  
> +	wbt_exit(q);
>  
>  	if (q->mq_ops)
>  		blk_mq_unregister_dev(disk_to_dev(disk), q);

void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
{
        mutex_lock(&q->sysfs_lock);
        __blk_mq_unregister_dev(dev, q);
        mutex_unlock(&q->sysfs_lock);
}

> @@ -946,4 +951,6 @@ void blk_unregister_queue(struct gendisk *disk)
>  	kobject_del(&q->kobj);
>  	blk_trace_remove_sysfs(disk_to_dev(disk));
>  	kobject_put(&disk_to_dev(disk)->kobj);
> +
> +	mutex_unlock(&q->sysfs_lock);
>  }

Except for above, I remember there is also lockdep warning between
sysfs_lock and driver core's lock if the sysfs_lock is extended in this
way(I tried it before, but forget the details now), so please just hold
queue_lock inside the sysfs lock.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue
  2018-01-12 15:17   ` Ming Lei
@ 2018-01-12 15:29     ` Mike Snitzer
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, Ming Lei, hare, Bart.VanAssche, David Jeffery, dm-devel,
	linux-block

On Fri, Jan 12 2018 at 10:17am -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Fri, Jan 12, 2018 at 10:06:04AM -0500, Mike Snitzer wrote:
> > The original commit e9a823fb34a8b (block: fix warning when I/O elevator
> > is changed as request_queue is being removed) is pretty conflated.
> > "conflated" because the resource being protected by q->sysfs_lock isn't
> > the queue_flags (it is the 'queue' kobj).
> > 
> > q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
> > from racing with blk_unregister_queue():
> > 1) By holding q->sysfs_lock first, __elevator_change() can complete
> > before a racing blk_unregister_queue().
> > 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
> > in case elv_iosched_store() loses the race with blk_unregister_queue(),
> > it needs a way to know the 'queue' kobj isn't there.
> > 
> > Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
> > held until after the 'queue' kobj is removed.
> 
> This way will cause deadlock, see blow.

Ngh... I thought I tested blk-mq with this patch applied, apparently not.

> > 
> > Also, blk_unregister_queue() should use q->queue_lock to protect against
> > any concurrent writes to q->queue_flags -- even though chances are the
> > queue is being cleaned up so no concurrent writes are likely.
> > 
> > Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  block/blk-sysfs.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 870484eaed1f..9272452ff456 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -929,12 +929,17 @@ void blk_unregister_queue(struct gendisk *disk)
> >  	if (WARN_ON(!q))
> >  		return;
> >  
> > +	/*
> > +	 * Protect against the 'queue' kobj being accessed
> > +	 * while/after it is removed.
> > +	 */
> >  	mutex_lock(&q->sysfs_lock);
> > -	queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
> > -	mutex_unlock(&q->sysfs_lock);
> >  
> > -	wbt_exit(q);
> > +	spin_lock_irq(q->queue_lock);
> > +	queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
> > +	spin_unlock_irq(q->queue_lock);
> >  
> > +	wbt_exit(q);
> >  
> >  	if (q->mq_ops)
> >  		blk_mq_unregister_dev(disk_to_dev(disk), q);
> 
> void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
> {
>         mutex_lock(&q->sysfs_lock);
>         __blk_mq_unregister_dev(dev, q);
>         mutex_unlock(&q->sysfs_lock);
> }
> 
> > @@ -946,4 +951,6 @@ void blk_unregister_queue(struct gendisk *disk)
> >  	kobject_del(&q->kobj);
> >  	blk_trace_remove_sysfs(disk_to_dev(disk));
> >  	kobject_put(&disk_to_dev(disk)->kobj);
> > +
> > +	mutex_unlock(&q->sysfs_lock);
> >  }
> 
> Except for above, I remember there is also lockdep warning between
> sysfs_lock and driver core's lock if the sysfs_lock is extended in this
> way(I tried it before, but forget the details now), so please just hold
> queue_lock inside the sysfs lock.

No good deed goes unpunished.

I'll fix this up.

Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [for-4.16 PATCH v6 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue
  2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
  2018-01-12 15:17   ` Ming Lei
@ 2018-01-12 16:03   ` Mike Snitzer
  2018-01-13 12:57     ` Ming Lei
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 16:03 UTC (permalink / raw)
  To: axboe; +Cc: David Jeffery, Ming Lei, linux-block, dm-devel, Bart.VanAssche

The original commit e9a823fb34a8b (block: fix warning when I/O elevator
is changed as request_queue is being removed) is pretty conflated.
"conflated" because the resource being protected by q->sysfs_lock isn't
the queue_flags (it is the 'queue' kobj).

q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
from racing with blk_unregister_queue():
1) By holding q->sysfs_lock first, __elevator_change() can complete
before a racing blk_unregister_queue().
2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
in case elv_iosched_store() loses the race with blk_unregister_queue(),
it needs a way to know the 'queue' kobj isn't there.

Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
held until after the 'queue' kobj is removed.

To do so blk_mq_unregister_dev() must not also take q->sysfs_lock.  So
rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().

Also, blk_unregister_queue() should use q->queue_lock to protect against
any concurrent writes to q->queue_flags -- even though chances are the
queue is being cleaned up so no concurrent writes are likely.

Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq-sysfs.c |  9 +--------
 block/blk-sysfs.c    | 13 ++++++++++---
 2 files changed, 11 insertions(+), 11 deletions(-)

v6: blk_mq_unregister_dev now requires q->sysfs_lock be held, Ming: I am
not seeing any lockdep complaints with this.  I've tested bio-based,
blk-mq and old .request_fn request-based.

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 79969c3c234f..a54b4b070f1c 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -248,7 +248,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx)
 	return ret;
 }
 
-static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
+void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	int i;
@@ -265,13 +265,6 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
 	q->mq_sysfs_init_done = false;
 }
 
-void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
-{
-	mutex_lock(&q->sysfs_lock);
-	__blk_mq_unregister_dev(dev, q);
-	mutex_unlock(&q->sysfs_lock);
-}
-
 void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx)
 {
 	kobject_init(&hctx->kobj, &blk_mq_hw_ktype);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 870484eaed1f..9272452ff456 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -929,12 +929,17 @@ void blk_unregister_queue(struct gendisk *disk)
 	if (WARN_ON(!q))
 		return;
 
+	/*
+	 * Protect against the 'queue' kobj being accessed
+	 * while/after it is removed.
+	 */
 	mutex_lock(&q->sysfs_lock);
-	queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
-	mutex_unlock(&q->sysfs_lock);
 
-	wbt_exit(q);
+	spin_lock_irq(q->queue_lock);
+	queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
+	spin_unlock_irq(q->queue_lock);
 
+	wbt_exit(q);
 
 	if (q->mq_ops)
 		blk_mq_unregister_dev(disk_to_dev(disk), q);
@@ -946,4 +951,6 @@ void blk_unregister_queue(struct gendisk *disk)
 	kobject_del(&q->kobj);
 	blk_trace_remove_sysfs(disk_to_dev(disk));
 	kobject_put(&disk_to_dev(disk)->kobj);
+
+	mutex_unlock(&q->sysfs_lock);
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
                   ` (3 preceding siblings ...)
  2018-01-12 15:06 ` [for-4.16 PATCH v5 4/4] dm: fix incomplete request_queue initialization Mike Snitzer
@ 2018-01-12 16:13 ` Mike Snitzer
  2018-01-15 17:16   ` Bart Van Assche
  5 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-12 16:13 UTC (permalink / raw)
  To: axboe; +Cc: David Jeffery, Ming Lei, linux-block, dm-devel, Bart.VanAssche

On Fri, Jan 12 2018 at 10:06am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> Hi Jens,
> 
> I'm submitting this v5 with more feeling now ;)

Of course PATCH 02/04 needed a v6.

> I've distilled the changes down to be quite minimal.  Hopefully this
> will help you and others review.

I could make it even simpler by dropping 02/04 entirely.  But I do think
it worthwhile to try to get 02/04 in at this time too.

> I've also done a dry-run of applying 4.16 block changes and then
> merging dm-4.16; no merge conflicts:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=dm-4.16
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=block-4.16_dm-4.16
> 
> And all tests very well for me.
> 
> Please consider for 4.16, thanks!

Anyway, I've genuinely tested with blk-mq now.  Sorry for all the
iterations on this patchset in such rapid succession.  Obviously I know
it doesn't instill confidence...

But I really would appreciate you giving this patchset serious
consideration.

DM is long overdue for this fix.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v6 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue
  2018-01-12 16:03   ` [for-4.16 PATCH v6 " Mike Snitzer
@ 2018-01-13 12:57     ` Ming Lei
  0 siblings, 0 replies; 27+ messages in thread
From: Ming Lei @ 2018-01-13 12:57 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, David Jeffery, Ming Lei, linux-block, dm-devel, Bart.VanAssche

On Fri, Jan 12, 2018 at 11:03:52AM -0500, Mike Snitzer wrote:
> The original commit e9a823fb34a8b (block: fix warning when I/O elevator
> is changed as request_queue is being removed) is pretty conflated.
> "conflated" because the resource being protected by q->sysfs_lock isn't
> the queue_flags (it is the 'queue' kobj).
> 
> q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
> from racing with blk_unregister_queue():
> 1) By holding q->sysfs_lock first, __elevator_change() can complete
> before a racing blk_unregister_queue().
> 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
> in case elv_iosched_store() loses the race with blk_unregister_queue(),
> it needs a way to know the 'queue' kobj isn't there.
> 
> Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
> held until after the 'queue' kobj is removed.
> 
> To do so blk_mq_unregister_dev() must not also take q->sysfs_lock.  So
> rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().
> 
> Also, blk_unregister_queue() should use q->queue_lock to protect against
> any concurrent writes to q->queue_flags -- even though chances are the
> queue is being cleaned up so no concurrent writes are likely.
> 
> Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-mq-sysfs.c |  9 +--------
>  block/blk-sysfs.c    | 13 ++++++++++---
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> v6: blk_mq_unregister_dev now requires q->sysfs_lock be held, Ming: I am
> not seeing any lockdep complaints with this.  I've tested bio-based,
> blk-mq and old .request_fn request-based.
> 
> diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
> index 79969c3c234f..a54b4b070f1c 100644
> --- a/block/blk-mq-sysfs.c
> +++ b/block/blk-mq-sysfs.c
> @@ -248,7 +248,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx)
>  	return ret;
>  }
>  
> -static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
> +void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	int i;
> @@ -265,13 +265,6 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
>  	q->mq_sysfs_init_done = false;
>  }
>  
> -void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
> -{
> -	mutex_lock(&q->sysfs_lock);
> -	__blk_mq_unregister_dev(dev, q);
> -	mutex_unlock(&q->sysfs_lock);
> -}
> -
>  void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx)
>  {
>  	kobject_init(&hctx->kobj, &blk_mq_hw_ktype);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 870484eaed1f..9272452ff456 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -929,12 +929,17 @@ void blk_unregister_queue(struct gendisk *disk)
>  	if (WARN_ON(!q))
>  		return;
>  
> +	/*
> +	 * Protect against the 'queue' kobj being accessed
> +	 * while/after it is removed.
> +	 */
>  	mutex_lock(&q->sysfs_lock);
> -	queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q);
> -	mutex_unlock(&q->sysfs_lock);
>  
> -	wbt_exit(q);
> +	spin_lock_irq(q->queue_lock);
> +	queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
> +	spin_unlock_irq(q->queue_lock);
>  
> +	wbt_exit(q);
>  
>  	if (q->mq_ops)
>  		blk_mq_unregister_dev(disk_to_dev(disk), q);
> @@ -946,4 +951,6 @@ void blk_unregister_queue(struct gendisk *disk)
>  	kobject_del(&q->kobj);
>  	blk_trace_remove_sysfs(disk_to_dev(disk));
>  	kobject_put(&disk_to_dev(disk)->kobj);
> +
> +	mutex_unlock(&q->sysfs_lock);
>  }

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
@ 2018-01-15 17:16   ` Bart Van Assche
  2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:16 UTC (permalink / raw)
  To: snitzer, axboe; +Cc: dm-devel, hare, tom.leiming, djeffery, linux-block

T24gRnJpLCAyMDE4LTAxLTEyIGF0IDEwOjA2IC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IEknbSBzdWJtaXR0aW5nIHRoaXMgdjUgd2l0aCBtb3JlIGZlZWxpbmcgbm93IDspDQoNCkhlbGxv
IE1pa2UsDQoNCkhhdmUgdGhlc2UgcGF0Y2hlcyBiZWVuIHRlc3RlZCB3aXRoIGxvY2tkZXAgZW5h
YmxlZD8gVGhlIGZvbGxvd2luZyBhcHBlYXJlZCBpbg0KdGhlIGtlcm5lbCBsb2cgd2hlbiBhZnRl
ciBJIHN0YXJ0ZWQgdGVzdGluZyBKZW5zJyBmb3ItbmV4dCB0cmVlIG9mIHRoaXMgbW9ybmluZzoN
Cg0KPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
DQpXQVJOSU5HOiBwb3NzaWJsZSBjaXJjdWxhciBsb2NraW5nIGRlcGVuZGVuY3kgZGV0ZWN0ZWQN
CjQuMTUuMC1yYzctZGJnKyAjMSBOb3QgdGFpbnRlZA0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQowMi1tcS8xMjExIGlzIHRyeWluZyB0byBh
Y3F1aXJlIGxvY2s6DQogKCZxLT5zeXNmc19sb2NrKXsrLisufSwgYXQ6IFs8MDAwMDAwMDA4YjY1
YmRhZD5dIHF1ZXVlX2F0dHJfc3RvcmUrMHgzNS8weDgwDQoNCmJ1dCB0YXNrIGlzIGFscmVhZHkg
aG9sZGluZyBsb2NrOg0KIChrbi0+Y291bnQjMjEzKXsrKysrfSwgYXQ6IFs8MDAwMDAwMDA3YTE4
YWQxOD5dIGtlcm5mc19mb3Bfd3JpdGUrMHhlNS8weDE5MA0KDQp3aGljaCBsb2NrIGFscmVhZHkg
ZGVwZW5kcyBvbiB0aGUgbmV3IGxvY2suDQoNCg0KdGhlIGV4aXN0aW5nIGRlcGVuZGVuY3kgY2hh
aW4gKGluIHJldmVyc2Ugb3JkZXIpIGlzOg0KDQotPiAjMSAoa24tPmNvdW50IzIxMyl7KysrK306
DQogICAgICAga2VybmZzX3JlbW92ZSsweDFhLzB4MzANCiAgICAgICBrb2JqZWN0X2RlbC5wYXJ0
LjMrMHhlLzB4NDANCiAgICAgICBibGtfdW5yZWdpc3Rlcl9xdWV1ZSsweGE3LzB4ZTANCiAgICAg
ICBkZWxfZ2VuZGlzaysweDEyZi8weDI2MA0KICAgICAgIHNkX3JlbW92ZSsweDU4LzB4YzAgW3Nk
X21vZF0NCiAgICAgICBkZXZpY2VfcmVsZWFzZV9kcml2ZXJfaW50ZXJuYWwrMHgxNWEvMHgyMjAN
CiAgICAgICBidXNfcmVtb3ZlX2RldmljZSsweGY0LzB4MTcwDQogICAgICAgZGV2aWNlX2RlbCsw
eDEyZi8weDMzMA0KICAgICAgIF9fc2NzaV9yZW1vdmVfZGV2aWNlKzB4ZWYvMHgxMjAgW3Njc2lf
bW9kXQ0KICAgICAgIHNjc2lfZm9yZ2V0X2hvc3QrMHgxYi8weDYwIFtzY3NpX21vZF0NCiAgICAg
ICBzY3NpX3JlbW92ZV9ob3N0KzB4NmYvMHgxMTAgW3Njc2lfbW9kXQ0KICAgICAgIDB4ZmZmZmZm
ZmZjMDllZDZlNA0KICAgICAgIHByb2Nlc3Nfb25lX3dvcmsrMHgyMWMvMHg2ZDANCiAgICAgICB3
b3JrZXJfdGhyZWFkKzB4MzUvMHgzODANCiAgICAgICBrdGhyZWFkKzB4MTE3LzB4MTMwDQogICAg
ICAgcmV0X2Zyb21fZm9yaysweDI0LzB4MzANCg0KLT4gIzAgKCZxLT5zeXNmc19sb2NrKXsrLisu
fToNCiAgICAgICBfX211dGV4X2xvY2srMHg2Yy8weDllMA0KICAgICAgIHF1ZXVlX2F0dHJfc3Rv
cmUrMHgzNS8weDgwDQogICAgICAga2VybmZzX2ZvcF93cml0ZSsweDEwOS8weDE5MA0KICAgICAg
IF9fdmZzX3dyaXRlKzB4MWUvMHgxMzANCiAgICAgICB2ZnNfd3JpdGUrMHhiOS8weDFiMA0KICAg
ICAgIFN5U193cml0ZSsweDQwLzB4YTANCiAgICAgICBlbnRyeV9TWVNDQUxMXzY0X2Zhc3RwYXRo
KzB4MjMvMHg5YQ0KDQpvdGhlciBpbmZvIHRoYXQgbWlnaHQgaGVscCB1cyBkZWJ1ZyB0aGlzOg0K
DQogUG9zc2libGUgdW5zYWZlIGxvY2tpbmcgc2NlbmFyaW86DQoNCiAgICAgICBDUFUwICAgICAg
ICAgICAgICAgICAgICBDUFUxDQogICAgICAgLS0tLSAgICAgICAgICAgICAgICAgICAgLS0tLQ0K
ICBsb2NrKGtuLT5jb3VudCMyMTMpOw0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxv
Y2soJnEtPnN5c2ZzX2xvY2spOw0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxvY2so
a24tPmNvdW50IzIxMyk7DQogIGxvY2soJnEtPnN5c2ZzX2xvY2spOw0KDQogKioqIERFQURMT0NL
ICoqKg0KDQozIGxvY2tzIGhlbGQgYnkgMDItbXEvMTIxMToNCiAjMDogIChzYl93cml0ZXJzIzYp
ey4rLit9LCBhdDogWzwwMDAwMDAwMGFmZGI2MWQzPl0gdmZzX3dyaXRlKzB4MTdmLzB4MWIwDQog
IzE6ICAoJm9mLT5tdXRleCl7Ky4rLn0sIGF0OiBbPDAwMDAwMDAwYjI5MWNhYmI+XSBrZXJuZnNf
Zm9wX3dyaXRlKzB4ZGQvMHgxOTANCiAjMjogIChrbi0+Y291bnQjMjEzKXsrKysrfSwgYXQ6IFs8
MDAwMDAwMDA3YTE4YWQxOD5dIGtlcm5mc19mb3Bfd3JpdGUrMHhlNS8weDE5MA0KDQpzdGFjayBi
YWNrdHJhY2U6DQpDUFU6IDIgUElEOiAxMjExIENvbW06IDAyLW1xIE5vdCB0YWludGVkIDQuMTUu
MC1yYzctZGJnKyAjMQ0KSGFyZHdhcmUgbmFtZTogUUVNVSBTdGFuZGFyZCBQQyAoaTQ0MEZYICsg
UElJWCwgMTk5NiksIEJJT1MgMS4wLjAtcHJlYnVpbHQucWVtdS1wcm9qZWN0Lm9yZyAwNC8wMS8y
MDE0DQpDYWxsIFRyYWNlOg0KIGR1bXBfc3RhY2srMHg4NS8weGJmDQogcHJpbnRfY2lyY3VsYXJf
YnVnLmlzcmEuMzUrMHgxZDcvMHgxZTQNCiBfX2xvY2tfYWNxdWlyZSsweDEyYWYvMHgxMzcwDQog
PyBsb2NrX2FjcXVpcmUrMHhhOC8weDIzMA0KIGxvY2tfYWNxdWlyZSsweGE4LzB4MjMwDQogPyBx
dWV1ZV9hdHRyX3N0b3JlKzB4MzUvMHg4MA0KID8gcXVldWVfYXR0cl9zdG9yZSsweDM1LzB4ODAN
CiBfX211dGV4X2xvY2srMHg2Yy8weDllMA0KID8gcXVldWVfYXR0cl9zdG9yZSsweDM1LzB4ODAN
CiA/IGtlcm5mc19mb3Bfd3JpdGUrMHhkZC8weDE5MA0KID8gX19tdXRleF9sb2NrKzB4NmMvMHg5
ZTANCiA/IF9fbXV0ZXhfbG9jaysweDNkMC8weDllMA0KID8gbG9ja19hY3F1aXJlKzB4YTgvMHgy
MzANCiA/IF9fbG9ja19pc19oZWxkKzB4NTUvMHg5MA0KID8gcXVldWVfYXR0cl9zdG9yZSsweDM1
LzB4ODANCiBxdWV1ZV9hdHRyX3N0b3JlKzB4MzUvMHg4MA0KIGtlcm5mc19mb3Bfd3JpdGUrMHgx
MDkvMHgxOTANCiBfX3Zmc193cml0ZSsweDFlLzB4MTMwDQogPyByY3VfcmVhZF9sb2NrX3NjaGVk
X2hlbGQrMHg2Ni8weDcwDQogPyByY3Vfc3luY19sb2NrZGVwX2Fzc2VydCsweDIzLzB4NTANCiA/
IF9fc2Jfc3RhcnRfd3JpdGUrMHgxNTIvMHgxZjANCiA/IF9fc2Jfc3RhcnRfd3JpdGUrMHgxNjgv
MHgxZjANCiB2ZnNfd3JpdGUrMHhiOS8weDFiMA0KIFN5U193cml0ZSsweDQwLzB4YTANCiBlbnRy
eV9TWVNDQUxMXzY0X2Zhc3RwYXRoKzB4MjMvMHg5YQ0KUklQOiAwMDMzOjB4N2ZhYjhiZThkMDU0
DQpSU1A6IDAwMmI6MDAwMDdmZmQzOWYwZGVmOCBFRkxBR1M6IDAwMDAwMjQ2DQoNClRoYW5rcywN
Cg0KQmFydC4=

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-15 17:16   ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:16 UTC (permalink / raw)
  To: snitzer, axboe; +Cc: linux-block, tom.leiming, dm-devel, djeffery

On Fri, 2018-01-12 at 10:06 -0500, Mike Snitzer wrote:
> I'm submitting this v5 with more feeling now ;)

Hello Mike,

Have these patches been tested with lockdep enabled? The following appeared in
the kernel log when after I started testing Jens' for-next tree of this morning:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc7-dbg+ #1 Not tainted
------------------------------------------------------
02-mq/1211 is trying to acquire lock:
 (&q->sysfs_lock){+.+.}, at: [<000000008b65bdad>] queue_attr_store+0x35/0x80

but task is already holding lock:
 (kn->count#213){++++}, at: [<000000007a18ad18>] kernfs_fop_write+0xe5/0x190

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (kn->count#213){++++}:
       kernfs_remove+0x1a/0x30
       kobject_del.part.3+0xe/0x40
       blk_unregister_queue+0xa7/0xe0
       del_gendisk+0x12f/0x260
       sd_remove+0x58/0xc0 [sd_mod]
       device_release_driver_internal+0x15a/0x220
       bus_remove_device+0xf4/0x170
       device_del+0x12f/0x330
       __scsi_remove_device+0xef/0x120 [scsi_mod]
       scsi_forget_host+0x1b/0x60 [scsi_mod]
       scsi_remove_host+0x6f/0x110 [scsi_mod]
       0xffffffffc09ed6e4
       process_one_work+0x21c/0x6d0
       worker_thread+0x35/0x380
       kthread+0x117/0x130
       ret_from_fork+0x24/0x30

-> #0 (&q->sysfs_lock){+.+.}:
       __mutex_lock+0x6c/0x9e0
       queue_attr_store+0x35/0x80
       kernfs_fop_write+0x109/0x190
       __vfs_write+0x1e/0x130
       vfs_write+0xb9/0x1b0
       SyS_write+0x40/0xa0
       entry_SYSCALL_64_fastpath+0x23/0x9a

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(kn->count#213);
                               lock(&q->sysfs_lock);
                               lock(kn->count#213);
  lock(&q->sysfs_lock);

 *** DEADLOCK ***

3 locks held by 02-mq/1211:
 #0:  (sb_writers#6){.+.+}, at: [<00000000afdb61d3>] vfs_write+0x17f/0x1b0
 #1:  (&of->mutex){+.+.}, at: [<00000000b291cabb>] kernfs_fop_write+0xdd/0x190
 #2:  (kn->count#213){++++}, at: [<000000007a18ad18>] kernfs_fop_write+0xe5/0x190

stack backtrace:
CPU: 2 PID: 1211 Comm: 02-mq Not tainted 4.15.0-rc7-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xbf
 print_circular_bug.isra.35+0x1d7/0x1e4
 __lock_acquire+0x12af/0x1370
 ? lock_acquire+0xa8/0x230
 lock_acquire+0xa8/0x230
 ? queue_attr_store+0x35/0x80
 ? queue_attr_store+0x35/0x80
 __mutex_lock+0x6c/0x9e0
 ? queue_attr_store+0x35/0x80
 ? kernfs_fop_write+0xdd/0x190
 ? __mutex_lock+0x6c/0x9e0
 ? __mutex_lock+0x3d0/0x9e0
 ? lock_acquire+0xa8/0x230
 ? __lock_is_held+0x55/0x90
 ? queue_attr_store+0x35/0x80
 queue_attr_store+0x35/0x80
 kernfs_fop_write+0x109/0x190
 __vfs_write+0x1e/0x130
 ? rcu_read_lock_sched_held+0x66/0x70
 ? rcu_sync_lockdep_assert+0x23/0x50
 ? __sb_start_write+0x152/0x1f0
 ? __sb_start_write+0x168/0x1f0
 vfs_write+0xb9/0x1b0
 SyS_write+0x40/0xa0
 entry_SYSCALL_64_fastpath+0x23/0x9a
RIP: 0033:0x7fab8be8d054
RSP: 002b:00007ffd39f0def8 EFLAGS: 00000246

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:16   ` Bart Van Assche
  (?)
@ 2018-01-15 17:29   ` Mike Snitzer
  2018-01-15 17:36       ` Bart Van Assche
  -1 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 17:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: axboe, dm-devel, hare, tom.leiming, djeffery, linux-block

On Mon, Jan 15 2018 at 12:16pm -0500,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Fri, 2018-01-12 at 10:06 -0500, Mike Snitzer wrote:
> > I'm submitting this v5 with more feeling now ;)
> 
> Hello Mike,
> 
> Have these patches been tested with lockdep enabled? The following appeared in
> the kernel log when after I started testing Jens' for-next tree of this morning:

So you replied to v5, I emailed a v6 out for the relevant patch.  Just
want to make sure you're testing with either Jens' latest tree or are
using my v6 that fixed blk_mq_unregister_dev() to require caller holds
q->sysfs_lock ?

With the latest code that Jens merged, I do have lockdep enabled in my
kernel... but haven't seen any issues on the 2 testbeds I've been
hammering the changes on.

Ming was concerned about the potential for a lockdep splat (said he
tried expanding the scope of q->sysfs_lock in blk_unregister_queue
before and in doing so got lockdep to trigger).

Once you respond with what it is you're testing, and if it is latest
code, I'll look closer at what you've provided and see if it is real or
that lockdep just needs some training.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:29   ` Mike Snitzer
@ 2018-01-15 17:36       ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:36 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

T24gTW9uLCAyMDE4LTAxLTE1IGF0IDEyOjI5IC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IFNvIHlvdSByZXBsaWVkIHRvIHY1LCBJIGVtYWlsZWQgYSB2NiBvdXQgZm9yIHRoZSByZWxldmFu
dCBwYXRjaC4gIEp1c3QNCj4gd2FudCB0byBtYWtlIHN1cmUgeW91J3JlIHRlc3Rpbmcgd2l0aCBl
aXRoZXIgSmVucycgbGF0ZXN0IHRyZWUgb3IgYXJlDQo+IHVzaW5nIG15IHY2IHRoYXQgZml4ZWQg
YmxrX21xX3VucmVnaXN0ZXJfZGV2KCkgdG8gcmVxdWlyZSBjYWxsZXIgaG9sZHMNCj4gcS0+c3lz
ZnNfbG9jayA/DQoNCkhlbGxvIE1pa2UsDQoNCkluIG15IHRlc3QgSSB3YXMgdXNpbmcgSmVucycg
bGF0ZXN0IGZvci1uZXh0IHRyZWUgKGNvbW1pdCA1NjM4NzdhZTdkYWUpLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-15 17:36       ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:36 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

On Mon, 2018-01-15 at 12:29 -0500, Mike Snitzer wrote:
> So you replied to v5, I emailed a v6 out for the relevant patch.  Just
> want to make sure you're testing with either Jens' latest tree or are
> using my v6 that fixed blk_mq_unregister_dev() to require caller holds
> q->sysfs_lock ?

Hello Mike,

In my test I was using Jens' latest for-next tree (commit 563877ae7dae).

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:36       ` Bart Van Assche
  (?)
@ 2018-01-15 17:48       ` Mike Snitzer
  2018-01-15 17:51           ` Bart Van Assche
  -1 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 17:48 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

On Mon, Jan 15 2018 at 12:36pm -0500,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Mon, 2018-01-15 at 12:29 -0500, Mike Snitzer wrote:
> > So you replied to v5, I emailed a v6 out for the relevant patch.  Just
> > want to make sure you're testing with either Jens' latest tree or are
> > using my v6 that fixed blk_mq_unregister_dev() to require caller holds
> > q->sysfs_lock ?
> 
> Hello Mike,
> 
> In my test I was using Jens' latest for-next tree (commit 563877ae7dae).

OK, that includes his latest for-4.16/block (commit c100ec49fdd22228) so
I'll take a closer look at this lockdep splat.

Do I need to do something more to enable lockdep aside from set
CONFIG_LOCKDEP_SUPPORT=y ?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:48       ` Mike Snitzer
@ 2018-01-15 17:51           ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:51 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, axboe, djeffery

T24gTW9uLCAyMDE4LTAxLTE1IGF0IDEyOjQ4IC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IERvIEkgbmVlZCB0byBkbyBzb21ldGhpbmcgbW9yZSB0byBlbmFibGUgbG9ja2RlcCBhc2lkZSBm
cm9tIHNldA0KPiBDT05GSUdfTE9DS0RFUF9TVVBQT1JUPXkgPw0KDQpIZWxsbyBNaWtlLA0KDQpJ
IHRoaW5rIHlvdSBhbHNvIG5lZWQgdG8gc2V0IENPTkZJR19QUk9WRV9MT0NLSU5HPXkuDQoNCkJh
cnQu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-15 17:51           ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 17:51 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, axboe, djeffery

On Mon, 2018-01-15 at 12:48 -0500, Mike Snitzer wrote:
> Do I need to do something more to enable lockdep aside from set
> CONFIG_LOCKDEP_SUPPORT=y ?

Hello Mike,

I think you also need to set CONFIG_PROVE_LOCKING=y.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:51           ` Bart Van Assche
  (?)
@ 2018-01-15 17:57           ` Mike Snitzer
  -1 siblings, 0 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 17:57 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-block, hare, tom.leiming, axboe, djeffery

On Mon, Jan 15 2018 at 12:51pm -0500,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Mon, 2018-01-15 at 12:48 -0500, Mike Snitzer wrote:
> > Do I need to do something more to enable lockdep aside from set
> > CONFIG_LOCKDEP_SUPPORT=y ?
> 
> Hello Mike,
> 
> I think you also need to set CONFIG_PROVE_LOCKING=y.

Ah ok, was missing that, thanks.

Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 17:16   ` Bart Van Assche
  (?)
  (?)
@ 2018-01-15 22:15   ` Mike Snitzer
  2018-01-15 22:51       ` Bart Van Assche
  -1 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 22:15 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: axboe, dm-devel, hare, tom.leiming, djeffery, linux-block

On Mon, Jan 15 2018 at 12:16pm -0500,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Fri, 2018-01-12 at 10:06 -0500, Mike Snitzer wrote:
> > I'm submitting this v5 with more feeling now ;)
> 
> Hello Mike,
> 
> Have these patches been tested with lockdep enabled? The following appeared in
> the kernel log when after I started testing Jens' for-next tree of this morning:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.15.0-rc7-dbg+ #1 Not tainted
> ------------------------------------------------------
> 02-mq/1211 is trying to acquire lock:
>  (&q->sysfs_lock){+.+.}, at: [<000000008b65bdad>] queue_attr_store+0x35/0x80
> 
> but task is already holding lock:
>  (kn->count#213){++++}, at: [<000000007a18ad18>] kernfs_fop_write+0xe5/0x190
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (kn->count#213){++++}:
>        kernfs_remove+0x1a/0x30
>        kobject_del.part.3+0xe/0x40
>        blk_unregister_queue+0xa7/0xe0
>        del_gendisk+0x12f/0x260
>        sd_remove+0x58/0xc0 [sd_mod]
>        device_release_driver_internal+0x15a/0x220
>        bus_remove_device+0xf4/0x170
>        device_del+0x12f/0x330
>        __scsi_remove_device+0xef/0x120 [scsi_mod]
>        scsi_forget_host+0x1b/0x60 [scsi_mod]
>        scsi_remove_host+0x6f/0x110 [scsi_mod]
>        0xffffffffc09ed6e4
>        process_one_work+0x21c/0x6d0
>        worker_thread+0x35/0x380
>        kthread+0x117/0x130
>        ret_from_fork+0x24/0x30
> 
> -> #0 (&q->sysfs_lock){+.+.}:
>        __mutex_lock+0x6c/0x9e0
>        queue_attr_store+0x35/0x80
>        kernfs_fop_write+0x109/0x190
>        __vfs_write+0x1e/0x130
>        vfs_write+0xb9/0x1b0
>        SyS_write+0x40/0xa0
>        entry_SYSCALL_64_fastpath+0x23/0x9a
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(kn->count#213);
>                                lock(&q->sysfs_lock);
>                                lock(kn->count#213);
>   lock(&q->sysfs_lock);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by 02-mq/1211:
>  #0:  (sb_writers#6){.+.+}, at: [<00000000afdb61d3>] vfs_write+0x17f/0x1b0
>  #1:  (&of->mutex){+.+.}, at: [<00000000b291cabb>] kernfs_fop_write+0xdd/0x190
>  #2:  (kn->count#213){++++}, at: [<000000007a18ad18>] kernfs_fop_write+0xe5/0x190

sysfs write op calls kernfs_fop_write which takes:
of->mutex then kn->count#213 (no idea what that is)
then q->sysfs_lock (via queue_attr_store)

vs 

blk_unregister_queue takes:
q->sysfs_lock then
kernfs_mutex (via kernfs_remove)
seems lockdep thinks "kernfs_mutex" is "kn->count#213"?

Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
triggering false positives.. because these seem like different kernfs
locks yet they are reported as "kn->count#213".

Certainly feeling out of my depth with kernfs's locking though.

Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 22:15   ` Mike Snitzer
@ 2018-01-15 22:51       ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 22:51 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

T24gTW9uLCAyMDE4LTAxLTE1IGF0IDE3OjE1IC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IHN5c2ZzIHdyaXRlIG9wIGNhbGxzIGtlcm5mc19mb3Bfd3JpdGUgd2hpY2ggdGFrZXM6DQo+IG9m
LT5tdXRleCB0aGVuIGtuLT5jb3VudCMyMTMgKG5vIGlkZWEgd2hhdCB0aGF0IGlzKQ0KPiB0aGVu
IHEtPnN5c2ZzX2xvY2sgKHZpYSBxdWV1ZV9hdHRyX3N0b3JlKQ0KPiANCj4gdnMgDQo+IA0KPiBi
bGtfdW5yZWdpc3Rlcl9xdWV1ZSB0YWtlczoNCj4gcS0+c3lzZnNfbG9jayB0aGVuDQo+IGtlcm5m
c19tdXRleCAodmlhIGtlcm5mc19yZW1vdmUpDQo+IHNlZW1zIGxvY2tkZXAgdGhpbmtzICJrZXJu
ZnNfbXV0ZXgiIGlzICJrbi0+Y291bnQjMjEzIj8NCj4gDQo+IEZlZWxzIGxpa2UgbG9ja2RlcCBj
b2RlIGluIGZzL2tlcm5mcy9maWxlLmMgYW5kIGZzL2tlcm5mcy9kaXIuYyBpcw0KPiB0cmlnZ2Vy
aW5nIGZhbHNlIHBvc2l0aXZlcy4uIGJlY2F1c2UgdGhlc2Ugc2VlbSBsaWtlIGRpZmZlcmVudCBr
ZXJuZnMNCj4gbG9ja3MgeWV0IHRoZXkgYXJlIHJlcG9ydGVkIGFzICJrbi0+Y291bnQjMjEzIi4N
Cj4gDQo+IENlcnRhaW5seSBmZWVsaW5nIG91dCBvZiBteSBkZXB0aCB3aXRoIGtlcm5mcydzIGxv
Y2tpbmcgdGhvdWdoLg0KDQpIZWxsbyBNaWtlLA0KDQpJIGRvbid0IHRoaW5rIHRoYXQgdGhpcyBp
cyBhIGZhbHNlIHBvc2l0aXZlIGJ1dCByYXRoZXIgdGhlIGZvbGxvd2luZyB0cmFkaXRpb25hbA0K
cGF0dGVybiBvZiBhIHBvdGVudGlhbCBkZWFkbG9jayBpbnZvbHZpbmcgc3lzZnMgYXR0cmlidXRl
czoNCiogT25lIGNvbnRleHQgb2J0YWlucyBhIG11dGV4IGZyb20gaW5zaWRlIGEgc3lzZnMgYXR0
cmlidXRlIG1ldGhvZDoNCiAgcXVldWVfYXR0cl9zdG9yZSgpIG9idGFpbnMgcS0+c3lzZnNfbG9j
ay4NCiogQW5vdGhlciBjb250ZXh0IHJlbW92ZXMgYSBzeXNmcyBhdHRyaWJ1dGUgd2hpbGUgaG9s
ZGluZyBhIG11dGV4Og0KICBibGtfdW5yZWdpc3Rlcl9xdWV1ZSgpIHJlbW92ZXMgdGhlIHF1ZXVl
IHN5c2ZzIGF0dHJpYnV0ZXMgd2hpbGUgaG9sZGluZw0KICBxLT5zeXNmc19sb2NrLg0KDQpUaGlz
IGNhbiByZXN1bHQgaW4gYSByZWFsIGRlYWRsb2NrIGJlY2F1c2UgdGhlIGNvZGUgdGhhdCByZW1v
dmVzIHN5c2ZzIG9iamVjdHMNCndhaXRzIHVudGlsIGFsbCBvbmdvaW5nIGF0dHJpYnV0ZSBjYWxs
YmFja3MgaGF2ZSBmaW5pc2hlZC4NCg0KU2luY2UgY29tbWl0IDY2NzI1N2U4YjI5OCAoImJsb2Nr
OiBwcm9wZXJseSBwcm90ZWN0IHRoZSAncXVldWUnIGtvYmogaW4NCmJsa191bnJlZ2lzdGVyX3F1
ZXVlIikgbW9kaWZpZWQgYmxrX3VucmVnaXN0ZXJfcXVldWUoKSBzdWNoIHRoYXQgcS0+c3lzZnNf
bG9jaw0KaXMgaGVsZCBhcm91bmQgdGhlIGtvYmplY3RfZGVsKCZxLT5rb2JqKSBjYWxsIEkgdGhp
bmsgdGhpcyBpcyBhIHJlZ3Jlc3Npb24NCmludHJvZHVjZWQgYnkgdGhhdCBjb21taXQuDQoNCkJh
cnQu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-15 22:51       ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-15 22:51 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
> sysfs write op calls kernfs_fop_write which takes:
> of->mutex then kn->count#213 (no idea what that is)
> then q->sysfs_lock (via queue_attr_store)
> 
> vs 
> 
> blk_unregister_queue takes:
> q->sysfs_lock then
> kernfs_mutex (via kernfs_remove)
> seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
> 
> Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
> triggering false positives.. because these seem like different kernfs
> locks yet they are reported as "kn->count#213".
> 
> Certainly feeling out of my depth with kernfs's locking though.

Hello Mike,

I don't think that this is a false positive but rather the following traditional
pattern of a potential deadlock involving sysfs attributes:
* One context obtains a mutex from inside a sysfs attribute method:
  queue_attr_store() obtains q->sysfs_lock.
* Another context removes a sysfs attribute while holding a mutex:
  blk_unregister_queue() removes the queue sysfs attributes while holding
  q->sysfs_lock.

This can result in a real deadlock because the code that removes sysfs objects
waits until all ongoing attribute callbacks have finished.

Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
is held around the kobject_del(&q->kobj) call I think this is a regression
introduced by that commit.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 22:51       ` Bart Van Assche
  (?)
@ 2018-01-15 23:10       ` Mike Snitzer
  2018-01-15 23:13         ` Mike Snitzer
  -1 siblings, 1 reply; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 23:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

On Mon, Jan 15 2018 at  5:51pm -0500,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
> > sysfs write op calls kernfs_fop_write which takes:
> > of->mutex then kn->count#213 (no idea what that is)
> > then q->sysfs_lock (via queue_attr_store)
> > 
> > vs 
> > 
> > blk_unregister_queue takes:
> > q->sysfs_lock then
> > kernfs_mutex (via kernfs_remove)
> > seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
> > 
> > Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
> > triggering false positives.. because these seem like different kernfs
> > locks yet they are reported as "kn->count#213".
> > 
> > Certainly feeling out of my depth with kernfs's locking though.
> 
> Hello Mike,
> 
> I don't think that this is a false positive but rather the following traditional
> pattern of a potential deadlock involving sysfs attributes:
> * One context obtains a mutex from inside a sysfs attribute method:
>   queue_attr_store() obtains q->sysfs_lock.
> * Another context removes a sysfs attribute while holding a mutex:
>   blk_unregister_queue() removes the queue sysfs attributes while holding
>   q->sysfs_lock.
> 
> This can result in a real deadlock because the code that removes sysfs objects
> waits until all ongoing attribute callbacks have finished.
> 
> Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
> blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
> is held around the kobject_del(&q->kobj) call I think this is a regression
> introduced by that commit.

Sure, of course it is a regression.

Aside from moving the mutex_unlock(&q->sysfs_lock) above the
kobject_del(&q->kobj) I don't know how to fix it.

Though, realistically that'd be an adequate fix (given the way the code
was before).

Mike

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 23:10       ` Mike Snitzer
@ 2018-01-15 23:13         ` Mike Snitzer
  2018-01-16  2:21           ` Ming Lei
  2018-01-16 18:19             ` Bart Van Assche
  0 siblings, 2 replies; 27+ messages in thread
From: Mike Snitzer @ 2018-01-15 23:13 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-block, hare, tom.leiming, djeffery, axboe

On Mon, Jan 15 2018 at  6:10P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Jan 15 2018 at  5:51pm -0500,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
> > > sysfs write op calls kernfs_fop_write which takes:
> > > of->mutex then kn->count#213 (no idea what that is)
> > > then q->sysfs_lock (via queue_attr_store)
> > > 
> > > vs 
> > > 
> > > blk_unregister_queue takes:
> > > q->sysfs_lock then
> > > kernfs_mutex (via kernfs_remove)
> > > seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
> > > 
> > > Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
> > > triggering false positives.. because these seem like different kernfs
> > > locks yet they are reported as "kn->count#213".
> > > 
> > > Certainly feeling out of my depth with kernfs's locking though.
> > 
> > Hello Mike,
> > 
> > I don't think that this is a false positive but rather the following traditional
> > pattern of a potential deadlock involving sysfs attributes:
> > * One context obtains a mutex from inside a sysfs attribute method:
> >   queue_attr_store() obtains q->sysfs_lock.
> > * Another context removes a sysfs attribute while holding a mutex:
> >   blk_unregister_queue() removes the queue sysfs attributes while holding
> >   q->sysfs_lock.
> > 
> > This can result in a real deadlock because the code that removes sysfs objects
> > waits until all ongoing attribute callbacks have finished.
> > 
> > Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
> > blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
> > is held around the kobject_del(&q->kobj) call I think this is a regression
> > introduced by that commit.
> 
> Sure, of course it is a regression.
> 
> Aside from moving the mutex_unlock(&q->sysfs_lock) above the
> kobject_del(&q->kobj) I don't know how to fix it.
> 
> Though, realistically that'd be an adequate fix (given the way the code
> was before).

Any chance you apply this and re-run your srp_test that triggered the
lockdep splat?

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4a6a40ffd78e..c50e08e9bf17 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -952,10 +952,10 @@ void blk_unregister_queue(struct gendisk *disk)
 	if (q->request_fn || (q->mq_ops && q->elevator))
 		elv_unregister_queue(q);
 
+	mutex_unlock(&q->sysfs_lock);
+
 	kobject_uevent(&q->kobj, KOBJ_REMOVE);
 	kobject_del(&q->kobj);
 	blk_trace_remove_sysfs(disk_to_dev(disk));
 	kobject_put(&disk_to_dev(disk)->kobj);
-
-	mutex_unlock(&q->sysfs_lock);
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 23:13         ` Mike Snitzer
@ 2018-01-16  2:21           ` Ming Lei
  2018-01-16 18:19             ` Bart Van Assche
  1 sibling, 0 replies; 27+ messages in thread
From: Ming Lei @ 2018-01-16  2:21 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, dm-devel, linux-block, hare, djeffery, axboe

On Tue, Jan 16, 2018 at 7:13 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Mon, Jan 15 2018 at  6:10P -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Mon, Jan 15 2018 at  5:51pm -0500,
>> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>>
>> > On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
>> > > sysfs write op calls kernfs_fop_write which takes:
>> > > of->mutex then kn->count#213 (no idea what that is)
>> > > then q->sysfs_lock (via queue_attr_store)
>> > >
>> > > vs
>> > >
>> > > blk_unregister_queue takes:
>> > > q->sysfs_lock then
>> > > kernfs_mutex (via kernfs_remove)
>> > > seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
>> > >
>> > > Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
>> > > triggering false positives.. because these seem like different kernfs
>> > > locks yet they are reported as "kn->count#213".
>> > >
>> > > Certainly feeling out of my depth with kernfs's locking though.
>> >
>> > Hello Mike,
>> >
>> > I don't think that this is a false positive but rather the following traditional
>> > pattern of a potential deadlock involving sysfs attributes:
>> > * One context obtains a mutex from inside a sysfs attribute method:
>> >   queue_attr_store() obtains q->sysfs_lock.
>> > * Another context removes a sysfs attribute while holding a mutex:
>> >   blk_unregister_queue() removes the queue sysfs attributes while holding
>> >   q->sysfs_lock.
>> >
>> > This can result in a real deadlock because the code that removes sysfs objects
>> > waits until all ongoing attribute callbacks have finished.
>> >
>> > Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
>> > blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
>> > is held around the kobject_del(&q->kobj) call I think this is a regression
>> > introduced by that commit.
>>
>> Sure, of course it is a regression.
>>
>> Aside from moving the mutex_unlock(&q->sysfs_lock) above the
>> kobject_del(&q->kobj) I don't know how to fix it.
>>
>> Though, realistically that'd be an adequate fix (given the way the code
>> was before).
>
> Any chance you apply this and re-run your srp_test that triggered the
> lockdep splat?
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 4a6a40ffd78e..c50e08e9bf17 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -952,10 +952,10 @@ void blk_unregister_queue(struct gendisk *disk)
>         if (q->request_fn || (q->mq_ops && q->elevator))
>                 elv_unregister_queue(q);
>
> +       mutex_unlock(&q->sysfs_lock);
> +
>         kobject_uevent(&q->kobj, KOBJ_REMOVE);
>         kobject_del(&q->kobj);
>         blk_trace_remove_sysfs(disk_to_dev(disk));
>         kobject_put(&disk_to_dev(disk)->kobj);
> -
> -       mutex_unlock(&q->sysfs_lock);
>  }

If this patch is required, similar change should be needed in failure path
of blk_register_queue() too.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
  2018-01-15 23:13         ` Mike Snitzer
@ 2018-01-16 18:19             ` Bart Van Assche
  2018-01-16 18:19             ` Bart Van Assche
  1 sibling, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-16 18:19 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, axboe, djeffery

T24gTW9uLCAyMDE4LTAxLTE1IGF0IDE4OjEzIC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IE9uIE1vbiwgSmFuIDE1IDIwMTggYXQgIDY6MTBQIC0wNTAwLA0KPiBNaWtlIFNuaXR6ZXIgPHNu
aXR6ZXJAcmVkaGF0LmNvbT4gd3JvdGU6DQo+IA0KPiA+IE9uIE1vbiwgSmFuIDE1IDIwMTggYXQg
IDU6NTFwbSAtMDUwMCwNCj4gPiBCYXJ0IFZhbiBBc3NjaGUgPEJhcnQuVmFuQXNzY2hlQHdkYy5j
b20+IHdyb3RlOg0KPiA+IA0KPiA+ID4gT24gTW9uLCAyMDE4LTAxLTE1IGF0IDE3OjE1IC0wNTAw
LCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+ID4gPiA+IHN5c2ZzIHdyaXRlIG9wIGNhbGxzIGtlcm5m
c19mb3Bfd3JpdGUgd2hpY2ggdGFrZXM6DQo+ID4gPiA+IG9mLT5tdXRleCB0aGVuIGtuLT5jb3Vu
dCMyMTMgKG5vIGlkZWEgd2hhdCB0aGF0IGlzKQ0KPiA+ID4gPiB0aGVuIHEtPnN5c2ZzX2xvY2sg
KHZpYSBxdWV1ZV9hdHRyX3N0b3JlKQ0KPiA+ID4gPiANCj4gPiA+ID4gdnMgDQo+ID4gPiA+IA0K
PiA+ID4gPiBibGtfdW5yZWdpc3Rlcl9xdWV1ZSB0YWtlczoNCj4gPiA+ID4gcS0+c3lzZnNfbG9j
ayB0aGVuDQo+ID4gPiA+IGtlcm5mc19tdXRleCAodmlhIGtlcm5mc19yZW1vdmUpDQo+ID4gPiA+
IHNlZW1zIGxvY2tkZXAgdGhpbmtzICJrZXJuZnNfbXV0ZXgiIGlzICJrbi0+Y291bnQjMjEzIj8N
Cj4gPiA+ID4gDQo+ID4gPiA+IEZlZWxzIGxpa2UgbG9ja2RlcCBjb2RlIGluIGZzL2tlcm5mcy9m
aWxlLmMgYW5kIGZzL2tlcm5mcy9kaXIuYyBpcw0KPiA+ID4gPiB0cmlnZ2VyaW5nIGZhbHNlIHBv
c2l0aXZlcy4uIGJlY2F1c2UgdGhlc2Ugc2VlbSBsaWtlIGRpZmZlcmVudCBrZXJuZnMNCj4gPiA+
ID4gbG9ja3MgeWV0IHRoZXkgYXJlIHJlcG9ydGVkIGFzICJrbi0+Y291bnQjMjEzIi4NCj4gPiA+
ID4gDQo+ID4gPiA+IENlcnRhaW5seSBmZWVsaW5nIG91dCBvZiBteSBkZXB0aCB3aXRoIGtlcm5m
cydzIGxvY2tpbmcgdGhvdWdoLg0KPiA+ID4gDQo+ID4gPiBIZWxsbyBNaWtlLA0KPiA+ID4gDQo+
ID4gPiBJIGRvbid0IHRoaW5rIHRoYXQgdGhpcyBpcyBhIGZhbHNlIHBvc2l0aXZlIGJ1dCByYXRo
ZXIgdGhlIGZvbGxvd2luZyB0cmFkaXRpb25hbA0KPiA+ID4gcGF0dGVybiBvZiBhIHBvdGVudGlh
bCBkZWFkbG9jayBpbnZvbHZpbmcgc3lzZnMgYXR0cmlidXRlczoNCj4gPiA+ICogT25lIGNvbnRl
eHQgb2J0YWlucyBhIG11dGV4IGZyb20gaW5zaWRlIGEgc3lzZnMgYXR0cmlidXRlIG1ldGhvZDoN
Cj4gPiA+ICAgcXVldWVfYXR0cl9zdG9yZSgpIG9idGFpbnMgcS0+c3lzZnNfbG9jay4NCj4gPiA+
ICogQW5vdGhlciBjb250ZXh0IHJlbW92ZXMgYSBzeXNmcyBhdHRyaWJ1dGUgd2hpbGUgaG9sZGlu
ZyBhIG11dGV4Og0KPiA+ID4gICBibGtfdW5yZWdpc3Rlcl9xdWV1ZSgpIHJlbW92ZXMgdGhlIHF1
ZXVlIHN5c2ZzIGF0dHJpYnV0ZXMgd2hpbGUgaG9sZGluZw0KPiA+ID4gICBxLT5zeXNmc19sb2Nr
Lg0KPiA+ID4gDQo+ID4gPiBUaGlzIGNhbiByZXN1bHQgaW4gYSByZWFsIGRlYWRsb2NrIGJlY2F1
c2UgdGhlIGNvZGUgdGhhdCByZW1vdmVzIHN5c2ZzIG9iamVjdHMNCj4gPiA+IHdhaXRzIHVudGls
IGFsbCBvbmdvaW5nIGF0dHJpYnV0ZSBjYWxsYmFja3MgaGF2ZSBmaW5pc2hlZC4NCj4gPiA+IA0K
PiA+ID4gU2luY2UgY29tbWl0IDY2NzI1N2U4YjI5OCAoImJsb2NrOiBwcm9wZXJseSBwcm90ZWN0
IHRoZSAncXVldWUnIGtvYmogaW4NCj4gPiA+IGJsa191bnJlZ2lzdGVyX3F1ZXVlIikgbW9kaWZp
ZWQgYmxrX3VucmVnaXN0ZXJfcXVldWUoKSBzdWNoIHRoYXQgcS0+c3lzZnNfbG9jaw0KPiA+ID4g
aXMgaGVsZCBhcm91bmQgdGhlIGtvYmplY3RfZGVsKCZxLT5rb2JqKSBjYWxsIEkgdGhpbmsgdGhp
cyBpcyBhIHJlZ3Jlc3Npb24NCj4gPiA+IGludHJvZHVjZWQgYnkgdGhhdCBjb21taXQuDQo+ID4g
DQo+ID4gU3VyZSwgb2YgY291cnNlIGl0IGlzIGEgcmVncmVzc2lvbi4NCj4gPiANCj4gPiBBc2lk
ZSBmcm9tIG1vdmluZyB0aGUgbXV0ZXhfdW5sb2NrKCZxLT5zeXNmc19sb2NrKSBhYm92ZSB0aGUN
Cj4gPiBrb2JqZWN0X2RlbCgmcS0+a29iaikgSSBkb24ndCBrbm93IGhvdyB0byBmaXggaXQuDQo+
ID4gDQo+ID4gVGhvdWdoLCByZWFsaXN0aWNhbGx5IHRoYXQnZCBiZSBhbiBhZGVxdWF0ZSBmaXgg
KGdpdmVuIHRoZSB3YXkgdGhlIGNvZGUNCj4gPiB3YXMgYmVmb3JlKS4NCj4gDQo+IEFueSBjaGFu
Y2UgeW91IGFwcGx5IHRoaXMgYW5kIHJlLXJ1biB5b3VyIHNycF90ZXN0IHRoYXQgdHJpZ2dlcmVk
IHRoZQ0KPiBsb2NrZGVwIHNwbGF0Pw0KPiANCj4gZGlmZiAtLWdpdCBhL2Jsb2NrL2Jsay1zeXNm
cy5jIGIvYmxvY2svYmxrLXN5c2ZzLmMNCj4gaW5kZXggNGE2YTQwZmZkNzhlLi5jNTBlMDhlOWJm
MTcgMTAwNjQ0DQo+IC0tLSBhL2Jsb2NrL2Jsay1zeXNmcy5jDQo+ICsrKyBiL2Jsb2NrL2Jsay1z
eXNmcy5jDQo+IEBAIC05NTIsMTAgKzk1MiwxMCBAQCB2b2lkIGJsa191bnJlZ2lzdGVyX3F1ZXVl
KHN0cnVjdCBnZW5kaXNrICpkaXNrKQ0KPiAgCWlmIChxLT5yZXF1ZXN0X2ZuIHx8IChxLT5tcV9v
cHMgJiYgcS0+ZWxldmF0b3IpKQ0KPiAgCQllbHZfdW5yZWdpc3Rlcl9xdWV1ZShxKTsNCj4gIA0K
PiArCW11dGV4X3VubG9jaygmcS0+c3lzZnNfbG9jayk7DQo+ICsNCj4gIAlrb2JqZWN0X3VldmVu
dCgmcS0+a29iaiwgS09CSl9SRU1PVkUpOw0KPiAgCWtvYmplY3RfZGVsKCZxLT5rb2JqKTsNCj4g
IAlibGtfdHJhY2VfcmVtb3ZlX3N5c2ZzKGRpc2tfdG9fZGV2KGRpc2spKTsNCj4gIAlrb2JqZWN0
X3B1dCgmZGlza190b19kZXYoZGlzayktPmtvYmopOw0KPiAtDQo+IC0JbXV0ZXhfdW5sb2NrKCZx
LT5zeXNmc19sb2NrKTsNCj4gIH0NCg0KSGVsbG8gTWlrZSwNCg0KVG9kYXkgc3lzZnNfbG9jayBw
cm90ZWN0cyBhIHdob2xlIGJ1bmNoIG9mIHN0YXRlIGluZm9ybWF0aW9uLiBJIHRoaW5rIGZvciB0
aGUNCmxvbmdlciB0ZXJtIHdlIHNob3VsZCBzcGxpdCBpdCBpbnRvIG11bHRpcGxlIG11dGV4ZXMu
IEZvciB0aGUgc2hvcnQgdGVybSwgcGxlYXNlDQpoYXZlIGEgbG9vayBhdCB0aGUgcGF0Y2ggc2Vy
aWVzIEkganVzdCBwb3N0ZWQgb24gdGhlIGxpbnV4LWJsb2NrIG1haWxpbmcgbGlzdC4NCg0KVGhh
bmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready
@ 2018-01-16 18:19             ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2018-01-16 18:19 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, linux-block, hare, tom.leiming, axboe, djeffery

On Mon, 2018-01-15 at 18:13 -0500, Mike Snitzer wrote:
> On Mon, Jan 15 2018 at  6:10P -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Mon, Jan 15 2018 at  5:51pm -0500,
> > Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> > 
> > > On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
> > > > sysfs write op calls kernfs_fop_write which takes:
> > > > of->mutex then kn->count#213 (no idea what that is)
> > > > then q->sysfs_lock (via queue_attr_store)
> > > > 
> > > > vs 
> > > > 
> > > > blk_unregister_queue takes:
> > > > q->sysfs_lock then
> > > > kernfs_mutex (via kernfs_remove)
> > > > seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
> > > > 
> > > > Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
> > > > triggering false positives.. because these seem like different kernfs
> > > > locks yet they are reported as "kn->count#213".
> > > > 
> > > > Certainly feeling out of my depth with kernfs's locking though.
> > > 
> > > Hello Mike,
> > > 
> > > I don't think that this is a false positive but rather the following traditional
> > > pattern of a potential deadlock involving sysfs attributes:
> > > * One context obtains a mutex from inside a sysfs attribute method:
> > >   queue_attr_store() obtains q->sysfs_lock.
> > > * Another context removes a sysfs attribute while holding a mutex:
> > >   blk_unregister_queue() removes the queue sysfs attributes while holding
> > >   q->sysfs_lock.
> > > 
> > > This can result in a real deadlock because the code that removes sysfs objects
> > > waits until all ongoing attribute callbacks have finished.
> > > 
> > > Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
> > > blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
> > > is held around the kobject_del(&q->kobj) call I think this is a regression
> > > introduced by that commit.
> > 
> > Sure, of course it is a regression.
> > 
> > Aside from moving the mutex_unlock(&q->sysfs_lock) above the
> > kobject_del(&q->kobj) I don't know how to fix it.
> > 
> > Though, realistically that'd be an adequate fix (given the way the code
> > was before).
> 
> Any chance you apply this and re-run your srp_test that triggered the
> lockdep splat?
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 4a6a40ffd78e..c50e08e9bf17 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -952,10 +952,10 @@ void blk_unregister_queue(struct gendisk *disk)
>  	if (q->request_fn || (q->mq_ops && q->elevator))
>  		elv_unregister_queue(q);
>  
> +	mutex_unlock(&q->sysfs_lock);
> +
>  	kobject_uevent(&q->kobj, KOBJ_REMOVE);
>  	kobject_del(&q->kobj);
>  	blk_trace_remove_sysfs(disk_to_dev(disk));
>  	kobject_put(&disk_to_dev(disk)->kobj);
> -
> -	mutex_unlock(&q->sysfs_lock);
>  }

Hello Mike,

Today sysfs_lock protects a whole bunch of state information. I think for the
longer term we should split it into multiple mutexes. For the short term, please
have a look at the patch series I just posted on the linux-block mailing list.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-01-16 18:19 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12 15:06 [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
2018-01-12 15:06 ` [for-4.16 PATCH v5 1/4] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN Mike Snitzer
2018-01-12 15:06 ` [for-4.16 PATCH v5 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue Mike Snitzer
2018-01-12 15:17   ` Ming Lei
2018-01-12 15:29     ` Mike Snitzer
2018-01-12 16:03   ` [for-4.16 PATCH v6 " Mike Snitzer
2018-01-13 12:57     ` Ming Lei
2018-01-12 15:06 ` [for-4.16 PATCH v5 3/4] block: allow gendisk's request_queue registration to be deferred Mike Snitzer
2018-01-12 15:06 ` [for-4.16 PATCH v5 4/4] dm: fix incomplete request_queue initialization Mike Snitzer
2018-01-12 16:13 ` [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready Mike Snitzer
2018-01-15 17:16 ` Bart Van Assche
2018-01-15 17:16   ` Bart Van Assche
2018-01-15 17:29   ` Mike Snitzer
2018-01-15 17:36     ` Bart Van Assche
2018-01-15 17:36       ` Bart Van Assche
2018-01-15 17:48       ` Mike Snitzer
2018-01-15 17:51         ` Bart Van Assche
2018-01-15 17:51           ` Bart Van Assche
2018-01-15 17:57           ` Mike Snitzer
2018-01-15 22:15   ` Mike Snitzer
2018-01-15 22:51     ` Bart Van Assche
2018-01-15 22:51       ` Bart Van Assche
2018-01-15 23:10       ` Mike Snitzer
2018-01-15 23:13         ` Mike Snitzer
2018-01-16  2:21           ` Ming Lei
2018-01-16 18:19           ` Bart Van Assche
2018-01-16 18:19             ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.