All of lore.kernel.org
 help / color / mirror / Atom feed
* fix delayed holder tracking v2
@ 2022-10-30 15:31 ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

Hi all,

this series tries to fix the delayed holder tracking that is only used by
dm by moving it into dm, where we can track the lifetimes much better.

Changes since v1: 
 - don't blow away ->bd_holder_dir in del_gendisk or add_disk failure
   as the holder unregistration references it
 - add an extra cleanup patch

Diffstat:
 block/genhd.c          |    6 --
 block/holder.c         |   85 ++++++++++------------------------
 drivers/md/dm.c        |  122 ++++++++++++++++++++++++++-----------------------
 include/linux/blkdev.h |    5 --
 4 files changed, 93 insertions(+), 125 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dm-devel] fix delayed holder tracking v2
@ 2022-10-30 15:31 ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

Hi all,

this series tries to fix the delayed holder tracking that is only used by
dm by moving it into dm, where we can track the lifetimes much better.

Changes since v1: 
 - don't blow away ->bd_holder_dir in del_gendisk or add_disk failure
   as the holder unregistration references it
 - add an extra cleanup patch

Diffstat:
 block/genhd.c          |    6 --
 block/holder.c         |   85 ++++++++++------------------------
 drivers/md/dm.c        |  122 ++++++++++++++++++++++++++-----------------------
 include/linux/blkdev.h |    5 --
 4 files changed, 93 insertions(+), 125 deletions(-)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/7] block: clear ->slave_dir when dropping the main slave_dir reference
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

Zero out the pointer to ->slave_dir so that the holder code doesn't
incorrectly treat the object as alive when add_disk failed or after
del_gendisk was called.

Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 17b33c62423df..aa0e2f5684543 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -528,6 +528,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 	blk_unregister_queue(disk);
 out_put_slave_dir:
 	kobject_put(disk->slave_dir);
+	disk->slave_dir = NULL;
 out_put_holder_dir:
 	kobject_put(disk->part0->bd_holder_dir);
 out_del_integrity:
@@ -624,6 +625,7 @@ void del_gendisk(struct gendisk *disk)
 
 	kobject_put(disk->part0->bd_holder_dir);
 	kobject_put(disk->slave_dir);
+	disk->slave_dir = NULL;
 
 	part_stat_set_all(disk->part0, 0);
 	disk->part0->bd_stamp = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 1/7] block: clear ->slave_dir when dropping the main slave_dir reference
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

Zero out the pointer to ->slave_dir so that the holder code doesn't
incorrectly treat the object as alive when add_disk failed or after
del_gendisk was called.

Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 17b33c62423df..aa0e2f5684543 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -528,6 +528,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 	blk_unregister_queue(disk);
 out_put_slave_dir:
 	kobject_put(disk->slave_dir);
+	disk->slave_dir = NULL;
 out_put_holder_dir:
 	kobject_put(disk->part0->bd_holder_dir);
 out_del_integrity:
@@ -624,6 +625,7 @@ void del_gendisk(struct gendisk *disk)
 
 	kobject_put(disk->part0->bd_holder_dir);
 	kobject_put(disk->slave_dir);
+	disk->slave_dir = NULL;
 
 	part_stat_set_all(disk->part0, 0);
 	disk->part0->bd_stamp = 0;
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/7] dm: remove free_table_devices
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

free_table_devices just warns and frees all table_device structures when
the target removal did not remove them.  This should never happen, but
if it did, just freeing the structure without deleting them from the
list or cleaning up the resources would not help at all.  So just WARN on
a non-empty list instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 95a1ee3d314eb..19d25bf997be4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -833,19 +833,6 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
 	mutex_unlock(&md->table_devices_lock);
 }
 
-static void free_table_devices(struct list_head *devices)
-{
-	struct list_head *tmp, *next;
-
-	list_for_each_safe(tmp, next, devices) {
-		struct table_device *td = list_entry(tmp, struct table_device, list);
-
-		DMWARN("dm_destroy: %s still exists with %d references",
-		       td->dm_dev.name, refcount_read(&td->count));
-		kfree(td);
-	}
-}
-
 /*
  * Get the geometry associated with a dm device
  */
@@ -2122,7 +2109,7 @@ static void free_dev(struct mapped_device *md)
 
 	cleanup_mapped_device(md);
 
-	free_table_devices(&md->table_devices);
+	WARN_ON_ONCE(!list_empty(&md->table_devices));
 	dm_stats_cleanup(&md->stats);
 	free_minor(minor);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 2/7] dm: remove free_table_devices
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

free_table_devices just warns and frees all table_device structures when
the target removal did not remove them.  This should never happen, but
if it did, just freeing the structure without deleting them from the
list or cleaning up the resources would not help at all.  So just WARN on
a non-empty list instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 95a1ee3d314eb..19d25bf997be4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -833,19 +833,6 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
 	mutex_unlock(&md->table_devices_lock);
 }
 
-static void free_table_devices(struct list_head *devices)
-{
-	struct list_head *tmp, *next;
-
-	list_for_each_safe(tmp, next, devices) {
-		struct table_device *td = list_entry(tmp, struct table_device, list);
-
-		DMWARN("dm_destroy: %s still exists with %d references",
-		       td->dm_dev.name, refcount_read(&td->count));
-		kfree(td);
-	}
-}
-
 /*
  * Get the geometry associated with a dm device
  */
@@ -2122,7 +2109,7 @@ static void free_dev(struct mapped_device *md)
 
 	cleanup_mapped_device(md);
 
-	free_table_devices(&md->table_devices);
+	WARN_ON_ONCE(!list_empty(&md->table_devices));
 	dm_stats_cleanup(&md->stats);
 	free_minor(minor);
 
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/7] dm: cleanup open_table_device
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

Move all the logic for allocation the table_device and linking it into
the list into the open_table_device.  This keeps the code tidy and
ensures that the table_devices only exist in fully initialized state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 56 ++++++++++++++++++++++++-------------------------
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 19d25bf997be4..28d7581b6a826 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -732,28 +732,41 @@ static char *_dm_claim_ptr = "I belong to device-mapper";
 /*
  * Open a table device so we can use it as a map destination.
  */
-static int open_table_device(struct table_device *td, dev_t dev,
-			     struct mapped_device *md)
+static struct table_device *open_table_device(struct mapped_device *md,
+		dev_t dev, fmode_t mode)
 {
+	struct table_device *td;
 	struct block_device *bdev;
 	u64 part_off;
 	int r;
 
-	BUG_ON(td->dm_dev.bdev);
+	td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
+	if (!td)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&td->count, 1);
 
-	bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
-	if (IS_ERR(bdev))
-		return PTR_ERR(bdev);
+	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+	if (IS_ERR(bdev)) {
+		r = PTR_ERR(bdev);
+		goto out_free_td;
+	}
 
 	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r) {
-		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
-		return r;
-	}
+	if (r)
+		goto out_blkdev_put;
 
+	td->dm_dev.mode = mode;
 	td->dm_dev.bdev = bdev;
 	td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off, NULL, NULL);
-	return 0;
+	format_dev_t(td->dm_dev.name, dev);
+	list_add(&td->list, &md->table_devices);
+	return td;
+
+out_blkdev_put:
+	blkdev_put(bdev, mode | FMODE_EXCL);
+out_free_td:
+	kfree(td);
+	return ERR_PTR(r);
 }
 
 /*
@@ -786,31 +799,16 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev,
 int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
 			struct dm_dev **result)
 {
-	int r;
 	struct table_device *td;
 
 	mutex_lock(&md->table_devices_lock);
 	td = find_table_device(&md->table_devices, dev, mode);
 	if (!td) {
-		td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
-		if (!td) {
-			mutex_unlock(&md->table_devices_lock);
-			return -ENOMEM;
-		}
-
-		td->dm_dev.mode = mode;
-		td->dm_dev.bdev = NULL;
-
-		if ((r = open_table_device(td, dev, md))) {
+		td = open_table_device(md, dev, mode);
+		if (IS_ERR(td)) {
 			mutex_unlock(&md->table_devices_lock);
-			kfree(td);
-			return r;
+			return PTR_ERR(td);
 		}
-
-		format_dev_t(td->dm_dev.name, dev);
-
-		refcount_set(&td->count, 1);
-		list_add(&td->list, &md->table_devices);
 	} else {
 		refcount_inc(&td->count);
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 3/7] dm: cleanup open_table_device
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

Move all the logic for allocation the table_device and linking it into
the list into the open_table_device.  This keeps the code tidy and
ensures that the table_devices only exist in fully initialized state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 56 ++++++++++++++++++++++++-------------------------
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 19d25bf997be4..28d7581b6a826 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -732,28 +732,41 @@ static char *_dm_claim_ptr = "I belong to device-mapper";
 /*
  * Open a table device so we can use it as a map destination.
  */
-static int open_table_device(struct table_device *td, dev_t dev,
-			     struct mapped_device *md)
+static struct table_device *open_table_device(struct mapped_device *md,
+		dev_t dev, fmode_t mode)
 {
+	struct table_device *td;
 	struct block_device *bdev;
 	u64 part_off;
 	int r;
 
-	BUG_ON(td->dm_dev.bdev);
+	td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
+	if (!td)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&td->count, 1);
 
-	bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
-	if (IS_ERR(bdev))
-		return PTR_ERR(bdev);
+	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+	if (IS_ERR(bdev)) {
+		r = PTR_ERR(bdev);
+		goto out_free_td;
+	}
 
 	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r) {
-		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
-		return r;
-	}
+	if (r)
+		goto out_blkdev_put;
 
+	td->dm_dev.mode = mode;
 	td->dm_dev.bdev = bdev;
 	td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off, NULL, NULL);
-	return 0;
+	format_dev_t(td->dm_dev.name, dev);
+	list_add(&td->list, &md->table_devices);
+	return td;
+
+out_blkdev_put:
+	blkdev_put(bdev, mode | FMODE_EXCL);
+out_free_td:
+	kfree(td);
+	return ERR_PTR(r);
 }
 
 /*
@@ -786,31 +799,16 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev,
 int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
 			struct dm_dev **result)
 {
-	int r;
 	struct table_device *td;
 
 	mutex_lock(&md->table_devices_lock);
 	td = find_table_device(&md->table_devices, dev, mode);
 	if (!td) {
-		td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
-		if (!td) {
-			mutex_unlock(&md->table_devices_lock);
-			return -ENOMEM;
-		}
-
-		td->dm_dev.mode = mode;
-		td->dm_dev.bdev = NULL;
-
-		if ((r = open_table_device(td, dev, md))) {
+		td = open_table_device(md, dev, mode);
+		if (IS_ERR(td)) {
 			mutex_unlock(&md->table_devices_lock);
-			kfree(td);
-			return r;
+			return PTR_ERR(td);
 		}
-
-		format_dev_t(td->dm_dev.name, dev);
-
-		refcount_set(&td->count, 1);
-		list_add(&td->list, &md->table_devices);
 	} else {
 		refcount_inc(&td->count);
 	}
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/7] dm: cleanup close_table_device
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

Take the list unlink and free into close_table_device so that no half
torn down table_devices exist.  Also remove the check for a NULL bdev
as that can't happen - open_table_device never adds a table_device to
the list that does not have a valid block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 28d7581b6a826..2917700b1e15c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -774,14 +774,11 @@ static struct table_device *open_table_device(struct mapped_device *md,
  */
 static void close_table_device(struct table_device *td, struct mapped_device *md)
 {
-	if (!td->dm_dev.bdev)
-		return;
-
 	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
-	td->dm_dev.bdev = NULL;
-	td->dm_dev.dax_dev = NULL;
+	list_del(&td->list);
+	kfree(td);
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -823,11 +820,8 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
 	struct table_device *td = container_of(d, struct table_device, dm_dev);
 
 	mutex_lock(&md->table_devices_lock);
-	if (refcount_dec_and_test(&td->count)) {
+	if (refcount_dec_and_test(&td->count))
 		close_table_device(td, md);
-		list_del(&td->list);
-		kfree(td);
-	}
 	mutex_unlock(&md->table_devices_lock);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 4/7] dm: cleanup close_table_device
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

Take the list unlink and free into close_table_device so that no half
torn down table_devices exist.  Also remove the check for a NULL bdev
as that can't happen - open_table_device never adds a table_device to
the list that does not have a valid block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 28d7581b6a826..2917700b1e15c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -774,14 +774,11 @@ static struct table_device *open_table_device(struct mapped_device *md,
  */
 static void close_table_device(struct table_device *td, struct mapped_device *md)
 {
-	if (!td->dm_dev.bdev)
-		return;
-
 	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
-	td->dm_dev.bdev = NULL;
-	td->dm_dev.dax_dev = NULL;
+	list_del(&td->list);
+	kfree(td);
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -823,11 +820,8 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
 	struct table_device *td = container_of(d, struct table_device, dm_dev);
 
 	mutex_lock(&md->table_devices_lock);
-	if (refcount_dec_and_test(&td->count)) {
+	if (refcount_dec_and_test(&td->count))
 		close_table_device(td, md);
-		list_del(&td->list);
-		kfree(td);
-	}
 	mutex_unlock(&md->table_devices_lock);
 }
 
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

dm is a bit special in that it opens the underlying devices.  Commit
89f871af1b26 ("dm: delay registering the gendisk") tried to accomodate
that by allowing to add the holder to the list before add_gendisk and
then just add them to sysfs once add_disk is called.  But that leads to
really odd lifetime problems and error handling problems as we can't
know the state of the kobjects and don't unwind properly.  To fix this
switch to just registering all existing table_devices with the holder
code right after add_disk, and remove them before calling del_gendisk.

Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2917700b1e15c..7b0d6dc957549 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
 		goto out_free_td;
 	}
 
-	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r)
-		goto out_blkdev_put;
+	/*
+	 * We can be called before the dm disk is added.  In that case we can't
+	 * register the holder relation here.  It will be done once add_disk was
+	 * called.
+	 */
+	if (md->disk->slave_dir) {
+		r = bd_link_disk_holder(bdev, md->disk);
+		if (r)
+			goto out_blkdev_put;
+	}
 
 	td->dm_dev.mode = mode;
 	td->dm_dev.bdev = bdev;
@@ -774,7 +781,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
  */
 static void close_table_device(struct table_device *td, struct mapped_device *md)
 {
-	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+	if (md->disk->slave_dir)
+		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
 	list_del(&td->list);
@@ -1951,7 +1959,13 @@ static void cleanup_mapped_device(struct mapped_device *md)
 		md->disk->private_data = NULL;
 		spin_unlock(&_minor_lock);
 		if (dm_get_md_type(md) != DM_TYPE_NONE) {
+			struct table_device *td;
+
 			dm_sysfs_exit(md);
+			list_for_each_entry(td, &md->table_devices, list) {
+				bd_unlink_disk_holder(td->dm_dev.bdev,
+						      md->disk);
+			}
 			del_gendisk(md->disk);
 		}
 		dm_queue_destroy_crypto_profile(md->queue);
@@ -2284,6 +2298,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 {
 	enum dm_queue_mode type = dm_table_get_type(t);
 	struct queue_limits limits;
+	struct table_device *td;
 	int r;
 
 	switch (type) {
@@ -2316,13 +2331,27 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 	if (r)
 		return r;
 
-	r = dm_sysfs_init(md);
-	if (r) {
-		del_gendisk(md->disk);
-		return r;
+	/*
+	 * Register the holder relationship for devices added before the disk
+	 * was live.
+	 */
+	list_for_each_entry(td, &md->table_devices, list) {
+		r = bd_link_disk_holder(td->dm_dev.bdev, md->disk);
+		if (r)
+			goto out_undo_holders;
 	}
+
+	r = dm_sysfs_init(md);
+	if (r)
+		goto out_undo_holders;
 	md->type = type;
 	return 0;
+
+out_undo_holders:
+	list_for_each_entry_continue_reverse(td, &md->table_devices, list)
+		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
+	del_gendisk(md->disk);
+	return r;
 }
 
 struct mapped_device *dm_get_md(dev_t dev)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

dm is a bit special in that it opens the underlying devices.  Commit
89f871af1b26 ("dm: delay registering the gendisk") tried to accomodate
that by allowing to add the holder to the list before add_gendisk and
then just add them to sysfs once add_disk is called.  But that leads to
really odd lifetime problems and error handling problems as we can't
know the state of the kobjects and don't unwind properly.  To fix this
switch to just registering all existing table_devices with the holder
code right after add_disk, and remove them before calling del_gendisk.

Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2917700b1e15c..7b0d6dc957549 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
 		goto out_free_td;
 	}
 
-	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r)
-		goto out_blkdev_put;
+	/*
+	 * We can be called before the dm disk is added.  In that case we can't
+	 * register the holder relation here.  It will be done once add_disk was
+	 * called.
+	 */
+	if (md->disk->slave_dir) {
+		r = bd_link_disk_holder(bdev, md->disk);
+		if (r)
+			goto out_blkdev_put;
+	}
 
 	td->dm_dev.mode = mode;
 	td->dm_dev.bdev = bdev;
@@ -774,7 +781,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
  */
 static void close_table_device(struct table_device *td, struct mapped_device *md)
 {
-	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+	if (md->disk->slave_dir)
+		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
 	list_del(&td->list);
@@ -1951,7 +1959,13 @@ static void cleanup_mapped_device(struct mapped_device *md)
 		md->disk->private_data = NULL;
 		spin_unlock(&_minor_lock);
 		if (dm_get_md_type(md) != DM_TYPE_NONE) {
+			struct table_device *td;
+
 			dm_sysfs_exit(md);
+			list_for_each_entry(td, &md->table_devices, list) {
+				bd_unlink_disk_holder(td->dm_dev.bdev,
+						      md->disk);
+			}
 			del_gendisk(md->disk);
 		}
 		dm_queue_destroy_crypto_profile(md->queue);
@@ -2284,6 +2298,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 {
 	enum dm_queue_mode type = dm_table_get_type(t);
 	struct queue_limits limits;
+	struct table_device *td;
 	int r;
 
 	switch (type) {
@@ -2316,13 +2331,27 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 	if (r)
 		return r;
 
-	r = dm_sysfs_init(md);
-	if (r) {
-		del_gendisk(md->disk);
-		return r;
+	/*
+	 * Register the holder relationship for devices added before the disk
+	 * was live.
+	 */
+	list_for_each_entry(td, &md->table_devices, list) {
+		r = bd_link_disk_holder(td->dm_dev.bdev, md->disk);
+		if (r)
+			goto out_undo_holders;
 	}
+
+	r = dm_sysfs_init(md);
+	if (r)
+		goto out_undo_holders;
 	md->type = type;
 	return 0;
+
+out_undo_holders:
+	list_for_each_entry_continue_reverse(td, &md->table_devices, list)
+		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
+	del_gendisk(md->disk);
+	return r;
 }
 
 struct mapped_device *dm_get_md(dev_t dev)
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/7] block: remove delayed holder registration
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

Now that dm has been fixed to track of holder registrations before
add_disk, the somewhat buggy block layer code can be safely removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c          |  4 ---
 block/holder.c         | 72 ++++++++++++------------------------------
 include/linux/blkdev.h |  5 ---
 3 files changed, 21 insertions(+), 60 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index aa0e2f5684543..97d2243f07827 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -478,10 +478,6 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 		goto out_put_holder_dir;
 	}
 
-	ret = bd_register_pending_holders(disk);
-	if (ret < 0)
-		goto out_put_slave_dir;
-
 	ret = blk_register_queue(disk);
 	if (ret)
 		goto out_put_slave_dir;
diff --git a/block/holder.c b/block/holder.c
index 5283bc804cc14..dd9327b43ce05 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -29,19 +29,6 @@ static void del_symlink(struct kobject *from, struct kobject *to)
 	sysfs_remove_link(from, kobject_name(to));
 }
 
-static int __link_disk_holder(struct block_device *bdev, struct gendisk *disk)
-{
-	int ret;
-
-	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
-	if (ret)
-		return ret;
-	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-	if (ret)
-		del_symlink(disk->slave_dir, bdev_kobj(bdev));
-	return ret;
-}
-
 /**
  * bd_link_disk_holder - create symlinks between holding disk and slave bdev
  * @bdev: the claimed slave bdev
@@ -75,6 +62,9 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	struct bd_holder_disk *holder;
 	int ret = 0;
 
+	if (WARN_ON_ONCE(!disk->slave_dir))
+		return -EINVAL;
+
 	mutex_lock(&disk->open_mutex);
 
 	WARN_ON_ONCE(!bdev->bd_holder);
@@ -94,34 +84,32 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	INIT_LIST_HEAD(&holder->list);
 	holder->bdev = bdev;
 	holder->refcnt = 1;
-	if (disk->slave_dir) {
-		ret = __link_disk_holder(bdev, disk);
-		if (ret) {
-			kfree(holder);
-			goto out_unlock;
-		}
-	}
-
+	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
+	if (ret)
+		goto out_free_holder;
+	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
+	if (ret)
+		goto out_del_symlink;
 	list_add(&holder->list, &disk->slave_bdevs);
+
 	/*
 	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
 	 * to keep our own here to allow for cleanup past that point.
 	 */
 	kobject_get(bdev->bd_holder_dir);
+	mutex_unlock(&disk->open_mutex);
+	return 0;
 
+out_del_symlink:
+	del_symlink(disk->slave_dir, bdev_kobj(bdev));
+out_free_holder:
+	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
 
-static void __unlink_disk_holder(struct block_device *bdev,
-		struct gendisk *disk)
-{
-	del_symlink(disk->slave_dir, bdev_kobj(bdev));
-	del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-}
-
 /**
  * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
  * @bdev: the calimed slave bdev
@@ -136,11 +124,14 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
+	if (WARN_ON_ONCE(!disk->slave_dir))
+		return;
+
 	mutex_lock(&disk->open_mutex);
 	holder = bd_find_holder_disk(bdev, disk);
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
-		if (disk->slave_dir)
-			__unlink_disk_holder(bdev, disk);
+		del_symlink(disk->slave_dir, bdev_kobj(bdev));
+		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
 		kobject_put(bdev->bd_holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
@@ -148,24 +139,3 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	mutex_unlock(&disk->open_mutex);
 }
 EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
-
-int bd_register_pending_holders(struct gendisk *disk)
-{
-	struct bd_holder_disk *holder;
-	int ret;
-
-	mutex_lock(&disk->open_mutex);
-	list_for_each_entry(holder, &disk->slave_bdevs, list) {
-		ret = __link_disk_holder(holder->bdev, disk);
-		if (ret)
-			goto out_undo;
-	}
-	mutex_unlock(&disk->open_mutex);
-	return 0;
-
-out_undo:
-	list_for_each_entry_continue_reverse(holder, &disk->slave_bdevs, list)
-		__unlink_disk_holder(holder->bdev, disk);
-	mutex_unlock(&disk->open_mutex);
-	return ret;
-}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57ed49f20d2eb..df45037815527 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -839,7 +839,6 @@ void set_capacity(struct gendisk *disk, sector_t size);
 #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
 int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
 void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk);
-int bd_register_pending_holders(struct gendisk *disk);
 #else
 static inline int bd_link_disk_holder(struct block_device *bdev,
 				      struct gendisk *disk)
@@ -850,10 +849,6 @@ static inline void bd_unlink_disk_holder(struct block_device *bdev,
 					 struct gendisk *disk)
 {
 }
-static inline int bd_register_pending_holders(struct gendisk *disk)
-{
-	return 0;
-}
 #endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */
 
 dev_t part_devt(struct gendisk *disk, u8 partno);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 6/7] block: remove delayed holder registration
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

Now that dm has been fixed to track of holder registrations before
add_disk, the somewhat buggy block layer code can be safely removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c          |  4 ---
 block/holder.c         | 72 ++++++++++++------------------------------
 include/linux/blkdev.h |  5 ---
 3 files changed, 21 insertions(+), 60 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index aa0e2f5684543..97d2243f07827 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -478,10 +478,6 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 		goto out_put_holder_dir;
 	}
 
-	ret = bd_register_pending_holders(disk);
-	if (ret < 0)
-		goto out_put_slave_dir;
-
 	ret = blk_register_queue(disk);
 	if (ret)
 		goto out_put_slave_dir;
diff --git a/block/holder.c b/block/holder.c
index 5283bc804cc14..dd9327b43ce05 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -29,19 +29,6 @@ static void del_symlink(struct kobject *from, struct kobject *to)
 	sysfs_remove_link(from, kobject_name(to));
 }
 
-static int __link_disk_holder(struct block_device *bdev, struct gendisk *disk)
-{
-	int ret;
-
-	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
-	if (ret)
-		return ret;
-	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-	if (ret)
-		del_symlink(disk->slave_dir, bdev_kobj(bdev));
-	return ret;
-}
-
 /**
  * bd_link_disk_holder - create symlinks between holding disk and slave bdev
  * @bdev: the claimed slave bdev
@@ -75,6 +62,9 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	struct bd_holder_disk *holder;
 	int ret = 0;
 
+	if (WARN_ON_ONCE(!disk->slave_dir))
+		return -EINVAL;
+
 	mutex_lock(&disk->open_mutex);
 
 	WARN_ON_ONCE(!bdev->bd_holder);
@@ -94,34 +84,32 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	INIT_LIST_HEAD(&holder->list);
 	holder->bdev = bdev;
 	holder->refcnt = 1;
-	if (disk->slave_dir) {
-		ret = __link_disk_holder(bdev, disk);
-		if (ret) {
-			kfree(holder);
-			goto out_unlock;
-		}
-	}
-
+	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
+	if (ret)
+		goto out_free_holder;
+	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
+	if (ret)
+		goto out_del_symlink;
 	list_add(&holder->list, &disk->slave_bdevs);
+
 	/*
 	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
 	 * to keep our own here to allow for cleanup past that point.
 	 */
 	kobject_get(bdev->bd_holder_dir);
+	mutex_unlock(&disk->open_mutex);
+	return 0;
 
+out_del_symlink:
+	del_symlink(disk->slave_dir, bdev_kobj(bdev));
+out_free_holder:
+	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
 
-static void __unlink_disk_holder(struct block_device *bdev,
-		struct gendisk *disk)
-{
-	del_symlink(disk->slave_dir, bdev_kobj(bdev));
-	del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-}
-
 /**
  * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
  * @bdev: the calimed slave bdev
@@ -136,11 +124,14 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
+	if (WARN_ON_ONCE(!disk->slave_dir))
+		return;
+
 	mutex_lock(&disk->open_mutex);
 	holder = bd_find_holder_disk(bdev, disk);
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
-		if (disk->slave_dir)
-			__unlink_disk_holder(bdev, disk);
+		del_symlink(disk->slave_dir, bdev_kobj(bdev));
+		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
 		kobject_put(bdev->bd_holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
@@ -148,24 +139,3 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	mutex_unlock(&disk->open_mutex);
 }
 EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
-
-int bd_register_pending_holders(struct gendisk *disk)
-{
-	struct bd_holder_disk *holder;
-	int ret;
-
-	mutex_lock(&disk->open_mutex);
-	list_for_each_entry(holder, &disk->slave_bdevs, list) {
-		ret = __link_disk_holder(holder->bdev, disk);
-		if (ret)
-			goto out_undo;
-	}
-	mutex_unlock(&disk->open_mutex);
-	return 0;
-
-out_undo:
-	list_for_each_entry_continue_reverse(holder, &disk->slave_bdevs, list)
-		__unlink_disk_holder(holder->bdev, disk);
-	mutex_unlock(&disk->open_mutex);
-	return ret;
-}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 57ed49f20d2eb..df45037815527 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -839,7 +839,6 @@ void set_capacity(struct gendisk *disk, sector_t size);
 #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
 int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
 void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk);
-int bd_register_pending_holders(struct gendisk *disk);
 #else
 static inline int bd_link_disk_holder(struct block_device *bdev,
 				      struct gendisk *disk)
@@ -850,10 +849,6 @@ static inline void bd_unlink_disk_holder(struct block_device *bdev,
 					 struct gendisk *disk)
 {
 }
-static inline int bd_register_pending_holders(struct gendisk *disk)
-{
-	return 0;
-}
 #endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */
 
 dev_t part_devt(struct gendisk *disk, u8 partno);
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
@ 2022-10-30 15:31   ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block

We hold a reference to the holder kobject for each bd_holder_disk,
so to make the code a bit more robust, use a reference to it instead
of the block_device.  As long as no one clears ->bd_holder_dir in
before freeing the disk, this isn't strictly required, but it does
make the code more clear and more robust.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/holder.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/holder.c b/block/holder.c
index dd9327b43ce05..a8c355b9d0806 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -4,7 +4,7 @@
 
 struct bd_holder_disk {
 	struct list_head	list;
-	struct block_device	*bdev;
+	struct kobject		*holder_dir;
 	int			refcnt;
 };
 
@@ -14,7 +14,7 @@ static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
 	struct bd_holder_disk *holder;
 
 	list_for_each_entry(holder, &disk->slave_bdevs, list)
-		if (holder->bdev == bdev)
+		if (holder->holder_dir == bdev->bd_holder_dir)
 			return holder;
 	return NULL;
 }
@@ -82,27 +82,24 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	}
 
 	INIT_LIST_HEAD(&holder->list);
-	holder->bdev = bdev;
 	holder->refcnt = 1;
+	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
+
 	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
 	if (ret)
-		goto out_free_holder;
-	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
+		goto out_put_holder_dir;
+	ret = add_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
 	if (ret)
 		goto out_del_symlink;
 	list_add(&holder->list, &disk->slave_bdevs);
 
-	/*
-	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
-	 * to keep our own here to allow for cleanup past that point.
-	 */
-	kobject_get(bdev->bd_holder_dir);
 	mutex_unlock(&disk->open_mutex);
 	return 0;
 
 out_del_symlink:
 	del_symlink(disk->slave_dir, bdev_kobj(bdev));
-out_free_holder:
+out_put_holder_dir:
+	kobject_put(holder->holder_dir);
 	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);
@@ -131,8 +128,8 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	holder = bd_find_holder_disk(bdev, disk);
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
 		del_symlink(disk->slave_dir, bdev_kobj(bdev));
-		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-		kobject_put(bdev->bd_holder_dir);
+		del_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
+		kobject_put(holder->holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-10-30 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:31 UTC (permalink / raw)
  To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: linux-block, Yu Kuai, dm-devel

We hold a reference to the holder kobject for each bd_holder_disk,
so to make the code a bit more robust, use a reference to it instead
of the block_device.  As long as no one clears ->bd_holder_dir in
before freeing the disk, this isn't strictly required, but it does
make the code more clear and more robust.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/holder.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/holder.c b/block/holder.c
index dd9327b43ce05..a8c355b9d0806 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -4,7 +4,7 @@
 
 struct bd_holder_disk {
 	struct list_head	list;
-	struct block_device	*bdev;
+	struct kobject		*holder_dir;
 	int			refcnt;
 };
 
@@ -14,7 +14,7 @@ static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
 	struct bd_holder_disk *holder;
 
 	list_for_each_entry(holder, &disk->slave_bdevs, list)
-		if (holder->bdev == bdev)
+		if (holder->holder_dir == bdev->bd_holder_dir)
 			return holder;
 	return NULL;
 }
@@ -82,27 +82,24 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	}
 
 	INIT_LIST_HEAD(&holder->list);
-	holder->bdev = bdev;
 	holder->refcnt = 1;
+	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
+
 	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
 	if (ret)
-		goto out_free_holder;
-	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
+		goto out_put_holder_dir;
+	ret = add_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
 	if (ret)
 		goto out_del_symlink;
 	list_add(&holder->list, &disk->slave_bdevs);
 
-	/*
-	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
-	 * to keep our own here to allow for cleanup past that point.
-	 */
-	kobject_get(bdev->bd_holder_dir);
 	mutex_unlock(&disk->open_mutex);
 	return 0;
 
 out_del_symlink:
 	del_symlink(disk->slave_dir, bdev_kobj(bdev));
-out_free_holder:
+out_put_holder_dir:
+	kobject_put(holder->holder_dir);
 	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);
@@ -131,8 +128,8 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	holder = bd_find_holder_disk(bdev, disk);
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
 		del_symlink(disk->slave_dir, bdev_kobj(bdev));
-		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
-		kobject_put(bdev->bd_holder_dir);
+		del_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
+		kobject_put(holder->holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
 	}
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
@ 2022-10-31  1:52     ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-10-31  1:52 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer
  Cc: Yu Kuai, dm-devel, linux-block, yukuai (C)

Hi

在 2022/10/30 23:31, Christoph Hellwig 写道:
> We hold a reference to the holder kobject for each bd_holder_disk,
> so to make the code a bit more robust, use a reference to it instead
> of the block_device.  As long as no one clears ->bd_holder_dir in
> before freeing the disk, this isn't strictly required, but it does
> make the code more clear and more robust.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/holder.c | 23 ++++++++++-------------
>   1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/block/holder.c b/block/holder.c
> index dd9327b43ce05..a8c355b9d0806 100644
> --- a/block/holder.c
> +++ b/block/holder.c
> @@ -4,7 +4,7 @@
>   
>   struct bd_holder_disk {
>   	struct list_head	list;
> -	struct block_device	*bdev;
> +	struct kobject		*holder_dir;
>   	int			refcnt;
>   };
>   
> @@ -14,7 +14,7 @@ static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
>   	struct bd_holder_disk *holder;
>   
>   	list_for_each_entry(holder, &disk->slave_bdevs, list)
> -		if (holder->bdev == bdev)
> +		if (holder->holder_dir == bdev->bd_holder_dir)
>   			return holder;
>   	return NULL;
>   }
> @@ -82,27 +82,24 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	}
>   
>   	INIT_LIST_HEAD(&holder->list);
> -	holder->bdev = bdev;
>   	holder->refcnt = 1;
> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);

I wonder is this safe here, if kobject reference is 0 here and
bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
warn about uaf, and kobject_get will return a address that is about to
be freed.

Thansk,
Kuai
> +
>   	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
>   	if (ret)
> -		goto out_free_holder;
> -	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
> +		goto out_put_holder_dir;
> +	ret = add_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
>   	if (ret)
>   		goto out_del_symlink;
>   	list_add(&holder->list, &disk->slave_bdevs);
>   
> -	/*
> -	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
> -	 * to keep our own here to allow for cleanup past that point.
> -	 */
> -	kobject_get(bdev->bd_holder_dir);
>   	mutex_unlock(&disk->open_mutex);
>   	return 0;
>   
>   out_del_symlink:
>   	del_symlink(disk->slave_dir, bdev_kobj(bdev));
> -out_free_holder:
> +out_put_holder_dir:
> +	kobject_put(holder->holder_dir);
>   	kfree(holder);
>   out_unlock:
>   	mutex_unlock(&disk->open_mutex);
> @@ -131,8 +128,8 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	holder = bd_find_holder_disk(bdev, disk);
>   	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
>   		del_symlink(disk->slave_dir, bdev_kobj(bdev));
> -		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
> -		kobject_put(bdev->bd_holder_dir);
> +		del_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
> +		kobject_put(holder->holder_dir);
>   		list_del_init(&holder->list);
>   		kfree(holder);
>   	}
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-10-31  1:52     ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-10-31  1:52 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer
  Cc: linux-block, Yu Kuai, dm-devel, yukuai (C)

Hi

在 2022/10/30 23:31, Christoph Hellwig 写道:
> We hold a reference to the holder kobject for each bd_holder_disk,
> so to make the code a bit more robust, use a reference to it instead
> of the block_device.  As long as no one clears ->bd_holder_dir in
> before freeing the disk, this isn't strictly required, but it does
> make the code more clear and more robust.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/holder.c | 23 ++++++++++-------------
>   1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/block/holder.c b/block/holder.c
> index dd9327b43ce05..a8c355b9d0806 100644
> --- a/block/holder.c
> +++ b/block/holder.c
> @@ -4,7 +4,7 @@
>   
>   struct bd_holder_disk {
>   	struct list_head	list;
> -	struct block_device	*bdev;
> +	struct kobject		*holder_dir;
>   	int			refcnt;
>   };
>   
> @@ -14,7 +14,7 @@ static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
>   	struct bd_holder_disk *holder;
>   
>   	list_for_each_entry(holder, &disk->slave_bdevs, list)
> -		if (holder->bdev == bdev)
> +		if (holder->holder_dir == bdev->bd_holder_dir)
>   			return holder;
>   	return NULL;
>   }
> @@ -82,27 +82,24 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	}
>   
>   	INIT_LIST_HEAD(&holder->list);
> -	holder->bdev = bdev;
>   	holder->refcnt = 1;
> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);

I wonder is this safe here, if kobject reference is 0 here and
bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
warn about uaf, and kobject_get will return a address that is about to
be freed.

Thansk,
Kuai
> +
>   	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
>   	if (ret)
> -		goto out_free_holder;
> -	ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
> +		goto out_put_holder_dir;
> +	ret = add_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
>   	if (ret)
>   		goto out_del_symlink;
>   	list_add(&holder->list, &disk->slave_bdevs);
>   
> -	/*
> -	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
> -	 * to keep our own here to allow for cleanup past that point.
> -	 */
> -	kobject_get(bdev->bd_holder_dir);
>   	mutex_unlock(&disk->open_mutex);
>   	return 0;
>   
>   out_del_symlink:
>   	del_symlink(disk->slave_dir, bdev_kobj(bdev));
> -out_free_holder:
> +out_put_holder_dir:
> +	kobject_put(holder->holder_dir);
>   	kfree(holder);
>   out_unlock:
>   	mutex_unlock(&disk->open_mutex);
> @@ -131,8 +128,8 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	holder = bd_find_holder_disk(bdev, disk);
>   	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
>   		del_symlink(disk->slave_dir, bdev_kobj(bdev));
> -		del_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
> -		kobject_put(bdev->bd_holder_dir);
> +		del_symlink(holder->holder_dir, &disk_to_dev(disk)->kobj);
> +		kobject_put(holder->holder_dir);
>   		list_del_init(&holder->list);
>   		kfree(holder);
>   	}
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-10-31  1:52     ` [dm-devel] " Yu Kuai
@ 2022-11-01 10:49       ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 10:49 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer,
	dm-devel, linux-block, yukuai (C)

On Mon, Oct 31, 2022 at 09:52:04AM +0800, Yu Kuai wrote:
>>     	INIT_LIST_HEAD(&holder->list);
>> -	holder->bdev = bdev;
>>   	holder->refcnt = 1;
>> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
>
> I wonder is this safe here, if kobject reference is 0 here and
> bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
> warn about uaf, and kobject_get will return a address that is about to
> be freed.

But how could the reference be 0 here?  The driver that calls
bd_link_disk_holder must have the block device open and thus hold a
reference to it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 10:49       ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 10:49 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Christoph Hellwig, Alasdair Kergon

On Mon, Oct 31, 2022 at 09:52:04AM +0800, Yu Kuai wrote:
>>     	INIT_LIST_HEAD(&holder->list);
>> -	holder->bdev = bdev;
>>   	holder->refcnt = 1;
>> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
>
> I wonder is this safe here, if kobject reference is 0 here and
> bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
> warn about uaf, and kobject_get will return a address that is about to
> be freed.

But how could the reference be 0 here?  The driver that calls
bd_link_disk_holder must have the block device open and thus hold a
reference to it.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-11-01 10:49       ` [dm-devel] " Christoph Hellwig
@ 2022-11-01 11:12         ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 11:12 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel, linux-block,
	yukuai (C)

Hi,

在 2022/11/01 18:49, Christoph Hellwig 写道:
> On Mon, Oct 31, 2022 at 09:52:04AM +0800, Yu Kuai wrote:
>>>      	INIT_LIST_HEAD(&holder->list);
>>> -	holder->bdev = bdev;
>>>    	holder->refcnt = 1;
>>> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
>>
>> I wonder is this safe here, if kobject reference is 0 here and
>> bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
>> warn about uaf, and kobject_get will return a address that is about to
>> be freed.
> 
> But how could the reference be 0 here?  The driver that calls
> bd_link_disk_holder must have the block device open and thus hold a
> reference to it.

Like I said before, the caller of bd_link_disk_holder() get bdev by
blkdev_get_by_dev(), which do not grab reference of holder_dir, and
grab disk reference can only prevent disk_release() to be called, not
del_gendisk() while holder_dir reference is dropped in del_gendisk()
and can be decreased to 0.

If you agree with above explanation, I tried to fix this:

1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
there seems to be a lot of other dependencies.

2) protect bd_holder_dir reference by open_mutex.

Thanks,
Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 11:12         ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 11:12 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Alasdair Kergon

Hi,

在 2022/11/01 18:49, Christoph Hellwig 写道:
> On Mon, Oct 31, 2022 at 09:52:04AM +0800, Yu Kuai wrote:
>>>      	INIT_LIST_HEAD(&holder->list);
>>> -	holder->bdev = bdev;
>>>    	holder->refcnt = 1;
>>> +	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
>>
>> I wonder is this safe here, if kobject reference is 0 here and
>> bd_holder_dir is about to be freed. Here in kobject_get, kref_get() will
>> warn about uaf, and kobject_get will return a address that is about to
>> be freed.
> 
> But how could the reference be 0 here?  The driver that calls
> bd_link_disk_holder must have the block device open and thus hold a
> reference to it.

Like I said before, the caller of bd_link_disk_holder() get bdev by
blkdev_get_by_dev(), which do not grab reference of holder_dir, and
grab disk reference can only prevent disk_release() to be called, not
del_gendisk() while holder_dir reference is dropped in del_gendisk()
and can be decreased to 0.

If you agree with above explanation, I tried to fix this:

1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
there seems to be a lot of other dependencies.

2) protect bd_holder_dir reference by open_mutex.

Thanks,
Kuai
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-11-01 11:12         ` [dm-devel] " Yu Kuai
@ 2022-11-01 11:21           ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 11:21 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer,
	dm-devel, linux-block, yukuai (C)

On Tue, Nov 01, 2022 at 07:12:51PM +0800, Yu Kuai wrote:
>> But how could the reference be 0 here?  The driver that calls
>> bd_link_disk_holder must have the block device open and thus hold a
>> reference to it.
>
> Like I said before, the caller of bd_link_disk_holder() get bdev by
> blkdev_get_by_dev(), which do not grab reference of holder_dir, and
> grab disk reference can only prevent disk_release() to be called, not
> del_gendisk() while holder_dir reference is dropped in del_gendisk()
> and can be decreased to 0.

Oh, the bd_holder_dir reference, not the block_device one.  So yes,
I agree.

> If you agree with above explanation, I tried to fix this:
>
> 1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
> there seems to be a lot of other dependencies.
>
> 2) protect bd_holder_dir reference by open_mutex.

I think simply switching the kobject_get in bd_link_disk_holder
into a kobject_get_unless_zero and unwinding if there is no reference
should be enough:

diff --git a/block/holder.c b/block/holder.c
index a8c355b9d0806..cd18064f6ff80 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -83,7 +83,11 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 
 	INIT_LIST_HEAD(&holder->list);
 	holder->refcnt = 1;
-	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
+	if (!kobject_get_unless_zero(bdev->bd_holder_dir)) {
+		ret = -EBUSY;
+		goto out_free_holder;
+	}
+	holder->holder_dir = bdev->bd_holder_dir;
 
 	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
 	if (ret)
@@ -100,6 +104,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	del_symlink(disk->slave_dir, bdev_kobj(bdev));
 out_put_holder_dir:
 	kobject_put(holder->holder_dir);
+out_free_holder:
 	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 11:21           ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 11:21 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Christoph Hellwig, Alasdair Kergon

On Tue, Nov 01, 2022 at 07:12:51PM +0800, Yu Kuai wrote:
>> But how could the reference be 0 here?  The driver that calls
>> bd_link_disk_holder must have the block device open and thus hold a
>> reference to it.
>
> Like I said before, the caller of bd_link_disk_holder() get bdev by
> blkdev_get_by_dev(), which do not grab reference of holder_dir, and
> grab disk reference can only prevent disk_release() to be called, not
> del_gendisk() while holder_dir reference is dropped in del_gendisk()
> and can be decreased to 0.

Oh, the bd_holder_dir reference, not the block_device one.  So yes,
I agree.

> If you agree with above explanation, I tried to fix this:
>
> 1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
> there seems to be a lot of other dependencies.
>
> 2) protect bd_holder_dir reference by open_mutex.

I think simply switching the kobject_get in bd_link_disk_holder
into a kobject_get_unless_zero and unwinding if there is no reference
should be enough:

diff --git a/block/holder.c b/block/holder.c
index a8c355b9d0806..cd18064f6ff80 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -83,7 +83,11 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 
 	INIT_LIST_HEAD(&holder->list);
 	holder->refcnt = 1;
-	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
+	if (!kobject_get_unless_zero(bdev->bd_holder_dir)) {
+		ret = -EBUSY;
+		goto out_free_holder;
+	}
+	holder->holder_dir = bdev->bd_holder_dir;
 
 	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
 	if (ret)
@@ -100,6 +104,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	del_symlink(disk->slave_dir, bdev_kobj(bdev));
 out_put_holder_dir:
 	kobject_put(holder->holder_dir);
+out_free_holder:
 	kfree(holder);
 out_unlock:
 	mutex_unlock(&disk->open_mutex);

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-11-01 11:21           ` [dm-devel] " Christoph Hellwig
@ 2022-11-01 11:28             ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 11:28 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel, linux-block,
	yukuai (C)

Hi,

在 2022/11/01 19:21, Christoph Hellwig 写道:
> On Tue, Nov 01, 2022 at 07:12:51PM +0800, Yu Kuai wrote:
>>> But how could the reference be 0 here?  The driver that calls
>>> bd_link_disk_holder must have the block device open and thus hold a
>>> reference to it.
>>
>> Like I said before, the caller of bd_link_disk_holder() get bdev by
>> blkdev_get_by_dev(), which do not grab reference of holder_dir, and
>> grab disk reference can only prevent disk_release() to be called, not
>> del_gendisk() while holder_dir reference is dropped in del_gendisk()
>> and can be decreased to 0.
> 
> Oh, the bd_holder_dir reference, not the block_device one.  So yes,
> I agree.
> 
>> If you agree with above explanation, I tried to fix this:
>>
>> 1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
>> there seems to be a lot of other dependencies.
>>
>> 2) protect bd_holder_dir reference by open_mutex.
> 
> I think simply switching the kobject_get in bd_link_disk_holder
> into a kobject_get_unless_zero and unwinding if there is no reference
> should be enough:
> 
> diff --git a/block/holder.c b/block/holder.c
> index a8c355b9d0806..cd18064f6ff80 100644
> --- a/block/holder.c
> +++ b/block/holder.c
> @@ -83,7 +83,11 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   
>   	INIT_LIST_HEAD(&holder->list);
>   	holder->refcnt = 1;
> -	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
> +	if (!kobject_get_unless_zero(bdev->bd_holder_dir)) {
> +		ret = -EBUSY;
> +		goto out_free_holder;
> +	}
> +	holder->holder_dir = bdev->bd_holder_dir;

What if bd_holder_dir is already freed here, then uaf can be triggered.
Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
is dropped to 0, however, kobject apis can't do that...

Thanks,
Kuai
>   
>   	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
>   	if (ret)
> @@ -100,6 +104,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	del_symlink(disk->slave_dir, bdev_kobj(bdev));
>   out_put_holder_dir:
>   	kobject_put(holder->holder_dir);
> +out_free_holder:
>   	kfree(holder);
>   out_unlock:
>   	mutex_unlock(&disk->open_mutex);
> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 11:28             ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 11:28 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Alasdair Kergon

Hi,

在 2022/11/01 19:21, Christoph Hellwig 写道:
> On Tue, Nov 01, 2022 at 07:12:51PM +0800, Yu Kuai wrote:
>>> But how could the reference be 0 here?  The driver that calls
>>> bd_link_disk_holder must have the block device open and thus hold a
>>> reference to it.
>>
>> Like I said before, the caller of bd_link_disk_holder() get bdev by
>> blkdev_get_by_dev(), which do not grab reference of holder_dir, and
>> grab disk reference can only prevent disk_release() to be called, not
>> del_gendisk() while holder_dir reference is dropped in del_gendisk()
>> and can be decreased to 0.
> 
> Oh, the bd_holder_dir reference, not the block_device one.  So yes,
> I agree.
> 
>> If you agree with above explanation, I tried to fix this:
>>
>> 1) move kobject_put(bd_holder_dir) from del_gendisk to disk_release,
>> there seems to be a lot of other dependencies.
>>
>> 2) protect bd_holder_dir reference by open_mutex.
> 
> I think simply switching the kobject_get in bd_link_disk_holder
> into a kobject_get_unless_zero and unwinding if there is no reference
> should be enough:
> 
> diff --git a/block/holder.c b/block/holder.c
> index a8c355b9d0806..cd18064f6ff80 100644
> --- a/block/holder.c
> +++ b/block/holder.c
> @@ -83,7 +83,11 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   
>   	INIT_LIST_HEAD(&holder->list);
>   	holder->refcnt = 1;
> -	holder->holder_dir = kobject_get(bdev->bd_holder_dir);
> +	if (!kobject_get_unless_zero(bdev->bd_holder_dir)) {
> +		ret = -EBUSY;
> +		goto out_free_holder;
> +	}
> +	holder->holder_dir = bdev->bd_holder_dir;

What if bd_holder_dir is already freed here, then uaf can be triggered.
Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
is dropped to 0, however, kobject apis can't do that...

Thanks,
Kuai
>   
>   	ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
>   	if (ret)
> @@ -100,6 +104,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>   	del_symlink(disk->slave_dir, bdev_kobj(bdev));
>   out_put_holder_dir:
>   	kobject_put(holder->holder_dir);
> +out_free_holder:
>   	kfree(holder);
>   out_unlock:
>   	mutex_unlock(&disk->open_mutex);
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-11-01 11:28             ` [dm-devel] " Yu Kuai
@ 2022-11-01 13:18               ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 13:18 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer,
	dm-devel, linux-block, yukuai (C)

On Tue, Nov 01, 2022 at 07:28:17PM +0800, Yu Kuai wrote:
> What if bd_holder_dir is already freed here, then uaf can be triggered.
> Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
> is dropped to 0, however, kobject apis can't do that...

Indeed.  I don't think we can simply move the dropping of the reference
as you suggested as that also implies taking it earlier, and the
device in the disk is only initialized in add_disk.

Now what I think we could do is:

 - hold open_mutex in bd_link_disk_holder as you suggested
 - check that the bdev inode is hashed inside open_mutex before doing
   the kobject_get

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 13:18               ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-01 13:18 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Christoph Hellwig, Alasdair Kergon

On Tue, Nov 01, 2022 at 07:28:17PM +0800, Yu Kuai wrote:
> What if bd_holder_dir is already freed here, then uaf can be triggered.
> Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
> is dropped to 0, however, kobject apis can't do that...

Indeed.  I don't think we can simply move the dropping of the reference
as you suggested as that also implies taking it earlier, and the
device in the disk is only initialized in add_disk.

Now what I think we could do is:

 - hold open_mutex in bd_link_disk_holder as you suggested
 - check that the bdev inode is hashed inside open_mutex before doing
   the kobject_get

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] block: store the holder kobject in bd_holder_disk
  2022-11-01 13:18               ` [dm-devel] " Christoph Hellwig
@ 2022-11-01 13:29                 ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 13:29 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Alasdair Kergon, Mike Snitzer, dm-devel, linux-block,
	yukuai (C)

Hi,

在 2022/11/01 21:18, Christoph Hellwig 写道:
> On Tue, Nov 01, 2022 at 07:28:17PM +0800, Yu Kuai wrote:
>> What if bd_holder_dir is already freed here, then uaf can be triggered.
>> Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
>> is dropped to 0, however, kobject apis can't do that...
> 
> Indeed.  I don't think we can simply move the dropping of the reference
> as you suggested as that also implies taking it earlier, and the
> device in the disk is only initialized in add_disk.
> 
> Now what I think we could do is:
> 
>   - hold open_mutex in bd_link_disk_holder as you suggested
>   - check that the bdev inode is hashed inside open_mutex before doing
>     the kobject_get

Yes, that's sounds good, check if inode is hashed is better than
what I did in another thread to introduce a new field.

Thansk,
Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 7/7] block: store the holder kobject in bd_holder_disk
@ 2022-11-01 13:29                 ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-01 13:29 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Alasdair Kergon

Hi,

在 2022/11/01 21:18, Christoph Hellwig 写道:
> On Tue, Nov 01, 2022 at 07:28:17PM +0800, Yu Kuai wrote:
>> What if bd_holder_dir is already freed here, then uaf can be triggered.
>> Thus bd_holder_dir need to be resed in del_gendisk() if it's reference
>> is dropped to 0, however, kobject apis can't do that...
> 
> Indeed.  I don't think we can simply move the dropping of the reference
> as you suggested as that also implies taking it earlier, and the
> device in the disk is only initialized in add_disk.
> 
> Now what I think we could do is:
> 
>   - hold open_mutex in bd_link_disk_holder as you suggested
>   - check that the bdev inode is hashed inside open_mutex before doing
>     the kobject_get

Yes, that's sounds good, check if inode is hashed is better than
what I did in another thread to introduce a new field.

Thansk,
Kuai
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
@ 2022-11-09  2:08     ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-09  2:08 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer
  Cc: Yu Kuai, dm-devel, linux-block, yukuai (C)

Hi,

在 2022/10/30 23:31, Christoph Hellwig 写道:
> dm is a bit special in that it opens the underlying devices.  Commit
> 89f871af1b26 ("dm: delay registering the gendisk") tried to accomodate
> that by allowing to add the holder to the list before add_gendisk and
> then just add them to sysfs once add_disk is called.  But that leads to
> really odd lifetime problems and error handling problems as we can't
> know the state of the kobjects and don't unwind properly.  To fix this
> switch to just registering all existing table_devices with the holder
> code right after add_disk, and remove them before calling del_gendisk.
> 
> Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
> Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 2917700b1e15c..7b0d6dc957549 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>   		goto out_free_td;
>   	}
>   
> -	r = bd_link_disk_holder(bdev, dm_disk(md));
> -	if (r)
> -		goto out_blkdev_put;
> +	/*
> +	 * We can be called before the dm disk is added.  In that case we can't
> +	 * register the holder relation here.  It will be done once add_disk was
> +	 * called.
> +	 */
> +	if (md->disk->slave_dir) {
If device_add_disk() or del_gendisk() can concurrent with this, It seems
to me that using 'slave_dir' is not safe.

I'm not quite familiar with dm, can we guarantee that they can't
concurrent?

Thanks,
Kuai
> +		r = bd_link_disk_holder(bdev, md->disk);
> +		if (r)
> +			goto out_blkdev_put;
> +	}
>   
>   	td->dm_dev.mode = mode;
>   	td->dm_dev.bdev = bdev;
> @@ -774,7 +781,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
>    */
>   static void close_table_device(struct table_device *td, struct mapped_device *md)
>   {
> -	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
> +	if (md->disk->slave_dir)
> +		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
>   	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
>   	put_dax(td->dm_dev.dax_dev);
>   	list_del(&td->list);
> @@ -1951,7 +1959,13 @@ static void cleanup_mapped_device(struct mapped_device *md)
>   		md->disk->private_data = NULL;
>   		spin_unlock(&_minor_lock);
>   		if (dm_get_md_type(md) != DM_TYPE_NONE) {
> +			struct table_device *td;
> +
>   			dm_sysfs_exit(md);
> +			list_for_each_entry(td, &md->table_devices, list) {
> +				bd_unlink_disk_holder(td->dm_dev.bdev,
> +						      md->disk);
> +			}
>   			del_gendisk(md->disk);
>   		}
>   		dm_queue_destroy_crypto_profile(md->queue);
> @@ -2284,6 +2298,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   {
>   	enum dm_queue_mode type = dm_table_get_type(t);
>   	struct queue_limits limits;
> +	struct table_device *td;
>   	int r;
>   
>   	switch (type) {
> @@ -2316,13 +2331,27 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   	if (r)
>   		return r;
>   
> -	r = dm_sysfs_init(md);
> -	if (r) {
> -		del_gendisk(md->disk);
> -		return r;
> +	/*
> +	 * Register the holder relationship for devices added before the disk
> +	 * was live.
> +	 */
> +	list_for_each_entry(td, &md->table_devices, list) {
> +		r = bd_link_disk_holder(td->dm_dev.bdev, md->disk);
> +		if (r)
> +			goto out_undo_holders;
>   	}
> +
> +	r = dm_sysfs_init(md);
> +	if (r)
> +		goto out_undo_holders;
>   	md->type = type;
>   	return 0;
> +
> +out_undo_holders:
> +	list_for_each_entry_continue_reverse(td, &md->table_devices, list)
> +		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
> +	del_gendisk(md->disk);
> +	return r;
>   }
>   
>   struct mapped_device *dm_get_md(dev_t dev)
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-11-09  2:08     ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-09  2:08 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer
  Cc: linux-block, Yu Kuai, dm-devel, yukuai (C)

Hi,

在 2022/10/30 23:31, Christoph Hellwig 写道:
> dm is a bit special in that it opens the underlying devices.  Commit
> 89f871af1b26 ("dm: delay registering the gendisk") tried to accomodate
> that by allowing to add the holder to the list before add_gendisk and
> then just add them to sysfs once add_disk is called.  But that leads to
> really odd lifetime problems and error handling problems as we can't
> know the state of the kobjects and don't unwind properly.  To fix this
> switch to just registering all existing table_devices with the holder
> code right after add_disk, and remove them before calling del_gendisk.
> 
> Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
> Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 2917700b1e15c..7b0d6dc957549 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>   		goto out_free_td;
>   	}
>   
> -	r = bd_link_disk_holder(bdev, dm_disk(md));
> -	if (r)
> -		goto out_blkdev_put;
> +	/*
> +	 * We can be called before the dm disk is added.  In that case we can't
> +	 * register the holder relation here.  It will be done once add_disk was
> +	 * called.
> +	 */
> +	if (md->disk->slave_dir) {
If device_add_disk() or del_gendisk() can concurrent with this, It seems
to me that using 'slave_dir' is not safe.

I'm not quite familiar with dm, can we guarantee that they can't
concurrent?

Thanks,
Kuai
> +		r = bd_link_disk_holder(bdev, md->disk);
> +		if (r)
> +			goto out_blkdev_put;
> +	}
>   
>   	td->dm_dev.mode = mode;
>   	td->dm_dev.bdev = bdev;
> @@ -774,7 +781,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
>    */
>   static void close_table_device(struct table_device *td, struct mapped_device *md)
>   {
> -	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
> +	if (md->disk->slave_dir)
> +		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
>   	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
>   	put_dax(td->dm_dev.dax_dev);
>   	list_del(&td->list);
> @@ -1951,7 +1959,13 @@ static void cleanup_mapped_device(struct mapped_device *md)
>   		md->disk->private_data = NULL;
>   		spin_unlock(&_minor_lock);
>   		if (dm_get_md_type(md) != DM_TYPE_NONE) {
> +			struct table_device *td;
> +
>   			dm_sysfs_exit(md);
> +			list_for_each_entry(td, &md->table_devices, list) {
> +				bd_unlink_disk_holder(td->dm_dev.bdev,
> +						      md->disk);
> +			}
>   			del_gendisk(md->disk);
>   		}
>   		dm_queue_destroy_crypto_profile(md->queue);
> @@ -2284,6 +2298,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   {
>   	enum dm_queue_mode type = dm_table_get_type(t);
>   	struct queue_limits limits;
> +	struct table_device *td;
>   	int r;
>   
>   	switch (type) {
> @@ -2316,13 +2331,27 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   	if (r)
>   		return r;
>   
> -	r = dm_sysfs_init(md);
> -	if (r) {
> -		del_gendisk(md->disk);
> -		return r;
> +	/*
> +	 * Register the holder relationship for devices added before the disk
> +	 * was live.
> +	 */
> +	list_for_each_entry(td, &md->table_devices, list) {
> +		r = bd_link_disk_holder(td->dm_dev.bdev, md->disk);
> +		if (r)
> +			goto out_undo_holders;
>   	}
> +
> +	r = dm_sysfs_init(md);
> +	if (r)
> +		goto out_undo_holders;
>   	md->type = type;
>   	return 0;
> +
> +out_undo_holders:
> +	list_for_each_entry_continue_reverse(td, &md->table_devices, list)
> +		bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
> +	del_gendisk(md->disk);
> +	return r;
>   }
>   
>   struct mapped_device *dm_get_md(dev_t dev)
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-11-09  2:08     ` [dm-devel] " Yu Kuai
@ 2022-11-09  8:26       ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-09  8:26 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer,
	dm-devel, linux-block, yukuai (C)

On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 2917700b1e15c..7b0d6dc957549 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>>   		goto out_free_td;
>>   	}
>>   -	r = bd_link_disk_holder(bdev, dm_disk(md));
>> -	if (r)
>> -		goto out_blkdev_put;
>> +	/*
>> +	 * We can be called before the dm disk is added.  In that case we can't
>> +	 * register the holder relation here.  It will be done once add_disk was
>> +	 * called.
>> +	 */
>> +	if (md->disk->slave_dir) {
> If device_add_disk() or del_gendisk() can concurrent with this, It seems
> to me that using 'slave_dir' is not safe.
>
> I'm not quite familiar with dm, can we guarantee that they can't
> concurrent?

I assumed dm would not get itself into territory were creating /
deleting the device could race with adding component devices, but
digging deeper I can't find anything.  This could be done
by holding table_devices_lock around add_disk/del_gendisk, but
I'm not that familar with the dm code.

Mike, can you help out on this?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-11-09  8:26       ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2022-11-09  8:26 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, yukuai (C),
	Christoph Hellwig, Alasdair Kergon

On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 2917700b1e15c..7b0d6dc957549 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>>   		goto out_free_td;
>>   	}
>>   -	r = bd_link_disk_holder(bdev, dm_disk(md));
>> -	if (r)
>> -		goto out_blkdev_put;
>> +	/*
>> +	 * We can be called before the dm disk is added.  In that case we can't
>> +	 * register the holder relation here.  It will be done once add_disk was
>> +	 * called.
>> +	 */
>> +	if (md->disk->slave_dir) {
> If device_add_disk() or del_gendisk() can concurrent with this, It seems
> to me that using 'slave_dir' is not safe.
>
> I'm not quite familiar with dm, can we guarantee that they can't
> concurrent?

I assumed dm would not get itself into territory were creating /
deleting the device could race with adding component devices, but
digging deeper I can't find anything.  This could be done
by holding table_devices_lock around add_disk/del_gendisk, but
I'm not that familar with the dm code.

Mike, can you help out on this?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-11-09  8:26       ` [dm-devel] " Christoph Hellwig
@ 2022-11-10 18:09         ` Mike Snitzer
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Snitzer @ 2022-11-10 18:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, Mike Snitzer, linux-block, dm-devel,
	yukuai (C),
	Alasdair Kergon

On Wed, Nov 09 2022 at  3:26P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >> index 2917700b1e15c..7b0d6dc957549 100644
> >> --- a/drivers/md/dm.c
> >> +++ b/drivers/md/dm.c
> >> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
> >>   		goto out_free_td;
> >>   	}
> >>   -	r = bd_link_disk_holder(bdev, dm_disk(md));
> >> -	if (r)
> >> -		goto out_blkdev_put;
> >> +	/*
> >> +	 * We can be called before the dm disk is added.  In that case we can't
> >> +	 * register the holder relation here.  It will be done once add_disk was
> >> +	 * called.
> >> +	 */
> >> +	if (md->disk->slave_dir) {
> > If device_add_disk() or del_gendisk() can concurrent with this, It seems
> > to me that using 'slave_dir' is not safe.
> >
> > I'm not quite familiar with dm, can we guarantee that they can't
> > concurrent?
> 
> I assumed dm would not get itself into territory were creating /
> deleting the device could race with adding component devices, but
> digging deeper I can't find anything.  This could be done
> by holding table_devices_lock around add_disk/del_gendisk, but
> I'm not that familar with the dm code.
> 
> Mike, can you help out on this?

Maybe :/

Underlying component devices can certainly come and go at any
time. And there is no DM code that can, or should, prevent that. All
we can do is cope with unavailability of devices. But pretty sure that
isn't the question.

I'm unclear about the specific race in question:
if open_table_device() doesn't see slave_dir it is the first table
load. Otherwise, the DM device (and associated gendisk) shouldn't have
been torn down while a table is actively being loaded for it. But
_where_ the code lives, to ensure that, is also eluding me...

You could use a big lock (table_devices_lock) to disallow changes to
DM relations while loading the table. But I wouldn't think it needed
as long as the gendisk's lifecycle is protected vs table loads (or
other concurrent actions like table load vs dm device removal). Again,
more code inspection needed to page all this back into my head.

The concern for race aside:
I am concerned that your redundant bd_link_disk_holder() (first in
open_table_device and later in dm_setup_md_queue) will result in
dangling refcount (e.g. increase of 2 when it should only be by 1) --
given bd_link_disk_holder will gladly just bump its holder->refcnt if
bd_find_holder_disk() returns an existing holder. This would occur if
a DM table is already loaded (and DM device's gendisk exists) and a
new DM table is being loaded.

Mike


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-11-10 18:09         ` Mike Snitzer
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Snitzer @ 2022-11-10 18:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mike Snitzer, linux-block, Yu Kuai, dm-devel,
	yukuai (C),
	Alasdair Kergon

On Wed, Nov 09 2022 at  3:26P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >> index 2917700b1e15c..7b0d6dc957549 100644
> >> --- a/drivers/md/dm.c
> >> +++ b/drivers/md/dm.c
> >> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
> >>   		goto out_free_td;
> >>   	}
> >>   -	r = bd_link_disk_holder(bdev, dm_disk(md));
> >> -	if (r)
> >> -		goto out_blkdev_put;
> >> +	/*
> >> +	 * We can be called before the dm disk is added.  In that case we can't
> >> +	 * register the holder relation here.  It will be done once add_disk was
> >> +	 * called.
> >> +	 */
> >> +	if (md->disk->slave_dir) {
> > If device_add_disk() or del_gendisk() can concurrent with this, It seems
> > to me that using 'slave_dir' is not safe.
> >
> > I'm not quite familiar with dm, can we guarantee that they can't
> > concurrent?
> 
> I assumed dm would not get itself into territory were creating /
> deleting the device could race with adding component devices, but
> digging deeper I can't find anything.  This could be done
> by holding table_devices_lock around add_disk/del_gendisk, but
> I'm not that familar with the dm code.
> 
> Mike, can you help out on this?

Maybe :/

Underlying component devices can certainly come and go at any
time. And there is no DM code that can, or should, prevent that. All
we can do is cope with unavailability of devices. But pretty sure that
isn't the question.

I'm unclear about the specific race in question:
if open_table_device() doesn't see slave_dir it is the first table
load. Otherwise, the DM device (and associated gendisk) shouldn't have
been torn down while a table is actively being loaded for it. But
_where_ the code lives, to ensure that, is also eluding me...

You could use a big lock (table_devices_lock) to disallow changes to
DM relations while loading the table. But I wouldn't think it needed
as long as the gendisk's lifecycle is protected vs table loads (or
other concurrent actions like table load vs dm device removal). Again,
more code inspection needed to page all this back into my head.

The concern for race aside:
I am concerned that your redundant bd_link_disk_holder() (first in
open_table_device and later in dm_setup_md_queue) will result in
dangling refcount (e.g. increase of 2 when it should only be by 1) --
given bd_link_disk_holder will gladly just bump its holder->refcnt if
bd_find_holder_disk() returns an existing holder. This would occur if
a DM table is already loaded (and DM device's gendisk exists) and a
new DM table is being loaded.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-11-10 18:09         ` [dm-devel] " Mike Snitzer
@ 2022-11-10 19:48           ` Mike Snitzer
  -1 siblings, 0 replies; 40+ messages in thread
From: Mike Snitzer @ 2022-11-10 19:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mike Snitzer, linux-block, Yu Kuai, dm-devel,
	yukuai (C),
	Alasdair Kergon

On Thu, Nov 10 2022 at  1:09P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> The concern for race aside:
> I am concerned that your redundant bd_link_disk_holder() (first in
> open_table_device and later in dm_setup_md_queue) will result in
> dangling refcount (e.g. increase of 2 when it should only be by 1) --
> given bd_link_disk_holder will gladly just bump its holder->refcnt if
> bd_find_holder_disk() returns an existing holder. This would occur if
> a DM table is already loaded (and DM device's gendisk exists) and a
> new DM table is being loaded.

Nevermind, dm_setup_md_queue should only ever be called once.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-11-10 19:48           ` Mike Snitzer
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Snitzer @ 2022-11-10 19:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, Mike Snitzer, linux-block, dm-devel,
	yukuai (C),
	Alasdair Kergon

On Thu, Nov 10 2022 at  1:09P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> The concern for race aside:
> I am concerned that your redundant bd_link_disk_holder() (first in
> open_table_device and later in dm_setup_md_queue) will result in
> dangling refcount (e.g. increase of 2 when it should only be by 1) --
> given bd_link_disk_holder will gladly just bump its holder->refcnt if
> bd_find_holder_disk() returns an existing holder. This would occur if
> a DM table is already loaded (and DM device's gendisk exists) and a
> new DM table is being loaded.

Nevermind, dm_setup_md_queue should only ever be called once.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] dm: track per-add_disk holder relations in DM
  2022-11-10 18:09         ` [dm-devel] " Mike Snitzer
@ 2022-11-12  6:23           ` Yu Kuai
  -1 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-12  6:23 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, Mike Snitzer, linux-block, dm-devel,
	Alasdair Kergon, yukuai (C)



在 2022/11/11 2:09, Mike Snitzer 写道:
> On Wed, Nov 09 2022 at  3:26P -0500,
> Christoph Hellwig <hch@lst.de> wrote:
> 
>> On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>> index 2917700b1e15c..7b0d6dc957549 100644
>>>> --- a/drivers/md/dm.c
>>>> +++ b/drivers/md/dm.c
>>>> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>>>>    		goto out_free_td;
>>>>    	}
>>>>    -	r = bd_link_disk_holder(bdev, dm_disk(md));
>>>> -	if (r)
>>>> -		goto out_blkdev_put;
>>>> +	/*
>>>> +	 * We can be called before the dm disk is added.  In that case we can't
>>>> +	 * register the holder relation here.  It will be done once add_disk was
>>>> +	 * called.
>>>> +	 */
>>>> +	if (md->disk->slave_dir) {
>>> If device_add_disk() or del_gendisk() can concurrent with this, It seems
>>> to me that using 'slave_dir' is not safe.
>>>
>>> I'm not quite familiar with dm, can we guarantee that they can't
>>> concurrent?
>>
>> I assumed dm would not get itself into territory were creating /
>> deleting the device could race with adding component devices, but
>> digging deeper I can't find anything.  This could be done
>> by holding table_devices_lock around add_disk/del_gendisk, but
>> I'm not that familar with the dm code.
>>
>> Mike, can you help out on this?
> 
> Maybe :/
> 
> Underlying component devices can certainly come and go at any
> time. And there is no DM code that can, or should, prevent that. All
> we can do is cope with unavailability of devices. But pretty sure that
> isn't the question.
> 
> I'm unclear about the specific race in question:
> if open_table_device() doesn't see slave_dir it is the first table
> load. Otherwise, the DM device (and associated gendisk) shouldn't have
> been torn down while a table is actively being loaded for it. But
> _where_ the code lives, to ensure that, is also eluding me...
> 
> You could use a big lock (table_devices_lock) to disallow changes to
> DM relations while loading the table. But I wouldn't think it needed

How about using table_devices_lock to protect device addition and
removal to forbid table load race with creating and deleting explictily,
as Christoph suggested?

Thanks,
Kuai

> as long as the gendisk's lifecycle is protected vs table loads (or
> other concurrent actions like table load vs dm device removal). Again,
> more code inspection needed to page all this back into my head.
> 
> The concern for race aside:
> I am concerned that your redundant bd_link_disk_holder() (first in
> open_table_device and later in dm_setup_md_queue) will result in
> dangling refcount (e.g. increase of 2 when it should only be by 1) --
> given bd_link_disk_holder will gladly just bump its holder->refcnt if
> bd_find_holder_disk() returns an existing holder. This would occur if
> a DM table is already loaded (and DM device's gendisk exists) and a
> new DM table is being loaded.
> 
> Mike
> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [PATCH 5/7] dm: track per-add_disk holder relations in DM
@ 2022-11-12  6:23           ` Yu Kuai
  0 siblings, 0 replies; 40+ messages in thread
From: Yu Kuai @ 2022-11-12  6:23 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig
  Cc: Jens Axboe, Mike Snitzer, linux-block, Yu Kuai, dm-devel,
	yukuai (C),
	Alasdair Kergon



在 2022/11/11 2:09, Mike Snitzer 写道:
> On Wed, Nov 09 2022 at  3:26P -0500,
> Christoph Hellwig <hch@lst.de> wrote:
> 
>> On Wed, Nov 09, 2022 at 10:08:14AM +0800, Yu Kuai wrote:
>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>> index 2917700b1e15c..7b0d6dc957549 100644
>>>> --- a/drivers/md/dm.c
>>>> +++ b/drivers/md/dm.c
>>>> @@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
>>>>    		goto out_free_td;
>>>>    	}
>>>>    -	r = bd_link_disk_holder(bdev, dm_disk(md));
>>>> -	if (r)
>>>> -		goto out_blkdev_put;
>>>> +	/*
>>>> +	 * We can be called before the dm disk is added.  In that case we can't
>>>> +	 * register the holder relation here.  It will be done once add_disk was
>>>> +	 * called.
>>>> +	 */
>>>> +	if (md->disk->slave_dir) {
>>> If device_add_disk() or del_gendisk() can concurrent with this, It seems
>>> to me that using 'slave_dir' is not safe.
>>>
>>> I'm not quite familiar with dm, can we guarantee that they can't
>>> concurrent?
>>
>> I assumed dm would not get itself into territory were creating /
>> deleting the device could race with adding component devices, but
>> digging deeper I can't find anything.  This could be done
>> by holding table_devices_lock around add_disk/del_gendisk, but
>> I'm not that familar with the dm code.
>>
>> Mike, can you help out on this?
> 
> Maybe :/
> 
> Underlying component devices can certainly come and go at any
> time. And there is no DM code that can, or should, prevent that. All
> we can do is cope with unavailability of devices. But pretty sure that
> isn't the question.
> 
> I'm unclear about the specific race in question:
> if open_table_device() doesn't see slave_dir it is the first table
> load. Otherwise, the DM device (and associated gendisk) shouldn't have
> been torn down while a table is actively being loaded for it. But
> _where_ the code lives, to ensure that, is also eluding me...
> 
> You could use a big lock (table_devices_lock) to disallow changes to
> DM relations while loading the table. But I wouldn't think it needed

How about using table_devices_lock to protect device addition and
removal to forbid table load race with creating and deleting explictily,
as Christoph suggested?

Thanks,
Kuai

> as long as the gendisk's lifecycle is protected vs table loads (or
> other concurrent actions like table load vs dm device removal). Again,
> more code inspection needed to page all this back into my head.
> 
> The concern for race aside:
> I am concerned that your redundant bd_link_disk_holder() (first in
> open_table_device and later in dm_setup_md_queue) will result in
> dangling refcount (e.g. increase of 2 when it should only be by 1) --
> given bd_link_disk_holder will gladly just bump its holder->refcnt if
> bd_find_holder_disk() returns an existing holder. This would occur if
> a DM table is already loaded (and DM device's gendisk exists) and a
> new DM table is being loaded.
> 
> Mike
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-11-14  7:13 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-30 15:31 fix delayed holder tracking v2 Christoph Hellwig
2022-10-30 15:31 ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 1/7] block: clear ->slave_dir when dropping the main slave_dir reference Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 2/7] dm: remove free_table_devices Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 3/7] dm: cleanup open_table_device Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 4/7] dm: cleanup close_table_device Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 5/7] dm: track per-add_disk holder relations in DM Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-11-09  2:08   ` Yu Kuai
2022-11-09  2:08     ` [dm-devel] " Yu Kuai
2022-11-09  8:26     ` Christoph Hellwig
2022-11-09  8:26       ` [dm-devel] " Christoph Hellwig
2022-11-10 18:09       ` Mike Snitzer
2022-11-10 18:09         ` [dm-devel] " Mike Snitzer
2022-11-10 19:48         ` Mike Snitzer
2022-11-10 19:48           ` Mike Snitzer
2022-11-12  6:23         ` Yu Kuai
2022-11-12  6:23           ` [dm-devel] " Yu Kuai
2022-10-30 15:31 ` [PATCH 6/7] block: remove delayed holder registration Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-30 15:31 ` [PATCH 7/7] block: store the holder kobject in bd_holder_disk Christoph Hellwig
2022-10-30 15:31   ` [dm-devel] " Christoph Hellwig
2022-10-31  1:52   ` Yu Kuai
2022-10-31  1:52     ` [dm-devel] " Yu Kuai
2022-11-01 10:49     ` Christoph Hellwig
2022-11-01 10:49       ` [dm-devel] " Christoph Hellwig
2022-11-01 11:12       ` Yu Kuai
2022-11-01 11:12         ` [dm-devel] " Yu Kuai
2022-11-01 11:21         ` Christoph Hellwig
2022-11-01 11:21           ` [dm-devel] " Christoph Hellwig
2022-11-01 11:28           ` Yu Kuai
2022-11-01 11:28             ` [dm-devel] " Yu Kuai
2022-11-01 13:18             ` Christoph Hellwig
2022-11-01 13:18               ` [dm-devel] " Christoph Hellwig
2022-11-01 13:29               ` Yu Kuai
2022-11-01 13:29                 ` [dm-devel] " Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.