All of lore.kernel.org
 help / color / mirror / Atom feed
* fix md disk_name lifetime problems v3
@ 2022-07-18  6:34 Christoph Hellwig
  2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Hi all,

this series tries to fix a problem repored by Logan where we see
duplicate sysfs file name in md.  It is due to the fact that the
md driver only checks for duplicates on currently live mddevs,
while the sysfs name can live on longer.  It is an old problem,
but the race window got longer due to waiting for the device freeze
earlier in del_gendisk.

Changes since v2:
 - delay mddex->kobj initialization

Changes since v1:
 - rebased to the md-next branch
 - fixed error handling in md_alloc for real
 - add a little cleanup patch

Diffstat:
 md.c |  304 +++++++++++++++++++++++++++++++++++--------------------------------
 md.h |    1 
 2 files changed, 162 insertions(+), 143 deletions(-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/10] md: fix mddev->kobj lifetime
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  6:55   ` Hannes Reinecke
  2022-07-18 19:48   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Once a kobject is initialized, the containing object should not be
directly freed.  So delay initialization until it is added.  Also
remove the kobject_del call as the last put will remove the kobject as
well.  The explicitly delete isn't needed here, and dropping it will
simplify further fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b64de313838f2..a49ddc9454ff6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -678,7 +678,6 @@ static void md_safemode_timeout(struct timer_list *t);
 
 void mddev_init(struct mddev *mddev)
 {
-	kobject_init(&mddev->kobj, &md_ktype);
 	mutex_init(&mddev->open_mutex);
 	mutex_init(&mddev->reconfig_mutex);
 	mutex_init(&mddev->bitmap_info.mutex);
@@ -5617,7 +5616,6 @@ static void mddev_delayed_delete(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, del_work);
 
-	kobject_del(&mddev->kobj);
 	kobject_put(&mddev->kobj);
 }
 
@@ -5719,6 +5717,7 @@ int md_alloc(dev_t dev, char *name)
 	if (error)
 		goto out_cleanup_disk;
 
+	kobject_init(&mddev->kobj, &md_ktype);
 	error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
 	if (error)
 		goto out_del_gendisk;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/10] md: fix error handling in md_alloc
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
  2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:00   ` Hannes Reinecke
  2022-07-18 19:48   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Error handling in md_alloc is a mess.  Untangle it to just free the mddev
directly before add_disk is called and thus the gendisk is globally
visible.  After that clear the hold flag and let the mddev_put take care
of cleaning up the mddev through the usual mechanisms.

Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a49ddc9454ff6..64c7c24d267bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -790,6 +790,15 @@ static struct mddev *mddev_alloc(dev_t unit)
 	return ERR_PTR(error);
 }
 
+static void mddev_free(struct mddev *mddev)
+{
+	spin_lock(&all_mddevs_lock);
+	list_del(&mddev->all_mddevs);
+	spin_unlock(&all_mddevs_lock);
+
+	kfree(mddev);
+}
+
 static const struct attribute_group md_redundancy_group;
 
 void mddev_unlock(struct mddev *mddev)
@@ -5662,8 +5671,8 @@ int md_alloc(dev_t dev, char *name)
 	mutex_lock(&disks_mutex);
 	mddev = mddev_alloc(dev);
 	if (IS_ERR(mddev)) {
-		mutex_unlock(&disks_mutex);
-		return PTR_ERR(mddev);
+		error = PTR_ERR(mddev);
+		goto out_unlock;
 	}
 
 	partitioned = (MAJOR(mddev->unit) != MD_MAJOR);
@@ -5681,7 +5690,7 @@ int md_alloc(dev_t dev, char *name)
 			    strcmp(mddev2->gendisk->disk_name, name) == 0) {
 				spin_unlock(&all_mddevs_lock);
 				error = -EEXIST;
-				goto out_unlock_disks_mutex;
+				goto out_free_mddev;
 			}
 		spin_unlock(&all_mddevs_lock);
 	}
@@ -5694,7 +5703,7 @@ int md_alloc(dev_t dev, char *name)
 	error = -ENOMEM;
 	disk = blk_alloc_disk(NUMA_NO_NODE);
 	if (!disk)
-		goto out_unlock_disks_mutex;
+		goto out_free_mddev;
 
 	disk->major = MAJOR(mddev->unit);
 	disk->first_minor = unit << shift;
@@ -5715,26 +5724,36 @@ int md_alloc(dev_t dev, char *name)
 	mddev->gendisk = disk;
 	error = add_disk(disk);
 	if (error)
-		goto out_cleanup_disk;
+		goto out_put_disk;
 
 	kobject_init(&mddev->kobj, &md_ktype);
 	error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
-	if (error)
-		goto out_del_gendisk;
+	if (error) {
+		/*
+		 * The disk is already live at this point.  Clear the hold flag
+		 * and let mddev_put take care of the deletion, as it isn't any
+		 * different from a normal close on last release now.
+		 */
+		mddev->hold_active = 0;
+		goto done;
+	}
 
 	kobject_uevent(&mddev->kobj, KOBJ_ADD);
 	mddev->sysfs_state = sysfs_get_dirent_safe(mddev->kobj.sd, "array_state");
 	mddev->sysfs_level = sysfs_get_dirent_safe(mddev->kobj.sd, "level");
-	goto out_unlock_disks_mutex;
 
-out_del_gendisk:
-	del_gendisk(disk);
-out_cleanup_disk:
-	blk_cleanup_disk(disk);
-out_unlock_disks_mutex:
+done:
 	mutex_unlock(&disks_mutex);
 	mddev_put(mddev);
 	return error;
+
+out_put_disk:
+	put_disk(disk);
+out_free_mddev:
+	mddev_free(mddev);
+out_unlock:
+	mutex_unlock(&disks_mutex);
+	return error;
 }
 
 static void md_probe(dev_t dev)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/10] md: implement ->free_disk
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
  2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
  2022-07-18  6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:01   ` Hannes Reinecke
  2022-07-18 20:01   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Ensure that all private data is only freed once all accesses are done.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 64c7c24d267bc..1e658d5060842 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5602,11 +5602,6 @@ static void md_free(struct kobject *ko)
 		del_gendisk(mddev->gendisk);
 		blk_cleanup_disk(mddev->gendisk);
 	}
-	percpu_ref_exit(&mddev->writes_pending);
-
-	bioset_exit(&mddev->bio_set);
-	bioset_exit(&mddev->sync_set);
-	kfree(mddev);
 }
 
 static const struct sysfs_ops md_sysfs_ops = {
@@ -7876,6 +7871,17 @@ static unsigned int md_check_events(struct gendisk *disk, unsigned int clearing)
 	return ret;
 }
 
+static void md_free_disk(struct gendisk *disk)
+{
+	struct mddev *mddev = disk->private_data;
+
+	percpu_ref_exit(&mddev->writes_pending);
+	bioset_exit(&mddev->bio_set);
+	bioset_exit(&mddev->sync_set);
+
+	kfree(mddev);
+}
+
 const struct block_device_operations md_fops =
 {
 	.owner		= THIS_MODULE,
@@ -7889,6 +7895,7 @@ const struct block_device_operations md_fops =
 	.getgeo		= md_getgeo,
 	.check_events	= md_check_events,
 	.set_read_only	= md_set_read_only,
+	.free_disk	= md_free_disk,
 };
 
 static int md_thread(void *arg)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/10] md: rename md_free to md_kobj_release
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:01   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

The md_free name is rather misleading, so pick a better one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1e658d5060842..96b4e901ff6b5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5589,7 +5589,7 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	return rv;
 }
 
-static void md_free(struct kobject *ko)
+static void md_kobj_release(struct kobject *ko)
 {
 	struct mddev *mddev = container_of(ko, struct mddev, kobj);
 
@@ -5609,7 +5609,7 @@ static const struct sysfs_ops md_sysfs_ops = {
 	.store	= md_attr_store,
 };
 static struct kobj_type md_ktype = {
-	.release	= md_free,
+	.release	= md_kobj_release,
 	.sysfs_ops	= &md_sysfs_ops,
 	.default_groups	= md_attr_groups,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:02   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

This splits the code into nicely readable chunks and also avoids
the refcount inc/dec manipulations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 84 +++++++++++++++++++++++--------------------------
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 96b4e901ff6b5..c8a5f9340060a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3343,14 +3343,33 @@ rdev_size_show(struct md_rdev *rdev, char *page)
 	return sprintf(page, "%llu\n", (unsigned long long)rdev->sectors / 2);
 }
 
-static int overlaps(sector_t s1, sector_t l1, sector_t s2, sector_t l2)
+static int md_rdevs_overlap(struct md_rdev *a, struct md_rdev *b)
 {
 	/* check if two start/length pairs overlap */
-	if (s1+l1 <= s2)
-		return 0;
-	if (s2+l2 <= s1)
-		return 0;
-	return 1;
+	if (a->data_offset + a->sectors <= b->data_offset)
+		return false;
+	if (b->data_offset + b->sectors <= a->data_offset)
+		return false;
+	return true;
+}
+
+static bool md_rdev_overlaps(struct md_rdev *rdev)
+{
+	struct mddev *mddev;
+	struct md_rdev *rdev2;
+
+	spin_lock(&all_mddevs_lock);
+	list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
+		rdev_for_each(rdev2, mddev) {
+			if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
+			    md_rdevs_overlap(rdev, rdev2)) {
+				spin_unlock(&all_mddevs_lock);
+				return true;
+			}
+		}
+	}
+	spin_unlock(&all_mddevs_lock);
+	return false;
 }
 
 static int strict_blocks_to_sectors(const char *buf, sector_t *sectors)
@@ -3402,46 +3421,21 @@ rdev_size_store(struct md_rdev *rdev, const char *buf, size_t len)
 		return -EINVAL; /* component must fit device */
 
 	rdev->sectors = sectors;
-	if (sectors > oldsectors && my_mddev->external) {
-		/* Need to check that all other rdevs with the same
-		 * ->bdev do not overlap.  'rcu' is sufficient to walk
-		 * the rdev lists safely.
-		 * This check does not provide a hard guarantee, it
-		 * just helps avoid dangerous mistakes.
-		 */
-		struct mddev *mddev;
-		int overlap = 0;
-		struct list_head *tmp;
 
-		rcu_read_lock();
-		for_each_mddev(mddev, tmp) {
-			struct md_rdev *rdev2;
-
-			rdev_for_each(rdev2, mddev)
-				if (rdev->bdev == rdev2->bdev &&
-				    rdev != rdev2 &&
-				    overlaps(rdev->data_offset, rdev->sectors,
-					     rdev2->data_offset,
-					     rdev2->sectors)) {
-					overlap = 1;
-					break;
-				}
-			if (overlap) {
-				mddev_put(mddev);
-				break;
-			}
-		}
-		rcu_read_unlock();
-		if (overlap) {
-			/* Someone else could have slipped in a size
-			 * change here, but doing so is just silly.
-			 * We put oldsectors back because we *know* it is
-			 * safe, and trust userspace not to race with
-			 * itself
-			 */
-			rdev->sectors = oldsectors;
-			return -EBUSY;
-		}
+	/*
+	 * Check that all other rdevs with the same bdev do not overlap.  This
+	 * check does not provide a hard guarantee, it just helps avoid
+	 * dangerous mistakes.
+	 */
+	if (sectors > oldsectors && my_mddev->external &&
+	    md_rdev_overlaps(rdev)) {
+		/*
+		 * Someone else could have slipped in a size change here, but
+		 * doing so is just silly.  We put oldsectors back because we
+		 * know it is safe, and trust userspace not to race with itself.
+		 */
+		rdev->sectors = oldsectors;
+		return -EBUSY;
 	}
 	return len;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:03   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Just do a plain list_for_each that only grabs a mddev reference in
the case where the thread sleeps and restarts the list iteration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c8a5f9340060a..687f320c7ec4a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8725,7 +8725,6 @@ void md_do_sync(struct md_thread *thread)
 	unsigned long update_time;
 	sector_t mark_cnt[SYNC_MARKS];
 	int last_mark,m;
-	struct list_head *tmp;
 	sector_t last_check;
 	int skipped = 0;
 	struct md_rdev *rdev;
@@ -8789,7 +8788,8 @@ void md_do_sync(struct md_thread *thread)
 	try_again:
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
 			goto skip;
-		for_each_mddev(mddev2, tmp) {
+		spin_lock(&all_mddevs_lock);
+		list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
 			if (mddev2 == mddev)
 				continue;
 			if (!mddev->parallel_resync
@@ -8821,7 +8821,8 @@ void md_do_sync(struct md_thread *thread)
 							desc, mdname(mddev),
 							mdname(mddev2));
 					}
-					mddev_put(mddev2);
+					spin_unlock(&all_mddevs_lock);
+
 					if (signal_pending(current))
 						flush_signals(current);
 					schedule();
@@ -8831,6 +8832,7 @@ void md_do_sync(struct md_thread *thread)
 				finish_wait(&resync_wait, &wq);
 			}
 		}
+		spin_unlock(&all_mddevs_lock);
 	} while (mddev->curr_resync < MD_RESYNC_DELAYED);
 
 	j = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:05   ` Hannes Reinecke
  2022-07-18 20:03   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
reference when we drop the lock.

Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 687f320c7ec4a..44e4071b43148 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9587,11 +9587,13 @@ EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
 static int md_notify_reboot(struct notifier_block *this,
 			    unsigned long code, void *x)
 {
-	struct list_head *tmp;
-	struct mddev *mddev;
+	struct mddev *mddev, *n;
 	int need_delay = 0;
 
-	for_each_mddev(mddev, tmp) {
+	spin_lock(&all_mddevs_lock);
+	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
+		mddev_get(mddev);
+		spin_unlock(&all_mddevs_lock);
 		if (mddev_trylock(mddev)) {
 			if (mddev->pers)
 				__md_stop_writes(mddev);
@@ -9600,7 +9602,11 @@ static int md_notify_reboot(struct notifier_block *this,
 			mddev_unlock(mddev);
 		}
 		need_delay = 1;
+		mddev_put(mddev);
+		spin_lock(&all_mddevs_lock);
 	}
+	spin_unlock(&all_mddevs_lock);
+
 	/*
 	 * certain more exotic SCSI devices are known to be
 	 * volatile wrt too early system reboots. While the
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/10] md: stop using for_each_mddev in md_exit
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:10   ` Hannes Reinecke
  2022-07-18 20:03   ` Logan Gunthorpe
  2022-07-18  6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
  2022-07-18  6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
  9 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
reference when we drop the lock and delete the now unused for_each_mddev
macro.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 44e4071b43148..805f2b4ed9c0d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -368,28 +368,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
 static LIST_HEAD(all_mddevs);
 static DEFINE_SPINLOCK(all_mddevs_lock);
 
-/*
- * iterates through all used mddevs in the system.
- * We take care to grab the all_mddevs_lock whenever navigating
- * the list, and to always hold a refcount when unlocked.
- * Any code which breaks out of this loop while own
- * a reference to the current mddev and must mddev_put it.
- */
-#define for_each_mddev(_mddev,_tmp)					\
-									\
-	for (({ spin_lock(&all_mddevs_lock);				\
-		_tmp = all_mddevs.next;					\
-		_mddev = NULL;});					\
-	     ({ if (_tmp != &all_mddevs)				\
-			mddev_get(list_entry(_tmp, struct mddev, all_mddevs));\
-		spin_unlock(&all_mddevs_lock);				\
-		if (_mddev) mddev_put(_mddev);				\
-		_mddev = list_entry(_tmp, struct mddev, all_mddevs);	\
-		_tmp != &all_mddevs;});					\
-	     ({ spin_lock(&all_mddevs_lock);				\
-		_tmp = _tmp->next;})					\
-		)
-
 /* Rather than calling directly into the personality make_request function,
  * IO requests come here first so that we can check if the device is
  * being suspended pending a reconfiguration.
@@ -9925,8 +9903,7 @@ void md_autostart_arrays(int part)
 
 static __exit void md_exit(void)
 {
-	struct mddev *mddev;
-	struct list_head *tmp;
+	struct mddev *mddev, *n;
 	int delay = 1;
 
 	unregister_blkdev(MD_MAJOR,"md");
@@ -9946,17 +9923,23 @@ static __exit void md_exit(void)
 	}
 	remove_proc_entry("mdstat", NULL);
 
-	for_each_mddev(mddev, tmp) {
+	spin_lock(&all_mddevs_lock);
+	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
+		mddev_get(mddev);
+		spin_unlock(&all_mddevs_lock);
 		export_array(mddev);
 		mddev->ctime = 0;
 		mddev->hold_active = 0;
 		/*
-		 * for_each_mddev() will call mddev_put() at the end of each
-		 * iteration.  As the mddev is now fully clear, this will
-		 * schedule the mddev for destruction by a workqueue, and the
+		 * As the mddev is now fully clear, mddev_put will schedule
+		 * the mddev for destruction by a workqueue, and the
 		 * destroy_workqueue() below will wait for that to complete.
 		 */
+		mddev_put(mddev);
+		spin_lock(&all_mddevs_lock);
 	}
+	spin_unlock(&all_mddevs_lock);
+
 	destroy_workqueue(md_rdev_misc_wq);
 	destroy_workqueue(md_misc_wq);
 	destroy_workqueue(md_wq);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:17   ` Hannes Reinecke
  2022-07-18  6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
  9 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

This ensures device names don't get prematurely reused.  Instead add a
deleted flag to skip already deleted devices in mddev_get and other
places that only want to see live mddevs.

Reported-by; Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
 drivers/md/md.h |  1 +
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 805f2b4ed9c0d..08cf21ad4c2d7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
 
 static inline struct mddev *mddev_get(struct mddev *mddev)
 {
+	lockdep_assert_held(&all_mddevs_lock);
+
+	if (mddev->deleted)
+		return NULL;
 	atomic_inc(&mddev->active);
 	return mddev;
 }
@@ -639,7 +643,7 @@ static void mddev_put(struct mddev *mddev)
 	    mddev->ctime == 0 && !mddev->hold_active) {
 		/* Array is not configured at all, and not held active,
 		 * so destroy it */
-		list_del_init(&mddev->all_mddevs);
+		mddev->deleted = true;
 
 		/*
 		 * Call queue_work inside the spinlock so that
@@ -719,8 +723,8 @@ static struct mddev *mddev_find(dev_t unit)
 
 	spin_lock(&all_mddevs_lock);
 	mddev = mddev_find_locked(unit);
-	if (mddev)
-		mddev_get(mddev);
+	if (mddev && !mddev_get(mddev))
+		mddev = NULL;
 	spin_unlock(&all_mddevs_lock);
 
 	return mddev;
@@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
 
 	spin_lock(&all_mddevs_lock);
 	list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
+		if (mddev->deleted)
+			continue;
 		rdev_for_each(rdev2, mddev) {
 			if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
 			    md_rdevs_overlap(rdev, rdev2)) {
@@ -5525,11 +5531,10 @@ md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 	if (!entry->show)
 		return -EIO;
 	spin_lock(&all_mddevs_lock);
-	if (list_empty(&mddev->all_mddevs)) {
+	if (!mddev_get(mddev)) {
 		spin_unlock(&all_mddevs_lock);
 		return -EBUSY;
 	}
-	mddev_get(mddev);
 	spin_unlock(&all_mddevs_lock);
 
 	rv = entry->show(mddev, page);
@@ -5550,11 +5555,10 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 	spin_lock(&all_mddevs_lock);
-	if (list_empty(&mddev->all_mddevs)) {
+	if (!mddev_get(mddev)) {
 		spin_unlock(&all_mddevs_lock);
 		return -EBUSY;
 	}
-	mddev_get(mddev);
 	spin_unlock(&all_mddevs_lock);
 	rv = entry->store(mddev, page, length);
 	mddev_put(mddev);
@@ -7851,7 +7855,7 @@ static void md_free_disk(struct gendisk *disk)
 	bioset_exit(&mddev->bio_set);
 	bioset_exit(&mddev->sync_set);
 
-	kfree(mddev);
+	mddev_free(mddev);
 }
 
 const struct block_device_operations md_fops =
@@ -8173,6 +8177,8 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
 		if (!l--) {
 			mddev = list_entry(tmp, struct mddev, all_mddevs);
 			mddev_get(mddev);
+			if (!mddev_get(mddev))
+				continue;
 			spin_unlock(&all_mddevs_lock);
 			return mddev;
 		}
@@ -8186,25 +8192,35 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct list_head *tmp;
 	struct mddev *next_mddev, *mddev = v;
+	struct mddev *to_put = NULL;
 
 	++*pos;
 	if (v == (void*)2)
 		return NULL;
 
 	spin_lock(&all_mddevs_lock);
-	if (v == (void*)1)
+	if (v == (void*)1) {
 		tmp = all_mddevs.next;
-	else
+	} else {
+		to_put = mddev;
 		tmp = mddev->all_mddevs.next;
-	if (tmp != &all_mddevs)
-		next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs));
-	else {
-		next_mddev = (void*)2;
-		*pos = 0x10000;
 	}
+
+	for (;;) {
+		if (tmp == &all_mddevs) {
+			next_mddev = (void*)2;
+			*pos = 0x10000;
+			break;
+		}
+		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
+		if (mddev_get(next_mddev))
+			break;
+		mddev = next_mddev;
+		tmp = mddev->all_mddevs.next;
+	};
 	spin_unlock(&all_mddevs_lock);
 
-	if (v != (void*)1)
+	if (to_put)
 		mddev_put(mddev);
 	return next_mddev;
 
@@ -8768,6 +8784,8 @@ void md_do_sync(struct md_thread *thread)
 			goto skip;
 		spin_lock(&all_mddevs_lock);
 		list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
+			if (mddev2->deleted)
+				continue;
 			if (mddev2 == mddev)
 				continue;
 			if (!mddev->parallel_resync
@@ -9570,7 +9588,8 @@ static int md_notify_reboot(struct notifier_block *this,
 
 	spin_lock(&all_mddevs_lock);
 	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
-		mddev_get(mddev);
+		if (!mddev_get(mddev))
+			continue;
 		spin_unlock(&all_mddevs_lock);
 		if (mddev_trylock(mddev)) {
 			if (mddev->pers)
@@ -9925,7 +9944,8 @@ static __exit void md_exit(void)
 
 	spin_lock(&all_mddevs_lock);
 	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
-		mddev_get(mddev);
+		if (!mddev_get(mddev))
+			continue;
 		spin_unlock(&all_mddevs_lock);
 		export_array(mddev);
 		mddev->ctime = 0;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1a85dbe78a71c..bc870e1f1e8c2 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -503,6 +503,7 @@ struct mddev {
 
 	atomic_t			max_corr_read_errors; /* max read retries */
 	struct list_head		all_mddevs;
+	bool				deleted;
 
 	const struct attribute_group	*to_remove;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/10] md: simplify md_open
  2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-07-18  6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
@ 2022-07-18  6:34 ` Christoph Hellwig
  2022-07-18  7:19   ` Hannes Reinecke
  9 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-18  6:34 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

Now that devices are on the all_mddevs list until the gendisk is freed,
there can't be any duplicates.  Remove the global list lookup and just
grab a reference.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 42 +++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08cf21ad4c2d7..9b1e61b4bac8b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7785,45 +7785,33 @@ static int md_set_read_only(struct block_device *bdev, bool ro)
 
 static int md_open(struct block_device *bdev, fmode_t mode)
 {
-	/*
-	 * Succeed if we can lock the mddev, which confirms that
-	 * it isn't being stopped right now.
-	 */
-	struct mddev *mddev = mddev_find(bdev->bd_dev);
+	struct mddev *mddev;
 	int err;
 
+	spin_lock(&all_mddevs_lock);
+	mddev = mddev_get(bdev->bd_disk->private_data);
+	spin_unlock(&all_mddevs_lock);
 	if (!mddev)
 		return -ENODEV;
 
-	if (mddev->gendisk != bdev->bd_disk) {
-		/* we are racing with mddev_put which is discarding this
-		 * bd_disk.
-		 */
-		mddev_put(mddev);
-		/* Wait until bdev->bd_disk is definitely gone */
-		if (work_pending(&mddev->del_work))
-			flush_workqueue(md_misc_wq);
-		return -EBUSY;
-	}
-	BUG_ON(mddev != bdev->bd_disk->private_data);
-
-	if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
+	err = mutex_lock_interruptible(&mddev->open_mutex);
+	if (err)
 		goto out;
 
-	if (test_bit(MD_CLOSING, &mddev->flags)) {
-		mutex_unlock(&mddev->open_mutex);
-		err = -ENODEV;
-		goto out;
-	}
+	err = -ENODEV;
+	if (test_bit(MD_CLOSING, &mddev->flags))
+		goto out_unlock;
 
-	err = 0;
 	atomic_inc(&mddev->openers);
 	mutex_unlock(&mddev->open_mutex);
 
 	bdev_check_media_change(bdev);
- out:
-	if (err)
-		mddev_put(mddev);
+	return 0;
+
+out_unlock:
+	mutex_unlock(&mddev->open_mutex);
+out:
+	mddev_put(mddev);
 	return err;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/10] md: fix mddev->kobj lifetime
  2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
@ 2022-07-18  6:55   ` Hannes Reinecke
  2022-07-18 19:48   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  6:55 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Once a kobject is initialized, the containing object should not be
> directly freed.  So delay initialization until it is added.  Also
> remove the kobject_del call as the last put will remove the kobject as
> well.  The explicitly delete isn't needed here, and dropping it will
> simplify further fixes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b64de313838f2..a49ddc9454ff6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -678,7 +678,6 @@ static void md_safemode_timeout(struct timer_list *t);
>   
>   void mddev_init(struct mddev *mddev)
>   {
> -	kobject_init(&mddev->kobj, &md_ktype);
>   	mutex_init(&mddev->open_mutex);
>   	mutex_init(&mddev->reconfig_mutex);
>   	mutex_init(&mddev->bitmap_info.mutex);
> @@ -5617,7 +5616,6 @@ static void mddev_delayed_delete(struct work_struct *ws)
>   {
>   	struct mddev *mddev = container_of(ws, struct mddev, del_work);
>   
> -	kobject_del(&mddev->kobj);
>   	kobject_put(&mddev->kobj);
>   }
>   
> @@ -5719,6 +5717,7 @@ int md_alloc(dev_t dev, char *name)
>   	if (error)
>   		goto out_cleanup_disk;
>   
> +	kobject_init(&mddev->kobj, &md_ktype);
>   	error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
>   	if (error)
>   		goto out_del_gendisk;
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/10] md: fix error handling in md_alloc
  2022-07-18  6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
@ 2022-07-18  7:00   ` Hannes Reinecke
  2022-07-18 19:48   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:00 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Error handling in md_alloc is a mess.  Untangle it to just free the mddev
> directly before add_disk is called and thus the gendisk is globally
> visible.  After that clear the hold flag and let the mddev_put take care
> of cleaning up the mddev through the usual mechanisms.
> 
> Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
> Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
> Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
>   1 file changed, 32 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/10] md: implement ->free_disk
  2022-07-18  6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
@ 2022-07-18  7:01   ` Hannes Reinecke
  2022-07-18 20:01   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:01 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Ensure that all private data is only freed once all accesses are done.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
Reviewd-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/10] md: rename md_free to md_kobj_release
  2022-07-18  6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
@ 2022-07-18  7:01   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:01 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> The md_free name is rather misleading, so pick a better one.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 1e658d5060842..96b4e901ff6b5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5589,7 +5589,7 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
>   	return rv;
>   }
>   
> -static void md_free(struct kobject *ko)
> +static void md_kobj_release(struct kobject *ko)
>   {
>   	struct mddev *mddev = container_of(ko, struct mddev, kobj);
>   
> @@ -5609,7 +5609,7 @@ static const struct sysfs_ops md_sysfs_ops = {
>   	.store	= md_attr_store,
>   };
>   static struct kobj_type md_ktype = {
> -	.release	= md_free,
> +	.release	= md_kobj_release,
>   	.sysfs_ops	= &md_sysfs_ops,
>   	.default_groups	= md_attr_groups,
>   };
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
  2022-07-18  6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
@ 2022-07-18  7:02   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:02 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> This splits the code into nicely readable chunks and also avoids
> the refcount inc/dec manipulations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 84 +++++++++++++++++++++++--------------------------
>   1 file changed, 39 insertions(+), 45 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
  2022-07-18  6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
@ 2022-07-18  7:03   ` Hannes Reinecke
  2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:03 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a plain list_for_each that only grabs a mddev reference in
> the case where the thread sleeps and restarts the list iteration.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
  2022-07-18  6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
@ 2022-07-18  7:05   ` Hannes Reinecke
  2022-07-18 20:03   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:05 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 687f320c7ec4a..44e4071b43148 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9587,11 +9587,13 @@ EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
>   static int md_notify_reboot(struct notifier_block *this,
>   			    unsigned long code, void *x)
>   {
> -	struct list_head *tmp;
> -	struct mddev *mddev;
> +	struct mddev *mddev, *n;
>   	int need_delay = 0;
>   
> -	for_each_mddev(mddev, tmp) {
> +	spin_lock(&all_mddevs_lock);
> +	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> +		mddev_get(mddev);
> +		spin_unlock(&all_mddevs_lock);
>   		if (mddev_trylock(mddev)) {
>   			if (mddev->pers)
>   				__md_stop_writes(mddev);
> @@ -9600,7 +9602,11 @@ static int md_notify_reboot(struct notifier_block *this,
>   			mddev_unlock(mddev);
>   		}
>   		need_delay = 1;
> +		mddev_put(mddev);
> +		spin_lock(&all_mddevs_lock);
>   	}
> +	spin_unlock(&all_mddevs_lock);
> +
>   	/*
>   	 * certain more exotic SCSI devices are known to be
>   	 * volatile wrt too early system reboots. While the

I wish 'mddev_get()' could fail if 'mddev->active' is zero when we call 
it ... but maybe for another time.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
  2022-07-18  6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
@ 2022-07-18  7:10   ` Hannes Reinecke
  2022-07-19  7:03     ` Christoph Hellwig
  2022-07-18 20:03   ` Logan Gunthorpe
  1 sibling, 1 reply; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:10 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock and delete the now unused for_each_mddev
> macro.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 39 +++++++++++----------------------------
>   1 file changed, 11 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 44e4071b43148..805f2b4ed9c0d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -368,28 +368,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
>   static LIST_HEAD(all_mddevs);
>   static DEFINE_SPINLOCK(all_mddevs_lock);
>   
> -/*
> - * iterates through all used mddevs in the system.
> - * We take care to grab the all_mddevs_lock whenever navigating
> - * the list, and to always hold a refcount when unlocked.
> - * Any code which breaks out of this loop while own
> - * a reference to the current mddev and must mddev_put it.
> - */
> -#define for_each_mddev(_mddev,_tmp)					\
> -									\
> -	for (({ spin_lock(&all_mddevs_lock);				\
> -		_tmp = all_mddevs.next;					\
> -		_mddev = NULL;});					\
> -	     ({ if (_tmp != &all_mddevs)				\
> -			mddev_get(list_entry(_tmp, struct mddev, all_mddevs));\
> -		spin_unlock(&all_mddevs_lock);				\
> -		if (_mddev) mddev_put(_mddev);				\
> -		_mddev = list_entry(_tmp, struct mddev, all_mddevs);	\
> -		_tmp != &all_mddevs;});					\
> -	     ({ spin_lock(&all_mddevs_lock);				\
> -		_tmp = _tmp->next;})					\
> -		)
> -
>   /* Rather than calling directly into the personality make_request function,
>    * IO requests come here first so that we can check if the device is
>    * being suspended pending a reconfiguration.
> @@ -9925,8 +9903,7 @@ void md_autostart_arrays(int part)
>   
>   static __exit void md_exit(void)
>   {
> -	struct mddev *mddev;
> -	struct list_head *tmp;
> +	struct mddev *mddev, *n;
>   	int delay = 1;
>   
>   	unregister_blkdev(MD_MAJOR,"md");
> @@ -9946,17 +9923,23 @@ static __exit void md_exit(void)
>   	}
>   	remove_proc_entry("mdstat", NULL);
>   
> -	for_each_mddev(mddev, tmp) {
> +	spin_lock(&all_mddevs_lock);
> +	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> +		mddev_get(mddev);
> +		spin_unlock(&all_mddevs_lock);
>   		export_array(mddev);
>   		mddev->ctime = 0;
>   		mddev->hold_active = 0;
>   		/*
> -		 * for_each_mddev() will call mddev_put() at the end of each
> -		 * iteration.  As the mddev is now fully clear, this will
> -		 * schedule the mddev for destruction by a workqueue, and the
> +		 * As the mddev is now fully clear, mddev_put will schedule
> +		 * the mddev for destruction by a workqueue, and the
>   		 * destroy_workqueue() below will wait for that to complete.
>   		 */
> +		mddev_put(mddev);
> +		spin_lock(&all_mddevs_lock);
>   	}
> +	spin_unlock(&all_mddevs_lock);
> +
>   	destroy_workqueue(md_rdev_misc_wq);
>   	destroy_workqueue(md_misc_wq);
>   	destroy_workqueue(md_wq);

Having thought about it some more ... wouldn't it make sense to modify 
mddev_get() to

if (atomic_inc_not_zero(&mddev->active))
     return NULL;

to ensure we're not missing any use-after-free issues, which previously 
would have been caught by the 'for_each_mddev()' macro?

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-18  6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
@ 2022-07-18  7:17   ` Hannes Reinecke
  2022-07-18 18:20     ` Song Liu
  2022-07-19  7:14     ` Christoph Hellwig
  0 siblings, 2 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:17 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> This ensures device names don't get prematurely reused.  Instead add a
> deleted flag to skip already deleted devices in mddev_get and other
> places that only want to see live mddevs.
> 
> Reported-by; Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
>   drivers/md/md.h |  1 +
>   2 files changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 805f2b4ed9c0d..08cf21ad4c2d7 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
>   
>   static inline struct mddev *mddev_get(struct mddev *mddev)
>   {
> +	lockdep_assert_held(&all_mddevs_lock);
> +
> +	if (mddev->deleted)
> +		return NULL;
>   	atomic_inc(&mddev->active);
>   	return mddev;
>   }

Why can't we use 'atomic_inc_unless_zero' here and do away with the 
additional 'deleted' flag?

> @@ -639,7 +643,7 @@ static void mddev_put(struct mddev *mddev)
>   	    mddev->ctime == 0 && !mddev->hold_active) {
>   		/* Array is not configured at all, and not held active,
>   		 * so destroy it */
> -		list_del_init(&mddev->all_mddevs);
> +		mddev->deleted = true;
>   
>   		/*
>   		 * Call queue_work inside the spinlock so that
> @@ -719,8 +723,8 @@ static struct mddev *mddev_find(dev_t unit)
>   
>   	spin_lock(&all_mddevs_lock);
>   	mddev = mddev_find_locked(unit);
> -	if (mddev)
> -		mddev_get(mddev);
> +	if (mddev && !mddev_get(mddev))
> +		mddev = NULL;
>   	spin_unlock(&all_mddevs_lock);
>   
>   	return mddev;
> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>   
>   	spin_lock(&all_mddevs_lock);
>   	list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
> +		if (mddev->deleted)
> +			continue;

Any particular reason why you can't use the 'mddev_get()' / 
'mddev_put()' sequence here?
After all, I don't think that 'mddev' should vanish while being in this 
loop, no?

>   		rdev_for_each(rdev2, mddev) {
>   			if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
>   			    md_rdevs_overlap(rdev, rdev2)) {
> @@ -5525,11 +5531,10 @@ md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
>   	if (!entry->show)
>   		return -EIO;
>   	spin_lock(&all_mddevs_lock);
> -	if (list_empty(&mddev->all_mddevs)) {
> +	if (!mddev_get(mddev)) { >   		spin_unlock(&all_mddevs_lock);
>   		return -EBUSY;
>   	}
> -	mddev_get(mddev);
>   	spin_unlock(&all_mddevs_lock);
>   
>   	rv = entry->show(mddev, page);
> @@ -5550,11 +5555,10 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EACCES;
>   	spin_lock(&all_mddevs_lock);
> -	if (list_empty(&mddev->all_mddevs)) {
> +	if (!mddev_get(mddev)) {
>   		spin_unlock(&all_mddevs_lock);
>   		return -EBUSY;
>   	}
> -	mddev_get(mddev);
>   	spin_unlock(&all_mddevs_lock);
>   	rv = entry->store(mddev, page, length);
>   	mddev_put(mddev);
> @@ -7851,7 +7855,7 @@ static void md_free_disk(struct gendisk *disk)
>   	bioset_exit(&mddev->bio_set);
>   	bioset_exit(&mddev->sync_set);
>   
> -	kfree(mddev);
> +	mddev_free(mddev);
>   }
>   
>   const struct block_device_operations md_fops =
> @@ -8173,6 +8177,8 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
>   		if (!l--) {
>   			mddev = list_entry(tmp, struct mddev, all_mddevs);
>   			mddev_get(mddev);
> +			if (!mddev_get(mddev))
> +				continue;
>   			spin_unlock(&all_mddevs_lock);
>   			return mddev;
>   		}
> @@ -8186,25 +8192,35 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>   {
>   	struct list_head *tmp;
>   	struct mddev *next_mddev, *mddev = v;
> +	struct mddev *to_put = NULL;
>   
>   	++*pos;
>   	if (v == (void*)2)
>   		return NULL;
>   
>   	spin_lock(&all_mddevs_lock);
> -	if (v == (void*)1)
> +	if (v == (void*)1) {
>   		tmp = all_mddevs.next;
> -	else
> +	} else {
> +		to_put = mddev;
>   		tmp = mddev->all_mddevs.next;
> -	if (tmp != &all_mddevs)
> -		next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs));
> -	else {
> -		next_mddev = (void*)2;
> -		*pos = 0x10000;
>   	}
> +
> +	for (;;) {
> +		if (tmp == &all_mddevs) {
> +			next_mddev = (void*)2;
> +			*pos = 0x10000;
> +			break;
> +		}
> +		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
> +		if (mddev_get(next_mddev))
> +			break;
> +		mddev = next_mddev;
> +		tmp = mddev->all_mddevs.next;
> +	};
>   	spin_unlock(&all_mddevs_lock);
>   
> -	if (v != (void*)1)
> +	if (to_put)
>   		mddev_put(mddev);
>   	return next_mddev;
>   
> @@ -8768,6 +8784,8 @@ void md_do_sync(struct md_thread *thread)
>   			goto skip;
>   		spin_lock(&all_mddevs_lock);
>   		list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
> +			if (mddev2->deleted)
> +				continue;
>   			if (mddev2 == mddev)
>   				continue;
>   			if (!mddev->parallel_resync
> @@ -9570,7 +9588,8 @@ static int md_notify_reboot(struct notifier_block *this,
>   
>   	spin_lock(&all_mddevs_lock);
>   	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> -		mddev_get(mddev);
> +		if (!mddev_get(mddev))
> +			continue;
>   		spin_unlock(&all_mddevs_lock);
>   		if (mddev_trylock(mddev)) {
>   			if (mddev->pers)
> @@ -9925,7 +9944,8 @@ static __exit void md_exit(void)
>   
>   	spin_lock(&all_mddevs_lock);
>   	list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> -		mddev_get(mddev);
> +		if (!mddev_get(mddev))
> +			continue;
>   		spin_unlock(&all_mddevs_lock);
>   		export_array(mddev);
>   		mddev->ctime = 0;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1a85dbe78a71c..bc870e1f1e8c2 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -503,6 +503,7 @@ struct mddev {
>   
>   	atomic_t			max_corr_read_errors; /* max read retries */
>   	struct list_head		all_mddevs;
> +	bool				deleted;
>   
>   	const struct attribute_group	*to_remove;
>   

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 10/10] md: simplify md_open
  2022-07-18  6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
@ 2022-07-18  7:19   ` Hannes Reinecke
  0 siblings, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-18  7:19 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/18/22 08:34, Christoph Hellwig wrote:
> Now that devices are on the all_mddevs list until the gendisk is freed,
> there can't be any duplicates.  Remove the global list lookup and just
> grab a reference.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 42 +++++++++++++++---------------------------
>   1 file changed, 15 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 08cf21ad4c2d7..9b1e61b4bac8b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7785,45 +7785,33 @@ static int md_set_read_only(struct block_device *bdev, bool ro)
>   
>   static int md_open(struct block_device *bdev, fmode_t mode)
>   {
> -	/*
> -	 * Succeed if we can lock the mddev, which confirms that
> -	 * it isn't being stopped right now.
> -	 */
> -	struct mddev *mddev = mddev_find(bdev->bd_dev);
> +	struct mddev *mddev;
>   	int err;
>   
> +	spin_lock(&all_mddevs_lock);
> +	mddev = mddev_get(bdev->bd_disk->private_data);
> +	spin_unlock(&all_mddevs_lock);
>   	if (!mddev)
>   		return -ENODEV;
>   
> -	if (mddev->gendisk != bdev->bd_disk) {
> -		/* we are racing with mddev_put which is discarding this
> -		 * bd_disk.
> -		 */
> -		mddev_put(mddev);
> -		/* Wait until bdev->bd_disk is definitely gone */
> -		if (work_pending(&mddev->del_work))
> -			flush_workqueue(md_misc_wq);
> -		return -EBUSY;
> -	}
> -	BUG_ON(mddev != bdev->bd_disk->private_data);
> -
> -	if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
> +	err = mutex_lock_interruptible(&mddev->open_mutex);
> +	if (err)
>   		goto out;
>   
> -	if (test_bit(MD_CLOSING, &mddev->flags)) {
> -		mutex_unlock(&mddev->open_mutex);
> -		err = -ENODEV;
> -		goto out;
> -	}
> +	err = -ENODEV;
> +	if (test_bit(MD_CLOSING, &mddev->flags))
> +		goto out_unlock;
>   
> -	err = 0;
>   	atomic_inc(&mddev->openers);
>   	mutex_unlock(&mddev->open_mutex);
>   
>   	bdev_check_media_change(bdev);
> - out:
> -	if (err)
> -		mddev_put(mddev);
> +	return 0;
> +
> +out_unlock:
> +	mutex_unlock(&mddev->open_mutex);
> +out:
> +	mddev_put(mddev);
>   	return err;
>   }
>   

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-18  7:17   ` Hannes Reinecke
@ 2022-07-18 18:20     ` Song Liu
  2022-07-19  5:06       ` Christoph Hellwig
  2022-07-19  7:14     ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: Song Liu @ 2022-07-18 18:20 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Logan Gunthorpe, linux-raid, linux-block

On Mon, Jul 18, 2022 at 12:17 AM Hannes Reinecke <hare@suse.de> wrote:
>
> On 7/18/22 08:34, Christoph Hellwig wrote:
> > This ensures device names don't get prematurely reused.  Instead add a
> > deleted flag to skip already deleted devices in mddev_get and other
> > places that only want to see live mddevs.
> >
> > Reported-by; Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >   drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
> >   drivers/md/md.h |  1 +
> >   2 files changed, 39 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 805f2b4ed9c0d..08cf21ad4c2d7 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
> >
> >   static inline struct mddev *mddev_get(struct mddev *mddev)
> >   {
> > +     lockdep_assert_held(&all_mddevs_lock);
> > +
> > +     if (mddev->deleted)
> > +             return NULL;
> >       atomic_inc(&mddev->active);
> >       return mddev;
> >   }
>
> Why can't we use 'atomic_inc_unless_zero' here and do away with the
> additional 'deleted' flag?

Thanks Christoph for the set. And thanks Hannes for the quick review!

I think atomic_inc_unless_zero() should work here. Alternatively, we can
also grab a big from mddev->flags as MD_DELETED.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/10] md: fix mddev->kobj lifetime
  2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
  2022-07-18  6:55   ` Hannes Reinecke
@ 2022-07-18 19:48   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Once a kobject is initialized, the containing object should not be
> directly freed.  So delay initialization until it is added.  Also
> remove the kobject_del call as the last put will remove the kobject as
> well.  The explicitly delete isn't needed here, and dropping it will
> simplify further fixes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/10] md: fix error handling in md_alloc
  2022-07-18  6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
  2022-07-18  7:00   ` Hannes Reinecke
@ 2022-07-18 19:48   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 19:48 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Error handling in md_alloc is a mess.  Untangle it to just free the mddev
> directly before add_disk is called and thus the gendisk is globally
> visible.  After that clear the hold flag and let the mddev_put take care
> of cleaning up the mddev through the usual mechanisms.
> 
> Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
> Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
> Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
> Signed-off-by: Christoph Hellwig <hch@lst.de>


Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/10] md: implement ->free_disk
  2022-07-18  6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
  2022-07-18  7:01   ` Hannes Reinecke
@ 2022-07-18 20:01   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:01 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Ensure that all private data is only freed once all accesses are done.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/md.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 64c7c24d267bc..1e658d5060842 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5602,11 +5602,6 @@ static void md_free(struct kobject *ko)
>  		del_gendisk(mddev->gendisk);
>  		blk_cleanup_disk(mddev->gendisk);
>  	}

Looks like, with the previous patch, we can now remove the conditional
on gendisk. Other than that it looks good:

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/10] md: rename md_free to md_kobj_release
  2022-07-18  6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
  2022-07-18  7:01   ` Hannes Reinecke
@ 2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> The md_free name is rather misleading, so pick a better one.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
  2022-07-18  6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
  2022-07-18  7:02   ` Hannes Reinecke
@ 2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> This splits the code into nicely readable chunks and also avoids
> the refcount inc/dec manipulations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
  2022-07-18  6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
  2022-07-18  7:03   ` Hannes Reinecke
@ 2022-07-18 20:02   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a plain list_for_each that only grabs a mddev reference in
> the case where the thread sleeps and restarts the list iteration.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
  2022-07-18  6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
  2022-07-18  7:05   ` Hannes Reinecke
@ 2022-07-18 20:03   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:03 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
  2022-07-18  6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
  2022-07-18  7:10   ` Hannes Reinecke
@ 2022-07-18 20:03   ` Logan Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:03 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block



On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock and delete the now unused for_each_mddev
> macro.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-18 18:20     ` Song Liu
@ 2022-07-19  5:06       ` Christoph Hellwig
  2022-07-19  6:28         ` Song Liu
  2022-07-19  7:00         ` Hannes Reinecke
  0 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-19  5:06 UTC (permalink / raw)
  To: Song Liu
  Cc: Hannes Reinecke, Christoph Hellwig, Logan Gunthorpe, linux-raid,
	linux-block

On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
> > >   static inline struct mddev *mddev_get(struct mddev *mddev)
> > >   {
> > > +     lockdep_assert_held(&all_mddevs_lock);
> > > +
> > > +     if (mddev->deleted)
> > > +             return NULL;
> > >       atomic_inc(&mddev->active);
> > >       return mddev;
> > >   }
> >
> > Why can't we use 'atomic_inc_unless_zero' here and do away with the
> > additional 'deleted' flag?

> I think atomic_inc_unless_zero() should work here. Alternatively, we can
> also grab a big from mddev->flags as MD_DELETED.

s/big/bit/ ?

Yes, this could use mddev->flags.  I don't think atomic_inc_unless_zero
would work, as an array can have a refcount of 0 but we still allow
to take new references to it, the deletion only happens if the other
conditions in mddev_put are met.

Do you want me to repin the series using a flag?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-19  5:06       ` Christoph Hellwig
@ 2022-07-19  6:28         ` Song Liu
  2022-07-19  7:00         ` Hannes Reinecke
  1 sibling, 0 replies; 38+ messages in thread
From: Song Liu @ 2022-07-19  6:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hannes Reinecke, Logan Gunthorpe, linux-raid, linux-block

On Mon, Jul 18, 2022 at 10:07 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
> > > >   static inline struct mddev *mddev_get(struct mddev *mddev)
> > > >   {
> > > > +     lockdep_assert_held(&all_mddevs_lock);
> > > > +
> > > > +     if (mddev->deleted)
> > > > +             return NULL;
> > > >       atomic_inc(&mddev->active);
> > > >       return mddev;
> > > >   }
> > >
> > > Why can't we use 'atomic_inc_unless_zero' here and do away with the
> > > additional 'deleted' flag?
>
> > I think atomic_inc_unless_zero() should work here. Alternatively, we can
> > also grab a big from mddev->flags as MD_DELETED.
>
> s/big/bit/ ?
yeah, bit...

>
> Yes, this could use mddev->flags.  I don't think atomic_inc_unless_zero
> would work, as an array can have a refcount of 0 but we still allow
> to take new references to it, the deletion only happens if the other
> conditions in mddev_put are met.
>
> Do you want me to repin the series using a flag?

Respin the series sounds good. Please also address Hannes' feedback
on 8/10.

Thanks,
Song

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-19  5:06       ` Christoph Hellwig
  2022-07-19  6:28         ` Song Liu
@ 2022-07-19  7:00         ` Hannes Reinecke
  1 sibling, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-19  7:00 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block

On 7/19/22 07:06, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
>>>>    static inline struct mddev *mddev_get(struct mddev *mddev)
>>>>    {
>>>> +     lockdep_assert_held(&all_mddevs_lock);
>>>> +
>>>> +     if (mddev->deleted)
>>>> +             return NULL;
>>>>        atomic_inc(&mddev->active);
>>>>        return mddev;
>>>>    }
>>>
>>> Why can't we use 'atomic_inc_unless_zero' here and do away with the
>>> additional 'deleted' flag?
> 
>> I think atomic_inc_unless_zero() should work here. Alternatively, we can
>> also grab a big from mddev->flags as MD_DELETED.
> 
> s/big/bit/ ?
> 
> Yes, this could use mddev->flags.  I don't think atomic_inc_unless_zero
> would work, as an array can have a refcount of 0 but we still allow
> to take new references to it, the deletion only happens if the other
> conditions in mddev_put are met.
> 
> Do you want me to repin the series using a flag?

I would vote for it.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
  2022-07-18  7:10   ` Hannes Reinecke
@ 2022-07-19  7:03     ` Christoph Hellwig
  2022-07-19  7:06       ` Hannes Reinecke
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-19  7:03 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Song Liu, Logan Gunthorpe, linux-raid, linux-block

On Mon, Jul 18, 2022 at 09:10:59AM +0200, Hannes Reinecke wrote:
>
> Having thought about it some more ... wouldn't it make sense to modify 
> mddev_get() to
>
> if (atomic_inc_not_zero(&mddev->active))
>     return NULL;
>
> to ensure we're not missing any use-after-free issues, which previously 
> would have been caught by the 'for_each_mddev()' macro?

No.  mddev->active = 0 can still be a perfectly valid mddev and they
are not automatically deleted.  Check the conditions in mddev_put.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
  2022-07-19  7:03     ` Christoph Hellwig
@ 2022-07-19  7:06       ` Hannes Reinecke
  0 siblings, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-19  7:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Logan Gunthorpe, linux-raid, linux-block

On 7/19/22 09:03, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 09:10:59AM +0200, Hannes Reinecke wrote:
>>
>> Having thought about it some more ... wouldn't it make sense to modify
>> mddev_get() to
>>
>> if (atomic_inc_not_zero(&mddev->active))
>>      return NULL;
>>
>> to ensure we're not missing any use-after-free issues, which previously
>> would have been caught by the 'for_each_mddev()' macro?
> 
> No.  mddev->active = 0 can still be a perfectly valid mddev and they
> are not automatically deleted.  Check the conditions in mddev_put.

Sigh. Okay.
MD is a mess anyway.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-18  7:17   ` Hannes Reinecke
  2022-07-18 18:20     ` Song Liu
@ 2022-07-19  7:14     ` Christoph Hellwig
  2022-07-19  7:59       ` Hannes Reinecke
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-19  7:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Song Liu, Logan Gunthorpe, linux-raid, linux-block

On Mon, Jul 18, 2022 at 09:17:42AM +0200, Hannes Reinecke wrote:
> Why can't we use 'atomic_inc_unless_zero' here and do away with the 
> additional 'deleted' flag?

See my previous reply.

>> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>>     	spin_lock(&all_mddevs_lock);
>>   	list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
>> +		if (mddev->deleted)
>> +			continue;
>
> Any particular reason why you can't use the 'mddev_get()' / 'mddev_put()' 
> sequence here?

Mostly because it would be pretty mess.  mdev_put takes all_mddevs_lock,
so we'd have to add an unlock/relock cycle for each loop iteration.

> After all, I don't think that 'mddev' should vanish while being in this 
> loop, no?

It won't, at least without the call to mddev_put.  Once mddev_put is
in the game things aren't that easy, and I suspect the exising and
new code might have bugs in that area in the reboot notifier and
md_exit for extreme corner cases.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
  2022-07-19  7:14     ` Christoph Hellwig
@ 2022-07-19  7:59       ` Hannes Reinecke
  0 siblings, 0 replies; 38+ messages in thread
From: Hannes Reinecke @ 2022-07-19  7:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Song Liu, Logan Gunthorpe, linux-raid, linux-block

On 7/19/22 09:14, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 09:17:42AM +0200, Hannes Reinecke wrote:
>> Why can't we use 'atomic_inc_unless_zero' here and do away with the
>> additional 'deleted' flag?
> 
> See my previous reply.
> 
>>> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>>>      	spin_lock(&all_mddevs_lock);
>>>    	list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
>>> +		if (mddev->deleted)
>>> +			continue;
>>
>> Any particular reason why you can't use the 'mddev_get()' / 'mddev_put()'
>> sequence here?
> 
> Mostly because it would be pretty mess.  mdev_put takes all_mddevs_lock,
> so we'd have to add an unlock/relock cycle for each loop iteration.
> 
>> After all, I don't think that 'mddev' should vanish while being in this
>> loop, no?
> 
> It won't, at least without the call to mddev_put.  Once mddev_put is
> in the game things aren't that easy, and I suspect the exising and
> new code might have bugs in that area in the reboot notifier and
> md_exit for extreme corner cases.

And again, MD mess.
But thanks for start clearing it up.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 02/10] md: fix error handling in md_alloc
  2022-07-19  9:18 fix md disk_name lifetime problems v4 Christoph Hellwig
@ 2022-07-19  9:18 ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-07-19  9:18 UTC (permalink / raw)
  To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block, Hannes Reinecke

Error handling in md_alloc is a mess.  Untangle it to just free the mddev
directly before add_disk is called and thus the gendisk is globally
visible.  After that clear the hold flag and let the mddev_put take care
of cleaning up the mddev through the usual mechanisms.

Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7829045fc7fd9..f1c4f8ccb939d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -790,6 +790,15 @@ static struct mddev *mddev_alloc(dev_t unit)
 	return ERR_PTR(error);
 }
 
+static void mddev_free(struct mddev *mddev)
+{
+	spin_lock(&all_mddevs_lock);
+	list_del(&mddev->all_mddevs);
+	spin_unlock(&all_mddevs_lock);
+
+	kfree(mddev);
+}
+
 static const struct attribute_group md_redundancy_group;
 
 void mddev_unlock(struct mddev *mddev)
@@ -5661,8 +5670,8 @@ int md_alloc(dev_t dev, char *name)
 	mutex_lock(&disks_mutex);
 	mddev = mddev_alloc(dev);
 	if (IS_ERR(mddev)) {
-		mutex_unlock(&disks_mutex);
-		return PTR_ERR(mddev);
+		error = PTR_ERR(mddev);
+		goto out_unlock;
 	}
 
 	partitioned = (MAJOR(mddev->unit) != MD_MAJOR);
@@ -5680,7 +5689,7 @@ int md_alloc(dev_t dev, char *name)
 			    strcmp(mddev2->gendisk->disk_name, name) == 0) {
 				spin_unlock(&all_mddevs_lock);
 				error = -EEXIST;
-				goto out_unlock_disks_mutex;
+				goto out_free_mddev;
 			}
 		spin_unlock(&all_mddevs_lock);
 	}
@@ -5693,7 +5702,7 @@ int md_alloc(dev_t dev, char *name)
 	error = -ENOMEM;
 	disk = blk_alloc_disk(NUMA_NO_NODE);
 	if (!disk)
-		goto out_unlock_disks_mutex;
+		goto out_free_mddev;
 
 	disk->major = MAJOR(mddev->unit);
 	disk->first_minor = unit << shift;
@@ -5714,26 +5723,36 @@ int md_alloc(dev_t dev, char *name)
 	mddev->gendisk = disk;
 	error = add_disk(disk);
 	if (error)
-		goto out_cleanup_disk;
+		goto out_put_disk;
 
 	kobject_init(&mddev->kobj, &md_ktype);
 	error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
-	if (error)
-		goto out_del_gendisk;
+	if (error) {
+		/*
+		 * The disk is already live at this point.  Clear the hold flag
+		 * and let mddev_put take care of the deletion, as it isn't any
+		 * different from a normal close on last release now.
+		 */
+		mddev->hold_active = 0;
+		goto done;
+	}
 
 	kobject_uevent(&mddev->kobj, KOBJ_ADD);
 	mddev->sysfs_state = sysfs_get_dirent_safe(mddev->kobj.sd, "array_state");
 	mddev->sysfs_level = sysfs_get_dirent_safe(mddev->kobj.sd, "level");
-	goto out_unlock_disks_mutex;
 
-out_del_gendisk:
-	del_gendisk(disk);
-out_cleanup_disk:
-	blk_cleanup_disk(disk);
-out_unlock_disks_mutex:
+done:
 	mutex_unlock(&disks_mutex);
 	mddev_put(mddev);
 	return error;
+
+out_put_disk:
+	put_disk(disk);
+out_free_mddev:
+	mddev_free(mddev);
+out_unlock:
+	mutex_unlock(&disks_mutex);
+	return error;
 }
 
 static void md_probe(dev_t dev)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-07-19  9:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18  6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
2022-07-18  6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
2022-07-18  6:55   ` Hannes Reinecke
2022-07-18 19:48   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
2022-07-18  7:00   ` Hannes Reinecke
2022-07-18 19:48   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
2022-07-18  7:01   ` Hannes Reinecke
2022-07-18 20:01   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
2022-07-18  7:01   ` Hannes Reinecke
2022-07-18 20:02   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
2022-07-18  7:02   ` Hannes Reinecke
2022-07-18 20:02   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
2022-07-18  7:03   ` Hannes Reinecke
2022-07-18 20:02   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
2022-07-18  7:05   ` Hannes Reinecke
2022-07-18 20:03   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
2022-07-18  7:10   ` Hannes Reinecke
2022-07-19  7:03     ` Christoph Hellwig
2022-07-19  7:06       ` Hannes Reinecke
2022-07-18 20:03   ` Logan Gunthorpe
2022-07-18  6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
2022-07-18  7:17   ` Hannes Reinecke
2022-07-18 18:20     ` Song Liu
2022-07-19  5:06       ` Christoph Hellwig
2022-07-19  6:28         ` Song Liu
2022-07-19  7:00         ` Hannes Reinecke
2022-07-19  7:14     ` Christoph Hellwig
2022-07-19  7:59       ` Hannes Reinecke
2022-07-18  6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
2022-07-18  7:19   ` Hannes Reinecke
2022-07-19  9:18 fix md disk_name lifetime problems v4 Christoph Hellwig
2022-07-19  9:18 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.