* [PATCH 01/10] md: fix mddev->kobj lifetime
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 6:55 ` Hannes Reinecke
2022-07-18 19:48 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
` (8 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Once a kobject is initialized, the containing object should not be
directly freed. So delay initialization until it is added. Also
remove the kobject_del call as the last put will remove the kobject as
well. The explicitly delete isn't needed here, and dropping it will
simplify further fixes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b64de313838f2..a49ddc9454ff6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -678,7 +678,6 @@ static void md_safemode_timeout(struct timer_list *t);
void mddev_init(struct mddev *mddev)
{
- kobject_init(&mddev->kobj, &md_ktype);
mutex_init(&mddev->open_mutex);
mutex_init(&mddev->reconfig_mutex);
mutex_init(&mddev->bitmap_info.mutex);
@@ -5617,7 +5616,6 @@ static void mddev_delayed_delete(struct work_struct *ws)
{
struct mddev *mddev = container_of(ws, struct mddev, del_work);
- kobject_del(&mddev->kobj);
kobject_put(&mddev->kobj);
}
@@ -5719,6 +5717,7 @@ int md_alloc(dev_t dev, char *name)
if (error)
goto out_cleanup_disk;
+ kobject_init(&mddev->kobj, &md_ktype);
error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
if (error)
goto out_del_gendisk;
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 01/10] md: fix mddev->kobj lifetime
2022-07-18 6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
@ 2022-07-18 6:55 ` Hannes Reinecke
2022-07-18 19:48 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 6:55 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Once a kobject is initialized, the containing object should not be
> directly freed. So delay initialization until it is added. Also
> remove the kobject_del call as the last put will remove the kobject as
> well. The explicitly delete isn't needed here, and dropping it will
> simplify further fixes.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b64de313838f2..a49ddc9454ff6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -678,7 +678,6 @@ static void md_safemode_timeout(struct timer_list *t);
>
> void mddev_init(struct mddev *mddev)
> {
> - kobject_init(&mddev->kobj, &md_ktype);
> mutex_init(&mddev->open_mutex);
> mutex_init(&mddev->reconfig_mutex);
> mutex_init(&mddev->bitmap_info.mutex);
> @@ -5617,7 +5616,6 @@ static void mddev_delayed_delete(struct work_struct *ws)
> {
> struct mddev *mddev = container_of(ws, struct mddev, del_work);
>
> - kobject_del(&mddev->kobj);
> kobject_put(&mddev->kobj);
> }
>
> @@ -5719,6 +5717,7 @@ int md_alloc(dev_t dev, char *name)
> if (error)
> goto out_cleanup_disk;
>
> + kobject_init(&mddev->kobj, &md_ktype);
> error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
> if (error)
> goto out_del_gendisk;
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 01/10] md: fix mddev->kobj lifetime
2022-07-18 6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
2022-07-18 6:55 ` Hannes Reinecke
@ 2022-07-18 19:48 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 19:48 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Once a kobject is initialized, the containing object should not be
> directly freed. So delay initialization until it is added. Also
> remove the kobject_del call as the last put will remove the kobject as
> well. The explicitly delete isn't needed here, and dropping it will
> simplify further fixes.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 02/10] md: fix error handling in md_alloc
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
2022-07-18 6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:00 ` Hannes Reinecke
2022-07-18 19:48 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
` (7 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Error handling in md_alloc is a mess. Untangle it to just free the mddev
directly before add_disk is called and thus the gendisk is globally
visible. After that clear the hold flag and let the mddev_put take care
of cleaning up the mddev through the usual mechanisms.
Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index a49ddc9454ff6..64c7c24d267bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -790,6 +790,15 @@ static struct mddev *mddev_alloc(dev_t unit)
return ERR_PTR(error);
}
+static void mddev_free(struct mddev *mddev)
+{
+ spin_lock(&all_mddevs_lock);
+ list_del(&mddev->all_mddevs);
+ spin_unlock(&all_mddevs_lock);
+
+ kfree(mddev);
+}
+
static const struct attribute_group md_redundancy_group;
void mddev_unlock(struct mddev *mddev)
@@ -5662,8 +5671,8 @@ int md_alloc(dev_t dev, char *name)
mutex_lock(&disks_mutex);
mddev = mddev_alloc(dev);
if (IS_ERR(mddev)) {
- mutex_unlock(&disks_mutex);
- return PTR_ERR(mddev);
+ error = PTR_ERR(mddev);
+ goto out_unlock;
}
partitioned = (MAJOR(mddev->unit) != MD_MAJOR);
@@ -5681,7 +5690,7 @@ int md_alloc(dev_t dev, char *name)
strcmp(mddev2->gendisk->disk_name, name) == 0) {
spin_unlock(&all_mddevs_lock);
error = -EEXIST;
- goto out_unlock_disks_mutex;
+ goto out_free_mddev;
}
spin_unlock(&all_mddevs_lock);
}
@@ -5694,7 +5703,7 @@ int md_alloc(dev_t dev, char *name)
error = -ENOMEM;
disk = blk_alloc_disk(NUMA_NO_NODE);
if (!disk)
- goto out_unlock_disks_mutex;
+ goto out_free_mddev;
disk->major = MAJOR(mddev->unit);
disk->first_minor = unit << shift;
@@ -5715,26 +5724,36 @@ int md_alloc(dev_t dev, char *name)
mddev->gendisk = disk;
error = add_disk(disk);
if (error)
- goto out_cleanup_disk;
+ goto out_put_disk;
kobject_init(&mddev->kobj, &md_ktype);
error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md");
- if (error)
- goto out_del_gendisk;
+ if (error) {
+ /*
+ * The disk is already live at this point. Clear the hold flag
+ * and let mddev_put take care of the deletion, as it isn't any
+ * different from a normal close on last release now.
+ */
+ mddev->hold_active = 0;
+ goto done;
+ }
kobject_uevent(&mddev->kobj, KOBJ_ADD);
mddev->sysfs_state = sysfs_get_dirent_safe(mddev->kobj.sd, "array_state");
mddev->sysfs_level = sysfs_get_dirent_safe(mddev->kobj.sd, "level");
- goto out_unlock_disks_mutex;
-out_del_gendisk:
- del_gendisk(disk);
-out_cleanup_disk:
- blk_cleanup_disk(disk);
-out_unlock_disks_mutex:
+done:
mutex_unlock(&disks_mutex);
mddev_put(mddev);
return error;
+
+out_put_disk:
+ put_disk(disk);
+out_free_mddev:
+ mddev_free(mddev);
+out_unlock:
+ mutex_unlock(&disks_mutex);
+ return error;
}
static void md_probe(dev_t dev)
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 02/10] md: fix error handling in md_alloc
2022-07-18 6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
@ 2022-07-18 7:00 ` Hannes Reinecke
2022-07-18 19:48 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:00 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Error handling in md_alloc is a mess. Untangle it to just free the mddev
> directly before add_disk is called and thus the gendisk is globally
> visible. After that clear the hold flag and let the mddev_put take care
> of cleaning up the mddev through the usual mechanisms.
>
> Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
> Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
> Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
> 1 file changed, 32 insertions(+), 13 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 02/10] md: fix error handling in md_alloc
2022-07-18 6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
2022-07-18 7:00 ` Hannes Reinecke
@ 2022-07-18 19:48 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 19:48 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Error handling in md_alloc is a mess. Untangle it to just free the mddev
> directly before add_disk is called and thus the gendisk is globally
> visible. After that clear the hold flag and let the mddev_put take care
> of cleaning up the mddev through the usual mechanisms.
>
> Fixes: 5e55e2f5fc95 ("[PATCH] md: convert compile time warnings into runtime warnings")
> Fixes: 9be68dd7ac0e ("md: add error handling support for add_disk()")
> Fixes: 7ad1069166c0 ("md: properly unwind when failing to add the kobject in md_alloc")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 03/10] md: implement ->free_disk
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
2022-07-18 6:34 ` [PATCH 01/10] md: fix mddev->kobj lifetime Christoph Hellwig
2022-07-18 6:34 ` [PATCH 02/10] md: fix error handling in md_alloc Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:01 ` Hannes Reinecke
2022-07-18 20:01 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
` (6 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Ensure that all private data is only freed once all accesses are done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 64c7c24d267bc..1e658d5060842 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5602,11 +5602,6 @@ static void md_free(struct kobject *ko)
del_gendisk(mddev->gendisk);
blk_cleanup_disk(mddev->gendisk);
}
- percpu_ref_exit(&mddev->writes_pending);
-
- bioset_exit(&mddev->bio_set);
- bioset_exit(&mddev->sync_set);
- kfree(mddev);
}
static const struct sysfs_ops md_sysfs_ops = {
@@ -7876,6 +7871,17 @@ static unsigned int md_check_events(struct gendisk *disk, unsigned int clearing)
return ret;
}
+static void md_free_disk(struct gendisk *disk)
+{
+ struct mddev *mddev = disk->private_data;
+
+ percpu_ref_exit(&mddev->writes_pending);
+ bioset_exit(&mddev->bio_set);
+ bioset_exit(&mddev->sync_set);
+
+ kfree(mddev);
+}
+
const struct block_device_operations md_fops =
{
.owner = THIS_MODULE,
@@ -7889,6 +7895,7 @@ const struct block_device_operations md_fops =
.getgeo = md_getgeo,
.check_events = md_check_events,
.set_read_only = md_set_read_only,
+ .free_disk = md_free_disk,
};
static int md_thread(void *arg)
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 03/10] md: implement ->free_disk
2022-07-18 6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
@ 2022-07-18 7:01 ` Hannes Reinecke
2022-07-18 20:01 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:01 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Ensure that all private data is only freed once all accesses are done.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
Reviewd-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 03/10] md: implement ->free_disk
2022-07-18 6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
2022-07-18 7:01 ` Hannes Reinecke
@ 2022-07-18 20:01 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:01 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Ensure that all private data is only freed once all accesses are done.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 64c7c24d267bc..1e658d5060842 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5602,11 +5602,6 @@ static void md_free(struct kobject *ko)
> del_gendisk(mddev->gendisk);
> blk_cleanup_disk(mddev->gendisk);
> }
Looks like, with the previous patch, we can now remove the conditional
on gendisk. Other than that it looks good:
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 04/10] md: rename md_free to md_kobj_release
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (2 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 03/10] md: implement ->free_disk Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:01 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
` (5 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
The md_free name is rather misleading, so pick a better one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1e658d5060842..96b4e901ff6b5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5589,7 +5589,7 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
return rv;
}
-static void md_free(struct kobject *ko)
+static void md_kobj_release(struct kobject *ko)
{
struct mddev *mddev = container_of(ko, struct mddev, kobj);
@@ -5609,7 +5609,7 @@ static const struct sysfs_ops md_sysfs_ops = {
.store = md_attr_store,
};
static struct kobj_type md_ktype = {
- .release = md_free,
+ .release = md_kobj_release,
.sysfs_ops = &md_sysfs_ops,
.default_groups = md_attr_groups,
};
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 04/10] md: rename md_free to md_kobj_release
2022-07-18 6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
@ 2022-07-18 7:01 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:01 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> The md_free name is rather misleading, so pick a better one.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 1e658d5060842..96b4e901ff6b5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5589,7 +5589,7 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
> return rv;
> }
>
> -static void md_free(struct kobject *ko)
> +static void md_kobj_release(struct kobject *ko)
> {
> struct mddev *mddev = container_of(ko, struct mddev, kobj);
>
> @@ -5609,7 +5609,7 @@ static const struct sysfs_ops md_sysfs_ops = {
> .store = md_attr_store,
> };
> static struct kobj_type md_ktype = {
> - .release = md_free,
> + .release = md_kobj_release,
> .sysfs_ops = &md_sysfs_ops,
> .default_groups = md_attr_groups,
> };
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 04/10] md: rename md_free to md_kobj_release
2022-07-18 6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
2022-07-18 7:01 ` Hannes Reinecke
@ 2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> The md_free name is rather misleading, so pick a better one.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (3 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 04/10] md: rename md_free to md_kobj_release Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:02 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
` (4 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
This splits the code into nicely readable chunks and also avoids
the refcount inc/dec manipulations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 84 +++++++++++++++++++++++--------------------------
1 file changed, 39 insertions(+), 45 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 96b4e901ff6b5..c8a5f9340060a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3343,14 +3343,33 @@ rdev_size_show(struct md_rdev *rdev, char *page)
return sprintf(page, "%llu\n", (unsigned long long)rdev->sectors / 2);
}
-static int overlaps(sector_t s1, sector_t l1, sector_t s2, sector_t l2)
+static int md_rdevs_overlap(struct md_rdev *a, struct md_rdev *b)
{
/* check if two start/length pairs overlap */
- if (s1+l1 <= s2)
- return 0;
- if (s2+l2 <= s1)
- return 0;
- return 1;
+ if (a->data_offset + a->sectors <= b->data_offset)
+ return false;
+ if (b->data_offset + b->sectors <= a->data_offset)
+ return false;
+ return true;
+}
+
+static bool md_rdev_overlaps(struct md_rdev *rdev)
+{
+ struct mddev *mddev;
+ struct md_rdev *rdev2;
+
+ spin_lock(&all_mddevs_lock);
+ list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
+ rdev_for_each(rdev2, mddev) {
+ if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
+ md_rdevs_overlap(rdev, rdev2)) {
+ spin_unlock(&all_mddevs_lock);
+ return true;
+ }
+ }
+ }
+ spin_unlock(&all_mddevs_lock);
+ return false;
}
static int strict_blocks_to_sectors(const char *buf, sector_t *sectors)
@@ -3402,46 +3421,21 @@ rdev_size_store(struct md_rdev *rdev, const char *buf, size_t len)
return -EINVAL; /* component must fit device */
rdev->sectors = sectors;
- if (sectors > oldsectors && my_mddev->external) {
- /* Need to check that all other rdevs with the same
- * ->bdev do not overlap. 'rcu' is sufficient to walk
- * the rdev lists safely.
- * This check does not provide a hard guarantee, it
- * just helps avoid dangerous mistakes.
- */
- struct mddev *mddev;
- int overlap = 0;
- struct list_head *tmp;
- rcu_read_lock();
- for_each_mddev(mddev, tmp) {
- struct md_rdev *rdev2;
-
- rdev_for_each(rdev2, mddev)
- if (rdev->bdev == rdev2->bdev &&
- rdev != rdev2 &&
- overlaps(rdev->data_offset, rdev->sectors,
- rdev2->data_offset,
- rdev2->sectors)) {
- overlap = 1;
- break;
- }
- if (overlap) {
- mddev_put(mddev);
- break;
- }
- }
- rcu_read_unlock();
- if (overlap) {
- /* Someone else could have slipped in a size
- * change here, but doing so is just silly.
- * We put oldsectors back because we *know* it is
- * safe, and trust userspace not to race with
- * itself
- */
- rdev->sectors = oldsectors;
- return -EBUSY;
- }
+ /*
+ * Check that all other rdevs with the same bdev do not overlap. This
+ * check does not provide a hard guarantee, it just helps avoid
+ * dangerous mistakes.
+ */
+ if (sectors > oldsectors && my_mddev->external &&
+ md_rdev_overlaps(rdev)) {
+ /*
+ * Someone else could have slipped in a size change here, but
+ * doing so is just silly. We put oldsectors back because we
+ * know it is safe, and trust userspace not to race with itself.
+ */
+ rdev->sectors = oldsectors;
+ return -EBUSY;
}
return len;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
2022-07-18 6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
@ 2022-07-18 7:02 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:02 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> This splits the code into nicely readable chunks and also avoids
> the refcount inc/dec manipulations.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 84 +++++++++++++++++++++++--------------------------
> 1 file changed, 39 insertions(+), 45 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store
2022-07-18 6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
2022-07-18 7:02 ` Hannes Reinecke
@ 2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> This splits the code into nicely readable chunks and also avoids
> the refcount inc/dec manipulations.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (4 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 05/10] md: factor out the rdev overlaps check from rdev_size_store Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:03 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
` (3 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Just do a plain list_for_each that only grabs a mddev reference in
the case where the thread sleeps and restarts the list iteration.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c8a5f9340060a..687f320c7ec4a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8725,7 +8725,6 @@ void md_do_sync(struct md_thread *thread)
unsigned long update_time;
sector_t mark_cnt[SYNC_MARKS];
int last_mark,m;
- struct list_head *tmp;
sector_t last_check;
int skipped = 0;
struct md_rdev *rdev;
@@ -8789,7 +8788,8 @@ void md_do_sync(struct md_thread *thread)
try_again:
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
goto skip;
- for_each_mddev(mddev2, tmp) {
+ spin_lock(&all_mddevs_lock);
+ list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
if (mddev2 == mddev)
continue;
if (!mddev->parallel_resync
@@ -8821,7 +8821,8 @@ void md_do_sync(struct md_thread *thread)
desc, mdname(mddev),
mdname(mddev2));
}
- mddev_put(mddev2);
+ spin_unlock(&all_mddevs_lock);
+
if (signal_pending(current))
flush_signals(current);
schedule();
@@ -8831,6 +8832,7 @@ void md_do_sync(struct md_thread *thread)
finish_wait(&resync_wait, &wq);
}
}
+ spin_unlock(&all_mddevs_lock);
} while (mddev->curr_resync < MD_RESYNC_DELAYED);
j = 0;
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
2022-07-18 6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
@ 2022-07-18 7:03 ` Hannes Reinecke
2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:03 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a plain list_for_each that only grabs a mddev reference in
> the case where the thread sleeps and restarts the list iteration.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 06/10] md: stop using for_each_mddev in md_do_sync
2022-07-18 6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
2022-07-18 7:03 ` Hannes Reinecke
@ 2022-07-18 20:02 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:02 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a plain list_for_each that only grabs a mddev reference in
> the case where the thread sleeps and restarts the list iteration.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (5 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 06/10] md: stop using for_each_mddev in md_do_sync Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:05 ` Hannes Reinecke
2022-07-18 20:03 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
` (2 subsequent siblings)
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
reference when we drop the lock.
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 687f320c7ec4a..44e4071b43148 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9587,11 +9587,13 @@ EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
static int md_notify_reboot(struct notifier_block *this,
unsigned long code, void *x)
{
- struct list_head *tmp;
- struct mddev *mddev;
+ struct mddev *mddev, *n;
int need_delay = 0;
- for_each_mddev(mddev, tmp) {
+ spin_lock(&all_mddevs_lock);
+ list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
+ mddev_get(mddev);
+ spin_unlock(&all_mddevs_lock);
if (mddev_trylock(mddev)) {
if (mddev->pers)
__md_stop_writes(mddev);
@@ -9600,7 +9602,11 @@ static int md_notify_reboot(struct notifier_block *this,
mddev_unlock(mddev);
}
need_delay = 1;
+ mddev_put(mddev);
+ spin_lock(&all_mddevs_lock);
}
+ spin_unlock(&all_mddevs_lock);
+
/*
* certain more exotic SCSI devices are known to be
* volatile wrt too early system reboots. While the
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
2022-07-18 6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
@ 2022-07-18 7:05 ` Hannes Reinecke
2022-07-18 20:03 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:05 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 687f320c7ec4a..44e4071b43148 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9587,11 +9587,13 @@ EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
> static int md_notify_reboot(struct notifier_block *this,
> unsigned long code, void *x)
> {
> - struct list_head *tmp;
> - struct mddev *mddev;
> + struct mddev *mddev, *n;
> int need_delay = 0;
>
> - for_each_mddev(mddev, tmp) {
> + spin_lock(&all_mddevs_lock);
> + list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> + mddev_get(mddev);
> + spin_unlock(&all_mddevs_lock);
> if (mddev_trylock(mddev)) {
> if (mddev->pers)
> __md_stop_writes(mddev);
> @@ -9600,7 +9602,11 @@ static int md_notify_reboot(struct notifier_block *this,
> mddev_unlock(mddev);
> }
> need_delay = 1;
> + mddev_put(mddev);
> + spin_lock(&all_mddevs_lock);
> }
> + spin_unlock(&all_mddevs_lock);
> +
> /*
> * certain more exotic SCSI devices are known to be
> * volatile wrt too early system reboots. While the
I wish 'mddev_get()' could fail if 'mddev->active' is zero when we call
it ... but maybe for another time.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot
2022-07-18 6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
2022-07-18 7:05 ` Hannes Reinecke
@ 2022-07-18 20:03 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:03 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 08/10] md: stop using for_each_mddev in md_exit
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (6 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 07/10] md: stop using for_each_mddev in md_notify_reboot Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:10 ` Hannes Reinecke
2022-07-18 20:03 ` Logan Gunthorpe
2022-07-18 6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
2022-07-18 6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
9 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
reference when we drop the lock and delete the now unused for_each_mddev
macro.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 39 +++++++++++----------------------------
1 file changed, 11 insertions(+), 28 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 44e4071b43148..805f2b4ed9c0d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -368,28 +368,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
static LIST_HEAD(all_mddevs);
static DEFINE_SPINLOCK(all_mddevs_lock);
-/*
- * iterates through all used mddevs in the system.
- * We take care to grab the all_mddevs_lock whenever navigating
- * the list, and to always hold a refcount when unlocked.
- * Any code which breaks out of this loop while own
- * a reference to the current mddev and must mddev_put it.
- */
-#define for_each_mddev(_mddev,_tmp) \
- \
- for (({ spin_lock(&all_mddevs_lock); \
- _tmp = all_mddevs.next; \
- _mddev = NULL;}); \
- ({ if (_tmp != &all_mddevs) \
- mddev_get(list_entry(_tmp, struct mddev, all_mddevs));\
- spin_unlock(&all_mddevs_lock); \
- if (_mddev) mddev_put(_mddev); \
- _mddev = list_entry(_tmp, struct mddev, all_mddevs); \
- _tmp != &all_mddevs;}); \
- ({ spin_lock(&all_mddevs_lock); \
- _tmp = _tmp->next;}) \
- )
-
/* Rather than calling directly into the personality make_request function,
* IO requests come here first so that we can check if the device is
* being suspended pending a reconfiguration.
@@ -9925,8 +9903,7 @@ void md_autostart_arrays(int part)
static __exit void md_exit(void)
{
- struct mddev *mddev;
- struct list_head *tmp;
+ struct mddev *mddev, *n;
int delay = 1;
unregister_blkdev(MD_MAJOR,"md");
@@ -9946,17 +9923,23 @@ static __exit void md_exit(void)
}
remove_proc_entry("mdstat", NULL);
- for_each_mddev(mddev, tmp) {
+ spin_lock(&all_mddevs_lock);
+ list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
+ mddev_get(mddev);
+ spin_unlock(&all_mddevs_lock);
export_array(mddev);
mddev->ctime = 0;
mddev->hold_active = 0;
/*
- * for_each_mddev() will call mddev_put() at the end of each
- * iteration. As the mddev is now fully clear, this will
- * schedule the mddev for destruction by a workqueue, and the
+ * As the mddev is now fully clear, mddev_put will schedule
+ * the mddev for destruction by a workqueue, and the
* destroy_workqueue() below will wait for that to complete.
*/
+ mddev_put(mddev);
+ spin_lock(&all_mddevs_lock);
}
+ spin_unlock(&all_mddevs_lock);
+
destroy_workqueue(md_rdev_misc_wq);
destroy_workqueue(md_misc_wq);
destroy_workqueue(md_wq);
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
2022-07-18 6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
@ 2022-07-18 7:10 ` Hannes Reinecke
2022-07-19 7:03 ` Christoph Hellwig
2022-07-18 20:03 ` Logan Gunthorpe
1 sibling, 1 reply; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:10 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock and delete the now unused for_each_mddev
> macro.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 39 +++++++++++----------------------------
> 1 file changed, 11 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 44e4071b43148..805f2b4ed9c0d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -368,28 +368,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
> static LIST_HEAD(all_mddevs);
> static DEFINE_SPINLOCK(all_mddevs_lock);
>
> -/*
> - * iterates through all used mddevs in the system.
> - * We take care to grab the all_mddevs_lock whenever navigating
> - * the list, and to always hold a refcount when unlocked.
> - * Any code which breaks out of this loop while own
> - * a reference to the current mddev and must mddev_put it.
> - */
> -#define for_each_mddev(_mddev,_tmp) \
> - \
> - for (({ spin_lock(&all_mddevs_lock); \
> - _tmp = all_mddevs.next; \
> - _mddev = NULL;}); \
> - ({ if (_tmp != &all_mddevs) \
> - mddev_get(list_entry(_tmp, struct mddev, all_mddevs));\
> - spin_unlock(&all_mddevs_lock); \
> - if (_mddev) mddev_put(_mddev); \
> - _mddev = list_entry(_tmp, struct mddev, all_mddevs); \
> - _tmp != &all_mddevs;}); \
> - ({ spin_lock(&all_mddevs_lock); \
> - _tmp = _tmp->next;}) \
> - )
> -
> /* Rather than calling directly into the personality make_request function,
> * IO requests come here first so that we can check if the device is
> * being suspended pending a reconfiguration.
> @@ -9925,8 +9903,7 @@ void md_autostart_arrays(int part)
>
> static __exit void md_exit(void)
> {
> - struct mddev *mddev;
> - struct list_head *tmp;
> + struct mddev *mddev, *n;
> int delay = 1;
>
> unregister_blkdev(MD_MAJOR,"md");
> @@ -9946,17 +9923,23 @@ static __exit void md_exit(void)
> }
> remove_proc_entry("mdstat", NULL);
>
> - for_each_mddev(mddev, tmp) {
> + spin_lock(&all_mddevs_lock);
> + list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> + mddev_get(mddev);
> + spin_unlock(&all_mddevs_lock);
> export_array(mddev);
> mddev->ctime = 0;
> mddev->hold_active = 0;
> /*
> - * for_each_mddev() will call mddev_put() at the end of each
> - * iteration. As the mddev is now fully clear, this will
> - * schedule the mddev for destruction by a workqueue, and the
> + * As the mddev is now fully clear, mddev_put will schedule
> + * the mddev for destruction by a workqueue, and the
> * destroy_workqueue() below will wait for that to complete.
> */
> + mddev_put(mddev);
> + spin_lock(&all_mddevs_lock);
> }
> + spin_unlock(&all_mddevs_lock);
> +
> destroy_workqueue(md_rdev_misc_wq);
> destroy_workqueue(md_misc_wq);
> destroy_workqueue(md_wq);
Having thought about it some more ... wouldn't it make sense to modify
mddev_get() to
if (atomic_inc_not_zero(&mddev->active))
return NULL;
to ensure we're not missing any use-after-free issues, which previously
would have been caught by the 'for_each_mddev()' macro?
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
2022-07-18 7:10 ` Hannes Reinecke
@ 2022-07-19 7:03 ` Christoph Hellwig
2022-07-19 7:06 ` Hannes Reinecke
0 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-19 7:03 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Song Liu, Logan Gunthorpe, linux-raid, linux-block
On Mon, Jul 18, 2022 at 09:10:59AM +0200, Hannes Reinecke wrote:
>
> Having thought about it some more ... wouldn't it make sense to modify
> mddev_get() to
>
> if (atomic_inc_not_zero(&mddev->active))
> return NULL;
>
> to ensure we're not missing any use-after-free issues, which previously
> would have been caught by the 'for_each_mddev()' macro?
No. mddev->active = 0 can still be a perfectly valid mddev and they
are not automatically deleted. Check the conditions in mddev_put.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
2022-07-19 7:03 ` Christoph Hellwig
@ 2022-07-19 7:06 ` Hannes Reinecke
0 siblings, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-19 7:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Song Liu, Logan Gunthorpe, linux-raid, linux-block
On 7/19/22 09:03, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 09:10:59AM +0200, Hannes Reinecke wrote:
>>
>> Having thought about it some more ... wouldn't it make sense to modify
>> mddev_get() to
>>
>> if (atomic_inc_not_zero(&mddev->active))
>> return NULL;
>>
>> to ensure we're not missing any use-after-free issues, which previously
>> would have been caught by the 'for_each_mddev()' macro?
>
> No. mddev->active = 0 can still be a perfectly valid mddev and they
> are not automatically deleted. Check the conditions in mddev_put.
Sigh. Okay.
MD is a mess anyway.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 08/10] md: stop using for_each_mddev in md_exit
2022-07-18 6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
2022-07-18 7:10 ` Hannes Reinecke
@ 2022-07-18 20:03 ` Logan Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Logan Gunthorpe @ 2022-07-18 20:03 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: linux-raid, linux-block
On 2022-07-18 00:34, Christoph Hellwig wrote:
> Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a
> reference when we drop the lock and delete the now unused for_each_mddev
> macro.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Logan
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (7 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 08/10] md: stop using for_each_mddev in md_exit Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:17 ` Hannes Reinecke
2022-07-18 6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
9 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
This ensures device names don't get prematurely reused. Instead add a
deleted flag to skip already deleted devices in mddev_get and other
places that only want to see live mddevs.
Reported-by; Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
drivers/md/md.h | 1 +
2 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 805f2b4ed9c0d..08cf21ad4c2d7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
static inline struct mddev *mddev_get(struct mddev *mddev)
{
+ lockdep_assert_held(&all_mddevs_lock);
+
+ if (mddev->deleted)
+ return NULL;
atomic_inc(&mddev->active);
return mddev;
}
@@ -639,7 +643,7 @@ static void mddev_put(struct mddev *mddev)
mddev->ctime == 0 && !mddev->hold_active) {
/* Array is not configured at all, and not held active,
* so destroy it */
- list_del_init(&mddev->all_mddevs);
+ mddev->deleted = true;
/*
* Call queue_work inside the spinlock so that
@@ -719,8 +723,8 @@ static struct mddev *mddev_find(dev_t unit)
spin_lock(&all_mddevs_lock);
mddev = mddev_find_locked(unit);
- if (mddev)
- mddev_get(mddev);
+ if (mddev && !mddev_get(mddev))
+ mddev = NULL;
spin_unlock(&all_mddevs_lock);
return mddev;
@@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
spin_lock(&all_mddevs_lock);
list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
+ if (mddev->deleted)
+ continue;
rdev_for_each(rdev2, mddev) {
if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
md_rdevs_overlap(rdev, rdev2)) {
@@ -5525,11 +5531,10 @@ md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
if (!entry->show)
return -EIO;
spin_lock(&all_mddevs_lock);
- if (list_empty(&mddev->all_mddevs)) {
+ if (!mddev_get(mddev)) {
spin_unlock(&all_mddevs_lock);
return -EBUSY;
}
- mddev_get(mddev);
spin_unlock(&all_mddevs_lock);
rv = entry->show(mddev, page);
@@ -5550,11 +5555,10 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
spin_lock(&all_mddevs_lock);
- if (list_empty(&mddev->all_mddevs)) {
+ if (!mddev_get(mddev)) {
spin_unlock(&all_mddevs_lock);
return -EBUSY;
}
- mddev_get(mddev);
spin_unlock(&all_mddevs_lock);
rv = entry->store(mddev, page, length);
mddev_put(mddev);
@@ -7851,7 +7855,7 @@ static void md_free_disk(struct gendisk *disk)
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
- kfree(mddev);
+ mddev_free(mddev);
}
const struct block_device_operations md_fops =
@@ -8173,6 +8177,8 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
if (!l--) {
mddev = list_entry(tmp, struct mddev, all_mddevs);
mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
return mddev;
}
@@ -8186,25 +8192,35 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
struct list_head *tmp;
struct mddev *next_mddev, *mddev = v;
+ struct mddev *to_put = NULL;
++*pos;
if (v == (void*)2)
return NULL;
spin_lock(&all_mddevs_lock);
- if (v == (void*)1)
+ if (v == (void*)1) {
tmp = all_mddevs.next;
- else
+ } else {
+ to_put = mddev;
tmp = mddev->all_mddevs.next;
- if (tmp != &all_mddevs)
- next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs));
- else {
- next_mddev = (void*)2;
- *pos = 0x10000;
}
+
+ for (;;) {
+ if (tmp == &all_mddevs) {
+ next_mddev = (void*)2;
+ *pos = 0x10000;
+ break;
+ }
+ next_mddev = list_entry(tmp, struct mddev, all_mddevs);
+ if (mddev_get(next_mddev))
+ break;
+ mddev = next_mddev;
+ tmp = mddev->all_mddevs.next;
+ };
spin_unlock(&all_mddevs_lock);
- if (v != (void*)1)
+ if (to_put)
mddev_put(mddev);
return next_mddev;
@@ -8768,6 +8784,8 @@ void md_do_sync(struct md_thread *thread)
goto skip;
spin_lock(&all_mddevs_lock);
list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
+ if (mddev2->deleted)
+ continue;
if (mddev2 == mddev)
continue;
if (!mddev->parallel_resync
@@ -9570,7 +9588,8 @@ static int md_notify_reboot(struct notifier_block *this,
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
if (mddev_trylock(mddev)) {
if (mddev->pers)
@@ -9925,7 +9944,8 @@ static __exit void md_exit(void)
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
export_array(mddev);
mddev->ctime = 0;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1a85dbe78a71c..bc870e1f1e8c2 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -503,6 +503,7 @@ struct mddev {
atomic_t max_corr_read_errors; /* max read retries */
struct list_head all_mddevs;
+ bool deleted;
const struct attribute_group *to_remove;
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-18 6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
@ 2022-07-18 7:17 ` Hannes Reinecke
2022-07-18 18:20 ` Song Liu
2022-07-19 7:14 ` Christoph Hellwig
0 siblings, 2 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:17 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> This ensures device names don't get prematurely reused. Instead add a
> deleted flag to skip already deleted devices in mddev_get and other
> places that only want to see live mddevs.
>
> Reported-by; Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
> drivers/md/md.h | 1 +
> 2 files changed, 39 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 805f2b4ed9c0d..08cf21ad4c2d7 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
>
> static inline struct mddev *mddev_get(struct mddev *mddev)
> {
> + lockdep_assert_held(&all_mddevs_lock);
> +
> + if (mddev->deleted)
> + return NULL;
> atomic_inc(&mddev->active);
> return mddev;
> }
Why can't we use 'atomic_inc_unless_zero' here and do away with the
additional 'deleted' flag?
> @@ -639,7 +643,7 @@ static void mddev_put(struct mddev *mddev)
> mddev->ctime == 0 && !mddev->hold_active) {
> /* Array is not configured at all, and not held active,
> * so destroy it */
> - list_del_init(&mddev->all_mddevs);
> + mddev->deleted = true;
>
> /*
> * Call queue_work inside the spinlock so that
> @@ -719,8 +723,8 @@ static struct mddev *mddev_find(dev_t unit)
>
> spin_lock(&all_mddevs_lock);
> mddev = mddev_find_locked(unit);
> - if (mddev)
> - mddev_get(mddev);
> + if (mddev && !mddev_get(mddev))
> + mddev = NULL;
> spin_unlock(&all_mddevs_lock);
>
> return mddev;
> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>
> spin_lock(&all_mddevs_lock);
> list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
> + if (mddev->deleted)
> + continue;
Any particular reason why you can't use the 'mddev_get()' /
'mddev_put()' sequence here?
After all, I don't think that 'mddev' should vanish while being in this
loop, no?
> rdev_for_each(rdev2, mddev) {
> if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
> md_rdevs_overlap(rdev, rdev2)) {
> @@ -5525,11 +5531,10 @@ md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
> if (!entry->show)
> return -EIO;
> spin_lock(&all_mddevs_lock);
> - if (list_empty(&mddev->all_mddevs)) {
> + if (!mddev_get(mddev)) { > spin_unlock(&all_mddevs_lock);
> return -EBUSY;
> }
> - mddev_get(mddev);
> spin_unlock(&all_mddevs_lock);
>
> rv = entry->show(mddev, page);
> @@ -5550,11 +5555,10 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
> spin_lock(&all_mddevs_lock);
> - if (list_empty(&mddev->all_mddevs)) {
> + if (!mddev_get(mddev)) {
> spin_unlock(&all_mddevs_lock);
> return -EBUSY;
> }
> - mddev_get(mddev);
> spin_unlock(&all_mddevs_lock);
> rv = entry->store(mddev, page, length);
> mddev_put(mddev);
> @@ -7851,7 +7855,7 @@ static void md_free_disk(struct gendisk *disk)
> bioset_exit(&mddev->bio_set);
> bioset_exit(&mddev->sync_set);
>
> - kfree(mddev);
> + mddev_free(mddev);
> }
>
> const struct block_device_operations md_fops =
> @@ -8173,6 +8177,8 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
> if (!l--) {
> mddev = list_entry(tmp, struct mddev, all_mddevs);
> mddev_get(mddev);
> + if (!mddev_get(mddev))
> + continue;
> spin_unlock(&all_mddevs_lock);
> return mddev;
> }
> @@ -8186,25 +8192,35 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> {
> struct list_head *tmp;
> struct mddev *next_mddev, *mddev = v;
> + struct mddev *to_put = NULL;
>
> ++*pos;
> if (v == (void*)2)
> return NULL;
>
> spin_lock(&all_mddevs_lock);
> - if (v == (void*)1)
> + if (v == (void*)1) {
> tmp = all_mddevs.next;
> - else
> + } else {
> + to_put = mddev;
> tmp = mddev->all_mddevs.next;
> - if (tmp != &all_mddevs)
> - next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs));
> - else {
> - next_mddev = (void*)2;
> - *pos = 0x10000;
> }
> +
> + for (;;) {
> + if (tmp == &all_mddevs) {
> + next_mddev = (void*)2;
> + *pos = 0x10000;
> + break;
> + }
> + next_mddev = list_entry(tmp, struct mddev, all_mddevs);
> + if (mddev_get(next_mddev))
> + break;
> + mddev = next_mddev;
> + tmp = mddev->all_mddevs.next;
> + };
> spin_unlock(&all_mddevs_lock);
>
> - if (v != (void*)1)
> + if (to_put)
> mddev_put(mddev);
> return next_mddev;
>
> @@ -8768,6 +8784,8 @@ void md_do_sync(struct md_thread *thread)
> goto skip;
> spin_lock(&all_mddevs_lock);
> list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
> + if (mddev2->deleted)
> + continue;
> if (mddev2 == mddev)
> continue;
> if (!mddev->parallel_resync
> @@ -9570,7 +9588,8 @@ static int md_notify_reboot(struct notifier_block *this,
>
> spin_lock(&all_mddevs_lock);
> list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> - mddev_get(mddev);
> + if (!mddev_get(mddev))
> + continue;
> spin_unlock(&all_mddevs_lock);
> if (mddev_trylock(mddev)) {
> if (mddev->pers)
> @@ -9925,7 +9944,8 @@ static __exit void md_exit(void)
>
> spin_lock(&all_mddevs_lock);
> list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
> - mddev_get(mddev);
> + if (!mddev_get(mddev))
> + continue;
> spin_unlock(&all_mddevs_lock);
> export_array(mddev);
> mddev->ctime = 0;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1a85dbe78a71c..bc870e1f1e8c2 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -503,6 +503,7 @@ struct mddev {
>
> atomic_t max_corr_read_errors; /* max read retries */
> struct list_head all_mddevs;
> + bool deleted;
>
> const struct attribute_group *to_remove;
>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-18 7:17 ` Hannes Reinecke
@ 2022-07-18 18:20 ` Song Liu
2022-07-19 5:06 ` Christoph Hellwig
2022-07-19 7:14 ` Christoph Hellwig
1 sibling, 1 reply; 39+ messages in thread
From: Song Liu @ 2022-07-18 18:20 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Logan Gunthorpe, linux-raid, linux-block
On Mon, Jul 18, 2022 at 12:17 AM Hannes Reinecke <hare@suse.de> wrote:
>
> On 7/18/22 08:34, Christoph Hellwig wrote:
> > This ensures device names don't get prematurely reused. Instead add a
> > deleted flag to skip already deleted devices in mddev_get and other
> > places that only want to see live mddevs.
> >
> > Reported-by; Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
> > drivers/md/md.h | 1 +
> > 2 files changed, 39 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 805f2b4ed9c0d..08cf21ad4c2d7 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
> >
> > static inline struct mddev *mddev_get(struct mddev *mddev)
> > {
> > + lockdep_assert_held(&all_mddevs_lock);
> > +
> > + if (mddev->deleted)
> > + return NULL;
> > atomic_inc(&mddev->active);
> > return mddev;
> > }
>
> Why can't we use 'atomic_inc_unless_zero' here and do away with the
> additional 'deleted' flag?
Thanks Christoph for the set. And thanks Hannes for the quick review!
I think atomic_inc_unless_zero() should work here. Alternatively, we can
also grab a big from mddev->flags as MD_DELETED.
Thanks,
Song
[...]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-18 18:20 ` Song Liu
@ 2022-07-19 5:06 ` Christoph Hellwig
2022-07-19 6:28 ` Song Liu
2022-07-19 7:00 ` Hannes Reinecke
0 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-19 5:06 UTC (permalink / raw)
To: Song Liu
Cc: Hannes Reinecke, Christoph Hellwig, Logan Gunthorpe, linux-raid,
linux-block
On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
> > > static inline struct mddev *mddev_get(struct mddev *mddev)
> > > {
> > > + lockdep_assert_held(&all_mddevs_lock);
> > > +
> > > + if (mddev->deleted)
> > > + return NULL;
> > > atomic_inc(&mddev->active);
> > > return mddev;
> > > }
> >
> > Why can't we use 'atomic_inc_unless_zero' here and do away with the
> > additional 'deleted' flag?
> I think atomic_inc_unless_zero() should work here. Alternatively, we can
> also grab a big from mddev->flags as MD_DELETED.
s/big/bit/ ?
Yes, this could use mddev->flags. I don't think atomic_inc_unless_zero
would work, as an array can have a refcount of 0 but we still allow
to take new references to it, the deletion only happens if the other
conditions in mddev_put are met.
Do you want me to repin the series using a flag?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-19 5:06 ` Christoph Hellwig
@ 2022-07-19 6:28 ` Song Liu
2022-07-19 7:00 ` Hannes Reinecke
1 sibling, 0 replies; 39+ messages in thread
From: Song Liu @ 2022-07-19 6:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Hannes Reinecke, Logan Gunthorpe, linux-raid, linux-block
On Mon, Jul 18, 2022 at 10:07 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
> > > > static inline struct mddev *mddev_get(struct mddev *mddev)
> > > > {
> > > > + lockdep_assert_held(&all_mddevs_lock);
> > > > +
> > > > + if (mddev->deleted)
> > > > + return NULL;
> > > > atomic_inc(&mddev->active);
> > > > return mddev;
> > > > }
> > >
> > > Why can't we use 'atomic_inc_unless_zero' here and do away with the
> > > additional 'deleted' flag?
>
> > I think atomic_inc_unless_zero() should work here. Alternatively, we can
> > also grab a big from mddev->flags as MD_DELETED.
>
> s/big/bit/ ?
yeah, bit...
>
> Yes, this could use mddev->flags. I don't think atomic_inc_unless_zero
> would work, as an array can have a refcount of 0 but we still allow
> to take new references to it, the deletion only happens if the other
> conditions in mddev_put are met.
>
> Do you want me to repin the series using a flag?
Respin the series sounds good. Please also address Hannes' feedback
on 8/10.
Thanks,
Song
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-19 5:06 ` Christoph Hellwig
2022-07-19 6:28 ` Song Liu
@ 2022-07-19 7:00 ` Hannes Reinecke
1 sibling, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-19 7:00 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/19/22 07:06, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 11:20:39AM -0700, Song Liu wrote:
>>>> static inline struct mddev *mddev_get(struct mddev *mddev)
>>>> {
>>>> + lockdep_assert_held(&all_mddevs_lock);
>>>> +
>>>> + if (mddev->deleted)
>>>> + return NULL;
>>>> atomic_inc(&mddev->active);
>>>> return mddev;
>>>> }
>>>
>>> Why can't we use 'atomic_inc_unless_zero' here and do away with the
>>> additional 'deleted' flag?
>
>> I think atomic_inc_unless_zero() should work here. Alternatively, we can
>> also grab a big from mddev->flags as MD_DELETED.
>
> s/big/bit/ ?
>
> Yes, this could use mddev->flags. I don't think atomic_inc_unless_zero
> would work, as an array can have a refcount of 0 but we still allow
> to take new references to it, the deletion only happens if the other
> conditions in mddev_put are met.
>
> Do you want me to repin the series using a flag?
I would vote for it.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-18 7:17 ` Hannes Reinecke
2022-07-18 18:20 ` Song Liu
@ 2022-07-19 7:14 ` Christoph Hellwig
2022-07-19 7:59 ` Hannes Reinecke
1 sibling, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-19 7:14 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Song Liu, Logan Gunthorpe, linux-raid, linux-block
On Mon, Jul 18, 2022 at 09:17:42AM +0200, Hannes Reinecke wrote:
> Why can't we use 'atomic_inc_unless_zero' here and do away with the
> additional 'deleted' flag?
See my previous reply.
>> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>> spin_lock(&all_mddevs_lock);
>> list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
>> + if (mddev->deleted)
>> + continue;
>
> Any particular reason why you can't use the 'mddev_get()' / 'mddev_put()'
> sequence here?
Mostly because it would be pretty mess. mdev_put takes all_mddevs_lock,
so we'd have to add an unlock/relock cycle for each loop iteration.
> After all, I don't think that 'mddev' should vanish while being in this
> loop, no?
It won't, at least without the call to mddev_put. Once mddev_put is
in the game things aren't that easy, and I suspect the exising and
new code might have bugs in that area in the reboot notifier and
md_exit for extreme corner cases.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-19 7:14 ` Christoph Hellwig
@ 2022-07-19 7:59 ` Hannes Reinecke
0 siblings, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-19 7:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Song Liu, Logan Gunthorpe, linux-raid, linux-block
On 7/19/22 09:14, Christoph Hellwig wrote:
> On Mon, Jul 18, 2022 at 09:17:42AM +0200, Hannes Reinecke wrote:
>> Why can't we use 'atomic_inc_unless_zero' here and do away with the
>> additional 'deleted' flag?
>
> See my previous reply.
>
>>> @@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
>>> spin_lock(&all_mddevs_lock);
>>> list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
>>> + if (mddev->deleted)
>>> + continue;
>>
>> Any particular reason why you can't use the 'mddev_get()' / 'mddev_put()'
>> sequence here?
>
> Mostly because it would be pretty mess. mdev_put takes all_mddevs_lock,
> so we'd have to add an unlock/relock cycle for each loop iteration.
>
>> After all, I don't think that 'mddev' should vanish while being in this
>> loop, no?
>
> It won't, at least without the call to mddev_put. Once mddev_put is
> in the game things aren't that easy, and I suspect the exising and
> new code might have bugs in that area in the reboot notifier and
> md_exit for extreme corner cases.
And again, MD mess.
But thanks for start clearing it up.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 10/10] md: simplify md_open
2022-07-18 6:34 fix md disk_name lifetime problems v3 Christoph Hellwig
` (8 preceding siblings ...)
2022-07-18 6:34 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
@ 2022-07-18 6:34 ` Christoph Hellwig
2022-07-18 7:19 ` Hannes Reinecke
9 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-18 6:34 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
Now that devices are on the all_mddevs list until the gendisk is freed,
there can't be any duplicates. Remove the global list lookup and just
grab a reference.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 42 +++++++++++++++---------------------------
1 file changed, 15 insertions(+), 27 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08cf21ad4c2d7..9b1e61b4bac8b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7785,45 +7785,33 @@ static int md_set_read_only(struct block_device *bdev, bool ro)
static int md_open(struct block_device *bdev, fmode_t mode)
{
- /*
- * Succeed if we can lock the mddev, which confirms that
- * it isn't being stopped right now.
- */
- struct mddev *mddev = mddev_find(bdev->bd_dev);
+ struct mddev *mddev;
int err;
+ spin_lock(&all_mddevs_lock);
+ mddev = mddev_get(bdev->bd_disk->private_data);
+ spin_unlock(&all_mddevs_lock);
if (!mddev)
return -ENODEV;
- if (mddev->gendisk != bdev->bd_disk) {
- /* we are racing with mddev_put which is discarding this
- * bd_disk.
- */
- mddev_put(mddev);
- /* Wait until bdev->bd_disk is definitely gone */
- if (work_pending(&mddev->del_work))
- flush_workqueue(md_misc_wq);
- return -EBUSY;
- }
- BUG_ON(mddev != bdev->bd_disk->private_data);
-
- if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
+ err = mutex_lock_interruptible(&mddev->open_mutex);
+ if (err)
goto out;
- if (test_bit(MD_CLOSING, &mddev->flags)) {
- mutex_unlock(&mddev->open_mutex);
- err = -ENODEV;
- goto out;
- }
+ err = -ENODEV;
+ if (test_bit(MD_CLOSING, &mddev->flags))
+ goto out_unlock;
- err = 0;
atomic_inc(&mddev->openers);
mutex_unlock(&mddev->open_mutex);
bdev_check_media_change(bdev);
- out:
- if (err)
- mddev_put(mddev);
+ return 0;
+
+out_unlock:
+ mutex_unlock(&mddev->open_mutex);
+out:
+ mddev_put(mddev);
return err;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 10/10] md: simplify md_open
2022-07-18 6:34 ` [PATCH 10/10] md: simplify md_open Christoph Hellwig
@ 2022-07-18 7:19 ` Hannes Reinecke
0 siblings, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-18 7:19 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/18/22 08:34, Christoph Hellwig wrote:
> Now that devices are on the all_mddevs list until the gendisk is freed,
> there can't be any duplicates. Remove the global list lookup and just
> grab a reference.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 42 +++++++++++++++---------------------------
> 1 file changed, 15 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 08cf21ad4c2d7..9b1e61b4bac8b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7785,45 +7785,33 @@ static int md_set_read_only(struct block_device *bdev, bool ro)
>
> static int md_open(struct block_device *bdev, fmode_t mode)
> {
> - /*
> - * Succeed if we can lock the mddev, which confirms that
> - * it isn't being stopped right now.
> - */
> - struct mddev *mddev = mddev_find(bdev->bd_dev);
> + struct mddev *mddev;
> int err;
>
> + spin_lock(&all_mddevs_lock);
> + mddev = mddev_get(bdev->bd_disk->private_data);
> + spin_unlock(&all_mddevs_lock);
> if (!mddev)
> return -ENODEV;
>
> - if (mddev->gendisk != bdev->bd_disk) {
> - /* we are racing with mddev_put which is discarding this
> - * bd_disk.
> - */
> - mddev_put(mddev);
> - /* Wait until bdev->bd_disk is definitely gone */
> - if (work_pending(&mddev->del_work))
> - flush_workqueue(md_misc_wq);
> - return -EBUSY;
> - }
> - BUG_ON(mddev != bdev->bd_disk->private_data);
> -
> - if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
> + err = mutex_lock_interruptible(&mddev->open_mutex);
> + if (err)
> goto out;
>
> - if (test_bit(MD_CLOSING, &mddev->flags)) {
> - mutex_unlock(&mddev->open_mutex);
> - err = -ENODEV;
> - goto out;
> - }
> + err = -ENODEV;
> + if (test_bit(MD_CLOSING, &mddev->flags))
> + goto out_unlock;
>
> - err = 0;
> atomic_inc(&mddev->openers);
> mutex_unlock(&mddev->open_mutex);
>
> bdev_check_media_change(bdev);
> - out:
> - if (err)
> - mddev_put(mddev);
> + return 0;
> +
> +out_unlock:
> + mutex_unlock(&mddev->open_mutex);
> +out:
> + mddev_put(mddev);
> return err;
> }
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-19 9:18 fix md disk_name lifetime problems v4 Christoph Hellwig
@ 2022-07-19 9:18 ` Christoph Hellwig
2022-07-19 9:25 ` Hannes Reinecke
0 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-07-19 9:18 UTC (permalink / raw)
To: Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
This ensures device names don't get prematurely reused. Instead add a
deleted flag to skip already deleted devices in mddev_get and other
places that only want to see live mddevs.
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
drivers/md/md.h | 2 ++
2 files changed, 40 insertions(+), 18 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ee3e9ba03fd25..6180d0a6b926d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -625,6 +625,10 @@ EXPORT_SYMBOL(md_flush_request);
static inline struct mddev *mddev_get(struct mddev *mddev)
{
+ lockdep_assert_held(&all_mddevs_lock);
+
+ if (test_bit(MD_DELETED, &mddev->flags))
+ return NULL;
atomic_inc(&mddev->active);
return mddev;
}
@@ -639,7 +643,7 @@ static void mddev_put(struct mddev *mddev)
mddev->ctime == 0 && !mddev->hold_active) {
/* Array is not configured at all, and not held active,
* so destroy it */
- list_del_init(&mddev->all_mddevs);
+ set_bit(MD_DELETED, &mddev->flags);
/*
* Call queue_work inside the spinlock so that
@@ -719,8 +723,8 @@ static struct mddev *mddev_find(dev_t unit)
spin_lock(&all_mddevs_lock);
mddev = mddev_find_locked(unit);
- if (mddev)
- mddev_get(mddev);
+ if (mddev && !mddev_get(mddev))
+ mddev = NULL;
spin_unlock(&all_mddevs_lock);
return mddev;
@@ -3338,6 +3342,8 @@ static bool md_rdev_overlaps(struct md_rdev *rdev)
spin_lock(&all_mddevs_lock);
list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
+ if (test_bit(MD_DELETED, &mddev->flags))
+ continue;
rdev_for_each(rdev2, mddev) {
if (rdev != rdev2 && rdev->bdev == rdev2->bdev &&
md_rdevs_overlap(rdev, rdev2)) {
@@ -5525,11 +5531,10 @@ md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
if (!entry->show)
return -EIO;
spin_lock(&all_mddevs_lock);
- if (list_empty(&mddev->all_mddevs)) {
+ if (!mddev_get(mddev)) {
spin_unlock(&all_mddevs_lock);
return -EBUSY;
}
- mddev_get(mddev);
spin_unlock(&all_mddevs_lock);
rv = entry->show(mddev, page);
@@ -5550,11 +5555,10 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
spin_lock(&all_mddevs_lock);
- if (list_empty(&mddev->all_mddevs)) {
+ if (!mddev_get(mddev)) {
spin_unlock(&all_mddevs_lock);
return -EBUSY;
}
- mddev_get(mddev);
spin_unlock(&all_mddevs_lock);
rv = entry->store(mddev, page, length);
mddev_put(mddev);
@@ -7849,7 +7853,7 @@ static void md_free_disk(struct gendisk *disk)
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
- kfree(mddev);
+ mddev_free(mddev);
}
const struct block_device_operations md_fops =
@@ -8171,6 +8175,8 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
if (!l--) {
mddev = list_entry(tmp, struct mddev, all_mddevs);
mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
return mddev;
}
@@ -8184,25 +8190,35 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
struct list_head *tmp;
struct mddev *next_mddev, *mddev = v;
+ struct mddev *to_put = NULL;
++*pos;
if (v == (void*)2)
return NULL;
spin_lock(&all_mddevs_lock);
- if (v == (void*)1)
+ if (v == (void*)1) {
tmp = all_mddevs.next;
- else
+ } else {
+ to_put = mddev;
tmp = mddev->all_mddevs.next;
- if (tmp != &all_mddevs)
- next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs));
- else {
- next_mddev = (void*)2;
- *pos = 0x10000;
}
+
+ for (;;) {
+ if (tmp == &all_mddevs) {
+ next_mddev = (void*)2;
+ *pos = 0x10000;
+ break;
+ }
+ next_mddev = list_entry(tmp, struct mddev, all_mddevs);
+ if (mddev_get(next_mddev))
+ break;
+ mddev = next_mddev;
+ tmp = mddev->all_mddevs.next;
+ };
spin_unlock(&all_mddevs_lock);
- if (v != (void*)1)
+ if (to_put)
mddev_put(mddev);
return next_mddev;
@@ -8766,6 +8782,8 @@ void md_do_sync(struct md_thread *thread)
goto skip;
spin_lock(&all_mddevs_lock);
list_for_each_entry(mddev2, &all_mddevs, all_mddevs) {
+ if (test_bit(MD_DELETED, &mddev2->flags))
+ continue;
if (mddev2 == mddev)
continue;
if (!mddev->parallel_resync
@@ -9568,7 +9586,8 @@ static int md_notify_reboot(struct notifier_block *this,
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
if (mddev_trylock(mddev)) {
if (mddev->pers)
@@ -9923,7 +9942,8 @@ static __exit void md_exit(void)
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- mddev_get(mddev);
+ if (!mddev_get(mddev))
+ continue;
spin_unlock(&all_mddevs_lock);
export_array(mddev);
mddev->ctime = 0;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1a85dbe78a71c..6d1e526065cca 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -254,6 +254,7 @@ struct md_cluster_info;
* @MD_NOT_READY: do_md_run() is active, so 'array_state', ust not report that
* array is ready yet.
* @MD_BROKEN: This is used to stop writes and mark array as failed.
+ * @MD_DELETED: This device is being deleted
*
* change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
*/
@@ -270,6 +271,7 @@ enum mddev_flags {
MD_UPDATING_SB,
MD_NOT_READY,
MD_BROKEN,
+ MD_DELETED,
};
enum mddev_sb_flags {
--
2.30.2
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed
2022-07-19 9:18 ` [PATCH 09/10] md: only delete entries from all_mddevs when the disk is freed Christoph Hellwig
@ 2022-07-19 9:25 ` Hannes Reinecke
0 siblings, 0 replies; 39+ messages in thread
From: Hannes Reinecke @ 2022-07-19 9:25 UTC (permalink / raw)
To: Christoph Hellwig, Song Liu; +Cc: Logan Gunthorpe, linux-raid, linux-block
On 7/19/22 11:18, Christoph Hellwig wrote:
> This ensures device names don't get prematurely reused. Instead add a
> deleted flag to skip already deleted devices in mddev_get and other
> places that only want to see live mddevs.
>
> Reported-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 56 +++++++++++++++++++++++++++++++++----------------
> drivers/md/md.h | 2 ++
> 2 files changed, 40 insertions(+), 18 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
^ permalink raw reply [flat|nested] 39+ messages in thread