linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* split mddev_find and don't create new instance in ->open
@ 2021-04-03 16:15 Christoph Hellwig
  2021-04-03 16:15 ` [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-04-03 16:15 UTC (permalink / raw)
  To: song, Zhao Heming; +Cc: lidong.zhong, linux-raid


Hi Song,

this series split mddev_find so that no new instances get created
in ->open.  This was originally part of a larger cleanup, but it
turns out to actually fix a bug found by Zhao.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find
  2021-04-03 16:15 split mddev_find and don't create new instance in ->open Christoph Hellwig
@ 2021-04-03 16:15 ` Christoph Hellwig
  2021-04-04  2:24   ` heming.zhao
  2021-04-03 16:15 ` [PATCH 2/2] md: split mddev_find Christoph Hellwig
  2021-04-05 17:11 ` split mddev_find and don't create new instance in ->open Song Liu
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2021-04-03 16:15 UTC (permalink / raw)
  To: song, Zhao Heming; +Cc: lidong.zhong, linux-raid

Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 368cad6cd53a6e..5692427e78ba37 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -734,6 +734,17 @@ void mddev_init(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(mddev_init);
 
+static struct mddev *mddev_find_locked(dev_t unit)
+{
+	struct mddev *mddev;
+
+	list_for_each_entry(mddev, &all_mddevs, all_mddevs)
+		if (mddev->unit == unit)
+			return mddev;
+
+	return NULL;
+}
+
 static struct mddev *mddev_find(dev_t unit)
 {
 	struct mddev *mddev, *new = NULL;
@@ -745,13 +756,13 @@ static struct mddev *mddev_find(dev_t unit)
 	spin_lock(&all_mddevs_lock);
 
 	if (unit) {
-		list_for_each_entry(mddev, &all_mddevs, all_mddevs)
-			if (mddev->unit == unit) {
-				mddev_get(mddev);
-				spin_unlock(&all_mddevs_lock);
-				kfree(new);
-				return mddev;
-			}
+		mddev = mddev_find_locked(unit);
+		if (mddev) {
+			mddev_get(mddev);
+			spin_unlock(&all_mddevs_lock);
+			kfree(new);
+			return mddev;
+		}
 
 		if (new) {
 			list_add(&new->all_mddevs, &all_mddevs);
@@ -777,12 +788,7 @@ static struct mddev *mddev_find(dev_t unit)
 				return NULL;
 			}
 
-			is_free = 1;
-			list_for_each_entry(mddev, &all_mddevs, all_mddevs)
-				if (mddev->unit == dev) {
-					is_free = 0;
-					break;
-				}
+			is_free = !mddev_find_locked(dev);
 		}
 		new->unit = dev;
 		new->md_minor = MINOR(dev);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] md: split mddev_find
  2021-04-03 16:15 split mddev_find and don't create new instance in ->open Christoph Hellwig
  2021-04-03 16:15 ` [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find Christoph Hellwig
@ 2021-04-03 16:15 ` Christoph Hellwig
  2021-04-04  2:32   ` heming.zhao
  2021-04-05 17:11 ` split mddev_find and don't create new instance in ->open Song Liu
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2021-04-03 16:15 UTC (permalink / raw)
  To: song, Zhao Heming; +Cc: lidong.zhong, linux-raid

Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by
Zhao Heming <heming.zhao@suse.com>:

------------------------------ snip ------------------------------
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger 1 time with 10 tests

```
1  node1="15sp3-mdcluster1"
2  node2="15sp3-mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..100}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
```

I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

```
ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
 #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
 #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
 #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
 #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
 #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
 #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c

or:
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

```
<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
```

In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------

Fixes: d3374825ce57 ("md: make devices disappear when they are no longer needed.")
Reported-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5692427e78ba37..08d24177bbd4a4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -746,6 +746,22 @@ static struct mddev *mddev_find_locked(dev_t unit)
 }
 
 static struct mddev *mddev_find(dev_t unit)
+{
+	struct mddev *mddev;
+
+	if (MAJOR(unit) != MD_MAJOR)
+		unit &= ~((1 << MdpMinorShift) - 1);
+
+	spin_lock(&all_mddevs_lock);
+	mddev = mddev_find_locked(unit);
+	if (mddev)
+		mddev_get(mddev);
+	spin_unlock(&all_mddevs_lock);
+
+	return mddev;
+}
+
+static struct mddev *mddev_find_or_alloc(dev_t unit)
 {
 	struct mddev *mddev, *new = NULL;
 
@@ -5650,7 +5666,7 @@ static int md_alloc(dev_t dev, char *name)
 	 * writing to /sys/module/md_mod/parameters/new_array.
 	 */
 	static DEFINE_MUTEX(disks_mutex);
-	struct mddev *mddev = mddev_find(dev);
+	struct mddev *mddev = mddev_find_or_alloc(dev);
 	struct gendisk *disk;
 	int partitioned;
 	int shift;
@@ -6530,11 +6546,9 @@ static void autorun_devices(int part)
 
 		md_probe(dev);
 		mddev = mddev_find(dev);
-		if (!mddev || !mddev->gendisk) {
-			if (mddev)
-				mddev_put(mddev);
+		if (!mddev)
 			break;
-		}
+
 		if (mddev_lock(mddev))
 			pr_warn("md: %s locked, cannot run\n", mdname(mddev));
 		else if (mddev->raid_disks || mddev->major_version
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find
  2021-04-03 16:15 ` [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find Christoph Hellwig
@ 2021-04-04  2:24   ` heming.zhao
  2021-04-04  6:27     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: heming.zhao @ 2021-04-04  2:24 UTC (permalink / raw)
  To: Christoph Hellwig, song; +Cc: lidong.zhong, linux-raid

On 4/4/21 12:15 AM, Christoph Hellwig wrote:
> Factor out a self-contained helper to just lookup a mddev by the dev_t
> "unit".
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 32 +++++++++++++++++++-------------
>   1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 368cad6cd53a6e..5692427e78ba37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -734,6 +734,17 @@ void mddev_init(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL_GPL(mddev_init);
>   

I didn't know if I have right to give this patch "Reviewed-by: Heming Zhao <heming.zhao@suse.com>"
But, this patch looks good to me.

Thanks,
Heming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] md: split mddev_find
  2021-04-03 16:15 ` [PATCH 2/2] md: split mddev_find Christoph Hellwig
@ 2021-04-04  2:32   ` heming.zhao
  0 siblings, 0 replies; 7+ messages in thread
From: heming.zhao @ 2021-04-04  2:32 UTC (permalink / raw)
  To: Christoph Hellwig, song; +Cc: lidong.zhong, linux-raid

On 4/4/21 12:15 AM, Christoph Hellwig wrote:
> Split mddev_find into a simple mddev_find that just finds an existing
> mddev by the unit number, and a more complicated mddev_find that deals
> with find or allocating a mddev.
> 
> ... ...
> Fixes: d3374825ce57 ("md: make devices disappear when they are no longer needed.")
> Reported-by: Zhao Heming <heming.zhao@suse.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/md/md.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> ... ...
> @@ -6530,11 +6546,9 @@ static void autorun_devices(int part)
>   
>   		md_probe(dev);
>   		mddev = mddev_find(dev);
> -		if (!mddev || !mddev->gendisk) {
> -			if (mddev)
> -				mddev_put(mddev);
> +		if (!mddev)
>   			break;
> -		}
> +
>   		if (mddev_lock(mddev))
>   			pr_warn("md: %s locked, cannot run\n", mdname(mddev));
>   		else if (mddev->raid_disks || mddev->major_version
> 

autorun_devices use md_probe (when create_on_open is ture) to create mddev & gendisk,
mddev_find is only need to do finding job there.

looks good to me.
Reviewed-by: Heming Zhao <heming.zhao@suse.com>

Thanks,
Heming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find
  2021-04-04  2:24   ` heming.zhao
@ 2021-04-04  6:27     ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-04-04  6:27 UTC (permalink / raw)
  To: heming.zhao; +Cc: Christoph Hellwig, song, lidong.zhong, linux-raid

On Sun, Apr 04, 2021 at 10:24:36AM +0800, heming.zhao@suse.com wrote:
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 368cad6cd53a6e..5692427e78ba37 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -734,6 +734,17 @@ void mddev_init(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL_GPL(mddev_init);
>>   
>
> I didn't know if I have right to give this patch "Reviewed-by: Heming Zhao <heming.zhao@suse.com>"

You always do!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: split mddev_find and don't create new instance in ->open
  2021-04-03 16:15 split mddev_find and don't create new instance in ->open Christoph Hellwig
  2021-04-03 16:15 ` [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find Christoph Hellwig
  2021-04-03 16:15 ` [PATCH 2/2] md: split mddev_find Christoph Hellwig
@ 2021-04-05 17:11 ` Song Liu
  2 siblings, 0 replies; 7+ messages in thread
From: Song Liu @ 2021-04-05 17:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Zhao Heming, lidong.zhong, linux-raid

On Sat, Apr 3, 2021 at 9:15 AM Christoph Hellwig <hch@lst.de> wrote:
>
>
> Hi Song,
>
> this series split mddev_find so that no new instances get created
> in ->open.  This was originally part of a larger cleanup, but it
> turns out to actually fix a bug found by Zhao.

Applied this set and Heming's patch to md-next.

Thanks Christoph and Heming!
Song

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-05 17:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-03 16:15 split mddev_find and don't create new instance in ->open Christoph Hellwig
2021-04-03 16:15 ` [PATCH 1/2] md: factor out a mddev_find_locked helper from mddev_find Christoph Hellwig
2021-04-04  2:24   ` heming.zhao
2021-04-04  6:27     ` Christoph Hellwig
2021-04-03 16:15 ` [PATCH 2/2] md: split mddev_find Christoph Hellwig
2021-04-04  2:32   ` heming.zhao
2021-04-05 17:11 ` split mddev_find and don't create new instance in ->open Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).