* [PATCH] md: md_open returns -EBUSY when entering racing area
@ 2021-04-03 3:01 Zhao Heming
2021-04-03 16:11 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Zhao Heming @ 2021-04-03 3:01 UTC (permalink / raw)
To: linux-raid, song, hch
Cc: Zhao Heming, guoqing.jiang, lidong.zhong, xni, neilb, colyli
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.
This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).
For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.
*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt
*** script ***
about trigger every time with below script
```
1 node1="mdcluster1"
2 node2="mdcluster2"
3
4 mdadm -Ss
5 ssh ${node2} "mdadm -Ss"
6 wipefs -a /dev/sda /dev/sdb
7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
/dev/sdb --assume-clean
8
9 for i in {1..10}; do
10 echo ==== $i ====;
11
12 echo "test ...."
13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14 sleep 1
15
16 echo "clean ....."
17 ssh ${node2} "mdadm -Ss"
18 done
```
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.
*** stack ***
```
[ 884.226509] mddev_put+0x1c/0xe0 [md_mod]
[ 884.226515] md_open+0x3c/0xe0 [md_mod]
[ 884.226518] __blkdev_get+0x30d/0x710
[ 884.226520] ? bd_acquire+0xd0/0xd0
[ 884.226522] blkdev_get+0x14/0x30
[ 884.226524] do_dentry_open+0x204/0x3a0
[ 884.226531] path_openat+0x2fc/0x1520
[ 884.226534] ? seq_printf+0x4e/0x70
[ 884.226536] do_filp_open+0x9b/0x110
[ 884.226542] ? md_release+0x20/0x20 [md_mod]
[ 884.226543] ? seq_read+0x1d8/0x3e0
[ 884.226545] ? kmem_cache_alloc+0x18a/0x270
[ 884.226547] ? do_sys_open+0x1bd/0x260
[ 884.226548] do_sys_open+0x1bd/0x260
[ 884.226551] do_syscall_64+0x5b/0x1e0
[ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9
```
*** rootcause ***
"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.
The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.
```
<thread 1>: "mdadm -Ss"
md_release
mddev_put deletes mddev from all_mddevs
queue_work for mddev_delayed_delete
at this time, "/dev/md0" is still available for opening
<thread 2>: "mdadm --monitor ..."
md_open
+ mddev_find can't find mddev of /dev/md0, and create a new mddev and
| return.
+ trigger "if (mddev->gendisk != bdev->bd_disk)" and return
-ERESTARTSYS.
```
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.
In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
---
drivers/md/md.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 21da0c48f6c2..917365fc9c47 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7821,8 +7821,7 @@ static int md_open(struct block_device *bdev, fmode_t mode)
/* Wait until bdev->bd_disk is definitely gone */
if (work_pending(&mddev->del_work))
flush_workqueue(md_misc_wq);
- /* Then retry the open from the top */
- return -ERESTARTSYS;
+ return -EBUSY;
}
BUG_ON(mddev != bdev->bd_disk->private_data);
--
2.30.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] md: md_open returns -EBUSY when entering racing area
2021-04-03 3:01 [PATCH] md: md_open returns -EBUSY when entering racing area Zhao Heming
@ 2021-04-03 16:11 ` Christoph Hellwig
2021-04-04 2:02 ` heming.zhao
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2021-04-03 16:11 UTC (permalink / raw)
To: Zhao Heming
Cc: linux-raid, song, hch, guoqing.jiang, lidong.zhong, xni, neilb, colyli
This defintively looks better than the ERESTARTSYS hack.
I think I'll defer the whole blkdev_get refactoring and bd_mutex to
open_mutex conversion to the next merge window, so I think we should
be fine with this plus the mddev_find split.
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] md: md_open returns -EBUSY when entering racing area
2021-04-03 16:11 ` Christoph Hellwig
@ 2021-04-04 2:02 ` heming.zhao
0 siblings, 0 replies; 3+ messages in thread
From: heming.zhao @ 2021-04-04 2:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-raid, song, guoqing.jiang, lidong.zhong, xni, neilb, colyli
On 4/4/21 12:11 AM, Christoph Hellwig wrote:
> This defintively looks better than the ERESTARTSYS hack.
>
> I think I'll defer the whole blkdev_get refactoring and bd_mutex to
> open_mutex conversion to the next merge window, so I think we should
> be fine with this plus the mddev_find split.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
Thank you for your review.
/Heming
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-04-04 2:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-03 3:01 [PATCH] md: md_open returns -EBUSY when entering racing area Zhao Heming
2021-04-03 16:11 ` Christoph Hellwig
2021-04-04 2:02 ` heming.zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).