All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V2 0/4] Fix regression bugs
@ 2024-02-20 15:30 Xiao Ni
  2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-20 15:30 UTC (permalink / raw)
  To: song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

Hi all

Sorry, I know this patch set conflict with Yu Kuai's patch set. But
I have to send out this patch set. Now we're facing some deadlock
regression problems. So it's better to figure out the root cause and
fix them. But Kuai's patch set looks too complicate for me. And like
we're talking in the emails, Kuai's patch set breaks some rules. It's
not good to fix some problem by breaking the original logic. If we really
need to break some logic. It's better to use a distinct patch set to
describe why we need them.

This patch is based on linus's tree. The tag is 6.8-rc5. If this patch set
can be accepted. We need to revert Kuai's patches which have been merged
in Song's tree (md-6.8-20240216 tag). This patch set has four patches.
The first two resolves deadlock problems. With these two patches, it can
resolve most deadlock problem. The third one fixes active_io counter bug.
The fouth one fixes the raid5 reshape deadlock problem.

I have run lvm2 regression test. There are 4 failed cases:
shell/dmsetup-integrity-keys.sh
shell/lvresize-fs-crypt.sh
shell/pvck-dump.sh
shell/select-report.sh

Xiao Ni (4):
  Clear MD_RECOVERY_WAIT when stopping dmraid
  Set MD_RECOVERY_FROZEN before stop sync thread
  md: Missing decrease active_io for flush io
  Don't check crossing reshape when reshape hasn't started

 drivers/md/dm-raid.c |  2 ++
 drivers/md/md.c      |  8 +++++++-
 drivers/md/raid5.c   | 22 ++++++++++------------
 3 files changed, 19 insertions(+), 13 deletions(-)

-- 
2.32.0 (Apple Git-132)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
@ 2024-02-20 15:30 ` Xiao Ni
  2024-02-23  3:31   ` Yu Kuai
  2024-02-23 10:31   ` Yu Kuai
  2024-02-20 15:30 ` [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-20 15:30 UTC (permalink / raw)
  To: song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
dmraid stopped sync thread directy by calling md_reap_sync_thread.
After this patch dmraid stops sync thread asynchronously as md does.
This is right. Now the dmraid stop process is like this:

1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
is cleared
2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
3. md thread calls md_check_recovery (This is the place to reap sync
thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
thread)
4. raid_dtr stops/free struct mddev and release dmraid related resources

dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
this bit when stopping the dmraid before stopping sync thread.

But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
cleared before stopping sync thread. It's the reason stop_sync_thread only
wakes up task. If the task isn't running, it still needs to wake up sync
thread too.

This deadlock can be reproduced 100% by these commands:
modprobe brd rd_size=34816 rd_nr=5
while [ 1 ]; do
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff
sleep 1
done

Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/dm-raid.c | 2 ++
 drivers/md/md.c      | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..325767c1140f 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
 	struct raid_set *rs = ti->private;
 
 	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
 		/* Writes have to be stopped before suspending to avoid deadlocks. */
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..54790261254d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
 	 * never happen
 	 */
 	md_wakeup_thread_directly(mddev->sync_thread);
+	md_wakeup_thread(mddev->sync_thread);
 	if (work_pending(&mddev->sync_work))
 		flush_work(&mddev->sync_work);
 
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread
  2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
  2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
@ 2024-02-20 15:30 ` Xiao Ni
  2024-02-23  3:12   ` Yu Kuai
  2024-02-20 15:30 ` [PATCH RFC 3/4] md: Missing decrease active_io for flush io Xiao Ni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-20 15:30 UTC (permalink / raw)
  To: song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
stops sync thread asynchronously. The calling process is:
dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr

raid_postsuspend does two jobs. First, it stops sync thread. Then it
suspend array. Now it can stop sync thread successfully. But it doesn't
set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
sync thread because the array is already suspended.

This can be reproduced easily by those commands:
while [ 1 ]; do
vgcreate test_vg /dev/loop0 /dev/loop1
lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
lvchange -an test_vg
vgremove test_vg -ff
done

Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 54790261254d..77e3af7cf6bb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6337,6 +6337,7 @@ static void __md_stop_writes(struct mddev *mddev)
 void md_stop_writes(struct mddev *mddev)
 {
 	mddev_lock_nointr(mddev);
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	__md_stop_writes(mddev);
 	mddev_unlock(mddev);
 }
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC 3/4] md: Missing decrease active_io for flush io
  2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
  2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
  2024-02-20 15:30 ` [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
@ 2024-02-20 15:30 ` Xiao Ni
  2024-02-23  3:06   ` Yu Kuai
  2024-02-20 15:30 ` [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
  2024-02-21  5:45 ` [PATCH RFC V2 0/4] Fix regression bugs Benjamin Marzinski
  4 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-20 15:30 UTC (permalink / raw)
  To: song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

If all flush bios finish fast, it doesn't decrease active_io. And it will
stuck when stopping array.

This can be reproduced by lvm2 test shell/integrity-caching.sh.
But it can't reproduce 100%.

Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 77e3af7cf6bb..a41bed286fd2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -579,8 +579,12 @@ static void submit_flushes(struct work_struct *ws)
 			rcu_read_lock();
 		}
 	rcu_read_unlock();
-	if (atomic_dec_and_test(&mddev->flush_pending))
+	if (atomic_dec_and_test(&mddev->flush_pending)) {
+		/* The pair is percpu_ref_get() from md_flush_request() */
+		percpu_ref_put(&mddev->active_io);
+
 		queue_work(md_wq, &mddev->flush_work);
+	}
 }
 
 static void md_submit_flush_data(struct work_struct *ws)
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started
  2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
                   ` (2 preceding siblings ...)
  2024-02-20 15:30 ` [PATCH RFC 3/4] md: Missing decrease active_io for flush io Xiao Ni
@ 2024-02-20 15:30 ` Xiao Ni
  2024-02-23  3:08   ` Yu Kuai
  2024-02-21  5:45 ` [PATCH RFC V2 0/4] Fix regression bugs Benjamin Marzinski
  4 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-20 15:30 UTC (permalink / raw)
  To: song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

stripe_ahead_of_reshape is used to check if a stripe region cross the
reshape position. So first, change the function name to
stripe_across_reshape to describe the usage of this function.

For reshape backwards, it starts reshape from the end of array and conf->
reshape_progress is init to raid5_size. During reshape, if previous is true
(set in make_stripe_request) and max_sector >= conf->reshape_progress, ios
should wait until reshape window moves forward. But ios don't need to wait
if max_sector is raid5_size.

And put the conditions into the function directly to make understand the
codes easily.

This can be reproduced easily by lvm2 test shell/lvconvert-raid-reshape.sh
For dm raid reshape, before starting sync thread, it needs to reload table
some times. In one time dm raid uses MD_RECOVERY_WAIT to delay reshape and
it doesn't start sync thread this time. Then one io comes in and it waits
because stripe_ahead_of_reshape returns true because it's a backward
reshape and max_sectors > conf->reshape_progress. But the reshape hasn't
started. So skip this check when reshape_progress is raid5_size

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid5.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8497880135ee..4c71df4e2370 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5832,17 +5832,12 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector,
 					  sector >= reshape_sector;
 }
 
-static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min,
-				   sector_t max, sector_t reshape_sector)
-{
-	return mddev->reshape_backwards ? max < reshape_sector :
-					  min >= reshape_sector;
-}
-
-static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
+static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
+static bool stripe_across_reshape(struct mddev *mddev, struct r5conf *conf,
 				    struct stripe_head *sh)
 {
 	sector_t max_sector = 0, min_sector = MaxSector;
+	sector_t reshape_pos = 0;
 	bool ret = false;
 	int dd_idx;
 
@@ -5856,9 +5851,12 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
 
 	spin_lock_irq(&conf->device_lock);
 
-	if (!range_ahead_of_reshape(mddev, min_sector, max_sector,
-				     conf->reshape_progress))
-		/* mismatch, need to try again */
+	reshape_pos = conf->reshape_progress;
+	if (mddev->reshape_backwards) {
+		if (max_sector >= reshape_pos &&
+		    reshape_pos != raid5_size(mddev, 0, 0))
+			ret = true;
+	} else if (min_sector < reshape_pos)
 		ret = true;
 
 	spin_unlock_irq(&conf->device_lock);
@@ -5969,7 +5967,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 	}
 
 	if (unlikely(previous) &&
-	    stripe_ahead_of_reshape(mddev, conf, sh)) {
+	    stripe_across_reshape(mddev, conf, sh)) {
 		/*
 		 * Expansion moved on while waiting for a stripe.
 		 * Expansion could still move past after this
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC V2 0/4] Fix regression bugs
  2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
                   ` (3 preceding siblings ...)
  2024-02-20 15:30 ` [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
@ 2024-02-21  5:45 ` Benjamin Marzinski
  2024-02-23  2:42   ` Xiao Ni
  4 siblings, 1 reply; 23+ messages in thread
From: Benjamin Marzinski @ 2024-02-21  5:45 UTC (permalink / raw)
  To: Xiao Ni
  Cc: song, yukuai1, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

On Tue, Feb 20, 2024 at 11:30:55PM +0800, Xiao Ni wrote:
> Hi all
> 
> Sorry, I know this patch set conflict with Yu Kuai's patch set. But
> I have to send out this patch set. Now we're facing some deadlock
> regression problems. So it's better to figure out the root cause and
> fix them. But Kuai's patch set looks too complicate for me. And like
> we're talking in the emails, Kuai's patch set breaks some rules. It's
> not good to fix some problem by breaking the original logic. If we really
> need to break some logic. It's better to use a distinct patch set to
> describe why we need them.
> 
> This patch is based on linus's tree. The tag is 6.8-rc5. If this patch set
> can be accepted. We need to revert Kuai's patches which have been merged
> in Song's tree (md-6.8-20240216 tag). This patch set has four patches.
> The first two resolves deadlock problems. With these two patches, it can
> resolve most deadlock problem. The third one fixes active_io counter bug.
> The fouth one fixes the raid5 reshape deadlock problem.

With this patchset on top of the v6.8-rc5 kernel I can still see a hang
tearing down the devices at the end of lvconvert-raid-reshape.sh if I
run it repeatedly. I haven't dug into this enough to be certain, but it
appears that when this hangs, stripe_result make_stripe_request() is
returning STRIPE_SCHEDULE_AND_RETRY because of

ahead_of_reshape(mddev, logical_sector, conf->reshape_safe))

this never runs stripe_across_reshape() from you last patch.

It hangs with the following hung-task backtrace:

[ 4569.331345] sysrq: Show Blocked State
[ 4569.332640] task:mdX_resync      state:D stack:0     pid:155469 tgid:155469 ppid:2      flags:0x00004000
[ 4569.335367] Call Trace:
[ 4569.336122]  <TASK>
[ 4569.336758]  __schedule+0x3ec/0x15c0
[ 4569.337789]  ? __schedule+0x3f4/0x15c0
[ 4569.338433]  ? __wake_up_klogd.part.0+0x3c/0x60
[ 4569.339186]  schedule+0x32/0xd0
[ 4569.339709]  md_do_sync+0xede/0x11c0
[ 4569.340324]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 4569.341183]  ? __pfx_md_thread+0x10/0x10
[ 4569.341831]  md_thread+0xab/0x190
[ 4569.342397]  kthread+0xe5/0x120
[ 4569.342933]  ? __pfx_kthread+0x10/0x10
[ 4569.343554]  ret_from_fork+0x31/0x50
[ 4569.344152]  ? __pfx_kthread+0x10/0x10
[ 4569.344761]  ret_from_fork_asm+0x1b/0x30
[ 4569.345193]  </TASK>
[ 4569.345403] task:dmsetup         state:D stack:0     pid:156091 tgid:156091 ppid:155933 flags:0x00004002
[ 4569.346300] Call Trace:
[ 4569.346538]  <TASK>
[ 4569.346746]  __schedule+0x3ec/0x15c0
[ 4569.347097]  ? __schedule+0x3f4/0x15c0
[ 4569.347440]  ? sysvec_call_function_single+0xe/0x90
[ 4569.347905]  ? asm_sysvec_call_function_single+0x1a/0x20
[ 4569.348401]  ? __pfx_dev_remove+0x10/0x10
[ 4569.348779]  schedule+0x32/0xd0
[ 4569.349079]  stop_sync_thread+0x136/0x1d0
[ 4569.349465]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 4569.349965]  __md_stop_writes+0x15/0xe0
[ 4569.350341]  md_stop_writes+0x29/0x40
[ 4569.350698]  raid_postsuspend+0x53/0x60 [dm_raid]
[ 4569.351159]  dm_table_postsuspend_targets+0x3d/0x60
[ 4569.351627]  __dm_destroy+0x1c5/0x1e0
[ 4569.351984]  dev_remove+0x11d/0x190
[ 4569.352328]  ctl_ioctl+0x30e/0x5e0
[ 4569.352659]  dm_ctl_ioctl+0xe/0x20
[ 4569.352992]  __x64_sys_ioctl+0x94/0xd0
[ 4569.353352]  do_syscall_64+0x86/0x170
[ 4569.353703]  ? dm_ctl_ioctl+0xe/0x20
[ 4569.354059]  ? syscall_exit_to_user_mode+0x89/0x230
[ 4569.354517]  ? do_syscall_64+0x96/0x170
[ 4569.354891]  ? exc_page_fault+0x7f/0x180
[ 4569.355258]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 4569.355744] RIP: 0033:0x7f49e5dbc13d
[ 4569.356113] RSP: 002b:00007ffc365585f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4569.356804] RAX: ffffffffffffffda RBX: 000055638c4932c0 RCX: 00007f49e5dbc13d
[ 4569.357488] RDX: 000055638c493af0 RSI: 00000000c138fd04 RDI: 0000000000000003
[ 4569.358140] RBP: 00007ffc36558640 R08: 00007f49e5fbc690 R09: 00007ffc365584a8
[ 4569.358783] R10: 00007f49e5fbb97d R11: 0000000000000246 R12: 00007f49e5fbb97d
[ 4569.359442] R13: 000055638c493ba0 R14: 00007f49e5fbb97d R15: 00007f49e5fbb97d
[ 4569.360090]  </TASK>


> 
> I have run lvm2 regression test. There are 4 failed cases:
> shell/dmsetup-integrity-keys.sh
> shell/lvresize-fs-crypt.sh
> shell/pvck-dump.sh
> shell/select-report.sh
> 
> Xiao Ni (4):
>   Clear MD_RECOVERY_WAIT when stopping dmraid
>   Set MD_RECOVERY_FROZEN before stop sync thread
>   md: Missing decrease active_io for flush io
>   Don't check crossing reshape when reshape hasn't started
> 
>  drivers/md/dm-raid.c |  2 ++
>  drivers/md/md.c      |  8 +++++++-
>  drivers/md/raid5.c   | 22 ++++++++++------------
>  3 files changed, 19 insertions(+), 13 deletions(-)
> 
> -- 
> 2.32.0 (Apple Git-132)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC V2 0/4] Fix regression bugs
  2024-02-21  5:45 ` [PATCH RFC V2 0/4] Fix regression bugs Benjamin Marzinski
@ 2024-02-23  2:42   ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-23  2:42 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: song, yukuai1, heinzm, snitzer, ncroxon, neilb, linux-raid, dm-devel

On Wed, Feb 21, 2024 at 1:45 PM Benjamin Marzinski <bmarzins@redhat.com> wrote:
>
> On Tue, Feb 20, 2024 at 11:30:55PM +0800, Xiao Ni wrote:
> > Hi all
> >
> > Sorry, I know this patch set conflict with Yu Kuai's patch set. But
> > I have to send out this patch set. Now we're facing some deadlock
> > regression problems. So it's better to figure out the root cause and
> > fix them. But Kuai's patch set looks too complicate for me. And like
> > we're talking in the emails, Kuai's patch set breaks some rules. It's
> > not good to fix some problem by breaking the original logic. If we really
> > need to break some logic. It's better to use a distinct patch set to
> > describe why we need them.
> >
> > This patch is based on linus's tree. The tag is 6.8-rc5. If this patch set
> > can be accepted. We need to revert Kuai's patches which have been merged
> > in Song's tree (md-6.8-20240216 tag). This patch set has four patches.
> > The first two resolves deadlock problems. With these two patches, it can
> > resolve most deadlock problem. The third one fixes active_io counter bug.
> > The fouth one fixes the raid5 reshape deadlock problem.
>
> With this patchset on top of the v6.8-rc5 kernel I can still see a hang
> tearing down the devices at the end of lvconvert-raid-reshape.sh if I
> run it repeatedly. I haven't dug into this enough to be certain, but it
> appears that when this hangs, stripe_result make_stripe_request() is
> returning STRIPE_SCHEDULE_AND_RETRY because of
>
> ahead_of_reshape(mddev, logical_sector, conf->reshape_safe))
>
> this never runs stripe_across_reshape() from you last patch.
>
> It hangs with the following hung-task backtrace:
>
> [ 4569.331345] sysrq: Show Blocked State
> [ 4569.332640] task:mdX_resync      state:D stack:0     pid:155469 tgid:155469 ppid:2      flags:0x00004000
> [ 4569.335367] Call Trace:
> [ 4569.336122]  <TASK>
> [ 4569.336758]  __schedule+0x3ec/0x15c0
> [ 4569.337789]  ? __schedule+0x3f4/0x15c0
> [ 4569.338433]  ? __wake_up_klogd.part.0+0x3c/0x60
> [ 4569.339186]  schedule+0x32/0xd0
> [ 4569.339709]  md_do_sync+0xede/0x11c0
> [ 4569.340324]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4569.341183]  ? __pfx_md_thread+0x10/0x10
> [ 4569.341831]  md_thread+0xab/0x190
> [ 4569.342397]  kthread+0xe5/0x120
> [ 4569.342933]  ? __pfx_kthread+0x10/0x10
> [ 4569.343554]  ret_from_fork+0x31/0x50
> [ 4569.344152]  ? __pfx_kthread+0x10/0x10
> [ 4569.344761]  ret_from_fork_asm+0x1b/0x30
> [ 4569.345193]  </TASK>
> [ 4569.345403] task:dmsetup         state:D stack:0     pid:156091 tgid:156091 ppid:155933 flags:0x00004002
> [ 4569.346300] Call Trace:
> [ 4569.346538]  <TASK>
> [ 4569.346746]  __schedule+0x3ec/0x15c0
> [ 4569.347097]  ? __schedule+0x3f4/0x15c0
> [ 4569.347440]  ? sysvec_call_function_single+0xe/0x90
> [ 4569.347905]  ? asm_sysvec_call_function_single+0x1a/0x20
> [ 4569.348401]  ? __pfx_dev_remove+0x10/0x10
> [ 4569.348779]  schedule+0x32/0xd0
> [ 4569.349079]  stop_sync_thread+0x136/0x1d0
> [ 4569.349465]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4569.349965]  __md_stop_writes+0x15/0xe0
> [ 4569.350341]  md_stop_writes+0x29/0x40
> [ 4569.350698]  raid_postsuspend+0x53/0x60 [dm_raid]
> [ 4569.351159]  dm_table_postsuspend_targets+0x3d/0x60
> [ 4569.351627]  __dm_destroy+0x1c5/0x1e0
> [ 4569.351984]  dev_remove+0x11d/0x190
> [ 4569.352328]  ctl_ioctl+0x30e/0x5e0
> [ 4569.352659]  dm_ctl_ioctl+0xe/0x20
> [ 4569.352992]  __x64_sys_ioctl+0x94/0xd0
> [ 4569.353352]  do_syscall_64+0x86/0x170
> [ 4569.353703]  ? dm_ctl_ioctl+0xe/0x20
> [ 4569.354059]  ? syscall_exit_to_user_mode+0x89/0x230
> [ 4569.354517]  ? do_syscall_64+0x96/0x170
> [ 4569.354891]  ? exc_page_fault+0x7f/0x180
> [ 4569.355258]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [ 4569.355744] RIP: 0033:0x7f49e5dbc13d
> [ 4569.356113] RSP: 002b:00007ffc365585f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 4569.356804] RAX: ffffffffffffffda RBX: 000055638c4932c0 RCX: 00007f49e5dbc13d
> [ 4569.357488] RDX: 000055638c493af0 RSI: 00000000c138fd04 RDI: 0000000000000003
> [ 4569.358140] RBP: 00007ffc36558640 R08: 00007f49e5fbc690 R09: 00007ffc365584a8
> [ 4569.358783] R10: 00007f49e5fbb97d R11: 0000000000000246 R12: 00007f49e5fbb97d
> [ 4569.359442] R13: 000055638c493ba0 R14: 00007f49e5fbb97d R15: 00007f49e5fbb97d
> [ 4569.360090]  </TASK>

Hi Ben

I can reproduce this with 6.6 too. So it's not a regression by the
change (stop sync thread asynchronously). I'm trying to debug it and
find the root cause. In 6.8 with my patch set, the logs show it's
stuck at:
wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));

But raid5 conf->active_stripes is 0. So I'm still looking at why this
can happen.

Best Regards
Xiao
>
>
> >
> > I have run lvm2 regression test. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
> >
> > Xiao Ni (4):
> >   Clear MD_RECOVERY_WAIT when stopping dmraid
> >   Set MD_RECOVERY_FROZEN before stop sync thread
> >   md: Missing decrease active_io for flush io
> >   Don't check crossing reshape when reshape hasn't started
> >
> >  drivers/md/dm-raid.c |  2 ++
> >  drivers/md/md.c      |  8 +++++++-
> >  drivers/md/raid5.c   | 22 ++++++++++------------
> >  3 files changed, 19 insertions(+), 13 deletions(-)
> >
> > --
> > 2.32.0 (Apple Git-132)
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] md: Missing decrease active_io for flush io
  2024-02-20 15:30 ` [PATCH RFC 3/4] md: Missing decrease active_io for flush io Xiao Ni
@ 2024-02-23  3:06   ` Yu Kuai
  2024-02-23 13:49     ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-23  3:06 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> If all flush bios finish fast, it doesn't decrease active_io. And it will
> stuck when stopping array.
> 
> This can be reproduced by lvm2 test shell/integrity-caching.sh.
> But it can't reproduce 100%.
> 
> Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration")
> Signed-off-by: Xiao Ni <xni@redhat.com>

Same patch is already applied:

855678ed8534 md: Fix missing release of 'active_io' for flush

Thanks,
Kuai

> ---
>   drivers/md/md.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 77e3af7cf6bb..a41bed286fd2 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -579,8 +579,12 @@ static void submit_flushes(struct work_struct *ws)
>   			rcu_read_lock();
>   		}
>   	rcu_read_unlock();
> -	if (atomic_dec_and_test(&mddev->flush_pending))
> +	if (atomic_dec_and_test(&mddev->flush_pending)) {
> +		/* The pair is percpu_ref_get() from md_flush_request() */
> +		percpu_ref_put(&mddev->active_io);
> +
>   		queue_work(md_wq, &mddev->flush_work);
> +	}
>   }
>   
>   static void md_submit_flush_data(struct work_struct *ws)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started
  2024-02-20 15:30 ` [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
@ 2024-02-23  3:08   ` Yu Kuai
  2024-02-23 14:00     ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-23  3:08 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> stripe_ahead_of_reshape is used to check if a stripe region cross the
> reshape position. So first, change the function name to
> stripe_across_reshape to describe the usage of this function.
> 
> For reshape backwards, it starts reshape from the end of array and conf->
> reshape_progress is init to raid5_size. During reshape, if previous is true
> (set in make_stripe_request) and max_sector >= conf->reshape_progress, ios
> should wait until reshape window moves forward. But ios don't need to wait
> if max_sector is raid5_size.
> 
> And put the conditions into the function directly to make understand the
> codes easily.
> 
> This can be reproduced easily by lvm2 test shell/lvconvert-raid-reshape.sh
> For dm raid reshape, before starting sync thread, it needs to reload table
> some times. In one time dm raid uses MD_RECOVERY_WAIT to delay reshape and
> it doesn't start sync thread this time. Then one io comes in and it waits
> because stripe_ahead_of_reshape returns true because it's a backward
> reshape and max_sectors > conf->reshape_progress. But the reshape hasn't
> started. So skip this check when reshape_progress is raid5_size

Like I said before, after following merged patch:

ad39c08186f8 md: Don't register sync_thread for reshape directly

You should not see that sync_thread fo reshape is registered while
MD_RECOVERY_WAIT is set, so this patch should not be needed.

Thanks,
Kuai

> 
> Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/raid5.c | 22 ++++++++++------------
>   1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8497880135ee..4c71df4e2370 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5832,17 +5832,12 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector,
>   					  sector >= reshape_sector;
>   }
>   
> -static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min,
> -				   sector_t max, sector_t reshape_sector)
> -{
> -	return mddev->reshape_backwards ? max < reshape_sector :
> -					  min >= reshape_sector;
> -}
> -
> -static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
> +static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
> +static bool stripe_across_reshape(struct mddev *mddev, struct r5conf *conf,
>   				    struct stripe_head *sh)
>   {
>   	sector_t max_sector = 0, min_sector = MaxSector;
> +	sector_t reshape_pos = 0;
>   	bool ret = false;
>   	int dd_idx;
>   
> @@ -5856,9 +5851,12 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
>   
>   	spin_lock_irq(&conf->device_lock);
>   
> -	if (!range_ahead_of_reshape(mddev, min_sector, max_sector,
> -				     conf->reshape_progress))
> -		/* mismatch, need to try again */
> +	reshape_pos = conf->reshape_progress;
> +	if (mddev->reshape_backwards) {
> +		if (max_sector >= reshape_pos &&
> +		    reshape_pos != raid5_size(mddev, 0, 0))
> +			ret = true;
> +	} else if (min_sector < reshape_pos)
>   		ret = true;
>   
>   	spin_unlock_irq(&conf->device_lock);
> @@ -5969,7 +5967,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>   	}
>   
>   	if (unlikely(previous) &&
> -	    stripe_ahead_of_reshape(mddev, conf, sh)) {
> +	    stripe_across_reshape(mddev, conf, sh)) {
>   		/*
>   		 * Expansion moved on while waiting for a stripe.
>   		 * Expansion could still move past after this
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread
  2024-02-20 15:30 ` [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
@ 2024-02-23  3:12   ` Yu Kuai
  2024-02-23  3:58     ` Song Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-23  3:12 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
> stops sync thread asynchronously. The calling process is:
> dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr
> 
> raid_postsuspend does two jobs. First, it stops sync thread. Then it
> suspend array. Now it can stop sync thread successfully. But it doesn't
> set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
> raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
> sync thread because the array is already suspended.
> 
> This can be reproduced easily by those commands:
> while [ 1 ]; do
> vgcreate test_vg /dev/loop0 /dev/loop1
> lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
> lvchange -an test_vg
> vgremove test_vg -ff
> done
> 
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Xiao Ni <xni@redhat.com>

I agree with this change, but this patch is part of my patch in the
other thread:

dm-raid: really frozen sync_thread during suspend

I still think that fix found problems completely is better, however,
we'll let Song to make decision.

Thanks,
Kuai

> ---
>   drivers/md/md.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 54790261254d..77e3af7cf6bb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6337,6 +6337,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   void md_stop_writes(struct mddev *mddev)
>   {
>   	mddev_lock_nointr(mddev);
> +	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	__md_stop_writes(mddev);
>   	mddev_unlock(mddev);
>   }
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
@ 2024-02-23  3:31   ` Yu Kuai
  2024-02-23 13:20     ` Xiao Ni
  2024-02-23 10:31   ` Yu Kuai
  1 sibling, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-23  3:31 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> dmraid stopped sync thread directy by calling md_reap_sync_thread.
> After this patch dmraid stops sync thread asynchronously as md does.
> This is right. Now the dmraid stop process is like this:
> 
> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> is cleared
> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> 3. md thread calls md_check_recovery (This is the place to reap sync
> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> thread)
> 4. raid_dtr stops/free struct mddev and release dmraid related resources
> 
> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> this bit when stopping the dmraid before stopping sync thread.
> 
> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> cleared before stopping sync thread. It's the reason stop_sync_thread only
> wakes up task. If the task isn't running, it still needs to wake up sync
> thread too.
> 
> This deadlock can be reproduced 100% by these commands:
> modprobe brd rd_size=34816 rd_nr=5
> while [ 1 ]; do
> vgcreate test_vg /dev/ram*
> lvcreate --type raid5 -L 16M -n test_lv test_vg
> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> vgremove test_vg -ff
> sleep 1
> done
> 
> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Xiao Ni <xni@redhat.com>

I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
really breaks how sync_thread is working, it should just go away soon,
once we make sure sync_thread can't be registered before pers->start()
is done.

Thanks,
Kuai
> ---
>   drivers/md/dm-raid.c | 2 ++
>   drivers/md/md.c      | 1 +
>   2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index eb009d6bb03a..325767c1140f 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>   	struct raid_set *rs = ti->private;
>   
>   	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> +		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> +			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>   		/* Writes have to be stopped before suspending to avoid deadlocks. */
>   		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>   			md_stop_writes(&rs->md);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2266358d8074..54790261254d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>   	 * never happen
>   	 */
>   	md_wakeup_thread_directly(mddev->sync_thread);
> +	md_wakeup_thread(mddev->sync_thread);
>   	if (work_pending(&mddev->sync_work))
>   		flush_work(&mddev->sync_work);
>   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread
  2024-02-23  3:12   ` Yu Kuai
@ 2024-02-23  3:58     ` Song Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Song Liu @ 2024-02-23  3:58 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Xiao Ni, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Thu, Feb 22, 2024 at 7:12 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
> > stops sync thread asynchronously. The calling process is:
> > dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr
> >
> > raid_postsuspend does two jobs. First, it stops sync thread. Then it
> > suspend array. Now it can stop sync thread successfully. But it doesn't
> > set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
> > raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
> > sync thread because the array is already suspended.
> >
> > This can be reproduced easily by those commands:
> > while [ 1 ]; do
> > vgcreate test_vg /dev/loop0 /dev/loop1
> > lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
> > lvchange -an test_vg
> > vgremove test_vg -ff
> > done
> >
> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> I agree with this change, but this patch is part of my patch in the
> other thread:
>
> dm-raid: really frozen sync_thread during suspend
>
> I still think that fix found problems completely is better, however,
> we'll let Song to make decision.

I think we still need more time (and maybe more iterations) for the
other thread, so we can ship this change sooner. We should add
SoB Kuai here.

Thanks,
Song

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
  2024-02-23  3:31   ` Yu Kuai
@ 2024-02-23 10:31   ` Yu Kuai
  2024-02-23 13:40     ` Xiao Ni
  1 sibling, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-23 10:31 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> dmraid stopped sync thread directy by calling md_reap_sync_thread.
> After this patch dmraid stops sync thread asynchronously as md does.
> This is right. Now the dmraid stop process is like this:
> 
> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> is cleared
> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> 3. md thread calls md_check_recovery (This is the place to reap sync
> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> thread)
> 4. raid_dtr stops/free struct mddev and release dmraid related resources
> 
> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> this bit when stopping the dmraid before stopping sync thread.
> 
> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> cleared before stopping sync thread. It's the reason stop_sync_thread only
> wakes up task. If the task isn't running, it still needs to wake up sync
> thread too.
> 
> This deadlock can be reproduced 100% by these commands:
> modprobe brd rd_size=34816 rd_nr=5
> while [ 1 ]; do
> vgcreate test_vg /dev/ram*
> lvcreate --type raid5 -L 16M -n test_lv test_vg
> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> vgremove test_vg -ff
> sleep 1
> done
> 
> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/dm-raid.c | 2 ++
>   drivers/md/md.c      | 1 +
>   2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index eb009d6bb03a..325767c1140f 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>   	struct raid_set *rs = ti->private;
>   
>   	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> +		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> +			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);

Notice that 'MD_RECOVERY_WAIT' will never be cleared, hence sync_thread
will never make progress until table reload for dm-raid.

And other than stopping dm-raid, raid_postsuspend() call also be called
by ioctl to suspend dm-raid, hence this change is wrong.

I think we can never clear 'MD_RECOVERY_FROZEN' in this case so that
'MD_RECOVERY_WAIT' can be removed, or use '!MD_RECOVERY_WAIT' as a
condition to register new sync_thread.

Thanks,
Kuai
>   		/* Writes have to be stopped before suspending to avoid deadlocks. */
>   		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>   			md_stop_writes(&rs->md);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2266358d8074..54790261254d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>   	 * never happen
>   	 */
>   	md_wakeup_thread_directly(mddev->sync_thread);
> +	md_wakeup_thread(mddev->sync_thread);
>   	if (work_pending(&mddev->sync_work))
>   		flush_work(&mddev->sync_work);
>   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-23  3:31   ` Yu Kuai
@ 2024-02-23 13:20     ` Xiao Ni
  2024-02-26  1:31       ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-23 13:20 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> > commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> > Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> > dmraid stopped sync thread directy by calling md_reap_sync_thread.
> > After this patch dmraid stops sync thread asynchronously as md does.
> > This is right. Now the dmraid stop process is like this:
> >
> > 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> > stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> > is cleared
> > 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> > root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> > 3. md thread calls md_check_recovery (This is the place to reap sync
> > thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> > thread)
> > 4. raid_dtr stops/free struct mddev and release dmraid related resources
> >
> > dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> > this bit when stopping the dmraid before stopping sync thread.
> >
> > But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> > cleared before stopping sync thread. It's the reason stop_sync_thread only
> > wakes up task. If the task isn't running, it still needs to wake up sync
> > thread too.
> >
> > This deadlock can be reproduced 100% by these commands:
> > modprobe brd rd_size=34816 rd_nr=5
> > while [ 1 ]; do
> > vgcreate test_vg /dev/ram*
> > lvcreate --type raid5 -L 16M -n test_lv test_vg
> > lvconvert -y --stripes 4 /dev/test_vg/test_lv
> > vgremove test_vg -ff
> > sleep 1
> > done
> >
> > Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
> really breaks how sync_thread is working, it should just go away soon,
> once we make sure sync_thread can't be registered before pers->start()
> is done.

Hi Kuai

I just want to get to a stable state without changing any existing
logic. After fixing these regressions, we can consider other changes.
(e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
allow any io to happen including sync thread when array is suspended.

Regards
Xiao
>
> Thanks,
> Kuai
> > ---
> >   drivers/md/dm-raid.c | 2 ++
> >   drivers/md/md.c      | 1 +
> >   2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> > index eb009d6bb03a..325767c1140f 100644
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
> >       struct raid_set *rs = ti->private;
> >
> >       if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> > +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> > +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
> >               /* Writes have to be stopped before suspending to avoid deadlocks. */
> >               if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
> >                       md_stop_writes(&rs->md);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 2266358d8074..54790261254d 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> >        * never happen
> >        */
> >       md_wakeup_thread_directly(mddev->sync_thread);
> > +     md_wakeup_thread(mddev->sync_thread);
> >       if (work_pending(&mddev->sync_work))
> >               flush_work(&mddev->sync_work);
> >
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-23 10:31   ` Yu Kuai
@ 2024-02-23 13:40     ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-23 13:40 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Fri, Feb 23, 2024 at 6:31 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> > commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> > Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> > dmraid stopped sync thread directy by calling md_reap_sync_thread.
> > After this patch dmraid stops sync thread asynchronously as md does.
> > This is right. Now the dmraid stop process is like this:
> >
> > 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> > stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> > is cleared
> > 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> > root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> > 3. md thread calls md_check_recovery (This is the place to reap sync
> > thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> > thread)
> > 4. raid_dtr stops/free struct mddev and release dmraid related resources
> >
> > dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> > this bit when stopping the dmraid before stopping sync thread.
> >
> > But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> > cleared before stopping sync thread. It's the reason stop_sync_thread only
> > wakes up task. If the task isn't running, it still needs to wake up sync
> > thread too.
> >
> > This deadlock can be reproduced 100% by these commands:
> > modprobe brd rd_size=34816 rd_nr=5
> > while [ 1 ]; do
> > vgcreate test_vg /dev/ram*
> > lvcreate --type raid5 -L 16M -n test_lv test_vg
> > lvconvert -y --stripes 4 /dev/test_vg/test_lv
> > vgremove test_vg -ff
> > sleep 1
> > done
> >
> > Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/dm-raid.c | 2 ++
> >   drivers/md/md.c      | 1 +
> >   2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> > index eb009d6bb03a..325767c1140f 100644
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
> >       struct raid_set *rs = ti->private;
> >
> >       if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> > +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> > +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>
> Notice that 'MD_RECOVERY_WAIT' will never be cleared, hence sync_thread
> will never make progress until table reload for dm-raid.

Hi Kuai

After this patch, it indeed fix the problem mentioned in this patch.
So it can be cleared before stopping sync thread. I don't know why you
say it never be cleared.
>
> And other than stopping dm-raid, raid_postsuspend() call also be called
> by ioctl to suspend dm-raid, hence this change is wrong.

From code review, raid_postsuspend is used to stop sync thread and
suspend array. Maybe I don't understand right. If I'm right, it needs
to clear MD_RECOVERY_WAIT before stopping sync thread.

>
> I think we can never clear 'MD_RECOVERY_FROZEN' in this case so that
> 'MD_RECOVERY_WAIT' can be removed, or use '!MD_RECOVERY_WAIT' as a
> condition to register new sync_thread.

I don't understand you here.  From debug, there are three reloads
during dmraidd reshape. In the second reload, it doesn't want to start
sync thread. It's the reason that it sets MD_RECOVERY_WAIT because
dmraid is still not ready. In the third reload, it stops the mddev
which is created in the second reload and create a new mddev. During
this process, MD_RECOVERY_WAIT always works until suspend mddev which
is created in the second mddev. It has no relationship with
MD_RECOVERY_FROZEN.

Regards
Xiao
>
> Thanks,
> Kuai
> >               /* Writes have to be stopped before suspending to avoid deadlocks. */
> >               if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
> >                       md_stop_writes(&rs->md);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 2266358d8074..54790261254d 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> >        * never happen
> >        */
> >       md_wakeup_thread_directly(mddev->sync_thread);
> > +     md_wakeup_thread(mddev->sync_thread);
> >       if (work_pending(&mddev->sync_work))
> >               flush_work(&mddev->sync_work);
> >
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 3/4] md: Missing decrease active_io for flush io
  2024-02-23  3:06   ` Yu Kuai
@ 2024-02-23 13:49     ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-23 13:49 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Fri, Feb 23, 2024 at 11:06 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > If all flush bios finish fast, it doesn't decrease active_io. And it will
> > stuck when stopping array.
> >
> > This can be reproduced by lvm2 test shell/integrity-caching.sh.
> > But it can't reproduce 100%.
> >
> > Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> Same patch is already applied:
>
> 855678ed8534 md: Fix missing release of 'active_io' for flush

Thanks for the information. I used Linus's tree so I missed this one.

Regards
Xiao
>
> Thanks,
> Kuai
>
> > ---
> >   drivers/md/md.c | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 77e3af7cf6bb..a41bed286fd2 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -579,8 +579,12 @@ static void submit_flushes(struct work_struct *ws)
> >                       rcu_read_lock();
> >               }
> >       rcu_read_unlock();
> > -     if (atomic_dec_and_test(&mddev->flush_pending))
> > +     if (atomic_dec_and_test(&mddev->flush_pending)) {
> > +             /* The pair is percpu_ref_get() from md_flush_request() */
> > +             percpu_ref_put(&mddev->active_io);
> > +
> >               queue_work(md_wq, &mddev->flush_work);
> > +     }
> >   }
> >
> >   static void md_submit_flush_data(struct work_struct *ws)
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started
  2024-02-23  3:08   ` Yu Kuai
@ 2024-02-23 14:00     ` Xiao Ni
  0 siblings, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2024-02-23 14:00 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Fri, Feb 23, 2024 at 11:09 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > stripe_ahead_of_reshape is used to check if a stripe region cross the
> > reshape position. So first, change the function name to
> > stripe_across_reshape to describe the usage of this function.
> >
> > For reshape backwards, it starts reshape from the end of array and conf->
> > reshape_progress is init to raid5_size. During reshape, if previous is true
> > (set in make_stripe_request) and max_sector >= conf->reshape_progress, ios
> > should wait until reshape window moves forward. But ios don't need to wait
> > if max_sector is raid5_size.
> >
> > And put the conditions into the function directly to make understand the
> > codes easily.
> >
> > This can be reproduced easily by lvm2 test shell/lvconvert-raid-reshape.sh
> > For dm raid reshape, before starting sync thread, it needs to reload table
> > some times. In one time dm raid uses MD_RECOVERY_WAIT to delay reshape and
> > it doesn't start sync thread this time. Then one io comes in and it waits
> > because stripe_ahead_of_reshape returns true because it's a backward
> > reshape and max_sectors > conf->reshape_progress. But the reshape hasn't
> > started. So skip this check when reshape_progress is raid5_size
>
> Like I said before, after following merged patch:
>
> ad39c08186f8 md: Don't register sync_thread for reshape directly

Hi Kuai

The reason I send this patch set is I don't agree the patch 01 of your
patch set. Does the patch (md: Don't register sync_thread for reshape
directly) rely on patch 01 from your patch set? Because your patch 01
only changes one logic and other patches rely on it. So it's the
reason I can't accept the following patches.

Regards
Xiao
>
> You should not see that sync_thread fo reshape is registered while
> MD_RECOVERY_WAIT is set, so this patch should not be needed.
>
> Thanks,
> Kuai
>
> >
> > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/raid5.c | 22 ++++++++++------------
> >   1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 8497880135ee..4c71df4e2370 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5832,17 +5832,12 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector,
> >                                         sector >= reshape_sector;
> >   }
> >
> > -static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min,
> > -                                sector_t max, sector_t reshape_sector)
> > -{
> > -     return mddev->reshape_backwards ? max < reshape_sector :
> > -                                       min >= reshape_sector;
> > -}
> > -
> > -static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
> > +static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
> > +static bool stripe_across_reshape(struct mddev *mddev, struct r5conf *conf,
> >                                   struct stripe_head *sh)
> >   {
> >       sector_t max_sector = 0, min_sector = MaxSector;
> > +     sector_t reshape_pos = 0;
> >       bool ret = false;
> >       int dd_idx;
> >
> > @@ -5856,9 +5851,12 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
> >
> >       spin_lock_irq(&conf->device_lock);
> >
> > -     if (!range_ahead_of_reshape(mddev, min_sector, max_sector,
> > -                                  conf->reshape_progress))
> > -             /* mismatch, need to try again */
> > +     reshape_pos = conf->reshape_progress;
> > +     if (mddev->reshape_backwards) {
> > +             if (max_sector >= reshape_pos &&
> > +                 reshape_pos != raid5_size(mddev, 0, 0))
> > +                     ret = true;
> > +     } else if (min_sector < reshape_pos)
> >               ret = true;
> >
> >       spin_unlock_irq(&conf->device_lock);
> > @@ -5969,7 +5967,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
> >       }
> >
> >       if (unlikely(previous) &&
> > -         stripe_ahead_of_reshape(mddev, conf, sh)) {
> > +         stripe_across_reshape(mddev, conf, sh)) {
> >               /*
> >                * Expansion moved on while waiting for a stripe.
> >                * Expansion could still move past after this
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-23 13:20     ` Xiao Ni
@ 2024-02-26  1:31       ` Yu Kuai
  2024-02-26  5:12         ` Xiao Ni
  0 siblings, 1 reply; 23+ messages in thread
From: Yu Kuai @ 2024-02-26  1:31 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/23 21:20, Xiao Ni 写道:
> On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/02/20 23:30, Xiao Ni 写道:
>>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
>>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
>>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
>>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
>>> After this patch dmraid stops sync thread asynchronously as md does.
>>> This is right. Now the dmraid stop process is like this:
>>>
>>> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
>>> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
>>> is cleared
>>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
>>> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
>>> 3. md thread calls md_check_recovery (This is the place to reap sync
>>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
>>> thread)
>>> 4. raid_dtr stops/free struct mddev and release dmraid related resources
>>>
>>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
>>> this bit when stopping the dmraid before stopping sync thread.
>>>
>>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
>>> cleared before stopping sync thread. It's the reason stop_sync_thread only
>>> wakes up task. If the task isn't running, it still needs to wake up sync
>>> thread too.
>>>
>>> This deadlock can be reproduced 100% by these commands:
>>> modprobe brd rd_size=34816 rd_nr=5
>>> while [ 1 ]; do
>>> vgcreate test_vg /dev/ram*
>>> lvcreate --type raid5 -L 16M -n test_lv test_vg
>>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
>>> vgremove test_vg -ff
>>> sleep 1
>>> done
>>>
>>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>
>> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
>> really breaks how sync_thread is working, it should just go away soon,
>> once we make sure sync_thread can't be registered before pers->start()
>> is done.
> 
> Hi Kuai
> 
> I just want to get to a stable state without changing any existing
> logic. After fixing these regressions, we can consider other changes.
> (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
> is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
> allow any io to happen including sync thread when array is suspended.

So, are you still thinking that my patchset will allow this for dm-raid?

I already explain a lot why patch 1 from my set is okay, and why the set
doesn't introduce any behaviour change like you said in this patch 0:

"Kuai's patch set breaks some rules".

The only thing that will change is that for md/raid, sync_thrad can
start for suspended array, however, I don't think this will be a problem
because sync_thread can be running for suspended array already, and
'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.

Thanks,
Kuai

> 
> Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>> ---
>>>    drivers/md/dm-raid.c | 2 ++
>>>    drivers/md/md.c      | 1 +
>>>    2 files changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>> index eb009d6bb03a..325767c1140f 100644
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>>>        struct raid_set *rs = ti->private;
>>>
>>>        if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
>>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>>>                /* Writes have to be stopped before suspending to avoid deadlocks. */
>>>                if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>>>                        md_stop_writes(&rs->md);
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 2266358d8074..54790261254d 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>>>         * never happen
>>>         */
>>>        md_wakeup_thread_directly(mddev->sync_thread);
>>> +     md_wakeup_thread(mddev->sync_thread);
>>>        if (work_pending(&mddev->sync_work))
>>>                flush_work(&mddev->sync_work);
>>>
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-26  1:31       ` Yu Kuai
@ 2024-02-26  5:12         ` Xiao Ni
  2024-02-26  9:36           ` Yu Kuai
  0 siblings, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-26  5:12 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Mon, Feb 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/23 21:20, Xiao Ni 写道:
> > On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2024/02/20 23:30, Xiao Ni 写道:
> >>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> >>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> >>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> >>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
> >>> After this patch dmraid stops sync thread asynchronously as md does.
> >>> This is right. Now the dmraid stop process is like this:
> >>>
> >>> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> >>> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> >>> is cleared
> >>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> >>> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> >>> 3. md thread calls md_check_recovery (This is the place to reap sync
> >>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> >>> thread)
> >>> 4. raid_dtr stops/free struct mddev and release dmraid related resources
> >>>
> >>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> >>> this bit when stopping the dmraid before stopping sync thread.
> >>>
> >>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> >>> cleared before stopping sync thread. It's the reason stop_sync_thread only
> >>> wakes up task. If the task isn't running, it still needs to wake up sync
> >>> thread too.
> >>>
> >>> This deadlock can be reproduced 100% by these commands:
> >>> modprobe brd rd_size=34816 rd_nr=5
> >>> while [ 1 ]; do
> >>> vgcreate test_vg /dev/ram*
> >>> lvcreate --type raid5 -L 16M -n test_lv test_vg
> >>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> >>> vgremove test_vg -ff
> >>> sleep 1
> >>> done
> >>>
> >>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> >>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> >>> Signed-off-by: Xiao Ni <xni@redhat.com>
> >>
> >> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
> >> really breaks how sync_thread is working, it should just go away soon,
> >> once we make sure sync_thread can't be registered before pers->start()
> >> is done.
> >
> > Hi Kuai
> >
> > I just want to get to a stable state without changing any existing
> > logic. After fixing these regressions, we can consider other changes.
> > (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
> > is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
> > allow any io to happen including sync thread when array is suspended.
>

Hi Kuai

> So, are you still thinking that my patchset will allow this for dm-raid?
>
> I already explain a lot why patch 1 from my set is okay, and why the set
> doesn't introduce any behaviour change like you said in this patch 0:


I'll read all your patches to understand you well.

But as I mentioned many times too, we're fixing regression problems.
It's better for us to fix them with the smallest change. As you can
see, in my patch set, we can fix these regression problems with small
changes (Sorry I didn't notice your patch set has some changes which
are the same with mine).  So why don't we need such a big change to
fix the regression problems? Now with my patch set I can reproduce a
problem by lvm2 test suit which happens in 6.6 too. It means with this
patch set we can back to a state same with 6.6.



>
> "Kuai's patch set breaks some rules".
>
> The only thing that will change is that for md/raid, sync_thrad can
> start for suspended array, however, I don't think this will be a problem
> because sync_thread can be running for suspended array already, and
> 'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.

We can't allow sync thread happen for dmraid when it's suspended.
Because it needs to switch table when suspended. It's a base design.
If it can happen now. We should fix this.

Best Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Regards
> > Xiao
> >>
> >> Thanks,
> >> Kuai
> >>> ---
> >>>    drivers/md/dm-raid.c | 2 ++
> >>>    drivers/md/md.c      | 1 +
> >>>    2 files changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>> index eb009d6bb03a..325767c1140f 100644
> >>> --- a/drivers/md/dm-raid.c
> >>> +++ b/drivers/md/dm-raid.c
> >>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
> >>>        struct raid_set *rs = ti->private;
> >>>
> >>>        if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> >>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> >>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
> >>>                /* Writes have to be stopped before suspending to avoid deadlocks. */
> >>>                if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
> >>>                        md_stop_writes(&rs->md);
> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>> index 2266358d8074..54790261254d 100644
> >>> --- a/drivers/md/md.c
> >>> +++ b/drivers/md/md.c
> >>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> >>>         * never happen
> >>>         */
> >>>        md_wakeup_thread_directly(mddev->sync_thread);
> >>> +     md_wakeup_thread(mddev->sync_thread);
> >>>        if (work_pending(&mddev->sync_work))
> >>>                flush_work(&mddev->sync_work);
> >>>
> >>>
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-26  5:12         ` Xiao Ni
@ 2024-02-26  9:36           ` Yu Kuai
  2024-02-27  7:16             ` Xiao Ni
  2024-03-01  4:18             ` Xiao Ni
  0 siblings, 2 replies; 23+ messages in thread
From: Yu Kuai @ 2024-02-26  9:36 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/26 13:12, Xiao Ni 写道:
> On Mon, Feb 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/02/23 21:20, Xiao Ni 写道:
>>> On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2024/02/20 23:30, Xiao Ni 写道:
>>>>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
>>>>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
>>>>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
>>>>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
>>>>> After this patch dmraid stops sync thread asynchronously as md does.
>>>>> This is right. Now the dmraid stop process is like this:
>>>>>
>>>>> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
>>>>> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
>>>>> is cleared
>>>>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
>>>>> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
>>>>> 3. md thread calls md_check_recovery (This is the place to reap sync
>>>>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
>>>>> thread)
>>>>> 4. raid_dtr stops/free struct mddev and release dmraid related resources
>>>>>
>>>>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
>>>>> this bit when stopping the dmraid before stopping sync thread.
>>>>>
>>>>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
>>>>> cleared before stopping sync thread. It's the reason stop_sync_thread only
>>>>> wakes up task. If the task isn't running, it still needs to wake up sync
>>>>> thread too.
>>>>>
>>>>> This deadlock can be reproduced 100% by these commands:
>>>>> modprobe brd rd_size=34816 rd_nr=5
>>>>> while [ 1 ]; do
>>>>> vgcreate test_vg /dev/ram*
>>>>> lvcreate --type raid5 -L 16M -n test_lv test_vg
>>>>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
>>>>> vgremove test_vg -ff
>>>>> sleep 1
>>>>> done
>>>>>
>>>>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
>>>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>>>
>>>> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
>>>> really breaks how sync_thread is working, it should just go away soon,
>>>> once we make sure sync_thread can't be registered before pers->start()
>>>> is done.
>>>
>>> Hi Kuai
>>>
>>> I just want to get to a stable state without changing any existing
>>> logic. After fixing these regressions, we can consider other changes.
>>> (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
>>> is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
>>> allow any io to happen including sync thread when array is suspended.
>>
> 
> Hi Kuai
> 
>> So, are you still thinking that my patchset will allow this for dm-raid?
>>
>> I already explain a lot why patch 1 from my set is okay, and why the set
>> doesn't introduce any behaviour change like you said in this patch 0:
> 
> 
> I'll read all your patches to understand you well.
> 
> But as I mentioned many times too, we're fixing regression problems.
> It's better for us to fix them with the smallest change. As you can
> see, in my patch set, we can fix these regression problems with small
> changes (Sorry I didn't notice your patch set has some changes which
> are the same with mine).  So why don't we need such a big change to
> fix the regression problems? Now with my patch set I can reproduce a
> problem by lvm2 test suit which happens in 6.6 too. It means with this
> patch set we can back to a state same with 6.6.

For complexity, I agree that we can go back to the same state with v6.6,
and then fix other problems on the top of that. I don't have preference
for this, I'll post my patchset anyhow. But other than the test suite,
you still need to make sure nothing is broken from the big picture.

For example, in the following 3 cases, can MD_RECOVERY_WAIT be cleared
as expected, and new sync_thread will not start after a reload?

1)
ioctl suspend
ioctl reload
ioctl resume

2)
ioctl reload
ioctl resume

3)
ioctl suspend
// without a reload.
ioctl resume

> 
> 
> 
>>
>> "Kuai's patch set breaks some rules".
>>
>> The only thing that will change is that for md/raid, sync_thrad can
>> start for suspended array, however, I don't think this will be a problem
>> because sync_thread can be running for suspended array already, and
>> 'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.
> 
> We can't allow sync thread happen for dmraid when it's suspended.
> Because it needs to switch table when suspended. It's a base design.
> If it can happen now. We should fix this.

Yes, and my patchset also fix this, and other problems related to
sync_thread by managing sync_thread the same as md/raid.

BTW, on the top my patchset, I already made some change locally to
expand MD_RECOVERY_FROZEN and remove MD_RECOVERY_WAIT, if you're going
to review my patchset, you can take a look at following change later,
I already tested the change.

Thanks,
Kuai

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index e1b3b1917627..a0c8a5b92aab 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -213,6 +213,7 @@ struct raid_dev {
  #define RT_FLAG_RS_IN_SYNC             6
  #define RT_FLAG_RS_RESYNCING           7
  #define RT_FLAG_RS_GROW                        8
+#define RT_FLAG_WAIT_RELOAD            9

  /* Array elements of 64 bit needed for rebuild/failed disk bits */
  #define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 - 
1)) / sizeof(uint64_t) / 8)
@@ -3728,6 +3729,9 @@ static int raid_message(struct dm_target *ti, 
unsigned int argc, char **argv,
         if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
                 return -EBUSY;

+       if (test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
+               return -EINVAL;
+
         if (!strcasecmp(argv[0], "frozen")) {
                 ret = mddev_lock(mddev);
                 if (ret)
@@ -3809,21 +3813,25 @@ static void raid_presuspend(struct dm_target *ti)
         struct raid_set *rs = ti->private;
         struct mddev *mddev = &rs->md;

-       mddev_lock_nointr(mddev);
-       md_frozen_sync_thread(mddev);
-       mddev_unlock(mddev);
+       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
+               mddev_lock_nointr(mddev);
+               md_frozen_sync_thread(mddev);
+               mddev_unlock(mddev);

-       if (mddev->pers && mddev->pers->prepare_suspend)
-               mddev->pers->prepare_suspend(mddev);
+               if (mddev->pers && mddev->pers->prepare_suspend)
+                       mddev->pers->prepare_suspend(mddev);
+       }
  }

  static void raid_presuspend_undo(struct dm_target *ti)
  {
         struct raid_set *rs = ti->private;

-       mddev_lock_nointr(&rs->md);
-       md_unfrozen_sync_thread(&rs->md);
-       mddev_unlock(&rs->md);
+       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
+               mddev_lock_nointr(&rs->md);
+               md_unfrozen_sync_thread(&rs->md);
+               mddev_unlock(&rs->md);
+       }
  }

  static void raid_postsuspend(struct dm_target *ti)
@@ -3958,9 +3966,6 @@ static int rs_start_reshape(struct raid_set *rs)
         struct mddev *mddev = &rs->md;
         struct md_personality *pers = mddev->pers;

-       /* Don't allow the sync thread to work until the table gets 
reloaded. */
-       set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
-
         r = rs_setup_reshape(rs);
         if (r)
                 return r;
@@ -4055,6 +4060,7 @@ static int raid_preresume(struct dm_target *ti)
                 /* Initiate a reshape. */
                 rs_set_rdev_sectors(rs);
                 mddev_lock_nointr(mddev);
+               set_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags);
                 r = rs_start_reshape(rs);
                 mddev_unlock(mddev);
                 if (r)
@@ -4089,7 +4095,8 @@ static void raid_resume(struct dm_target *ti)
                 mddev_lock_nointr(mddev);
                 mddev->ro = 0;
                 mddev->in_sync = 0;
-               md_unfrozen_sync_thread(mddev);
+               if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
+                       md_unfrozen_sync_thread(mddev);
                 mddev_unlock(mddev);
         }
  }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 81e6f49a14fc..595b1fbdce20 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6064,7 +6064,8 @@ int md_run(struct mddev *mddev)
                         pr_warn("True protection against single-disk 
failure might be compromised.\n");
         }

-       mddev->recovery = 0;
+       /* dm-raid expect sync_thread to be frozen until resume */
+       mddev->recovery &= BIT_ULL_MASK(MD_RECOVERY_FROZEN);
         /* may be over-ridden by personality */
         mddev->resync_max_sectors = mddev->dev_sectors;

@@ -6237,10 +6238,8 @@ int md_start(struct mddev *mddev)
         int ret = 0;

         if (mddev->pers->start) {
-               set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                 ret = mddev->pers->start(mddev);
-               clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
-               md_wakeup_thread(mddev->sync_thread);
+               WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, 
&mddev->recovery));
         }
         return ret;
  }
@@ -8827,8 +8825,7 @@ void md_do_sync(struct md_thread *thread)
         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
                 goto skip;

-       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
-           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
                 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
                 goto skip;
         }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index c17b7e68c533..eb00cdadc9c5 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -554,7 +554,6 @@ enum recovery_flags {
         MD_RECOVERY_RESHAPE,    /* A reshape is happening */
         MD_RECOVERY_FROZEN,     /* User request to abort, and not 
restart, any action */
         MD_RECOVERY_ERROR,      /* sync-action interrupted because 
io-error */
-       MD_RECOVERY_WAIT,       /* waiting for pers->start() to finish */
         MD_RESYNCING_REMOTE,    /* remote node is running resync thread */
  };

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index acce2868e491..530a7f0b120b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5919,7 +5919,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
  static bool reshape_disabled(struct mddev *mddev)
  {
         return !md_is_rdwr(mddev) ||
-              test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
                test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
  }

> 
> Best Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Regards
>>> Xiao
>>>>
>>>> Thanks,
>>>> Kuai
>>>>> ---
>>>>>     drivers/md/dm-raid.c | 2 ++
>>>>>     drivers/md/md.c      | 1 +
>>>>>     2 files changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>> index eb009d6bb03a..325767c1140f 100644
>>>>> --- a/drivers/md/dm-raid.c
>>>>> +++ b/drivers/md/dm-raid.c
>>>>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>>>>>         struct raid_set *rs = ti->private;
>>>>>
>>>>>         if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>>>>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
>>>>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>>>>>                 /* Writes have to be stopped before suspending to avoid deadlocks. */
>>>>>                 if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>>>>>                         md_stop_writes(&rs->md);
>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>>> index 2266358d8074..54790261254d 100644
>>>>> --- a/drivers/md/md.c
>>>>> +++ b/drivers/md/md.c
>>>>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>>>>>          * never happen
>>>>>          */
>>>>>         md_wakeup_thread_directly(mddev->sync_thread);
>>>>> +     md_wakeup_thread(mddev->sync_thread);
>>>>>         if (work_pending(&mddev->sync_work))
>>>>>                 flush_work(&mddev->sync_work);
>>>>>
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-26  9:36           ` Yu Kuai
@ 2024-02-27  7:16             ` Xiao Ni
  2024-02-27  7:39               ` Yu Kuai
  2024-03-01  4:18             ` Xiao Ni
  1 sibling, 1 reply; 23+ messages in thread
From: Xiao Ni @ 2024-02-27  7:16 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

On Mon, Feb 26, 2024 at 5:36 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/26 13:12, Xiao Ni 写道:
> > On Mon, Feb 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2024/02/23 21:20, Xiao Ni 写道:
> >>> On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2024/02/20 23:30, Xiao Ni 写道:
> >>>>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> >>>>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> >>>>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> >>>>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
> >>>>> After this patch dmraid stops sync thread asynchronously as md does.
> >>>>> This is right. Now the dmraid stop process is like this:
> >>>>>
> >>>>> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> >>>>> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> >>>>> is cleared
> >>>>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> >>>>> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> >>>>> 3. md thread calls md_check_recovery (This is the place to reap sync
> >>>>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> >>>>> thread)
> >>>>> 4. raid_dtr stops/free struct mddev and release dmraid related resources
> >>>>>
> >>>>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> >>>>> this bit when stopping the dmraid before stopping sync thread.
> >>>>>
> >>>>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> >>>>> cleared before stopping sync thread. It's the reason stop_sync_thread only
> >>>>> wakes up task. If the task isn't running, it still needs to wake up sync
> >>>>> thread too.
> >>>>>
> >>>>> This deadlock can be reproduced 100% by these commands:
> >>>>> modprobe brd rd_size=34816 rd_nr=5
> >>>>> while [ 1 ]; do
> >>>>> vgcreate test_vg /dev/ram*
> >>>>> lvcreate --type raid5 -L 16M -n test_lv test_vg
> >>>>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> >>>>> vgremove test_vg -ff
> >>>>> sleep 1
> >>>>> done
> >>>>>
> >>>>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> >>>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> >>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
> >>>>
> >>>> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
> >>>> really breaks how sync_thread is working, it should just go away soon,
> >>>> once we make sure sync_thread can't be registered before pers->start()
> >>>> is done.
> >>>
> >>> Hi Kuai
> >>>
> >>> I just want to get to a stable state without changing any existing
> >>> logic. After fixing these regressions, we can consider other changes.
> >>> (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
> >>> is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
> >>> allow any io to happen including sync thread when array is suspended.
> >>
> >
> > Hi Kuai
> >
> >> So, are you still thinking that my patchset will allow this for dm-raid?
> >>
> >> I already explain a lot why patch 1 from my set is okay, and why the set
> >> doesn't introduce any behaviour change like you said in this patch 0:
> >
> >
> > I'll read all your patches to understand you well.
> >
> > But as I mentioned many times too, we're fixing regression problems.
> > It's better for us to fix them with the smallest change. As you can
> > see, in my patch set, we can fix these regression problems with small
> > changes (Sorry I didn't notice your patch set has some changes which
> > are the same with mine).  So why don't we need such a big change to
> > fix the regression problems? Now with my patch set I can reproduce a
> > problem by lvm2 test suit which happens in 6.6 too. It means with this
> > patch set we can back to a state same with 6.6.

Hi Kuai

>
> For complexity, I agree that we can go back to the same state with v6.6,
> and then fix other problems on the top of that. I don't have preference
> for this, I'll post my patchset anyhow. But other than the test suite,
> you still need to make sure nothing is broken from the big picture.

Thanks very much for this. I'm glad that we can reach an agreement :)
What's the preference you mean? The branch with my patch set? I'll
send a formal patch set later.

>
> For example, in the following 3 cases, can MD_RECOVERY_WAIT be cleared
> as expected, and new sync_thread will not start after a reload?
>
> 1)
> ioctl suspend
> ioctl reload
> ioctl resume
>
> 2)
> ioctl reload
> ioctl resume
>
> 3)
> ioctl suspend
> // without a reload.
> ioctl resume

Are they all dmsetup message commands? MD_RECOVERY_WAIT is used to
delay reshape to start. The sync thread (reshape) will start once
MD_RECOVERY_WAIT is cleared. I did tests with lvm commands. For the
three cases mentioned above, I need to check.
>
> >
> >
> >
> >>
> >> "Kuai's patch set breaks some rules".
> >>
> >> The only thing that will change is that for md/raid, sync_thrad can
> >> start for suspended array, however, I don't think this will be a problem
> >> because sync_thread can be running for suspended array already, and
> >> 'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.
> >
> > We can't allow sync thread happen for dmraid when it's suspended.
> > Because it needs to switch table when suspended. It's a base design.
> > If it can happen now. We should fix this.
>
> Yes, and my patchset also fix this, and other problems related to
> sync_thread by managing sync_thread the same as md/raid.
>
> BTW, on the top my patchset, I already made some change locally to
> expand MD_RECOVERY_FROZEN and remove MD_RECOVERY_WAIT, if you're going
> to review my patchset, you can take a look at following change later,
> I already tested the change.

We can work on this in the future to fix problems one by one. Please
try to make one patch set to resolve one problem. It's easy for
review.

Best Regards
Xiao

>
> Thanks,
> Kuai
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index e1b3b1917627..a0c8a5b92aab 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -213,6 +213,7 @@ struct raid_dev {
>   #define RT_FLAG_RS_IN_SYNC             6
>   #define RT_FLAG_RS_RESYNCING           7
>   #define RT_FLAG_RS_GROW                        8
> +#define RT_FLAG_WAIT_RELOAD            9
>
>   /* Array elements of 64 bit needed for rebuild/failed disk bits */
>   #define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 -
> 1)) / sizeof(uint64_t) / 8)
> @@ -3728,6 +3729,9 @@ static int raid_message(struct dm_target *ti,
> unsigned int argc, char **argv,
>          if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
>                  return -EBUSY;
>
> +       if (test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
> +               return -EINVAL;
> +
>          if (!strcasecmp(argv[0], "frozen")) {
>                  ret = mddev_lock(mddev);
>                  if (ret)
> @@ -3809,21 +3813,25 @@ static void raid_presuspend(struct dm_target *ti)
>          struct raid_set *rs = ti->private;
>          struct mddev *mddev = &rs->md;
>
> -       mddev_lock_nointr(mddev);
> -       md_frozen_sync_thread(mddev);
> -       mddev_unlock(mddev);
> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
> +               mddev_lock_nointr(mddev);
> +               md_frozen_sync_thread(mddev);
> +               mddev_unlock(mddev);
>
> -       if (mddev->pers && mddev->pers->prepare_suspend)
> -               mddev->pers->prepare_suspend(mddev);
> +               if (mddev->pers && mddev->pers->prepare_suspend)
> +                       mddev->pers->prepare_suspend(mddev);
> +       }
>   }
>
>   static void raid_presuspend_undo(struct dm_target *ti)
>   {
>          struct raid_set *rs = ti->private;
>
> -       mddev_lock_nointr(&rs->md);
> -       md_unfrozen_sync_thread(&rs->md);
> -       mddev_unlock(&rs->md);
> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
> +               mddev_lock_nointr(&rs->md);
> +               md_unfrozen_sync_thread(&rs->md);
> +               mddev_unlock(&rs->md);
> +       }
>   }
>
>   static void raid_postsuspend(struct dm_target *ti)
> @@ -3958,9 +3966,6 @@ static int rs_start_reshape(struct raid_set *rs)
>          struct mddev *mddev = &rs->md;
>          struct md_personality *pers = mddev->pers;
>
> -       /* Don't allow the sync thread to work until the table gets
> reloaded. */
> -       set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> -
>          r = rs_setup_reshape(rs);
>          if (r)
>                  return r;
> @@ -4055,6 +4060,7 @@ static int raid_preresume(struct dm_target *ti)
>                  /* Initiate a reshape. */
>                  rs_set_rdev_sectors(rs);
>                  mddev_lock_nointr(mddev);
> +               set_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags);
>                  r = rs_start_reshape(rs);
>                  mddev_unlock(mddev);
>                  if (r)
> @@ -4089,7 +4095,8 @@ static void raid_resume(struct dm_target *ti)
>                  mddev_lock_nointr(mddev);
>                  mddev->ro = 0;
>                  mddev->in_sync = 0;
> -               md_unfrozen_sync_thread(mddev);
> +               if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
> +                       md_unfrozen_sync_thread(mddev);
>                  mddev_unlock(mddev);
>          }
>   }
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 81e6f49a14fc..595b1fbdce20 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6064,7 +6064,8 @@ int md_run(struct mddev *mddev)
>                          pr_warn("True protection against single-disk
> failure might be compromised.\n");
>          }
>
> -       mddev->recovery = 0;
> +       /* dm-raid expect sync_thread to be frozen until resume */
> +       mddev->recovery &= BIT_ULL_MASK(MD_RECOVERY_FROZEN);
>          /* may be over-ridden by personality */
>          mddev->resync_max_sectors = mddev->dev_sectors;
>
> @@ -6237,10 +6238,8 @@ int md_start(struct mddev *mddev)
>          int ret = 0;
>
>          if (mddev->pers->start) {
> -               set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>                  ret = mddev->pers->start(mddev);
> -               clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> -               md_wakeup_thread(mddev->sync_thread);
> +               WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING,
> &mddev->recovery));
>          }
>          return ret;
>   }
> @@ -8827,8 +8825,7 @@ void md_do_sync(struct md_thread *thread)
>          if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>                  goto skip;
>
> -       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
> -           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> +       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>                  goto skip;
>          }
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c17b7e68c533..eb00cdadc9c5 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -554,7 +554,6 @@ enum recovery_flags {
>          MD_RECOVERY_RESHAPE,    /* A reshape is happening */
>          MD_RECOVERY_FROZEN,     /* User request to abort, and not
> restart, any action */
>          MD_RECOVERY_ERROR,      /* sync-action interrupted because
> io-error */
> -       MD_RECOVERY_WAIT,       /* waiting for pers->start() to finish */
>          MD_RESYNCING_REMOTE,    /* remote node is running resync thread */
>   };
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index acce2868e491..530a7f0b120b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5919,7 +5919,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
>   static bool reshape_disabled(struct mddev *mddev)
>   {
>          return !md_is_rdwr(mddev) ||
> -              test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>                 test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   }
>
> >
> > Best Regards
> > Xiao
> >>
> >> Thanks,
> >> Kuai
> >>
> >>>
> >>> Regards
> >>> Xiao
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>> ---
> >>>>>     drivers/md/dm-raid.c | 2 ++
> >>>>>     drivers/md/md.c      | 1 +
> >>>>>     2 files changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>>>> index eb009d6bb03a..325767c1140f 100644
> >>>>> --- a/drivers/md/dm-raid.c
> >>>>> +++ b/drivers/md/dm-raid.c
> >>>>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
> >>>>>         struct raid_set *rs = ti->private;
> >>>>>
> >>>>>         if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> >>>>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> >>>>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
> >>>>>                 /* Writes have to be stopped before suspending to avoid deadlocks. */
> >>>>>                 if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
> >>>>>                         md_stop_writes(&rs->md);
> >>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>>>> index 2266358d8074..54790261254d 100644
> >>>>> --- a/drivers/md/md.c
> >>>>> +++ b/drivers/md/md.c
> >>>>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> >>>>>          * never happen
> >>>>>          */
> >>>>>         md_wakeup_thread_directly(mddev->sync_thread);
> >>>>> +     md_wakeup_thread(mddev->sync_thread);
> >>>>>         if (work_pending(&mddev->sync_work))
> >>>>>                 flush_work(&mddev->sync_work);
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-27  7:16             ` Xiao Ni
@ 2024-02-27  7:39               ` Yu Kuai
  0 siblings, 0 replies; 23+ messages in thread
From: Yu Kuai @ 2024-02-27  7:39 UTC (permalink / raw)
  To: Xiao Ni, Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)

Hi,

在 2024/02/27 15:16, Xiao Ni 写道:
> On Mon, Feb 26, 2024 at 5:36 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/02/26 13:12, Xiao Ni 写道:
>>> On Mon, Feb 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2024/02/23 21:20, Xiao Ni 写道:
>>>>> On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2024/02/20 23:30, Xiao Ni 写道:
>>>>>>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
>>>>>>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
>>>>>>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
>>>>>>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
>>>>>>> After this patch dmraid stops sync thread asynchronously as md does.
>>>>>>> This is right. Now the dmraid stop process is like this:
>>>>>>>
>>>>>>> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
>>>>>>> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
>>>>>>> is cleared
>>>>>>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
>>>>>>> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
>>>>>>> 3. md thread calls md_check_recovery (This is the place to reap sync
>>>>>>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
>>>>>>> thread)
>>>>>>> 4. raid_dtr stops/free struct mddev and release dmraid related resources
>>>>>>>
>>>>>>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
>>>>>>> this bit when stopping the dmraid before stopping sync thread.
>>>>>>>
>>>>>>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
>>>>>>> cleared before stopping sync thread. It's the reason stop_sync_thread only
>>>>>>> wakes up task. If the task isn't running, it still needs to wake up sync
>>>>>>> thread too.
>>>>>>>
>>>>>>> This deadlock can be reproduced 100% by these commands:
>>>>>>> modprobe brd rd_size=34816 rd_nr=5
>>>>>>> while [ 1 ]; do
>>>>>>> vgcreate test_vg /dev/ram*
>>>>>>> lvcreate --type raid5 -L 16M -n test_lv test_vg
>>>>>>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
>>>>>>> vgremove test_vg -ff
>>>>>>> sleep 1
>>>>>>> done
>>>>>>>
>>>>>>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
>>>>>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>>>>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>>>>>
>>>>>> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
>>>>>> really breaks how sync_thread is working, it should just go away soon,
>>>>>> once we make sure sync_thread can't be registered before pers->start()
>>>>>> is done.
>>>>>
>>>>> Hi Kuai
>>>>>
>>>>> I just want to get to a stable state without changing any existing
>>>>> logic. After fixing these regressions, we can consider other changes.
>>>>> (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
>>>>> is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
>>>>> allow any io to happen including sync thread when array is suspended.
>>>>
>>>
>>> Hi Kuai
>>>
>>>> So, are you still thinking that my patchset will allow this for dm-raid?
>>>>
>>>> I already explain a lot why patch 1 from my set is okay, and why the set
>>>> doesn't introduce any behaviour change like you said in this patch 0:
>>>
>>>
>>> I'll read all your patches to understand you well.
>>>
>>> But as I mentioned many times too, we're fixing regression problems.
>>> It's better for us to fix them with the smallest change. As you can
>>> see, in my patch set, we can fix these regression problems with small
>>> changes (Sorry I didn't notice your patch set has some changes which
>>> are the same with mine).  So why don't we need such a big change to
>>> fix the regression problems? Now with my patch set I can reproduce a
>>> problem by lvm2 test suit which happens in 6.6 too. It means with this
>>> patch set we can back to a state same with 6.6.
> 
> Hi Kuai
> 
>>
>> For complexity, I agree that we can go back to the same state with v6.6,
>> and then fix other problems on the top of that. I don't have preference
>> for this, I'll post my patchset anyhow. But other than the test suite,
>> you still need to make sure nothing is broken from the big picture.
> 
> Thanks very much for this. I'm glad that we can reach an agreement :)
> What's the preference you mean? The branch with my patch set? I'll
> send a formal patch set later.
> 

I don't have preference... That will depend on dm-raid maintainer and
Song.

>>
>> For example, in the following 3 cases, can MD_RECOVERY_WAIT be cleared
>> as expected, and new sync_thread will not start after a reload?
>>
>> 1)
>> ioctl suspend
>> ioctl reload
>> ioctl resume
>>
>> 2)
>> ioctl reload
>> ioctl resume
>>
>> 3)
>> ioctl suspend
>> // without a reload.
>> ioctl resume
> 
> Are they all dmsetup message commands? MD_RECOVERY_WAIT is used to
> delay reshape to start. The sync thread (reshape) will start once
> MD_RECOVERY_WAIT is cleared. I did tests with lvm commands. For the
> three cases mentioned above, I need to check.

They are dm ioctl, and you can use dmsetup suspend/reload/resume cmd as
well.

 From what I see(with some debug), once MD_RECOVERY_WAIT is set, dm-raid
will never clear it, it relies on a reload to allocate a new raid_set
without the flag, and later dm_swap_table() from resume will replace
this new raid_set with the old one.

That's why I think this patch can work for the above case 1) and 2),
however, in case 3), MD_RECOVERY_WAIT is cleared without a reload and
start reshape this way can corrupt data.

And by the way, for case 1) and 2) the race window between clear
MD_RECOVERY_WAIT and mddev_suspend(), reshape can still start
concurrently, so this still is not a complete fix.

Thanks,
Kuai
>>
>>>
>>>
>>>
>>>>
>>>> "Kuai's patch set breaks some rules".
>>>>
>>>> The only thing that will change is that for md/raid, sync_thrad can
>>>> start for suspended array, however, I don't think this will be a problem
>>>> because sync_thread can be running for suspended array already, and
>>>> 'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.
>>>
>>> We can't allow sync thread happen for dmraid when it's suspended.
>>> Because it needs to switch table when suspended. It's a base design.
>>> If it can happen now. We should fix this.
>>
>> Yes, and my patchset also fix this, and other problems related to
>> sync_thread by managing sync_thread the same as md/raid.
>>
>> BTW, on the top my patchset, I already made some change locally to
>> expand MD_RECOVERY_FROZEN and remove MD_RECOVERY_WAIT, if you're going
>> to review my patchset, you can take a look at following change later,
>> I already tested the change.
> 
> We can work on this in the future to fix problems one by one. Please
> try to make one patch set to resolve one problem. It's easy for
> review.
> 
> Best Regards
> Xiao
> 
>>
>> Thanks,
>> Kuai
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index e1b3b1917627..a0c8a5b92aab 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -213,6 +213,7 @@ struct raid_dev {
>>    #define RT_FLAG_RS_IN_SYNC             6
>>    #define RT_FLAG_RS_RESYNCING           7
>>    #define RT_FLAG_RS_GROW                        8
>> +#define RT_FLAG_WAIT_RELOAD            9
>>
>>    /* Array elements of 64 bit needed for rebuild/failed disk bits */
>>    #define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 -
>> 1)) / sizeof(uint64_t) / 8)
>> @@ -3728,6 +3729,9 @@ static int raid_message(struct dm_target *ti,
>> unsigned int argc, char **argv,
>>           if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
>>                   return -EBUSY;
>>
>> +       if (test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
>> +               return -EINVAL;
>> +
>>           if (!strcasecmp(argv[0], "frozen")) {
>>                   ret = mddev_lock(mddev);
>>                   if (ret)
>> @@ -3809,21 +3813,25 @@ static void raid_presuspend(struct dm_target *ti)
>>           struct raid_set *rs = ti->private;
>>           struct mddev *mddev = &rs->md;
>>
>> -       mddev_lock_nointr(mddev);
>> -       md_frozen_sync_thread(mddev);
>> -       mddev_unlock(mddev);
>> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
>> +               mddev_lock_nointr(mddev);
>> +               md_frozen_sync_thread(mddev);
>> +               mddev_unlock(mddev);
>>
>> -       if (mddev->pers && mddev->pers->prepare_suspend)
>> -               mddev->pers->prepare_suspend(mddev);
>> +               if (mddev->pers && mddev->pers->prepare_suspend)
>> +                       mddev->pers->prepare_suspend(mddev);
>> +       }
>>    }
>>
>>    static void raid_presuspend_undo(struct dm_target *ti)
>>    {
>>           struct raid_set *rs = ti->private;
>>
>> -       mddev_lock_nointr(&rs->md);
>> -       md_unfrozen_sync_thread(&rs->md);
>> -       mddev_unlock(&rs->md);
>> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
>> +               mddev_lock_nointr(&rs->md);
>> +               md_unfrozen_sync_thread(&rs->md);
>> +               mddev_unlock(&rs->md);
>> +       }
>>    }
>>
>>    static void raid_postsuspend(struct dm_target *ti)
>> @@ -3958,9 +3966,6 @@ static int rs_start_reshape(struct raid_set *rs)
>>           struct mddev *mddev = &rs->md;
>>           struct md_personality *pers = mddev->pers;
>>
>> -       /* Don't allow the sync thread to work until the table gets
>> reloaded. */
>> -       set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>> -
>>           r = rs_setup_reshape(rs);
>>           if (r)
>>                   return r;
>> @@ -4055,6 +4060,7 @@ static int raid_preresume(struct dm_target *ti)
>>                   /* Initiate a reshape. */
>>                   rs_set_rdev_sectors(rs);
>>                   mddev_lock_nointr(mddev);
>> +               set_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags);
>>                   r = rs_start_reshape(rs);
>>                   mddev_unlock(mddev);
>>                   if (r)
>> @@ -4089,7 +4095,8 @@ static void raid_resume(struct dm_target *ti)
>>                   mddev_lock_nointr(mddev);
>>                   mddev->ro = 0;
>>                   mddev->in_sync = 0;
>> -               md_unfrozen_sync_thread(mddev);
>> +               if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
>> +                       md_unfrozen_sync_thread(mddev);
>>                   mddev_unlock(mddev);
>>           }
>>    }
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 81e6f49a14fc..595b1fbdce20 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6064,7 +6064,8 @@ int md_run(struct mddev *mddev)
>>                           pr_warn("True protection against single-disk
>> failure might be compromised.\n");
>>           }
>>
>> -       mddev->recovery = 0;
>> +       /* dm-raid expect sync_thread to be frozen until resume */
>> +       mddev->recovery &= BIT_ULL_MASK(MD_RECOVERY_FROZEN);
>>           /* may be over-ridden by personality */
>>           mddev->resync_max_sectors = mddev->dev_sectors;
>>
>> @@ -6237,10 +6238,8 @@ int md_start(struct mddev *mddev)
>>           int ret = 0;
>>
>>           if (mddev->pers->start) {
>> -               set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>>                   ret = mddev->pers->start(mddev);
>> -               clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>> -               md_wakeup_thread(mddev->sync_thread);
>> +               WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING,
>> &mddev->recovery));
>>           }
>>           return ret;
>>    }
>> @@ -8827,8 +8825,7 @@ void md_do_sync(struct md_thread *thread)
>>           if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>>                   goto skip;
>>
>> -       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>> -           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>> +       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>                   goto skip;
>>           }
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index c17b7e68c533..eb00cdadc9c5 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -554,7 +554,6 @@ enum recovery_flags {
>>           MD_RECOVERY_RESHAPE,    /* A reshape is happening */
>>           MD_RECOVERY_FROZEN,     /* User request to abort, and not
>> restart, any action */
>>           MD_RECOVERY_ERROR,      /* sync-action interrupted because
>> io-error */
>> -       MD_RECOVERY_WAIT,       /* waiting for pers->start() to finish */
>>           MD_RESYNCING_REMOTE,    /* remote node is running resync thread */
>>    };
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index acce2868e491..530a7f0b120b 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -5919,7 +5919,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
>>    static bool reshape_disabled(struct mddev *mddev)
>>    {
>>           return !md_is_rdwr(mddev) ||
>> -              test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>>                  test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>    }
>>
>>>
>>> Best Regards
>>> Xiao
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>>
>>>>> Regards
>>>>> Xiao
>>>>>>
>>>>>> Thanks,
>>>>>> Kuai
>>>>>>> ---
>>>>>>>      drivers/md/dm-raid.c | 2 ++
>>>>>>>      drivers/md/md.c      | 1 +
>>>>>>>      2 files changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>>> index eb009d6bb03a..325767c1140f 100644
>>>>>>> --- a/drivers/md/dm-raid.c
>>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>>>>>>>          struct raid_set *rs = ti->private;
>>>>>>>
>>>>>>>          if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>>>>>>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
>>>>>>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>>>>>>>                  /* Writes have to be stopped before suspending to avoid deadlocks. */
>>>>>>>                  if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>>>>>>>                          md_stop_writes(&rs->md);
>>>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>>>>> index 2266358d8074..54790261254d 100644
>>>>>>> --- a/drivers/md/md.c
>>>>>>> +++ b/drivers/md/md.c
>>>>>>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>>>>>>>           * never happen
>>>>>>>           */
>>>>>>>          md_wakeup_thread_directly(mddev->sync_thread);
>>>>>>> +     md_wakeup_thread(mddev->sync_thread);
>>>>>>>          if (work_pending(&mddev->sync_work))
>>>>>>>                  flush_work(&mddev->sync_work);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-26  9:36           ` Yu Kuai
  2024-02-27  7:16             ` Xiao Ni
@ 2024-03-01  4:18             ` Xiao Ni
  1 sibling, 0 replies; 23+ messages in thread
From: Xiao Ni @ 2024-03-01  4:18 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, neilb, linux-raid,
	dm-devel, yukuai (C)


在 2024/2/26 下午5:36, Yu Kuai 写道:
> Hi,
>
> 在 2024/02/26 13:12, Xiao Ni 写道:
>> On Mon, Feb 26, 2024 at 9:31 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> Hi,
>>>
>>> 在 2024/02/23 21:20, Xiao Ni 写道:
>>>> On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@huaweicloud.com> 
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> 在 2024/02/20 23:30, Xiao Ni 写道:
>>>>>> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
>>>>>> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
>>>>>> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
>>>>>> dmraid stopped sync thread directy by calling md_reap_sync_thread.
>>>>>> After this patch dmraid stops sync thread asynchronously as md does.
>>>>>> This is right. Now the dmraid stop process is like this:
>>>>>>
>>>>>> 1. 
>>>>>> raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
>>>>>> stop_sync_thread sets MD_RECOVERY_INTR and wait until 
>>>>>> MD_RECOVERY_RUNNING
>>>>>> is cleared
>>>>>> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
>>>>>> root cause for this deadlock. We hope md_do_sync can set 
>>>>>> MD_RECOVERY_DONE)
>>>>>> 3. md thread calls md_check_recovery (This is the place to reap sync
>>>>>> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap 
>>>>>> sync
>>>>>> thread)
>>>>>> 4. raid_dtr stops/free struct mddev and release dmraid related 
>>>>>> resources
>>>>>>
>>>>>> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs 
>>>>>> to clear
>>>>>> this bit when stopping the dmraid before stopping sync thread.
>>>>>>
>>>>>> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
>>>>>> cleared before stopping sync thread. It's the reason 
>>>>>> stop_sync_thread only
>>>>>> wakes up task. If the task isn't running, it still needs to wake 
>>>>>> up sync
>>>>>> thread too.
>>>>>>
>>>>>> This deadlock can be reproduced 100% by these commands:
>>>>>> modprobe brd rd_size=34816 rd_nr=5
>>>>>> while [ 1 ]; do
>>>>>> vgcreate test_vg /dev/ram*
>>>>>> lvcreate --type raid5 -L 16M -n test_lv test_vg
>>>>>> lvconvert -y --stripes 4 /dev/test_vg/test_lv
>>>>>> vgremove test_vg -ff
>>>>>> sleep 1
>>>>>> done
>>>>>>
>>>>>> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
>>>>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>>>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>>>>
>>>>> I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
>>>>> really breaks how sync_thread is working, it should just go away 
>>>>> soon,
>>>>> once we make sure sync_thread can't be registered before 
>>>>> pers->start()
>>>>> is done.
>>>>
>>>> Hi Kuai
>>>>
>>>> I just want to get to a stable state without changing any existing
>>>> logic. After fixing these regressions, we can consider other changes.
>>>> (e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
>>>> is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
>>>> allow any io to happen including sync thread when array is suspended.
>>>
>>
>> Hi Kuai
>>
>>> So, are you still thinking that my patchset will allow this for 
>>> dm-raid?
>>>
>>> I already explain a lot why patch 1 from my set is okay, and why the 
>>> set
>>> doesn't introduce any behaviour change like you said in this patch 0:
>>
>>
>> I'll read all your patches to understand you well.
>>
>> But as I mentioned many times too, we're fixing regression problems.
>> It's better for us to fix them with the smallest change. As you can
>> see, in my patch set, we can fix these regression problems with small
>> changes (Sorry I didn't notice your patch set has some changes which
>> are the same with mine).  So why don't we need such a big change to
>> fix the regression problems? Now with my patch set I can reproduce a
>> problem by lvm2 test suit which happens in 6.6 too. It means with this
>> patch set we can back to a state same with 6.6.
>
> For complexity, I agree that we can go back to the same state with v6.6,
> and then fix other problems on the top of that. I don't have preference
> for this, I'll post my patchset anyhow. But other than the test suite,
> you still need to make sure nothing is broken from the big picture.
>
> For example, in the following 3 cases, can MD_RECOVERY_WAIT be cleared
> as expected, and new sync_thread will not start after a reload?
>
> 1)
> ioctl suspend
> ioctl reload
> ioctl resume
>
> 2)
> ioctl reload
> ioctl resume
>
> 3)
> ioctl suspend
> // without a reload.
> ioctl resume


Hi Kuai

Is it the place that you think clearing MD_RECOVERY_WAIT is not safe? 
For the above three cases, I didn't do tests with dmsetup commands. 
Because lvm2 tests use lv commands, so I only checked the logs which I 
use printk to follow the functions calling sequence. l can make sure 
MD_RECOVERY_WAIT can be cleared when running lvconvert command. I'm not 
familiar with dmsetup commands. Could you give some examples fot the 
three above cases? I can do test myself here.

Best Regards

Xiao

>
>>
>>
>>
>>>
>>> "Kuai's patch set breaks some rules".
>>>
>>> The only thing that will change is that for md/raid, sync_thrad can
>>> start for suspended array, however, I don't think this will be a 
>>> problem
>>> because sync_thread can be running for suspended array already, and
>>> 'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.
>>
>> We can't allow sync thread happen for dmraid when it's suspended.
>> Because it needs to switch table when suspended. It's a base design.
>> If it can happen now. We should fix this.
>
> Yes, and my patchset also fix this, and other problems related to
> sync_thread by managing sync_thread the same as md/raid.
>
> BTW, on the top my patchset, I already made some change locally to
> expand MD_RECOVERY_FROZEN and remove MD_RECOVERY_WAIT, if you're going
> to review my patchset, you can take a look at following change later,
> I already tested the change.
>
> Thanks,
> Kuai
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index e1b3b1917627..a0c8a5b92aab 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -213,6 +213,7 @@ struct raid_dev {
>  #define RT_FLAG_RS_IN_SYNC             6
>  #define RT_FLAG_RS_RESYNCING           7
>  #define RT_FLAG_RS_GROW                        8
> +#define RT_FLAG_WAIT_RELOAD            9
>
>  /* Array elements of 64 bit needed for rebuild/failed disk bits */
>  #define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 
> - 1)) / sizeof(uint64_t) / 8)
> @@ -3728,6 +3729,9 @@ static int raid_message(struct dm_target *ti, 
> unsigned int argc, char **argv,
>         if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
>                 return -EBUSY;
>
> +       if (test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
> +               return -EINVAL;
> +
>         if (!strcasecmp(argv[0], "frozen")) {
>                 ret = mddev_lock(mddev);
>                 if (ret)
> @@ -3809,21 +3813,25 @@ static void raid_presuspend(struct dm_target *ti)
>         struct raid_set *rs = ti->private;
>         struct mddev *mddev = &rs->md;
>
> -       mddev_lock_nointr(mddev);
> -       md_frozen_sync_thread(mddev);
> -       mddev_unlock(mddev);
> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
> +               mddev_lock_nointr(mddev);
> +               md_frozen_sync_thread(mddev);
> +               mddev_unlock(mddev);
>
> -       if (mddev->pers && mddev->pers->prepare_suspend)
> -               mddev->pers->prepare_suspend(mddev);
> +               if (mddev->pers && mddev->pers->prepare_suspend)
> +                       mddev->pers->prepare_suspend(mddev);
> +       }
>  }
>
>  static void raid_presuspend_undo(struct dm_target *ti)
>  {
>         struct raid_set *rs = ti->private;
>
> -       mddev_lock_nointr(&rs->md);
> -       md_unfrozen_sync_thread(&rs->md);
> -       mddev_unlock(&rs->md);
> +       if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags)) {
> +               mddev_lock_nointr(&rs->md);
> +               md_unfrozen_sync_thread(&rs->md);
> +               mddev_unlock(&rs->md);
> +       }
>  }
>
>  static void raid_postsuspend(struct dm_target *ti)
> @@ -3958,9 +3966,6 @@ static int rs_start_reshape(struct raid_set *rs)
>         struct mddev *mddev = &rs->md;
>         struct md_personality *pers = mddev->pers;
>
> -       /* Don't allow the sync thread to work until the table gets 
> reloaded. */
> -       set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> -
>         r = rs_setup_reshape(rs);
>         if (r)
>                 return r;
> @@ -4055,6 +4060,7 @@ static int raid_preresume(struct dm_target *ti)
>                 /* Initiate a reshape. */
>                 rs_set_rdev_sectors(rs);
>                 mddev_lock_nointr(mddev);
> +               set_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags);
>                 r = rs_start_reshape(rs);
>                 mddev_unlock(mddev);
>                 if (r)
> @@ -4089,7 +4095,8 @@ static void raid_resume(struct dm_target *ti)
>                 mddev_lock_nointr(mddev);
>                 mddev->ro = 0;
>                 mddev->in_sync = 0;
> -               md_unfrozen_sync_thread(mddev);
> +               if (!test_bit(RT_FLAG_WAIT_RELOAD, &rs->runtime_flags))
> +                       md_unfrozen_sync_thread(mddev);
>                 mddev_unlock(mddev);
>         }
>  }
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 81e6f49a14fc..595b1fbdce20 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6064,7 +6064,8 @@ int md_run(struct mddev *mddev)
>                         pr_warn("True protection against single-disk 
> failure might be compromised.\n");
>         }
>
> -       mddev->recovery = 0;
> +       /* dm-raid expect sync_thread to be frozen until resume */
> +       mddev->recovery &= BIT_ULL_MASK(MD_RECOVERY_FROZEN);
>         /* may be over-ridden by personality */
>         mddev->resync_max_sectors = mddev->dev_sectors;
>
> @@ -6237,10 +6238,8 @@ int md_start(struct mddev *mddev)
>         int ret = 0;
>
>         if (mddev->pers->start) {
> -               set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>                 ret = mddev->pers->start(mddev);
> -               clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> -               md_wakeup_thread(mddev->sync_thread);
> +               WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, 
> &mddev->recovery));
>         }
>         return ret;
>  }
> @@ -8827,8 +8825,7 @@ void md_do_sync(struct md_thread *thread)
>         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>                 goto skip;
>
> -       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
> -           !md_is_rdwr(mddev)) {/* never try to sync a read-only 
> array */
> +       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only 
> array */
>                 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>                 goto skip;
>         }
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c17b7e68c533..eb00cdadc9c5 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -554,7 +554,6 @@ enum recovery_flags {
>         MD_RECOVERY_RESHAPE,    /* A reshape is happening */
>         MD_RECOVERY_FROZEN,     /* User request to abort, and not 
> restart, any action */
>         MD_RECOVERY_ERROR,      /* sync-action interrupted because 
> io-error */
> -       MD_RECOVERY_WAIT,       /* waiting for pers->start() to finish */
>         MD_RESYNCING_REMOTE,    /* remote node is running resync 
> thread */
>  };
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index acce2868e491..530a7f0b120b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5919,7 +5919,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
>  static bool reshape_disabled(struct mddev *mddev)
>  {
>         return !md_is_rdwr(mddev) ||
> -              test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>                test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>  }
>
>>
>> Best Regards
>> Xiao
>>>
>>> Thanks,
>>> Kuai
>>>
>>>>
>>>> Regards
>>>> Xiao
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>> ---
>>>>>>     drivers/md/dm-raid.c | 2 ++
>>>>>>     drivers/md/md.c      | 1 +
>>>>>>     2 files changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>> index eb009d6bb03a..325767c1140f 100644
>>>>>> --- a/drivers/md/dm-raid.c
>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct 
>>>>>> dm_target *ti)
>>>>>>         struct raid_set *rs = ti->private;
>>>>>>
>>>>>>         if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, 
>>>>>> &rs->runtime_flags)) {
>>>>>> +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
>>>>>> +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>>>>>>                 /* Writes have to be stopped before suspending to 
>>>>>> avoid deadlocks. */
>>>>>>                 if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>>>>>>                         md_stop_writes(&rs->md);
>>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>>>> index 2266358d8074..54790261254d 100644
>>>>>> --- a/drivers/md/md.c
>>>>>> +++ b/drivers/md/md.c
>>>>>> @@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev 
>>>>>> *mddev, bool locked, bool check_seq)
>>>>>>          * never happen
>>>>>>          */
>>>>>> md_wakeup_thread_directly(mddev->sync_thread);
>>>>>> +     md_wakeup_thread(mddev->sync_thread);
>>>>>>         if (work_pending(&mddev->sync_work))
>>>>>>                 flush_work(&mddev->sync_work);
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>>
>> .
>>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-03-01  4:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
2024-02-20 15:30 ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
2024-02-23  3:31   ` Yu Kuai
2024-02-23 13:20     ` Xiao Ni
2024-02-26  1:31       ` Yu Kuai
2024-02-26  5:12         ` Xiao Ni
2024-02-26  9:36           ` Yu Kuai
2024-02-27  7:16             ` Xiao Ni
2024-02-27  7:39               ` Yu Kuai
2024-03-01  4:18             ` Xiao Ni
2024-02-23 10:31   ` Yu Kuai
2024-02-23 13:40     ` Xiao Ni
2024-02-20 15:30 ` [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
2024-02-23  3:12   ` Yu Kuai
2024-02-23  3:58     ` Song Liu
2024-02-20 15:30 ` [PATCH RFC 3/4] md: Missing decrease active_io for flush io Xiao Ni
2024-02-23  3:06   ` Yu Kuai
2024-02-23 13:49     ` Xiao Ni
2024-02-20 15:30 ` [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
2024-02-23  3:08   ` Yu Kuai
2024-02-23 14:00     ` Xiao Ni
2024-02-21  5:45 ` [PATCH RFC V2 0/4] Fix regression bugs Benjamin Marzinski
2024-02-23  2:42   ` Xiao Ni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.