linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] dm-raid: fix v6.7 regressions
@ 2024-01-24  9:14 Yu Kuai
  2024-01-24  9:14 ` [PATCH v2 01/11] md: don't ignore suspended array in md_check_recovery() Yu Kuai
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Yu Kuai @ 2024-01-24  9:14 UTC (permalink / raw)
  To: agk, snitzer, mpatocka, dm-devel, xni, song, yukuai3, jbrassow,
	neilb, heinzm, shli, akpm
  Cc: linux-kernel, linux-raid, yukuai1, yi.zhang, yangerkun

First regression related to stop sync thread:

The lifetime of sync_thread is designed as following:

1) Decide want to start sync_thread, set MD_RECOVERY_NEEDED, and wake up
daemon thread;
2) Daemon thread detect that MD_RECOVERY_NEEDED is set, then set
MD_RECOVERY_RUNNING and register sync_thread;
3) Execute md_do_sync() for the actual work, if it's done or
interrupted, it will set MD_RECOVERY_DONE and wake up daemone thread;
4) Daemon thread detect that MD_RECOVERY_DONE is set, then clear
MD_RECOVERY_RUNNING and unregister sync_thread;

In v6.7, we fix md/raid to follow this design by commit f52f5c71f3d4
("md: fix stopping sync thread"), however, dm-raid is not considered at
that time, and following test will hang:

shell/integrity-caching.sh
shell/lvconvert-raid-reshape.sh

This patch set fix the broken test by patch 1-4;
 - patch 1 fix that step 4) is broken by suspended array;
 - patch 2 fix that step 4) is broken by read-only array;
 - patch 3 fix that step 3) is broken that md_do_sync() doesn't set
 MD_RECOVERY_DONE; Noted that this patch will introdece new problem that
 data will be corrupted, which will be fixed in later patches.
 - patch 4 fix that setp 1) is broken that sync_thread is register and
 MD_RECOVERY_RUNNING is set directly;

With patch 1-4, the above test won't hang anymore, however, the test
will still fail and complain that ext4 is corrupted;

Second regression related to frozen sync thread:

Noted that for raid456, if reshape is interrupted, then call
"pers->start_reshape" will corrupt data. This is because dm-raid rely on
md_do_sync() doesn't set MD_RECOVERY_DONE so that new sync_thread won't
be registered, and patch 3 just break this.

 - Patch 5-6 fix this problem by interrupting reshape and frozen
 sync_thread in dm_suspend(), then unfrozen and continue reshape in
dm_resume(). It's verified that dm-raid tests won't complain that
ext4 is corrupted anymore.
 - Patch 7 fix the problem that raid_message() call
 md_reap_sync_thread() directly, without holding 'reconfig_mutex'.

Last regression related to dm-raid456 IO concurrent with reshape:

For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:

1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;

After commit c467e97f079f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang at raid5_make_request().

For md/raid, the problem doesn't exist because:

1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
   'reconfig_mutex' anymore, it can be cleared and reshape can continue by
   sysfs api 'sync_action'.

However, I'm not sure yet how to avoid the problem in dm-raid yet.

 - patch 9-11 fix this problem by detecting the above 3 cases in
 dm_suspend(), and fail those IO directly.

If user really meet the IO error, then it means they're reading the wrong
data before c467e97f079f. And it's safe to read/write the array after
reshape make progress successfully.

Tests:

I already run the following two tests many times and verified that they
won't fail anymore:

shell/integrity-caching.sh
shell/lvconvert-raid-reshape.sh

For other tests, I'm still running. However, I'm sending this patchset
in case people think the fixes is not appropriate. Running the full
tests will cost lots of time in my VM, and I'll update full test results
soon.

Yu Kuai (11):
  md: don't ignore suspended array in md_check_recovery()
  md: don't ignore read-only array in md_check_recovery()
  md: make sure md_do_sync() will set MD_RECOVERY_DONE
  md: don't register sync_thread for reshape directly
  md: export helpers to stop sync_thread
  dm-raid: really frozen sync_thread during suspend
  md/dm-raid: don't call md_reap_sync_thread() directly
  dm-raid: remove mddev_suspend/resume()
  dm-raid: add a new helper prepare_suspend() in md_personality
  md: export helper md_is_rdwr()
  md/raid456: fix a deadlock for dm-raid456 while io concurrent with
    reshape

 drivers/md/dm-raid.c |  76 +++++++++++++++++++++----------
 drivers/md/md.c      | 104 ++++++++++++++++++++++++++++---------------
 drivers/md/md.h      |  16 +++++++
 drivers/md/raid10.c  |  16 +------
 drivers/md/raid5.c   |  61 +++++++++++++------------
 5 files changed, 171 insertions(+), 102 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-01-26  6:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24  9:14 [PATCH v2 00/11] dm-raid: fix v6.7 regressions Yu Kuai
2024-01-24  9:14 ` [PATCH v2 01/11] md: don't ignore suspended array in md_check_recovery() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 02/11] md: don't ignore read-only " Yu Kuai
2024-01-24  9:14 ` [PATCH v2 03/11] md: make sure md_do_sync() will set MD_RECOVERY_DONE Yu Kuai
2024-01-24  9:14 ` [PATCH v2 04/11] md: don't register sync_thread for reshape directly Yu Kuai
2024-01-24  9:14 ` [PATCH v2 05/11] md: export helpers to stop sync_thread Yu Kuai
2024-01-25  7:51   ` Xiao Ni
2024-01-25  7:57     ` Yu Kuai
2024-01-26  2:38       ` Yu Kuai
2024-01-25 11:35   ` Xiao Ni
2024-01-25 11:42     ` Yu Kuai
2024-01-25 11:52       ` Xiao Ni
2024-01-25 11:56         ` Yu Kuai
2024-01-25 11:45     ` Yu Kuai
2024-01-25 13:33   ` Xiao Ni
2024-01-26  0:14     ` Song Liu
2024-01-26  6:54       ` Xiao Ni
2024-01-24  9:14 ` [PATCH v2 06/11] dm-raid: really frozen sync_thread during suspend Yu Kuai
2024-01-24  9:14 ` [PATCH v2 07/11] md/dm-raid: don't call md_reap_sync_thread() directly Yu Kuai
2024-01-24  9:14 ` [PATCH v2 08/11] dm-raid: remove mddev_suspend/resume() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 09/11] dm-raid: add a new helper prepare_suspend() in md_personality Yu Kuai
2024-01-24  9:14 ` [PATCH v2 10/11] md: export helper md_is_rdwr() Yu Kuai
2024-01-24  9:14 ` [PATCH v2 11/11] md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Yu Kuai
2024-01-24 12:19 ` [PATCH v2 00/11] dm-raid: fix v6.7 regressions Xiao Ni
2024-01-25  0:50   ` Xiao Ni
2024-01-25  1:40     ` Yu Kuai
2024-01-25  0:46 ` Song Liu
2024-01-25  1:08   ` Yu Kuai
2024-01-25  1:51     ` Song Liu
2024-01-25  2:36       ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).