* [PATCH v6 0/2] raid10 bugfix
@ 2023-06-02 7:17 linan666
2023-06-02 7:17 ` [PATCH v6 1/2] md/raid10: Do not add spare disk when recovery fail linan666
2023-06-02 7:17 ` [PATCH v6 2/2] md/raid10: fix io loss while replacement replace rdev linan666
0 siblings, 2 replies; 3+ messages in thread
From: linan666 @ 2023-06-02 7:17 UTC (permalink / raw)
To: song, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
Changes in v6:
- in patch 1, improve commit message summary and comment.
Changes in v5:
- v4 send wrong patch, correct and resend.
Changes in v4:
- improve commit log
- removed applied patches
Li Nan (2):
md/raid10: Do not add spare disk when recovery fail
md/raid10: fix io loss while replacement replace rdev
drivers/md/raid10.c | 42 ++++++++++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 6 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v6 1/2] md/raid10: Do not add spare disk when recovery fail
2023-06-02 7:17 [PATCH v6 0/2] raid10 bugfix linan666
@ 2023-06-02 7:17 ` linan666
2023-06-02 7:17 ` [PATCH v6 2/2] md/raid10: fix io loss while replacement replace rdev linan666
1 sibling, 0 replies; 3+ messages in thread
From: linan666 @ 2023-06-02 7:17 UTC (permalink / raw)
To: song, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
In raid10_sync_request(), if data cannot be read from any disk for
recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
return 'max_sector', indicating that the recovery has been completed.
However, the recovery is just aborted and the data remains inconsistent.
Fix it by setting mirror->recovery_disabled, which will prevent the spare
disk from being added to this mirror. The same issue also exists during
resync, it will be fixed afterwards.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/raid10.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d93d8cb2b620..9fb55902e58a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int chunks_skipped = 0;
sector_t chunk_mask = conf->geo.chunk_mask;
int page_idx = 0;
+ int error_disk = -1;
/*
* Allow skipping a full rebuild for incremental assembly
@@ -3386,8 +3387,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
return reshape_request(mddev, sector_nr, skipped);
if (chunks_skipped >= conf->geo.raid_disks) {
- /* if there has been nothing to do on any drive,
- * then there is nothing to do at all..
+ pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+ test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
+ if (error_disk >= 0 &&
+ !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+ /*
+ * recovery fail, set mirrors.recovery_disabled,
+ * device shouldn't be added to there.
+ */
+ conf->mirrors[error_disk].recovery_disabled =
+ mddev->recovery_disabled;
+ return 0;
+ }
+ /*
+ * if there has been nothing to do on any drive,
+ * then there is nothing to do at all.
*/
*skipped = 1;
return (max_sector - sector_nr) + sectors_skipped;
@@ -3638,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
mdname(mddev));
mirror->recovery_disabled
= mddev->recovery_disabled;
+ } else {
+ error_disk = i;
}
put_buf(r10_bio);
if (rb2)
--
2.39.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v6 2/2] md/raid10: fix io loss while replacement replace rdev
2023-06-02 7:17 [PATCH v6 0/2] raid10 bugfix linan666
2023-06-02 7:17 ` [PATCH v6 1/2] md/raid10: Do not add spare disk when recovery fail linan666
@ 2023-06-02 7:17 ` linan666
1 sibling, 0 replies; 3+ messages in thread
From: linan666 @ 2023-06-02 7:17 UTC (permalink / raw)
To: song, neilb
Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
yangerkun
From: Li Nan <linan122@huawei.com>
When removing a disk with replacement, the replacement will be used to
replace rdev. During this process, there is a brief window in which both
rdev and replacement are read as NULL in raid10_write_request(). This
will result in io not being submitted but it should be.
//remove //write
raid10_remove_disk raid10_write_request
mirror->rdev = NULL
read rdev -> NULL
mirror->rdev = mirror->replacement
mirror->replacement = NULL
read replacement -> NULL
Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.
Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/raid10.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9fb55902e58a..317f53e9ce03 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
disk = r10_bio->devs[slot].devnum;
rdev = rcu_dereference(conf->mirrors[disk].replacement);
if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
- r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+ r10_bio->devs[slot].addr + sectors >
+ rdev->recovery_offset) {
+ /*
+ * Read replacement first to prevent reading both rdev
+ * and replacement as NULL during replacement replace
+ * rdev.
+ */
+ smp_mb();
rdev = rcu_dereference(conf->mirrors[disk].rdev);
+ }
if (rdev == NULL ||
test_bit(Faulty, &rdev->flags))
continue;
@@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
for (i = 0; i < conf->copies; i++) {
int d = r10_bio->devs[i].devnum;
- struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
- struct md_rdev *rrdev = rcu_dereference(
- conf->mirrors[d].replacement);
+ struct md_rdev *rdev, *rrdev;
+
+ rrdev = rcu_dereference(conf->mirrors[d].replacement);
+ /*
+ * Read replacement first to Prevent reading both rdev and
+ * replacement as NULL during replacement replace rdev.
+ */
+ smp_mb();
+ rdev = rcu_dereference(conf->mirrors[d].rdev);
if (rdev == rrdev)
rrdev = NULL;
if (rdev && (test_bit(Faulty, &rdev->flags)))
--
2.39.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-06-02 7:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 7:17 [PATCH v6 0/2] raid10 bugfix linan666
2023-06-02 7:17 ` [PATCH v6 1/2] md/raid10: Do not add spare disk when recovery fail linan666
2023-06-02 7:17 ` [PATCH v6 2/2] md/raid10: fix io loss while replacement replace rdev linan666
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).