linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] raid10 bugfix
@ 2023-06-02  9:18 linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Changes in v7:
 - in patch 1, change "fail" to "fails".

Changes in v6:
 - in patch 1, improve commit message summary and comment.

Changes in v5:
 - v4 send wrong patch, correct and resend.

Changes in v4:
 - improve commit log
 - removed applied patches


Li Nan (2):
  md/raid10: Do not add spare disk when recovery fails
  md/raid10: fix io loss while replacement replace rdev

 drivers/md/raid10.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
@ 2023-06-02  9:18 ` linan666
  2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
  2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

In raid10_sync_request(), if data cannot be read from any disk for
recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
return 'max_sector', indicating that the recovery has been completed.
However, the recovery is just aborted and the data remains inconsistent.

Fix it by setting mirror->recovery_disabled, which will prevent the spare
disk from being added to this mirror. The same issue also exists during
resync, it will be fixed afterwards.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d93d8cb2b620..3d52fb618571 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int chunks_skipped = 0;
 	sector_t chunk_mask = conf->geo.chunk_mask;
 	int page_idx = 0;
+	int error_disk = -1;
 
 	/*
 	 * Allow skipping a full rebuild for incremental assembly
@@ -3386,8 +3387,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		return reshape_request(mddev, sector_nr, skipped);
 
 	if (chunks_skipped >= conf->geo.raid_disks) {
-		/* if there has been nothing to do on any drive,
-		 * then there is nothing to do at all..
+		pr_err("md/raid10:%s: %s fails\n", mdname(mddev),
+			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
+		if (error_disk >= 0 &&
+		    !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+			/*
+			 * recovery fails, set mirrors.recovery_disabled,
+			 * device shouldn't be added to there.
+			 */
+			conf->mirrors[error_disk].recovery_disabled =
+						mddev->recovery_disabled;
+			return 0;
+		}
+		/*
+		 * if there has been nothing to do on any drive,
+		 * then there is nothing to do at all.
 		 */
 		*skipped = 1;
 		return (max_sector - sector_nr) + sectors_skipped;
@@ -3638,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						       mdname(mddev));
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
+				} else {
+					error_disk = i;
 				}
 				put_buf(r10_bio);
 				if (rb2)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
@ 2023-06-02  9:18 ` linan666
  2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

When removing a disk with replacement, the replacement will be used to
replace rdev. During this process, there is a brief window in which both
rdev and replacement are read as NULL in raid10_write_request(). This
will result in io not being submitted but it should be.

  //remove				//write
  raid10_remove_disk			raid10_write_request
   mirror->rdev = NULL
					 read rdev -> NULL
   mirror->rdev = mirror->replacement
   mirror->replacement = NULL
					 read replacement -> NULL

Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.

Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3d52fb618571..c867efbc6197 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
 		disk = r10_bio->devs[slot].devnum;
 		rdev = rcu_dereference(conf->mirrors[disk].replacement);
 		if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
-		    r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+		    r10_bio->devs[slot].addr + sectors >
+		    rdev->recovery_offset) {
+			/*
+			 * Read replacement first to prevent reading both rdev
+			 * and replacement as NULL during replacement replace
+			 * rdev.
+			 */
+			smp_mb();
 			rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		}
 		if (rdev == NULL ||
 		    test_bit(Faulty, &rdev->flags))
 			continue;
@@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 	for (i = 0;  i < conf->copies; i++) {
 		int d = r10_bio->devs[i].devnum;
-		struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
-		struct md_rdev *rrdev = rcu_dereference(
-			conf->mirrors[d].replacement);
+		struct md_rdev *rdev, *rrdev;
+
+		rrdev = rcu_dereference(conf->mirrors[d].replacement);
+		/*
+		 * Read replacement first to prevent reading both rdev and
+		 * replacement as NULL during replacement replace rdev.
+		 */
+		smp_mb();
+		rdev = rcu_dereference(conf->mirrors[d].rdev);
 		if (rdev == rrdev)
 			rrdev = NULL;
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v7 0/2] raid10 bugfix
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
  2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-06-06  0:07 ` Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Song Liu @ 2023-06-06  0:07 UTC (permalink / raw)
  To: linan666
  Cc: neilb, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun

On Fri, Jun 2, 2023 at 2:22 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Changes in v7:
>  - in patch 1, change "fail" to "fails".

Applied v7 to md-next.

Thanks,
Song

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-06  0:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).