All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/2] raid10 bugfix
@ 2023-06-02  9:18 linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Changes in v7:
 - in patch 1, change "fail" to "fails".

Changes in v6:
 - in patch 1, improve commit message summary and comment.

Changes in v5:
 - v4 send wrong patch, correct and resend.

Changes in v4:
 - improve commit log
 - removed applied patches


Li Nan (2):
  md/raid10: Do not add spare disk when recovery fails
  md/raid10: fix io loss while replacement replace rdev

 drivers/md/raid10.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
@ 2023-06-02  9:18 ` linan666
  2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
  2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

In raid10_sync_request(), if data cannot be read from any disk for
recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
return 'max_sector', indicating that the recovery has been completed.
However, the recovery is just aborted and the data remains inconsistent.

Fix it by setting mirror->recovery_disabled, which will prevent the spare
disk from being added to this mirror. The same issue also exists during
resync, it will be fixed afterwards.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d93d8cb2b620..3d52fb618571 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int chunks_skipped = 0;
 	sector_t chunk_mask = conf->geo.chunk_mask;
 	int page_idx = 0;
+	int error_disk = -1;
 
 	/*
 	 * Allow skipping a full rebuild for incremental assembly
@@ -3386,8 +3387,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		return reshape_request(mddev, sector_nr, skipped);
 
 	if (chunks_skipped >= conf->geo.raid_disks) {
-		/* if there has been nothing to do on any drive,
-		 * then there is nothing to do at all..
+		pr_err("md/raid10:%s: %s fails\n", mdname(mddev),
+			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
+		if (error_disk >= 0 &&
+		    !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+			/*
+			 * recovery fails, set mirrors.recovery_disabled,
+			 * device shouldn't be added to there.
+			 */
+			conf->mirrors[error_disk].recovery_disabled =
+						mddev->recovery_disabled;
+			return 0;
+		}
+		/*
+		 * if there has been nothing to do on any drive,
+		 * then there is nothing to do at all.
 		 */
 		*skipped = 1;
 		return (max_sector - sector_nr) + sectors_skipped;
@@ -3638,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						       mdname(mddev));
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
+				} else {
+					error_disk = i;
 				}
 				put_buf(r10_bio);
 				if (rb2)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
@ 2023-06-02  9:18 ` linan666
  2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: linan666 @ 2023-06-02  9:18 UTC (permalink / raw)
  To: song, neilb
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

When removing a disk with replacement, the replacement will be used to
replace rdev. During this process, there is a brief window in which both
rdev and replacement are read as NULL in raid10_write_request(). This
will result in io not being submitted but it should be.

  //remove				//write
  raid10_remove_disk			raid10_write_request
   mirror->rdev = NULL
					 read rdev -> NULL
   mirror->rdev = mirror->replacement
   mirror->replacement = NULL
					 read replacement -> NULL

Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
to prevent memory reordering.

Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3d52fb618571..c867efbc6197 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -779,8 +779,16 @@ static struct md_rdev *read_balance(struct r10conf *conf,
 		disk = r10_bio->devs[slot].devnum;
 		rdev = rcu_dereference(conf->mirrors[disk].replacement);
 		if (rdev == NULL || test_bit(Faulty, &rdev->flags) ||
-		    r10_bio->devs[slot].addr + sectors > rdev->recovery_offset)
+		    r10_bio->devs[slot].addr + sectors >
+		    rdev->recovery_offset) {
+			/*
+			 * Read replacement first to prevent reading both rdev
+			 * and replacement as NULL during replacement replace
+			 * rdev.
+			 */
+			smp_mb();
 			rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		}
 		if (rdev == NULL ||
 		    test_bit(Faulty, &rdev->flags))
 			continue;
@@ -1479,9 +1487,15 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 	for (i = 0;  i < conf->copies; i++) {
 		int d = r10_bio->devs[i].devnum;
-		struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
-		struct md_rdev *rrdev = rcu_dereference(
-			conf->mirrors[d].replacement);
+		struct md_rdev *rdev, *rrdev;
+
+		rrdev = rcu_dereference(conf->mirrors[d].replacement);
+		/*
+		 * Read replacement first to prevent reading both rdev and
+		 * replacement as NULL during replacement replace rdev.
+		 */
+		smp_mb();
+		rdev = rcu_dereference(conf->mirrors[d].rdev);
 		if (rdev == rrdev)
 			rrdev = NULL;
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v7 0/2] raid10 bugfix
  2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
  2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
  2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
@ 2023-06-06  0:07 ` Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Song Liu @ 2023-06-06  0:07 UTC (permalink / raw)
  To: linan666
  Cc: neilb, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun

On Fri, Jun 2, 2023 at 2:22 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Changes in v7:
>  - in patch 1, change "fail" to "fails".

Applied v7 to md-next.

Thanks,
Song

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-06  0:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  9:18 [PATCH v7 0/2] raid10 bugfix linan666
2023-06-02  9:18 ` [PATCH v7 1/2] md/raid10: Do not add spare disk when recovery fails linan666
2023-06-02  9:18 ` [PATCH v7 2/2] md/raid10: fix io loss while replacement replace rdev linan666
2023-06-06  0:07 ` [PATCH v7 0/2] raid10 bugfix Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.