All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] md/raid10: optimize and fix read error
  2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
@ 2023-06-23  9:52 ` Paul Menzel
  2023-06-25  1:23   ` Li Nan
  2023-06-23 17:32 ` [PATCH 1/3] md/raid10: optimize fix_read_error linan666
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Paul Menzel @ 2023-06-23  9:52 UTC (permalink / raw)
  To: linan666
  Cc: song, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun

Dear Li,


Thank you for your patches.

Am 23.06.23 um 19:32 schrieb linan666@huaweicloud.com:
> From: Li Nan <linan122@huawei.com>

Just a note, that the date/timestamps of your patches are from the 
future. (Yu’s patches had the same problem.) Could you please make sure 
to set up the system time correctly.

     Received: from huaweicloud.com (unknown [10.175.104.67])
             by APP4 (Coremail) with SMTP id 
gCh0CgCXaK8TZ5VkaZ1hMQ--.14505S4;
             Fri, 23 Jun 2023 17:34:11 +0800 (CST)
     […]
     Date:   Sat, 24 Jun 2023 01:32:33 +0800

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] md/raid10: optimize fix_read_error
  2023-06-23 17:32 ` [PATCH 1/3] md/raid10: optimize fix_read_error linan666
@ 2023-06-23 10:03   ` Paul Menzel
  2023-06-25  6:44     ` Li Nan
  2023-06-26  2:28   ` Yu Kuai
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Menzel @ 2023-06-23 10:03 UTC (permalink / raw)
  To: linan666
  Cc: song, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun

Dear Li,


Thank you for your patch.

Am 23.06.23 um 19:32 schrieb linan666@huaweicloud.com:
> From: Li Nan <linan122@huawei.com>
> 
> We dereference r10_bio->read_slot too many times in fix_read_error().
> Optimize it by using a variable to store read_slot.

I am always cautious reading about optimizations without any benchmarks 
or object code analysis. Although your explanation makes sense, did you 
check, that performance didn’t decrease in some way? (Maybe the compiler 
even generates the same code.)


Kind regards,

Paul


> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 381c21f7fb06..94ae294c8a3c 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2725,10 +2725,10 @@ static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
>   static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10bio *r10_bio)
>   {
>   	int sect = 0; /* Offset from r10_bio->sector */
> -	int sectors = r10_bio->sectors;
> +	int sectors = r10_bio->sectors, slot = r10_bio->read_slot;
>   	struct md_rdev *rdev;
>   	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
> -	int d = r10_bio->devs[r10_bio->read_slot].devnum;
> +	int d = r10_bio->devs[slot].devnum;
>   
>   	/* still own a reference to this rdev, so it cannot
>   	 * have been cleared recently.
> @@ -2749,13 +2749,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
>   			  mdname(mddev), rdev->bdev);
>   		md_error(mddev, rdev);
> -		r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED;
> +		r10_bio->devs[slot].bio = IO_BLOCKED;
>   		return;
>   	}
>   
>   	while(sectors) {
>   		int s = sectors;
> -		int sl = r10_bio->read_slot;
> +		int sl = slot;
>   		int success = 0;
>   		int start;
>   
> @@ -2790,7 +2790,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			sl++;
>   			if (sl == conf->copies)
>   				sl = 0;
> -		} while (!success && sl != r10_bio->read_slot);
> +		} while (!success && sl != slot);
>   		rcu_read_unlock();
>   
>   		if (!success) {
> @@ -2798,16 +2798,16 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			 * as bad on the first device to discourage future
>   			 * reads.
>   			 */
> -			int dn = r10_bio->devs[r10_bio->read_slot].devnum;
> +			int dn = r10_bio->devs[slot].devnum;
>   			rdev = conf->mirrors[dn].rdev;
>   
>   			if (!rdev_set_badblocks(
>   				    rdev,
> -				    r10_bio->devs[r10_bio->read_slot].addr
> +				    r10_bio->devs[slot].addr
>   				    + sect,
>   				    s, 0)) {
>   				md_error(mddev, rdev);
> -				r10_bio->devs[r10_bio->read_slot].bio
> +				r10_bio->devs[slot].bio
>   					= IO_BLOCKED;
>   			}
>   			break;
> @@ -2816,7 +2816,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   		start = sl;
>   		/* write it back and re-read */
>   		rcu_read_lock();
> -		while (sl != r10_bio->read_slot) {
> +		while (sl != slot) {
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;
> @@ -2850,7 +2850,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			rcu_read_lock();
>   		}
>   		sl = start;
> -		while (sl != r10_bio->read_slot) {
> +		while (sl != slot) {
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/3] md/raid10: optimize and fix read error
@ 2023-06-23 17:32 linan666
  2023-06-23  9:52 ` Paul Menzel
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: linan666 @ 2023-06-23 17:32 UTC (permalink / raw)
  To: song
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

This patch series optimizes and fixes fix_read_error().

Li Nan (3):
  md/raid10: optimize fix_read_error
  md: remove redundant check in fix_read_error()
  md/raid10: handle replacement devices in fix_read_error

 drivers/md/raid1.c  |  2 +-
 drivers/md/raid10.c | 50 +++++++++++++++++++++++----------------------
 2 files changed, 27 insertions(+), 25 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] md/raid10: optimize fix_read_error
  2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
  2023-06-23  9:52 ` Paul Menzel
@ 2023-06-23 17:32 ` linan666
  2023-06-23 10:03   ` Paul Menzel
  2023-06-26  2:28   ` Yu Kuai
  2023-06-23 17:32 ` [PATCH 2/3] md: remove redundant check in fix_read_error() linan666
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: linan666 @ 2023-06-23 17:32 UTC (permalink / raw)
  To: song
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

We dereference r10_bio->read_slot too many times in fix_read_error().
Optimize it by using a variable to store read_slot.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 381c21f7fb06..94ae294c8a3c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2725,10 +2725,10 @@ static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
 static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10bio *r10_bio)
 {
 	int sect = 0; /* Offset from r10_bio->sector */
-	int sectors = r10_bio->sectors;
+	int sectors = r10_bio->sectors, slot = r10_bio->read_slot;
 	struct md_rdev *rdev;
 	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
-	int d = r10_bio->devs[r10_bio->read_slot].devnum;
+	int d = r10_bio->devs[slot].devnum;
 
 	/* still own a reference to this rdev, so it cannot
 	 * have been cleared recently.
@@ -2749,13 +2749,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
 			  mdname(mddev), rdev->bdev);
 		md_error(mddev, rdev);
-		r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED;
+		r10_bio->devs[slot].bio = IO_BLOCKED;
 		return;
 	}
 
 	while(sectors) {
 		int s = sectors;
-		int sl = r10_bio->read_slot;
+		int sl = slot;
 		int success = 0;
 		int start;
 
@@ -2790,7 +2790,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			sl++;
 			if (sl == conf->copies)
 				sl = 0;
-		} while (!success && sl != r10_bio->read_slot);
+		} while (!success && sl != slot);
 		rcu_read_unlock();
 
 		if (!success) {
@@ -2798,16 +2798,16 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			 * as bad on the first device to discourage future
 			 * reads.
 			 */
-			int dn = r10_bio->devs[r10_bio->read_slot].devnum;
+			int dn = r10_bio->devs[slot].devnum;
 			rdev = conf->mirrors[dn].rdev;
 
 			if (!rdev_set_badblocks(
 				    rdev,
-				    r10_bio->devs[r10_bio->read_slot].addr
+				    r10_bio->devs[slot].addr
 				    + sect,
 				    s, 0)) {
 				md_error(mddev, rdev);
-				r10_bio->devs[r10_bio->read_slot].bio
+				r10_bio->devs[slot].bio
 					= IO_BLOCKED;
 			}
 			break;
@@ -2816,7 +2816,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 		start = sl;
 		/* write it back and re-read */
 		rcu_read_lock();
-		while (sl != r10_bio->read_slot) {
+		while (sl != slot) {
 			if (sl==0)
 				sl = conf->copies;
 			sl--;
@@ -2850,7 +2850,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			rcu_read_lock();
 		}
 		sl = start;
-		while (sl != r10_bio->read_slot) {
+		while (sl != slot) {
 			if (sl==0)
 				sl = conf->copies;
 			sl--;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] md: remove redundant check in fix_read_error()
  2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
  2023-06-23  9:52 ` Paul Menzel
  2023-06-23 17:32 ` [PATCH 1/3] md/raid10: optimize fix_read_error linan666
@ 2023-06-23 17:32 ` linan666
  2023-06-26  2:31   ` Yu Kuai
  2023-06-23 17:32 ` [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error linan666
  2023-07-07  8:16 ` [PATCH 0/3] md/raid10: optimize and fix read error Song Liu
  4 siblings, 1 reply; 12+ messages in thread
From: linan666 @ 2023-06-23 17:32 UTC (permalink / raw)
  To: song
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

In fix_read_error(), 'success' will be checked immediately after assigning
it, if it is set to 1 then the loop will break. Checking it again in
condition of loop is redundant. Clean it up.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid1.c  | 2 +-
 drivers/md/raid10.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3570da63969b..0391c2d0c109 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2318,7 +2318,7 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 			d++;
 			if (d == conf->raid_disks * 2)
 				d = 0;
-		} while (!success && d != read_disk);
+		} while (d != read_disk);
 
 		if (!success) {
 			/* Cannot read from anywhere - mark it bad */
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 94ae294c8a3c..a36e53fce21f 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2790,7 +2790,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			sl++;
 			if (sl == conf->copies)
 				sl = 0;
-		} while (!success && sl != slot);
+		} while (sl != slot);
 		rcu_read_unlock();
 
 		if (!success) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error
  2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
                   ` (2 preceding siblings ...)
  2023-06-23 17:32 ` [PATCH 2/3] md: remove redundant check in fix_read_error() linan666
@ 2023-06-23 17:32 ` linan666
  2023-06-26  2:42   ` Yu Kuai
  2023-07-07  8:16 ` [PATCH 0/3] md/raid10: optimize and fix read error Song Liu
  4 siblings, 1 reply; 12+ messages in thread
From: linan666 @ 2023-06-23 17:32 UTC (permalink / raw)
  To: song
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

In fix_read_error(), the handling of replacement devices is missing. If
read replacement device errors, we will attempt to fix 'mirror->rdev'.
It is wrong. Get rdev from r10bio to ensure that the fixed device is the
one which read error occurred.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a36e53fce21f..4a7c8eaf6ea0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2726,15 +2726,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 {
 	int sect = 0; /* Offset from r10_bio->sector */
 	int sectors = r10_bio->sectors, slot = r10_bio->read_slot;
-	struct md_rdev *rdev;
+	struct md_rdev *rdev = r10_bio->devs[slot].rdev;
 	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
 	int d = r10_bio->devs[slot].devnum;
 
-	/* still own a reference to this rdev, so it cannot
-	 * have been cleared recently.
-	 */
-	rdev = conf->mirrors[d].rdev;
-
 	if (test_bit(Faulty, &rdev->flags))
 		/* drive has already been failed, just ignore any
 		   more fix_read_error() attempts */
@@ -2763,12 +2758,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			s = PAGE_SIZE >> 9;
 
 		rcu_read_lock();
+		rdev = r10_bio->devs[slot].rdev;
 		do {
 			sector_t first_bad;
 			int bad_sectors;
 
-			d = r10_bio->devs[sl].devnum;
-			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev &&
 			    test_bit(In_sync, &rdev->flags) &&
 			    !test_bit(Faulty, &rdev->flags) &&
@@ -2790,6 +2784,8 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			sl++;
 			if (sl == conf->copies)
 				sl = 0;
+			d = r10_bio->devs[sl].devnum;
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
 		} while (sl != slot);
 		rcu_read_unlock();
 
@@ -2798,9 +2794,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			 * as bad on the first device to discourage future
 			 * reads.
 			 */
-			int dn = r10_bio->devs[slot].devnum;
-			rdev = conf->mirrors[dn].rdev;
-
+			rdev = r10_bio->devs[slot].rdev;
 			if (!rdev_set_badblocks(
 				    rdev,
 				    r10_bio->devs[slot].addr
@@ -2820,8 +2814,12 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			if (sl==0)
 				sl = conf->copies;
 			sl--;
-			d = r10_bio->devs[sl].devnum;
-			rdev = rcu_dereference(conf->mirrors[d].rdev);
+			if (sl == slot) {
+				rdev = r10_bio->devs[slot].rdev;
+			} else {
+				d = r10_bio->devs[sl].devnum;
+				rdev = rcu_dereference(conf->mirrors[d].rdev);
+			}
 			if (!rdev ||
 			    test_bit(Faulty, &rdev->flags) ||
 			    !test_bit(In_sync, &rdev->flags))
@@ -2854,8 +2852,12 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			if (sl==0)
 				sl = conf->copies;
 			sl--;
-			d = r10_bio->devs[sl].devnum;
-			rdev = rcu_dereference(conf->mirrors[d].rdev);
+			if (sl == slot) {
+				rdev = r10_bio->devs[slot].rdev;
+			} else {
+				d = r10_bio->devs[sl].devnum;
+				rdev = rcu_dereference(conf->mirrors[d].rdev);
+			}
 			if (!rdev ||
 			    test_bit(Faulty, &rdev->flags) ||
 			    !test_bit(In_sync, &rdev->flags))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] md/raid10: optimize and fix read error
  2023-06-23  9:52 ` Paul Menzel
@ 2023-06-25  1:23   ` Li Nan
  0 siblings, 0 replies; 12+ messages in thread
From: Li Nan @ 2023-06-25  1:23 UTC (permalink / raw)
  To: Paul Menzel
  Cc: song, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun



在 2023/6/23 17:52, Paul Menzel 写道:
> Dear Li,
> 
> 
> Thank you for your patches.
> 
> Am 23.06.23 um 19:32 schrieb linan666@huaweicloud.com:
>> From: Li Nan <linan122@huawei.com>
> 
> Just a note, that the date/timestamps of your patches are from the 
> future. (Yu’s patches had the same problem.) Could you please make sure 
> to set up the system time correctly.

System time is correct now. Thanks for your reminding :)

> 
>      Received: from huaweicloud.com (unknown [10.175.104.67])
>              by APP4 (Coremail) with SMTP id 
> gCh0CgCXaK8TZ5VkaZ1hMQ--.14505S4;
>              Fri, 23 Jun 2023 17:34:11 +0800 (CST)
>      […]
>      Date:   Sat, 24 Jun 2023 01:32:33 +0800
> 
> […]
> 
> 
> Kind regards,
> 
> Paul

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] md/raid10: optimize fix_read_error
  2023-06-23 10:03   ` Paul Menzel
@ 2023-06-25  6:44     ` Li Nan
  0 siblings, 0 replies; 12+ messages in thread
From: Li Nan @ 2023-06-25  6:44 UTC (permalink / raw)
  To: Paul Menzel
  Cc: song, linux-raid, linux-kernel, linan122, yukuai3, yi.zhang,
	houtao1, yangerkun



在 2023/6/23 18:03, Paul Menzel 写道:
> Dear Li,
> 
> 
> Thank you for your patch.
> 
> Am 23.06.23 um 19:32 schrieb linan666@huaweicloud.com:
>> From: Li Nan <linan122@huawei.com>
>>
>> We dereference r10_bio->read_slot too many times in fix_read_error().
>> Optimize it by using a variable to store read_slot.
> 
> I am always cautious reading about optimizations without any benchmarks 
> or object code analysis. Although your explanation makes sense, did you 
> check, that performance didn’t decrease in some way? (Maybe the compiler 
> even generates the same code.)
> 
> 
> Kind regards,
> 
> Paul
> 
> 

Compared assembly code before and after optimization:
  - With gcc 8.3.0, both are consistent.
  - With gcc 12.2.1, 'r10_bio->read_slot' mostly uses r10bio dirctly:
      2853    while (sl != r10_bio->read_slot) {
        0xffffffff8213f2a0 <+1328>:  cmp    %r14d,0x38(%rbp)

    'slot' is mostly saved to a register individually:
      2819    while (sl != slot) {
        0xffffffff8213f08a <+794>:   cmp    %r14d,%ebx

I have not tested the performance, and it won't bring significant 
performance optimization, which can also be seen from the analysis of 
the assembly code. In fact, I just want to make code more readable.

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] md/raid10: optimize fix_read_error
  2023-06-23 17:32 ` [PATCH 1/3] md/raid10: optimize fix_read_error linan666
  2023-06-23 10:03   ` Paul Menzel
@ 2023-06-26  2:28   ` Yu Kuai
  1 sibling, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2023-06-26  2:28 UTC (permalink / raw)
  To: linan666, song
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/06/24 1:32, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> We dereference r10_bio->read_slot too many times in fix_read_error().
> Optimize it by using a variable to store read_slot.
> 

Other than a nit below, this patch LGTM.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>

> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 381c21f7fb06..94ae294c8a3c 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2725,10 +2725,10 @@ static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
>   static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10bio *r10_bio)
>   {
>   	int sect = 0; /* Offset from r10_bio->sector */
> -	int sectors = r10_bio->sectors;
> +	int sectors = r10_bio->sectors, slot = r10_bio->read_slot;
>   	struct md_rdev *rdev;
>   	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
> -	int d = r10_bio->devs[r10_bio->read_slot].devnum;
> +	int d = r10_bio->devs[slot].devnum;
>   
>   	/* still own a reference to this rdev, so it cannot
>   	 * have been cleared recently.
> @@ -2749,13 +2749,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
>   			  mdname(mddev), rdev->bdev);
>   		md_error(mddev, rdev);
> -		r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED;
> +		r10_bio->devs[slot].bio = IO_BLOCKED;
>   		return;
>   	}
>   
>   	while(sectors) {
>   		int s = sectors;
> -		int sl = r10_bio->read_slot;
> +		int sl = slot;
>   		int success = 0;
>   		int start;
>   
> @@ -2790,7 +2790,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			sl++;
>   			if (sl == conf->copies)
>   				sl = 0;
> -		} while (!success && sl != r10_bio->read_slot);
> +		} while (!success && sl != slot);
>   		rcu_read_unlock();
>   
>   		if (!success) {
> @@ -2798,16 +2798,16 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			 * as bad on the first device to discourage future
>   			 * reads.
>   			 */
> -			int dn = r10_bio->devs[r10_bio->read_slot].devnum;
> +			int dn = r10_bio->devs[slot].devnum;
>   			rdev = conf->mirrors[dn].rdev;
>   
>   			if (!rdev_set_badblocks(
>   				    rdev,
> -				    r10_bio->devs[r10_bio->read_slot].addr
> +				    r10_bio->devs[slot].addr
>   				    + sect,
>   				    s, 0)) {
>   				md_error(mddev, rdev);
> -				r10_bio->devs[r10_bio->read_slot].bio
> +				r10_bio->devs[slot].bio
>   					= IO_BLOCKED;

There is no need to split lines now.

Thanks,
Kuai
>   			}
>   			break;
> @@ -2816,7 +2816,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   		start = sl;
>   		/* write it back and re-read */
>   		rcu_read_lock();
> -		while (sl != r10_bio->read_slot) {
> +		while (sl != slot) {
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;
> @@ -2850,7 +2850,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			rcu_read_lock();
>   		}
>   		sl = start;
> -		while (sl != r10_bio->read_slot) {
> +		while (sl != slot) {
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] md: remove redundant check in fix_read_error()
  2023-06-23 17:32 ` [PATCH 2/3] md: remove redundant check in fix_read_error() linan666
@ 2023-06-26  2:31   ` Yu Kuai
  0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2023-06-26  2:31 UTC (permalink / raw)
  To: linan666, song
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

在 2023/06/24 1:32, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> In fix_read_error(), 'success' will be checked immediately after assigning
> it, if it is set to 1 then the loop will break. Checking it again in
> condition of loop is redundant. Clean it up.

LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> 
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid1.c  | 2 +-
>   drivers/md/raid10.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> :
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3570da63969b..0391c2d0c109 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2318,7 +2318,7 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
>   			d++;
>   			if (d == conf->raid_disks * 2)
>   				d = 0;
> -		} while (!success && d != read_disk);
> +		} while (d != read_disk);
>   
>   		if (!success) {
>   			/* Cannot read from anywhere - mark it bad */
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 94ae294c8a3c..a36e53fce21f 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2790,7 +2790,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			sl++;
>   			if (sl == conf->copies)
>   				sl = 0;
> -		} while (!success && sl != slot);
> +		} while (sl != slot);
>   		rcu_read_unlock();
>   
>   		if (!success) {
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error
  2023-06-23 17:32 ` [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error linan666
@ 2023-06-26  2:42   ` Yu Kuai
  0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2023-06-26  2:42 UTC (permalink / raw)
  To: linan666, song
  Cc: linux-raid, linux-kernel, linan122, yi.zhang, houtao1, yangerkun,
	yukuai (C)

Hi,

在 2023/06/24 1:32, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> In fix_read_error(), the handling of replacement devices is missing. If
> read replacement device errors, we will attempt to fix 'mirror->rdev'.
> It is wrong. Get rdev from r10bio to ensure that the fixed device is the
> one which read error occurred.
> 
I'll suggest not to fix read error for replacement, it's better to error
replacement directly instead of setting badblock, like what write does. 
Otherwise, there is risk of losing data after replacement replace rdev.

Thanks,
Kuai

> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/raid10.c | 32 +++++++++++++++++---------------
>   1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a36e53fce21f..4a7c8eaf6ea0 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2726,15 +2726,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   {
>   	int sect = 0; /* Offset from r10_bio->sector */
>   	int sectors = r10_bio->sectors, slot = r10_bio->read_slot;
> -	struct md_rdev *rdev;
> +	struct md_rdev *rdev = r10_bio->devs[slot].rdev;
>   	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
>   	int d = r10_bio->devs[slot].devnum;
>   
> -	/* still own a reference to this rdev, so it cannot
> -	 * have been cleared recently.
> -	 */
> -	rdev = conf->mirrors[d].rdev;
> -
>   	if (test_bit(Faulty, &rdev->flags))
>   		/* drive has already been failed, just ignore any
>   		   more fix_read_error() attempts */
> @@ -2763,12 +2758,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			s = PAGE_SIZE >> 9;
>   
>   		rcu_read_lock();
> +		rdev = r10_bio->devs[slot].rdev;
>   		do {
>   			sector_t first_bad;
>   			int bad_sectors;
>   
> -			d = r10_bio->devs[sl].devnum;
> -			rdev = rcu_dereference(conf->mirrors[d].rdev);
>   			if (rdev &&
>   			    test_bit(In_sync, &rdev->flags) &&
>   			    !test_bit(Faulty, &rdev->flags) &&
> @@ -2790,6 +2784,8 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			sl++;
>   			if (sl == conf->copies)
>   				sl = 0;
> +			d = r10_bio->devs[sl].devnum;
> +			rdev = rcu_dereference(conf->mirrors[d].rdev);
>   		} while (sl != slot);
>   		rcu_read_unlock();
>   
> @@ -2798,9 +2794,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			 * as bad on the first device to discourage future
>   			 * reads.
>   			 */
> -			int dn = r10_bio->devs[slot].devnum;
> -			rdev = conf->mirrors[dn].rdev;
> -
> +			rdev = r10_bio->devs[slot].rdev;
>   			if (!rdev_set_badblocks(
>   				    rdev,
>   				    r10_bio->devs[slot].addr
> @@ -2820,8 +2814,12 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;
> -			d = r10_bio->devs[sl].devnum;
> -			rdev = rcu_dereference(conf->mirrors[d].rdev);
> +			if (sl == slot) {
> +				rdev = r10_bio->devs[slot].rdev;
> +			} else {
> +				d = r10_bio->devs[sl].devnum;
> +				rdev = rcu_dereference(conf->mirrors[d].rdev);
> +			}
>   			if (!rdev ||
>   			    test_bit(Faulty, &rdev->flags) ||
>   			    !test_bit(In_sync, &rdev->flags))
> @@ -2854,8 +2852,12 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>   			if (sl==0)
>   				sl = conf->copies;
>   			sl--;
> -			d = r10_bio->devs[sl].devnum;
> -			rdev = rcu_dereference(conf->mirrors[d].rdev);
> +			if (sl == slot) {
> +				rdev = r10_bio->devs[slot].rdev;
> +			} else {
> +				d = r10_bio->devs[sl].devnum;
> +				rdev = rcu_dereference(conf->mirrors[d].rdev);
> +			}
>   			if (!rdev ||
>   			    test_bit(Faulty, &rdev->flags) ||
>   			    !test_bit(In_sync, &rdev->flags))
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] md/raid10: optimize and fix read error
  2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
                   ` (3 preceding siblings ...)
  2023-06-23 17:32 ` [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error linan666
@ 2023-07-07  8:16 ` Song Liu
  4 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2023-07-07  8:16 UTC (permalink / raw)
  To: linan666
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

On Fri, Jun 23, 2023 at 5:34 PM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> This patch series optimizes and fixes fix_read_error().
>
> Li Nan (3):
>   md/raid10: optimize fix_read_error
>   md: remove redundant check in fix_read_error()

Applied 1/3 and 2/3 to md-next. Thanks!

Song


>   md/raid10: handle replacement devices in fix_read_error
>
>  drivers/md/raid1.c  |  2 +-
>  drivers/md/raid10.c | 50 +++++++++++++++++++++++----------------------
>  2 files changed, 27 insertions(+), 25 deletions(-)
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-07  8:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 17:32 [PATCH 0/3] md/raid10: optimize and fix read error linan666
2023-06-23  9:52 ` Paul Menzel
2023-06-25  1:23   ` Li Nan
2023-06-23 17:32 ` [PATCH 1/3] md/raid10: optimize fix_read_error linan666
2023-06-23 10:03   ` Paul Menzel
2023-06-25  6:44     ` Li Nan
2023-06-26  2:28   ` Yu Kuai
2023-06-23 17:32 ` [PATCH 2/3] md: remove redundant check in fix_read_error() linan666
2023-06-26  2:31   ` Yu Kuai
2023-06-23 17:32 ` [PATCH 3/3] md/raid10: handle replacement devices in fix_read_error linan666
2023-06-26  2:42   ` Yu Kuai
2023-07-07  8:16 ` [PATCH 0/3] md/raid10: optimize and fix read error Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.