[PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs
@ 2023-05-15 13:48 linan666
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter linan666
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: linan666 @ 2023-05-15 13:48 UTC (permalink / raw)
  To: song, neilb, Rob.Becker
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

The patch series fix the bug of writing raid sysfs.

Changes in v2:
 - in patch 1, move check out of md_bitmap_checkpage().
 - in patch 2, use div64_u64() and DIV64_U64_ROUND_UP() instead of directly
   '/', and chang old_delay/old_delay to unsigned int.
 - in patch 4, use 'goto' to make changes more readable.

Changes in v2:
 - add patch "md/raid10: optimize check_decay_read_errors()".
 - in patch 2, return ret-value of strict_strtoul_scaled if error occurs.
 - in patch 3, optimize format.

Li Nan (4):
  md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter
  md/raid10: fix overflow in safe_delay_store
  md/raid10: fix wrong setting of max_corr_read_errors
  md/raid10: optimize check_decay_read_errors()

 drivers/md/md-bitmap.c | 17 ++++-----
 drivers/md/md.c        | 78 ++++++++++++++++++++++++++----------------
 drivers/md/raid10.c    | 41 +++++++++++++---------
 3 files changed, 82 insertions(+), 54 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter
  2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
@ 2023-05-15 13:48 ` linan666
  2023-05-19 21:20   ` Song Liu
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store linan666
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2023-05-15 13:48 UTC (permalink / raw)
  To: song, neilb, Rob.Becker
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage()
will return -EINVAL because 'page >= bitmap->pages', but the return value
was not checked immediately in md_bitmap_get_counter() in order to set
*blocks value and slab-out-of-bounds occurs.

Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and
return directly if true.

Fixes: ef4256733506 ("md/bitmap: optimise scanning of empty bitmaps.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-bitmap.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 920bb68156d2..e122b19c124d 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -46,6 +46,7 @@ static inline char *bmname(struct bitmap *bitmap)
  *
  * if we find our page, we increment the page's refcount so that it stays
  * allocated while we're using it
+ * the caller must make sure 'page < bimap->pages'
  */
 static int md_bitmap_checkpage(struct bitmap_counts *bitmap,
 			       unsigned long page, int create, int no_hijack)
@@ -54,14 +55,6 @@ __acquires(bitmap->lock)
 {
 	unsigned char *mappage;
 
-	if (page >= bitmap->pages) {
-		/* This can happen if bitmap_start_sync goes beyond
-		 * End-of-device while looking for a whole page.
-		 * It is harmless.
-		 */
-		return -EINVAL;
-	}
-
 	if (bitmap->bp[page].hijacked) /* it's hijacked, don't try to alloc */
 		return 0;
 
@@ -1387,6 +1380,14 @@ __acquires(bitmap->lock)
 	sector_t csize;
 	int err;
 
+	if (page >= bitmap->pages) {
+		/*
+		 * This can happen if bitmap_start_sync goes beyond
+		 * End-of-device while looking for a whole page or
+		 * user set a huge number to sysfs bitmap_set_bits.
+		 */
+		return NULL;
+	}
 	err = md_bitmap_checkpage(bitmap, page, create, 0);
 
 	if (bitmap->bp[page].hijacked ||
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store
  2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter linan666
@ 2023-05-15 13:48 ` linan666
  2023-05-19 22:01   ` Song Liu
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors linan666
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2023-05-15 13:48 UTC (permalink / raw)
  To: song, neilb, Rob.Becker
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

There is no input check when echo md/safe_mode_delay and overflow will
occur. There is risk of overflow in strict_strtoul_scaled(), too. Fix it
by using kstrtoul instead of parsing word one by one.

Fixes: 72e02075a33f ("md: factor out parsing of fixed-point numbers")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 76 +++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 28 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b3444..5bba071ea907 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3767,56 +3767,76 @@ static int analyze_sbs(struct mddev *mddev)
  */
 int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale)
 {
-	unsigned long result = 0;
-	long decimals = -1;
-	while (isdigit(*cp) || (*cp == '.' && decimals < 0)) {
-		if (*cp == '.')
-			decimals = 0;
-		else if (decimals < scale) {
-			unsigned int value;
-			value = *cp - '0';
-			result = result * 10 + value;
-			if (decimals >= 0)
-				decimals++;
-		}
-		cp++;
-	}
-	if (*cp == '\n')
-		cp++;
-	if (*cp)
+	unsigned long result = 0, decimals = 0;
+	char *pos, *str;
+	int rv;
+
+	str = kmemdup_nul(cp, strlen(cp), GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+	pos = strchr(str, '.');
+	if (pos) {
+		int cnt = scale;
+
+		*pos = '\0';
+		while (isdigit(*(++pos))) {
+			if (cnt) {
+				decimals = decimals * 10 + *pos - '0';
+				cnt--;
+			}
+		}
+		if (*pos == '\n')
+			pos++;
+		if (*pos) {
+			kfree(str);
+			return -EINVAL;
+		}
+		decimals *= int_pow(10, cnt);
+	}
+
+	rv = kstrtoul(str, 10, &result);
+	kfree(str);
+	if (rv)
+		return rv;
+
+	if (result > div64_u64(ULONG_MAX - decimals, int_pow(10, scale)))
 		return -EINVAL;
-	if (decimals < 0)
-		decimals = 0;
-	*res = result * int_pow(10, scale - decimals);
-	return 0;
+	*res = result * int_pow(10, scale) + decimals;
+
+	return rv;
 }
 
 static ssize_t
 safe_delay_show(struct mddev *mddev, char *page)
 {
-	int msec = (mddev->safemode_delay*1000)/HZ;
-	return sprintf(page, "%d.%03d\n", msec/1000, msec%1000);
+	unsigned int msec = ((unsigned long)mddev->safemode_delay*1000)/HZ;
+
+	return sprintf(page, "%u.%03u\n", msec/1000, msec%1000);
 }
 static ssize_t
 safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len)
 {
 	unsigned long msec;
+	int ret;
 
 	if (mddev_is_clustered(mddev)) {
 		pr_warn("md: Safemode is disabled for clustered mode\n");
 		return -EINVAL;
 	}
 
-	if (strict_strtoul_scaled(cbuf, &msec, 3) < 0)
+	ret = strict_strtoul_scaled(cbuf, &msec, 3);
+	if (ret < 0)
+		return ret;
+	if (msec > UINT_MAX)
 		return -EINVAL;
+
 	if (msec == 0)
 		mddev->safemode_delay = 0;
 	else {
-		unsigned long old_delay = mddev->safemode_delay;
-		unsigned long new_delay = (msec*HZ)/1000;
+		unsigned int old_delay = mddev->safemode_delay;
+		/* HZ <= 1000, so new_delay < UINT_MAX, too */
+		unsigned int new_delay = DIV64_U64_ROUND_UP(msec * HZ, 1000);
 
-		if (new_delay == 0)
-			new_delay = 1;
 		mddev->safemode_delay = new_delay;
 		if (new_delay < old_delay || old_delay == 0)
 			mod_timer(&mddev->safemode_timer, jiffies+1);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors
  2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter linan666
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store linan666
@ 2023-05-15 13:48 ` linan666
  2023-05-19 22:06   ` Song Liu
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 4/4] md/raid10: optimize check_decay_read_errors() linan666
  2023-05-19 22:07 ` [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs Song Liu
  4 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2023-05-15 13:48 UTC (permalink / raw)
  To: song, neilb, Rob.Becker
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

max_corr_read_errors should not be negative number. Change it to
unsigned int where use it.

Fixes: 1e50915fe0bb ("raid: improve MD/raid10 handling of correctable read errors.")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c     | 2 +-
 drivers/md/raid10.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5bba071ea907..b69ddfb1662a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4484,7 +4484,7 @@ __ATTR_PREALLOC(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_stor
 
 static ssize_t
 max_corrected_read_errors_show(struct mddev *mddev, char *page) {
-	return sprintf(page, "%d\n",
+	return sprintf(page, "%u\n",
 		       atomic_read(&mddev->max_corr_read_errors));
 }
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4fcfcb350d2b..4d615fcc6a50 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2727,7 +2727,8 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 	int sect = 0; /* Offset from r10_bio->sector */
 	int sectors = r10_bio->sectors;
 	struct md_rdev *rdev;
-	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
+	unsigned int max_read_errors =
+			atomic_read(&mddev->max_corr_read_errors);
 	int d = r10_bio->devs[r10_bio->read_slot].devnum;
 
 	/* still own a reference to this rdev, so it cannot
@@ -2743,7 +2744,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 	check_decay_read_errors(mddev, rdev);
 	atomic_inc(&rdev->read_errors);
 	if (atomic_read(&rdev->read_errors) > max_read_errors) {
-		pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %d:max %d]\n",
+		pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %u:max %u]\n",
 			  mdname(mddev), rdev->bdev,
 			  atomic_read(&rdev->read_errors), max_read_errors);
 		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH OLK-5.10 v3 4/4] md/raid10: optimize check_decay_read_errors()
  2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
                   ` (2 preceding siblings ...)
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors linan666
@ 2023-05-15 13:48 ` linan666
  2023-05-19 22:07 ` [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs Song Liu
  4 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2023-05-15 13:48 UTC (permalink / raw)
  To: song, neilb, Rob.Becker
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

check_decay_read_errors() is used to handle rdev->read_errors. But
read_errors is inc and read after check_decay_read_errors() is invoked
in fix_read_error().

Put all operations of read_errors into check_decay_read_errors() and
clean up unnecessary atomic_read of read_errors.

Suggested-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 42 ++++++++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4d615fcc6a50..83b84116b686 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2655,23 +2655,24 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 }
 
 /*
- * Used by fix_read_error() to decay the per rdev read_errors.
+ * Used by fix_read_error() to decay the per rdev read_errors and check if
+ * read_error > max_read_errors.
  * We halve the read error count for every hour that has elapsed
  * since the last recorded read error.
  *
  */
-static void check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev)
+static bool check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev)
 {
-	long cur_time_mon;
+	time64_t cur_time_mon = ktime_get_seconds();
 	unsigned long hours_since_last;
-	unsigned int read_errors = atomic_read(&rdev->read_errors);
-
-	cur_time_mon = ktime_get_seconds();
+	unsigned int read_errors;
+	unsigned int max_read_errors =
+			atomic_read(&mddev->max_corr_read_errors);
 
 	if (rdev->last_read_error == 0) {
 		/* first time we've seen a read error */
 		rdev->last_read_error = cur_time_mon;
-		return;
+		goto increase;
 	}
 
 	hours_since_last = (long)(cur_time_mon -
@@ -2684,10 +2685,25 @@ static void check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev)
 	 * just set read errors to 0. We do this to avoid
 	 * overflowing the shift of read_errors by hours_since_last.
 	 */
+	read_errors = atomic_read(&rdev->read_errors);
 	if (hours_since_last >= 8 * sizeof(read_errors))
 		atomic_set(&rdev->read_errors, 0);
 	else
 		atomic_set(&rdev->read_errors, read_errors >> hours_since_last);
+
+increase:
+	read_errors = atomic_inc_return(&rdev->read_errors);
+	if (read_errors > max_read_errors) {
+		pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %u:max %u]\n",
+			  mdname(mddev), rdev->bdev,
+			  read_errors, max_read_errors);
+		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
+			  mdname(mddev), rdev->bdev);
+		md_error(mddev, rdev);
+		return false;
+	}
+
+	return true;
 }
 
 static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
@@ -2727,8 +2743,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 	int sect = 0; /* Offset from r10_bio->sector */
 	int sectors = r10_bio->sectors;
 	struct md_rdev *rdev;
-	unsigned int max_read_errors =
-			atomic_read(&mddev->max_corr_read_errors);
 	int d = r10_bio->devs[r10_bio->read_slot].devnum;
 
 	/* still own a reference to this rdev, so it cannot
@@ -2741,15 +2755,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 		   more fix_read_error() attempts */
 		return;
 
-	check_decay_read_errors(mddev, rdev);
-	atomic_inc(&rdev->read_errors);
-	if (atomic_read(&rdev->read_errors) > max_read_errors) {
-		pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %u:max %u]\n",
-			  mdname(mddev), rdev->bdev,
-			  atomic_read(&rdev->read_errors), max_read_errors);
-		pr_notice("md/raid10:%s: %pg: Failing raid device\n",
-			  mdname(mddev), rdev->bdev);
-		md_error(mddev, rdev);
+	if (!check_decay_read_errors(mddev, rdev)) {
 		r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED;
 		return;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter linan666
@ 2023-05-19 21:20   ` Song Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Song Liu @ 2023-05-19 21:20 UTC (permalink / raw)
  To: linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, linan122, yukuai3,
	yi.zhang, houtao1, yangerkun

On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage()
> will return -EINVAL because 'page >= bitmap->pages', but the return value
> was not checked immediately in md_bitmap_get_counter() in order to set
> *blocks value and slab-out-of-bounds occurs.
>
> Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and
> return directly if true.
>
> Fixes: ef4256733506 ("md/bitmap: optimise scanning of empty bitmaps.")
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md-bitmap.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index 920bb68156d2..e122b19c124d 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -46,6 +46,7 @@ static inline char *bmname(struct bitmap *bitmap)
>   *
>   * if we find our page, we increment the page's refcount so that it stays
>   * allocated while we're using it
> + * the caller must make sure 'page < bimap->pages'
>   */

I removed this comment, and added WARN_ON_ONCE().

Thanks,
Song

>  static int md_bitmap_checkpage(struct bitmap_counts *bitmap,
>                                unsigned long page, int create, int no_hijack)
> @@ -54,14 +55,6 @@ __acquires(bitmap->lock)
>  {
>         unsigned char *mappage;
>
> -       if (page >= bitmap->pages) {
> -               /* This can happen if bitmap_start_sync goes beyond
> -                * End-of-device while looking for a whole page.
> -                * It is harmless.
> -                */
> -               return -EINVAL;
> -       }
> -
>         if (bitmap->bp[page].hijacked) /* it's hijacked, don't try to alloc */
>                 return 0;
>
> @@ -1387,6 +1380,14 @@ __acquires(bitmap->lock)
>         sector_t csize;
>         int err;
>
> +       if (page >= bitmap->pages) {
> +               /*
> +                * This can happen if bitmap_start_sync goes beyond
> +                * End-of-device while looking for a whole page or
> +                * user set a huge number to sysfs bitmap_set_bits.
> +                */
> +               return NULL;
> +       }
>         err = md_bitmap_checkpage(bitmap, page, create, 0);
>
>         if (bitmap->bp[page].hijacked ||
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store linan666
@ 2023-05-19 22:01   ` Song Liu
  2023-05-20  0:43     ` Li Nan
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2023-05-19 22:01 UTC (permalink / raw)
  To: linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, linan122, yukuai3,
	yi.zhang, houtao1, yangerkun

On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> There is no input check when echo md/safe_mode_delay and overflow will
> occur. There is risk of overflow in strict_strtoul_scaled(), too. Fix it
> by using kstrtoul instead of parsing word one by one.
>
> Fixes: 72e02075a33f ("md: factor out parsing of fixed-point numbers")
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 76 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 48 insertions(+), 28 deletions(-)

This patch adds more complexity, which I don't really think is necessary.
Can we just check for overflow in safe_delay_store()?

Thanks,
Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors linan666
@ 2023-05-19 22:06   ` Song Liu
  2023-05-20  0:46     ` Li Nan
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2023-05-19 22:06 UTC (permalink / raw)
  To: linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, linan122, yukuai3,
	yi.zhang, houtao1, yangerkun

On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> max_corr_read_errors should not be negative number. Change it to
> unsigned int where use it.
>
> Fixes: 1e50915fe0bb ("raid: improve MD/raid10 handling of correctable read errors.")
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c     | 2 +-
>  drivers/md/raid10.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5bba071ea907..b69ddfb1662a 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4484,7 +4484,7 @@ __ATTR_PREALLOC(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_stor
>
>  static ssize_t
>  max_corrected_read_errors_show(struct mddev *mddev, char *page) {
> -       return sprintf(page, "%d\n",
> +       return sprintf(page, "%u\n",
>                        atomic_read(&mddev->max_corr_read_errors));
>  }

max_corr_read_errors is atomic_t, so a signed integer. So these
signed => unsigned changes are pretty error prone. Can we just
add check in max_corrected_read_errors_store() so we never store
a negative value?

Thanks,
Song

>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4fcfcb350d2b..4d615fcc6a50 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2727,7 +2727,8 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>         int sect = 0; /* Offset from r10_bio->sector */
>         int sectors = r10_bio->sectors;
>         struct md_rdev *rdev;
> -       int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
> +       unsigned int max_read_errors =
> +                       atomic_read(&mddev->max_corr_read_errors);
>         int d = r10_bio->devs[r10_bio->read_slot].devnum;
>
>         /* still own a reference to this rdev, so it cannot
> @@ -2743,7 +2744,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>         check_decay_read_errors(mddev, rdev);
>         atomic_inc(&rdev->read_errors);
>         if (atomic_read(&rdev->read_errors) > max_read_errors) {
> -               pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %d:max %d]\n",
> +               pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %u:max %u]\n",
>                           mdname(mddev), rdev->bdev,
>                           atomic_read(&rdev->read_errors), max_read_errors);
>                 pr_notice("md/raid10:%s: %pg: Failing raid device\n",
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs
  2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
                   ` (3 preceding siblings ...)
  2023-05-15 13:48 ` [PATCH OLK-5.10 v3 4/4] md/raid10: optimize check_decay_read_errors() linan666
@ 2023-05-19 22:07 ` Song Liu
  2023-05-19 22:08   ` Song Liu
  4 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2023-05-19 22:07 UTC (permalink / raw)
  To: linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, linan122, yukuai3,
	yi.zhang, houtao1, yangerkun

On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> The patch series fix the bug of writing raid sysfs.
>
> Changes in v2:
>  - in patch 1, move check out of md_bitmap_checkpage().
>  - in patch 2, use div64_u64() and DIV64_U64_ROUND_UP() instead of directly
>    '/', and chang old_delay/old_delay to unsigned int.
>  - in patch 4, use 'goto' to make changes more readable.
>
> Changes in v2:
>  - add patch "md/raid10: optimize check_decay_read_errors()".
>  - in patch 2, return ret-value of strict_strtoul_scaled if error occurs.
>  - in patch 3, optimize format.
>
> Li Nan (4):
>   md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter
>   md/raid10: fix overflow in safe_delay_store
>   md/raid10: fix wrong setting of max_corr_read_errors
>   md/raid10: optimize check_decay_read_errors()

I applied 1/4 to md-next.

Thanks,
Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs
  2023-05-19 22:07 ` [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs Song Liu
@ 2023-05-19 22:08   ` Song Liu
  2023-05-20  0:51     ` Li Nan
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2023-05-19 22:08 UTC (permalink / raw)
  To: linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, linan122, yukuai3,
	yi.zhang, houtao1, yangerkun

Btw, what does "OLK-5.10" mean?

Song


On Fri, May 19, 2023 at 3:07 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
> >
> > From: Li Nan <linan122@huawei.com>
> >
> > The patch series fix the bug of writing raid sysfs.
> >
> > Changes in v2:
> >  - in patch 1, move check out of md_bitmap_checkpage().
> >  - in patch 2, use div64_u64() and DIV64_U64_ROUND_UP() instead of directly
> >    '/', and chang old_delay/old_delay to unsigned int.
> >  - in patch 4, use 'goto' to make changes more readable.
> >
> > Changes in v2:
> >  - add patch "md/raid10: optimize check_decay_read_errors()".
> >  - in patch 2, return ret-value of strict_strtoul_scaled if error occurs.
> >  - in patch 3, optimize format.
> >
> > Li Nan (4):
> >   md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter
> >   md/raid10: fix overflow in safe_delay_store
> >   md/raid10: fix wrong setting of max_corr_read_errors
> >   md/raid10: optimize check_decay_read_errors()
>
> I applied 1/4 to md-next.
>
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store
  2023-05-19 22:01   ` Song Liu
@ 2023-05-20  0:43     ` Li Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Li Nan @ 2023-05-20  0:43 UTC (permalink / raw)
  To: Song Liu, linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, yukuai3, yi.zhang,
	houtao1, yangerkun



在 2023/5/20 6:01, Song Liu 写道:
> On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> There is no input check when echo md/safe_mode_delay and overflow will
>> occur. There is risk of overflow in strict_strtoul_scaled(), too. Fix it
>> by using kstrtoul instead of parsing word one by one.
>>
>> Fixes: 72e02075a33f ("md: factor out parsing of fixed-point numbers")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 76 +++++++++++++++++++++++++++++++------------------
>>   1 file changed, 48 insertions(+), 28 deletions(-)
> 
> This patch adds more complexity, which I don't really think is necessary.
> Can we just check for overflow in safe_delay_store()?

Yes, checking overflow is more convenient, I will check it in v4.

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors
  2023-05-19 22:06   ` Song Liu
@ 2023-05-20  0:46     ` Li Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Li Nan @ 2023-05-20  0:46 UTC (permalink / raw)
  To: Song Liu, linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, yukuai3, yi.zhang,
	houtao1, yangerkun



在 2023/5/20 6:06, Song Liu 写道:
> On Mon, May 15, 2023 at 6:49 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> max_corr_read_errors should not be negative number. Change it to
>> unsigned int where use it.
>>
>> Fixes: 1e50915fe0bb ("raid: improve MD/raid10 handling of correctable read errors.")
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> Reviewed-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c     | 2 +-
>>   drivers/md/raid10.c | 5 +++--
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5bba071ea907..b69ddfb1662a 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4484,7 +4484,7 @@ __ATTR_PREALLOC(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_stor
>>
>>   static ssize_t
>>   max_corrected_read_errors_show(struct mddev *mddev, char *page) {
>> -       return sprintf(page, "%d\n",
>> +       return sprintf(page, "%u\n",
>>                         atomic_read(&mddev->max_corr_read_errors));
>>   }
> 
> max_corr_read_errors is atomic_t, so a signed integer. So these
> signed => unsigned changes are pretty error prone. Can we just
> add check in max_corrected_read_errors_store() so we never store
> a negative value?
> 
> Thanks,
> Song
> 

I will check input in v4

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs
  2023-05-19 22:08   ` Song Liu
@ 2023-05-20  0:51     ` Li Nan
  0 siblings, 0 replies; 13+ messages in thread
From: Li Nan @ 2023-05-20  0:51 UTC (permalink / raw)
  To: Song Liu, linan666
  Cc: neilb, Rob.Becker, linux-raid, linux-kernel, yukuai3, yi.zhang,
	houtao1, yangerkun



在 2023/5/20 6:08, Song Liu 写道:
> Btw, what does "OLK-5.10" mean?
> 
> Song
> 

Sry, it is a slip of the pen.

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-05-20  0:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 13:48 [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs linan666
2023-05-15 13:48 ` [PATCH OLK-5.10 v3 1/4] md/raid10: fix slab-out-of-bounds in md_bitmap_get_counter linan666
2023-05-19 21:20   ` Song Liu
2023-05-15 13:48 ` [PATCH OLK-5.10 v3 2/4] md/raid10: fix overflow in safe_delay_store linan666
2023-05-19 22:01   ` Song Liu
2023-05-20  0:43     ` Li Nan
2023-05-15 13:48 ` [PATCH OLK-5.10 v3 3/4] md/raid10: fix wrong setting of max_corr_read_errors linan666
2023-05-19 22:06   ` Song Liu
2023-05-20  0:46     ` Li Nan
2023-05-15 13:48 ` [PATCH OLK-5.10 v3 4/4] md/raid10: optimize check_decay_read_errors() linan666
2023-05-19 22:07 ` [PATCH OLK-5.10 v3 0/4] md: bugfix of writing raid sysfs Song Liu
2023-05-19 22:08   ` Song Liu
2023-05-20  0:51     ` Li Nan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).