All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Raid5 Bug Fixes
@ 2022-07-07 19:15 Logan Gunthorpe
  2022-07-07 19:15 ` [PATCH 1/2] md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request() Logan Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Logan Gunthorpe @ 2022-07-07 19:15 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Guoqing Jiang, David Sloan, Logan Gunthorpe

Hey,

Please find two patches with fixes to the raid5 code.

The first patch fixes a bug in my recent commit that causes data
corruption in very specific circumstances.

The second patch fixes a theoretical issue with nested waits in
code that was recently cleaned up in the previous series (though the
issue always existed).

Thanks,

Logan

--

Logan Gunthorpe (2):
  md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request()
  md/raid5: Convert prepare_to_wait() to wait_woken() api

 drivers/md/raid5.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)


base-commit: ff4ec5f79108cf82fe7168547c76fe754c4ade0a
--
2.30.2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request()
  2022-07-07 19:15 [PATCH 0/2] Raid5 Bug Fixes Logan Gunthorpe
@ 2022-07-07 19:15 ` Logan Gunthorpe
  2022-07-07 19:15 ` [PATCH 2/2] md/raid5: Convert prepare_to_wait() to wait_woken() api Logan Gunthorpe
  2022-07-08  5:45 ` [PATCH 0/2] Raid5 Bug Fixes Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Logan Gunthorpe @ 2022-07-07 19:15 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Guoqing Jiang, David Sloan, Logan Gunthorpe

For unaligned IO that have nearly maximum sectors, the number of stripes
will end up being one greater than the size of the bitmap. When this
happens, the last stripe in the IO will not be processed as it should
be, resulting in data corruption.

However, this is not normally seen when the backing block devices have
4K physical block sizes seeing the block layer will split the request
before that happens.

To fix this increase the bitmap size by one bit and ensure the full
number of stripes are checked when calling find_first_bit().

Reported-by: David Sloan <David.Sloan@eideticom.com>
Fixes: a5b9c6a653fb ("md/raid5: Pivot raid5_make_request()")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 184145b49b7c..e37ed93d130f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5872,8 +5872,11 @@ struct stripe_request_ctx {
 	/* last sector in the request */
 	sector_t last_sector;
 
-	/* bitmap to track stripe sectors that have been added to stripes */
-	DECLARE_BITMAP(sectors_to_do, RAID5_MAX_REQ_STRIPES);
+	/*
+	 * bitmap to track stripe sectors that have been added to stripes
+	 * add one to account for unaligned requests
+	 */
+	DECLARE_BITMAP(sectors_to_do, RAID5_MAX_REQ_STRIPES + 1);
 
 	/* the request had REQ_PREFLUSH, cleared after the first stripe_head */
 	bool do_flush;
@@ -6046,7 +6049,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	const int rw = bio_data_dir(bi);
 	enum stripe_result res;
 	DEFINE_WAIT(w);
-	int s;
+	int s, stripe_cnt;
 
 	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
 		int ret = log_handle_flush_request(conf, bi);
@@ -6090,9 +6093,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	ctx.last_sector = bio_end_sector(bi);
 	bi->bi_next = NULL;
 
-	bitmap_set(ctx.sectors_to_do, 0,
-		   DIV_ROUND_UP_SECTOR_T(ctx.last_sector - logical_sector,
-					 RAID5_STRIPE_SECTORS(conf)));
+	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx.last_sector - logical_sector,
+					   RAID5_STRIPE_SECTORS(conf));
+	bitmap_set(ctx.sectors_to_do, 0, stripe_cnt);
 
 	pr_debug("raid456: %s, logical %llu to %llu\n", __func__,
 		 bi->bi_iter.bi_sector, ctx.last_sector);
@@ -6137,8 +6140,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			continue;
 		}
 
-		s = find_first_bit(ctx.sectors_to_do, RAID5_MAX_REQ_STRIPES);
-		if (s == RAID5_MAX_REQ_STRIPES)
+		s = find_first_bit(ctx.sectors_to_do, stripe_cnt);
+		if (s == stripe_cnt)
 			break;
 
 		logical_sector = ctx.first_sector +
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] md/raid5: Convert prepare_to_wait() to wait_woken() api
  2022-07-07 19:15 [PATCH 0/2] Raid5 Bug Fixes Logan Gunthorpe
  2022-07-07 19:15 ` [PATCH 1/2] md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request() Logan Gunthorpe
@ 2022-07-07 19:15 ` Logan Gunthorpe
  2022-07-08  5:45 ` [PATCH 0/2] Raid5 Bug Fixes Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Logan Gunthorpe @ 2022-07-07 19:15 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Guoqing Jiang, David Sloan, Logan Gunthorpe

raid5_get_active_stripe() can sleep in various situations and it
is called by make_stripe_request() while inside the
prepare_to_wait()/finish_wait() section. Nested waits like this are
not supported.

This was noticed while making other changes that add different sleeps
to raid5_get_active_stripe() that caused a WARNING with and
CONFIG_DEBUG_ATOMIC_SLEEP.

No ill effects have been noticed with the code as is, but theoretically
a nested and here could cause a dead lock so it should be fixed.

To fix this, convert the prepare_to_wait() call to use wake_woken()
which supports nested sleeps.

Link: https://lwn.net/Articles/628628/
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e37ed93d130f..88c22a5cc09a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6043,12 +6043,12 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 
 static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 {
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct r5conf *conf = mddev->private;
 	sector_t logical_sector;
 	struct stripe_request_ctx ctx = {};
 	const int rw = bio_data_dir(bi);
 	enum stripe_result res;
-	DEFINE_WAIT(w);
 	int s, stripe_cnt;
 
 	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
@@ -6111,7 +6111,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		return true;
 	}
 	md_account_bio(mddev, &bi);
-	prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+
+	add_wait_queue(&conf->wait_for_overlap, &wait);
 	while (1) {
 		res = make_stripe_request(mddev, conf, &ctx, logical_sector,
 					  bi);
@@ -6134,9 +6135,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 				ctx.batch_last = NULL;
 			}
 
-			schedule();
-			prepare_to_wait(&conf->wait_for_overlap, &w,
-					TASK_UNINTERRUPTIBLE);
+			wait_woken(&wait, TASK_UNINTERRUPTIBLE,
+				   MAX_SCHEDULE_TIMEOUT);
 			continue;
 		}
 
@@ -6147,8 +6147,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		logical_sector = ctx.first_sector +
 			(s << RAID5_STRIPE_SHIFT(conf));
 	}
-
-	finish_wait(&conf->wait_for_overlap, &w);
+	remove_wait_queue(&conf->wait_for_overlap, &wait);
 
 	if (ctx.batch_last)
 		raid5_release_stripe(ctx.batch_last);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] Raid5 Bug Fixes
  2022-07-07 19:15 [PATCH 0/2] Raid5 Bug Fixes Logan Gunthorpe
  2022-07-07 19:15 ` [PATCH 1/2] md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request() Logan Gunthorpe
  2022-07-07 19:15 ` [PATCH 2/2] md/raid5: Convert prepare_to_wait() to wait_woken() api Logan Gunthorpe
@ 2022-07-08  5:45 ` Song Liu
  2 siblings, 0 replies; 4+ messages in thread
From: Song Liu @ 2022-07-08  5:45 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: open list, linux-raid, Guoqing Jiang, David Sloan

On Thu, Jul 7, 2022 at 12:15 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hey,
>
> Please find two patches with fixes to the raid5 code.
>
> The first patch fixes a bug in my recent commit that causes data
> corruption in very specific circumstances.
>
> The second patch fixes a theoretical issue with nested waits in
> code that was recently cleaned up in the previous series (though the
> issue always existed).
>
> Thanks,

Applied to md-next after fixing a couple typos.

Thanks!
Song

>
> Logan
>
> --
>
> Logan Gunthorpe (2):
>   md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request()
>   md/raid5: Convert prepare_to_wait() to wait_woken() api
>
>  drivers/md/raid5.c | 32 +++++++++++++++++---------------
>  1 file changed, 17 insertions(+), 15 deletions(-)
>
>
> base-commit: ff4ec5f79108cf82fe7168547c76fe754c4ade0a
> --
> 2.30.2

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-08  5:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 19:15 [PATCH 0/2] Raid5 Bug Fixes Logan Gunthorpe
2022-07-07 19:15 ` [PATCH 1/2] md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request() Logan Gunthorpe
2022-07-07 19:15 ` [PATCH 2/2] md/raid5: Convert prepare_to_wait() to wait_woken() api Logan Gunthorpe
2022-07-08  5:45 ` [PATCH 0/2] Raid5 Bug Fixes Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.