All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Misc changes for md
@ 2021-10-04 15:34 Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Hello,

The first patch fixes the same calltrace as commit 6607cd319b6b ("raid1:
ensure write behind bio has less than BIO_MAX_VECS sectors") tried 
before, but unfortunately the calltrace still could happen if array
without write mostly device is configured with write-behind enabled.
So the first patch is suitable for fix branch which others are materials
for next branch.

Pls review.

Thanks,
Guoqing

Guoqing Jiang (6):
  md/raid1: only allocate write behind bio for WriteMostly device
  md/bitmap: don't set max_write_behind if there is no write mostly
    device
  md/raid1: use rdev in raid1_write_request directly
  md/raid10: add 'read_err' to raid10_read_request
  md/raid5: call roundup_pow_of_two in raid5_run
  md: remove unused argument from md_new_event

 drivers/md/md-bitmap.c | 17 +++++++++++++++++
 drivers/md/md.c        | 30 +++++++++++++++---------------
 drivers/md/md.h        |  2 +-
 drivers/md/raid1.c     | 13 ++++++-------
 drivers/md/raid10.c    | 10 +++++-----
 drivers/md/raid5.c     |  7 ++-----
 6 files changed, 46 insertions(+), 33 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-05  1:05   ` Guoqing Jiang
  2021-10-05  5:55   ` Jens Stutte
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
size of behind bio is not bigger than BIO_MAX_VECS sectors.

Unfortunately the same calltrace still could happen since an array could
enable write-behind without write mostly device.

To match the manpage of mdadm (which says "write-behind is only attempted
on drives marked as write-mostly"), we need to check WriteMostly flag to
avoid such unexpected behavior.

[1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25

Cc: stable@vger.kernel.org # v5.12+
Cc: Jens Stutte <jens@chianterastutte.eu>
Reported-by: Jens Stutte <jens@chianterastutte.eu>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 19598bd38939..6ba12f0f0f03 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		if (!r1_bio->bios[i])
 			continue;
 
-		if (first_clone) {
+		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
 			/* do behind I/O ?
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:25   ` Song Liu
  2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

We shouldn't set it since write behind IO should only happen to write
mostly device.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/md-bitmap.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e29c6298ef5c..0346281b1555 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 {
 	unsigned long backlog;
 	unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
+	struct md_rdev *rdev;
+	bool has_write_mostly = false;
 	int rv = kstrtoul(buf, 10, &backlog);
 	if (rv)
 		return rv;
 	if (backlog > COUNTER_MAX)
 		return -EINVAL;
+
+	/*
+	 * Without write mostly device, it doesn't make sense to set
+	 * backlog for max_write_behind.
+	 */
+	rdev_for_each(rdev, mddev)
+		if (test_bit(WriteMostly, &rdev->flags)) {
+			has_write_mostly = true;
+			break;
+		}
+	if (!has_write_mostly) {
+		pr_warn_ratelimited("md: No write mostly device available\n");
+		return -EINVAL;
+	}
+
 	mddev->bitmap_info.max_write_behind = backlog;
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

We already get rdev from conf->mirrors[i].rdev at the beginning of the
loop, so just use it.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid1.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6ba12f0f0f03..7dc8026cf6ee 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1529,13 +1529,12 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 
 		r1_bio->bios[i] = mbio;
 
-		mbio->bi_iter.bi_sector	= (r1_bio->sector +
-				   conf->mirrors[i].rdev->data_offset);
-		bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
+		mbio->bi_iter.bi_sector	= (r1_bio->sector + rdev->data_offset);
+		bio_set_dev(mbio, rdev->bdev);
 		mbio->bi_end_io	= raid1_end_write_request;
 		mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
-		if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
-		    !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
+		if (test_bit(FailFast, &rdev->flags) &&
+		    !test_bit(WriteMostly, &rdev->flags) &&
 		    conf->raid_disks - mddev->degraded > 1)
 			mbio->bi_opf |= MD_FAILFAST;
 		mbio->bi_private = r1_bio;
@@ -1546,7 +1545,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 			trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
 					      r1_bio->sector);
 		/* flush_pending_writes() needs access to the rdev so...*/
-		mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
+		mbio->bi_bdev = (void *)rdev;
 
 		cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
 		if (cb)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (2 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:20   ` Song Liu
  2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Add the paramenter since the err retry logic is only meaningful when
the caller is handle_read_error.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid10.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index aa2636582841..29eb538db953 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
 }
 
 static void raid10_read_request(struct mddev *mddev, struct bio *bio,
-				struct r10bio *r10_bio)
+				struct r10bio *r10_bio, bool read_err)
 {
 	struct r10conf *conf = mddev->private;
 	struct bio *read_bio;
@@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	struct md_rdev *err_rdev = NULL;
 	gfp_t gfp = GFP_NOIO;
 
-	if (slot >= 0 && r10_bio->devs[slot].rdev) {
+	if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
 		/*
 		 * This is an error retry, but we cannot
 		 * safely dereference the rdev in the r10_bio,
@@ -1519,7 +1519,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 			conf->geo.raid_disks);
 
 	if (bio_data_dir(bio) == READ)
-		raid10_read_request(mddev, bio, r10_bio);
+		raid10_read_request(mddev, bio, r10_bio, false);
 	else
 		raid10_write_request(mddev, bio, r10_bio);
 }
@@ -2918,7 +2918,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	rdev_dec_pending(rdev, mddev);
 	allow_barrier(conf);
 	r10_bio->state = 0;
-	raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
+	raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
 }
 
 static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (3 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
  2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Let's call roundup_pow_of_two here instead of open code.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid5.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 02ed53b20654..4ea9e7b647b0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7732,10 +7732,7 @@ static int raid5_run(struct mddev *mddev)
 		 * discard data disk but write parity disk
 		 */
 		stripe = stripe * PAGE_SIZE;
-		/* Round up to power of 2, as discard handling
-		 * currently assumes that */
-		while ((stripe-1) & stripe)
-			stripe = (stripe | (stripe-1)) + 1;
+		stripe = roundup_pow_of_two(stripe);
 		mddev->queue->limits.discard_alignment = stripe;
 		mddev->queue->limits.discard_granularity = stripe;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] md: remove unused argument from md_new_event
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (4 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Actually, mddev is not used by md_new_event.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/md.c     | 30 +++++++++++++++---------------
 drivers/md/md.h     |  2 +-
 drivers/md/raid10.c |  2 +-
 drivers/md/raid5.c  |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c322841d4edc..90c624bec695 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -352,7 +352,7 @@ static bool create_on_open = true;
  */
 static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
 static atomic_t md_event_count;
-void md_new_event(struct mddev *mddev)
+void md_new_event(void)
 {
 	atomic_inc(&md_event_count);
 	wake_up(&md_event_waiters);
@@ -2886,7 +2886,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
 	if (mddev->degraded)
 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_new_event(mddev);
+	md_new_event();
 	md_wakeup_thread(mddev->thread);
 	return 0;
 }
@@ -3002,7 +3002,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 					set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 					md_wakeup_thread(mddev->thread);
 				}
-				md_new_event(mddev);
+				md_new_event();
 			}
 		}
 	} else if (cmd_match(buf, "writemostly")) {
@@ -4099,7 +4099,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
 	sysfs_notify_dirent_safe(mddev->sysfs_level);
-	md_new_event(mddev);
+	md_new_event();
 	rv = len;
 out_unlock:
 	mddev_unlock(mddev);
@@ -4620,7 +4620,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 		export_rdev(rdev);
 	mddev_unlock(mddev);
 	if (!err)
-		md_new_event(mddev);
+		md_new_event();
 	return err ? err : len;
 }
 
@@ -6041,7 +6041,7 @@ int md_run(struct mddev *mddev)
 	if (mddev->sb_flags)
 		md_update_sb(mddev, 0);
 
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 bitmap_abort:
@@ -6431,7 +6431,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 		if (mddev->hold_active == UNTIL_STOP)
 			mddev->hold_active = 0;
 	}
-	md_new_event(mddev);
+	md_new_event();
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
 	return 0;
 }
@@ -6935,7 +6935,7 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev)
 		md_wakeup_thread(mddev->thread);
 	else
 		md_update_sb(mddev, 1);
-	md_new_event(mddev);
+	md_new_event();
 
 	return 0;
 busy:
@@ -7008,7 +7008,7 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
 	 */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 abort_export:
@@ -7982,7 +7982,7 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
 	md_wakeup_thread(mddev->thread);
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
-	md_new_event(mddev);
+	md_new_event();
 }
 EXPORT_SYMBOL(md_error);
 
@@ -8866,7 +8866,7 @@ void md_do_sync(struct md_thread *thread)
 		mddev->curr_resync = 3; /* no longer delayed */
 	mddev->curr_resync_completed = j;
 	sysfs_notify_dirent_safe(mddev->sysfs_completed);
-	md_new_event(mddev);
+	md_new_event();
 	update_time = jiffies;
 
 	blk_start_plug(&plug);
@@ -8937,7 +8937,7 @@ void md_do_sync(struct md_thread *thread)
 			/* this is the earliest that rebuild will be
 			 * visible in /proc/mdstat
 			 */
-			md_new_event(mddev);
+			md_new_event();
 
 		if (last_check + window > io_sectors || j == max_sectors)
 			continue;
@@ -9161,7 +9161,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 			sysfs_link_rdev(mddev, rdev);
 			if (!test_bit(Journal, &rdev->flags))
 				spares++;
-			md_new_event(mddev);
+			md_new_event();
 			set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 		}
 	}
@@ -9195,7 +9195,7 @@ static void md_start_sync(struct work_struct *ws)
 	} else
 		md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-	md_new_event(mddev);
+	md_new_event();
 }
 
 /*
@@ -9454,7 +9454,7 @@ void md_reap_sync_thread(struct mddev *mddev)
 	/* flag recovery needed just to double check */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-	md_new_event(mddev);
+	md_new_event();
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4c96c36bd01a..53ea7a6961de 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -731,7 +731,7 @@ extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 			struct page *page, int op, int op_flags,
 			bool metadata_op);
 extern void md_do_sync(struct md_thread *thread);
-extern void md_new_event(struct mddev *mddev);
+extern void md_new_event(void);
 extern void md_allow_write(struct mddev *mddev);
 extern void md_wait_for_blocked_rdev(struct md_rdev *rdev, struct mddev *mddev);
 extern void md_set_array_sectors(struct mddev *mddev, sector_t array_sectors);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 29eb538db953..0aee0675f18e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4647,7 +4647,7 @@ static int raid10_start_reshape(struct mddev *mddev)
 	}
 	conf->reshape_checkpoint = jiffies;
 	md_wakeup_thread(mddev->sync_thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 abort:
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4ea9e7b647b0..9c1a5877cf9f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8279,7 +8279,7 @@ static int raid5_start_reshape(struct mddev *mddev)
 	}
 	conf->reshape_checkpoint = jiffies;
 	md_wakeup_thread(mddev->sync_thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
@ 2021-10-05  1:05   ` Guoqing Jiang
  2021-10-05  5:55   ` Jens Stutte
  1 sibling, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-05  1:05 UTC (permalink / raw)
  To: song; +Cc: linux-raid, jens

Now cc Jens actually.

On 10/4/21 11:34 PM, Guoqing Jiang wrote:
> Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
> behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
> size of behind bio is not bigger than BIO_MAX_VECS sectors.
>
> Unfortunately the same calltrace still could happen since an array could
> enable write-behind without write mostly device.
>
> To match the manpage of mdadm (which says "write-behind is only attempted
> on drives marked as write-mostly"), we need to check WriteMostly flag to
> avoid such unexpected behavior.
>
> [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25
>
> Cc: stable@vger.kernel.org # v5.12+
> Cc: Jens Stutte <jens@chianterastutte.eu>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>   drivers/md/raid1.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 19598bd38939..6ba12f0f0f03 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		if (!r1_bio->bios[i])
>   			continue;
>   
> -		if (first_clone) {
> +		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
>   			/* do behind I/O ?
>   			 * Not if there are too many, or cannot
>   			 * allocate memory, or a reader on WriteMostly


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
  2021-10-05  1:05   ` Guoqing Jiang
@ 2021-10-05  5:55   ` Jens Stutte
  1 sibling, 0 replies; 14+ messages in thread
From: Jens Stutte @ 2021-10-05  5:55 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid

FWIW, I can confirm that an equivalent change (except for variable 
renaming, see [1]) is running well here now for a while now.

Thank you to work on this!

Am 04.10.21 um 17:34 schrieb Guoqing Jiang:
> Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
> behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
> size of behind bio is not bigger than BIO_MAX_VECS sectors.
>
> Unfortunately the same calltrace still could happen since an array could
> enable write-behind without write mostly device.
>
> To match the manpage of mdadm (which says "write-behind is only attempted
> on drives marked as write-mostly"), we need to check WriteMostly flag to
> avoid such unexpected behavior.
>
> [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25
>
> Cc: stable@vger.kernel.org # v5.12+
> Cc: Jens Stutte <jens@chianterastutte.eu>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>   drivers/md/raid1.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 19598bd38939..6ba12f0f0f03 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		if (!r1_bio->bios[i])
>   			continue;
>   
> -		if (first_clone) {
> +		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
>   			/* do behind I/O ?
>   			 * Not if there are too many, or cannot
>   			 * allocate memory, or a reader on WriteMostly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
@ 2021-10-07  6:20   ` Song Liu
  2021-10-07 10:16     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:20 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> Add the paramenter since the err retry logic is only meaningful when
> the caller is handle_read_error.
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>  drivers/md/raid10.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index aa2636582841..29eb538db953 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
>  }
>
>  static void raid10_read_request(struct mddev *mddev, struct bio *bio,
> -                               struct r10bio *r10_bio)
> +                               struct r10bio *r10_bio, bool read_err)
>  {
>         struct r10conf *conf = mddev->private;
>         struct bio *read_bio;
> @@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>         struct md_rdev *err_rdev = NULL;
>         gfp_t gfp = GFP_NOIO;
>
> -       if (slot >= 0 && r10_bio->devs[slot].rdev) {
> +       if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {

How about we just move this section to a separate function?

Thanks,
Song

>                 /*
>                  * This is an error retry, but we cannot
>                  * safely dereference the rdev in the r10_bio,
> @@ -1519,7 +1519,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
>                         conf->geo.raid_disks);
>
>         if (bio_data_dir(bio) == READ)
> -               raid10_read_request(mddev, bio, r10_bio);
> +               raid10_read_request(mddev, bio, r10_bio, false);
>         else
>                 raid10_write_request(mddev, bio, r10_bio);
>  }
> @@ -2918,7 +2918,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
>         rdev_dec_pending(rdev, mddev);
>         allow_barrier(conf);
>         r10_bio->state = 0;
> -       raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
> +       raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
>  }
>
>  static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
@ 2021-10-07  6:25   ` Song Liu
  2021-10-07 10:20     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:25 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> We shouldn't set it since write behind IO should only happen to write
> mostly device.
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>  drivers/md/md-bitmap.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index e29c6298ef5c..0346281b1555 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
>  {
>         unsigned long backlog;
>         unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
> +       struct md_rdev *rdev;
> +       bool has_write_mostly = false;
>         int rv = kstrtoul(buf, 10, &backlog);
>         if (rv)
>                 return rv;
>         if (backlog > COUNTER_MAX)
>                 return -EINVAL;
> +
> +       /*
> +        * Without write mostly device, it doesn't make sense to set
> +        * backlog for max_write_behind.
> +        */
> +       rdev_for_each(rdev, mddev)
> +               if (test_bit(WriteMostly, &rdev->flags)) {
> +                       has_write_mostly = true;
> +                       break;
> +               }
> +       if (!has_write_mostly) {
> +               pr_warn_ratelimited("md: No write mostly device available\n");

Most of these _store functions do not print warnings for invalid changes. So
I am not sure whether we want to add this one. If we do want it, we should
make it clear, as
"md: No write mostly device available. Cannot set backlog\n".
We may also add the device name there.

Thanks,
Song

> +               return -EINVAL;
> +       }
> +
>         mddev->bitmap_info.max_write_behind = backlog;
>         if (!backlog && mddev->serial_info_pool) {
>                 /* serial_info_pool is not needed if backlog is zero */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] Misc changes for md
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (5 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
@ 2021-10-07  6:32 ` Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:32 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> Hello,
>
> The first patch fixes the same calltrace as commit 6607cd319b6b ("raid1:
> ensure write behind bio has less than BIO_MAX_VECS sectors") tried
> before, but unfortunately the calltrace still could happen if array
> without write mostly device is configured with write-behind enabled.
> So the first patch is suitable for fix branch which others are materials
> for next branch.
>
> Pls review.
>
> Thanks,
> Guoqing
>
> Guoqing Jiang (6):
>   md/raid1: only allocate write behind bio for WriteMostly device
>   md/bitmap: don't set max_write_behind if there is no write mostly
>     device
>   md/raid1: use rdev in raid1_write_request directly
>   md/raid10: add 'read_err' to raid10_read_request
>   md/raid5: call roundup_pow_of_two in raid5_run
>   md: remove unused argument from md_new_event

Thanks for these fixes and cleanups! I applied 1, 3, 5, 6 to md-next.

Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-07  6:20   ` Song Liu
@ 2021-10-07 10:16     ` Guoqing Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-07 10:16 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid



On 10/7/21 2:20 PM, Song Liu wrote:
> On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>> Add the paramenter since the err retry logic is only meaningful when
>> the caller is handle_read_error.
>>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>>   drivers/md/raid10.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index aa2636582841..29eb538db953 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
>>   }
>>
>>   static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>> -                               struct r10bio *r10_bio)
>> +                               struct r10bio *r10_bio, bool read_err)
>>   {
>>          struct r10conf *conf = mddev->private;
>>          struct bio *read_bio;
>> @@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>>          struct md_rdev *err_rdev = NULL;
>>          gfp_t gfp = GFP_NOIO;
>>
>> -       if (slot >= 0 && r10_bio->devs[slot].rdev) {
>> +       if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
> How about we just move this section to a separate function?

Will add an additional patch for it.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-07  6:25   ` Song Liu
@ 2021-10-07 10:20     ` Guoqing Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-07 10:20 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid



On 10/7/21 2:25 PM, Song Liu wrote:
> On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>> We shouldn't set it since write behind IO should only happen to write
>> mostly device.
>>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>>   drivers/md/md-bitmap.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>> index e29c6298ef5c..0346281b1555 100644
>> --- a/drivers/md/md-bitmap.c
>> +++ b/drivers/md/md-bitmap.c
>> @@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
>>   {
>>          unsigned long backlog;
>>          unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
>> +       struct md_rdev *rdev;
>> +       bool has_write_mostly = false;
>>          int rv = kstrtoul(buf, 10, &backlog);
>>          if (rv)
>>                  return rv;
>>          if (backlog > COUNTER_MAX)
>>                  return -EINVAL;
>> +
>> +       /*
>> +        * Without write mostly device, it doesn't make sense to set
>> +        * backlog for max_write_behind.
>> +        */
>> +       rdev_for_each(rdev, mddev)
>> +               if (test_bit(WriteMostly, &rdev->flags)) {
>> +                       has_write_mostly = true;
>> +                       break;
>> +               }
>> +       if (!has_write_mostly) {
>> +               pr_warn_ratelimited("md: No write mostly device available\n");
> Most of these _store functions do not print warnings for invalid changes. So
> I am not sure whether we want to add this one.

I think it is better to makes users know the reason of invalidation.

> If we do want it, we should make it clear, as
> "md: No write mostly device available. Cannot set backlog\n".
> We may also add the device name there.

Sure, I will make it clearer.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-07 10:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
2021-10-05  1:05   ` Guoqing Jiang
2021-10-05  5:55   ` Jens Stutte
2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
2021-10-07  6:25   ` Song Liu
2021-10-07 10:20     ` Guoqing Jiang
2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
2021-10-07  6:20   ` Song Liu
2021-10-07 10:16     ` Guoqing Jiang
2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.