All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/15] Bug fixes for mdadm tests
@ 2022-05-19 19:12 Logan Gunthorpe
  2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:12 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

Hi,

This series includes fixes to fix all the kernel panics in the mdadm
tests and some, related, sparse issues. The first 10 patches
clean refactor the raid5-cache code so that the RCU usage of conf->log
can be cleaned up which is done in patches 11 and 12 -- fixing some
actual kernel NULL pointer dereference crashes in the mdadm test.

Patch 13 fixes some of the remaining sparse warnings that are just
missing __rcu annotations.

Patches 14 and 15 fix a couple additional hangs in an mdadm test.

This series also originally included a patch[1] to fix the
mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
this work to improve mdadm tests will mean these types of bugs will
be caught much sooner, before merging.

This series will be followed by another series for mdadm which fixes
the segfaults and annotates some failing tests to make mdadm tests
runnable fairly reliably, but I'll wait for a stable hash for this
series to note the kernel version tested against. Following that,
v3 of my lock contention series will be sent with more confidence
of its correctness.

This series is based on the current md/md-next branch as of today
(6ad84d559b8c). A git branch is available here:

  https://github.com/sbates130272/linux-p2pmem md-bug

Thanks,

Logan

[1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a

--

Logan Gunthorpe (15):
  md/raid5-log: Drop extern decorators for function prototypes
  md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
  md/raid5-cache: Refactor r5l_start() to take a struct r5conf
  md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
    r5conf
  md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
  md/raid5-cache: Refactor remaining functions to take a r5conf
  md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
  md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
  md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
  md/raid5-cache: Take struct r5l_log in
    r5c_log_required_to_flush_cache()
  md/raid5: Ensure array is suspended for calls to log_exit()
  md/raid5-cache: Add RCU protection to conf->log accesses
  md/raid5-cache: Annotate pslot with __rcu notation
  md: Ensure resync is reported after it starts
  md: Notify sysfs sync_completed in md_reap_sync_thread()

 drivers/md/md.c          |  13 ++-
 drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
 drivers/md/raid5-log.h   | 103 ++++++++---------
 drivers/md/raid5-ppl.c   |   2 +-
 drivers/md/raid5.c       |  50 ++++----
 drivers/md/raid5.h       |   2 +-
 6 files changed, 231 insertions(+), 179 deletions(-)


base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
--
2.30.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
@ 2022-05-19 19:12 ` Logan Gunthorpe
  2022-05-21 11:36   ` Christoph Hellwig
  2022-05-19 19:12 ` [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf Logan Gunthorpe
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:12 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

extern is not necessary and recommended against when defining prototype
functions in headers. checkpatch.pl complains about these. So remove
them.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-log.h | 75 ++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 43c714a8798c..270ced4f770f 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -2,49 +2,46 @@
 #ifndef _RAID5_LOG_H
 #define _RAID5_LOG_H
 
-extern int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
-extern void r5l_exit_log(struct r5conf *conf);
-extern int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
-extern void r5l_write_stripe_run(struct r5l_log *log);
-extern void r5l_flush_stripe_to_raid(struct r5l_log *log);
-extern void r5l_stripe_write_finished(struct stripe_head *sh);
-extern int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
-extern void r5l_quiesce(struct r5l_log *log, int quiesce);
-extern bool r5l_log_disk_error(struct r5conf *conf);
-extern bool r5c_is_writeback(struct r5l_log *log);
-extern int
-r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
-		      struct stripe_head_state *s, int disks);
-extern void
-r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
-			    struct stripe_head_state *s);
-extern void r5c_release_extra_page(struct stripe_head *sh);
-extern void r5c_use_extra_page(struct stripe_head *sh);
-extern void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
-extern void r5c_handle_cached_data_endio(struct r5conf *conf,
-	struct stripe_head *sh, int disks);
-extern int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
-extern void r5c_make_stripe_write_out(struct stripe_head *sh);
-extern void r5c_flush_cache(struct r5conf *conf, int num);
-extern void r5c_check_stripe_cache_usage(struct r5conf *conf);
-extern void r5c_check_cached_full_stripe(struct r5conf *conf);
+int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
+void r5l_exit_log(struct r5conf *conf);
+int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
+void r5l_write_stripe_run(struct r5l_log *log);
+void r5l_flush_stripe_to_raid(struct r5l_log *log);
+void r5l_stripe_write_finished(struct stripe_head *sh);
+int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
+void r5l_quiesce(struct r5l_log *log, int quiesce);
+bool r5l_log_disk_error(struct r5conf *conf);
+bool r5c_is_writeback(struct r5l_log *log);
+int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
+			  struct stripe_head_state *s, int disks);
+void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
+				 struct stripe_head_state *s);
+void r5c_release_extra_page(struct stripe_head *sh);
+void r5c_use_extra_page(struct stripe_head *sh);
+void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+void r5c_handle_cached_data_endio(struct r5conf *conf,
+				  struct stripe_head *sh, int disks);
+int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
+void r5c_make_stripe_write_out(struct stripe_head *sh);
+void r5c_flush_cache(struct r5conf *conf, int num);
+void r5c_check_stripe_cache_usage(struct r5conf *conf);
+void r5c_check_cached_full_stripe(struct r5conf *conf);
 extern struct md_sysfs_entry r5c_journal_mode;
-extern void r5c_update_on_rdev_error(struct mddev *mddev,
-				     struct md_rdev *rdev);
-extern bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect);
-extern int r5l_start(struct r5l_log *log);
+void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev);
+bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect);
+int r5l_start(struct r5l_log *log);
 
-extern struct dma_async_tx_descriptor *
+struct dma_async_tx_descriptor *
 ops_run_partial_parity(struct stripe_head *sh, struct raid5_percpu *percpu,
 		       struct dma_async_tx_descriptor *tx);
-extern int ppl_init_log(struct r5conf *conf);
-extern void ppl_exit_log(struct r5conf *conf);
-extern int ppl_write_stripe(struct r5conf *conf, struct stripe_head *sh);
-extern void ppl_write_stripe_run(struct r5conf *conf);
-extern void ppl_stripe_write_finished(struct stripe_head *sh);
-extern int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add);
-extern void ppl_quiesce(struct r5conf *conf, int quiesce);
-extern int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio);
+int ppl_init_log(struct r5conf *conf);
+void ppl_exit_log(struct r5conf *conf);
+int ppl_write_stripe(struct r5conf *conf, struct stripe_head *sh);
+void ppl_write_stripe_run(struct r5conf *conf);
+void ppl_stripe_write_finished(struct stripe_head *sh);
+int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add);
+void ppl_quiesce(struct r5conf *conf, int quiesce);
+int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio);
 extern struct md_sysfs_entry ppl_write_hint;
 
 static inline bool raid5_has_log(struct r5conf *conf)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
  2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
@ 2022-05-19 19:12 ` Logan Gunthorpe
  2022-05-21 11:37   ` Christoph Hellwig
  2022-05-19 19:12 ` [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() " Logan Gunthorpe
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:12 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

Most calls to r5c_is_writeback() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5c_is_writeback() to reduce
the number of sites that dereference conf->log so this can be improved.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 25 ++++++++++++++-----------
 drivers/md/raid5-log.h   |  2 +-
 drivers/md/raid5.c       | 16 ++++++++--------
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 83c184eddbda..241c271e836d 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -247,10 +247,14 @@ enum r5l_io_unit_state {
 	IO_UNIT_STRIPE_END = 3,	/* stripes data finished writing to raid */
 };
 
-bool r5c_is_writeback(struct r5l_log *log)
+static bool __r5c_is_writeback(struct r5l_log *log)
 {
-	return (log != NULL &&
-		log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK);
+	return log && log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK;
+}
+
+bool r5c_is_writeback(struct r5conf *conf)
+{
+	return __r5c_is_writeback(conf->log);
 }
 
 static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
@@ -328,7 +332,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
 {
 	int total_cached;
 
-	if (!r5c_is_writeback(conf->log))
+	if (!r5c_is_writeback(conf))
 		return;
 
 	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -353,7 +357,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
  */
 void r5c_check_cached_full_stripe(struct r5conf *conf)
 {
-	if (!r5c_is_writeback(conf->log))
+	if (!r5c_is_writeback(conf))
 		return;
 
 	/*
@@ -398,7 +402,7 @@ static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
 {
 	struct r5l_log *log = conf->log;
 
-	if (!r5c_is_writeback(log))
+	if (!r5c_is_writeback(conf))
 		return 0;
 
 	return BLOCK_SECTORS *
@@ -420,7 +424,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 	sector_t reclaim_space;
 	bool wake_reclaim = false;
 
-	if (!r5c_is_writeback(log))
+	if (!__r5c_is_writeback(log))
 		return;
 
 	free_space = r5l_ring_distance(log, log->log_start,
@@ -449,9 +453,8 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 void r5c_make_stripe_write_out(struct stripe_head *sh)
 {
 	struct r5conf *conf = sh->raid_conf;
-	struct r5l_log *log = conf->log;
 
-	BUG_ON(!r5c_is_writeback(log));
+	BUG_ON(!r5c_is_writeback(conf));
 
 	WARN_ON(!test_bit(STRIPE_R5C_CACHING, &sh->state));
 	clear_bit(STRIPE_R5C_CACHING, &sh->state);
@@ -1429,7 +1432,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
 	int stripes_to_flush;
 	int flushing_partial, flushing_full;
 
-	if (!r5c_is_writeback(log))
+	if (!r5c_is_writeback(conf))
 		return;
 
 	flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
@@ -2644,7 +2647,7 @@ int r5c_try_caching_write(struct r5conf *conf,
 	int ret;
 	uintptr_t refcount;
 
-	BUG_ON(!r5c_is_writeback(log));
+	BUG_ON(!__r5c_is_writeback(log));
 
 	if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
 		/*
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 270ced4f770f..8a98b5e8c04b 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -11,7 +11,7 @@ void r5l_stripe_write_finished(struct stripe_head *sh);
 int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
 void r5l_quiesce(struct r5l_log *log, int quiesce);
 bool r5l_log_disk_error(struct r5conf *conf);
-bool r5c_is_writeback(struct r5l_log *log);
+bool r5c_is_writeback(struct r5conf *conf);
 int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
 			  struct stripe_head_state *s, int disks);
 void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5d09256d7f81..bbc0e1101d0f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -224,7 +224,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
 	BUG_ON(!list_empty(&sh->lru));
 	BUG_ON(atomic_read(&conf->active_stripes)==0);
 
-	if (r5c_is_writeback(conf->log))
+	if (r5c_is_writeback(conf))
 		for (i = sh->disks; i--; )
 			if (test_bit(R5_InJournal, &sh->dev[i].flags))
 				injournal++;
@@ -236,7 +236,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
 	 *   2. when resync is requested fot the stripe.
 	 */
 	if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state) ||
-	    (conf->quiesce && r5c_is_writeback(conf->log) &&
+	    (conf->quiesce && r5c_is_writeback(conf) &&
 	     !test_bit(STRIPE_HANDLE, &sh->state) && injournal != 0)) {
 		if (test_bit(STRIPE_R5C_CACHING, &sh->state))
 			r5c_make_stripe_write_out(sh);
@@ -274,7 +274,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
 				md_wakeup_thread(conf->mddev->thread);
 		atomic_dec(&conf->active_stripes);
 		if (!test_bit(STRIPE_EXPANDING, &sh->state)) {
-			if (!r5c_is_writeback(conf->log))
+			if (!r5c_is_writeback(conf))
 				list_add_tail(&sh->lru, temp_inactive_list);
 			else {
 				WARN_ON(test_bit(R5_InJournal, &sh->dev[sh->pd_idx].flags));
@@ -1786,7 +1786,7 @@ static void ops_complete_prexor(void *stripe_head_ref)
 	pr_debug("%s: stripe %llu\n", __func__,
 		(unsigned long long)sh->sector);
 
-	if (r5c_is_writeback(sh->raid_conf->log))
+	if (r5c_is_writeback(sh->raid_conf))
 		/*
 		 * raid5-cache write back uses orig_page during prexor.
 		 * After prexor, it is time to free orig_page
@@ -1905,9 +1905,9 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
 					tx = async_copy_data(1, wbi, &dev->page,
 							     dev->offset,
 							     dev->sector, tx, sh,
-							     r5c_is_writeback(conf->log));
+							     r5c_is_writeback(conf));
 					if (dev->page != dev->orig_page &&
-					    !r5c_is_writeback(conf->log)) {
+					    !r5c_is_writeback(conf)) {
 						set_bit(R5_SkipCopy, &dev->flags);
 						clear_bit(R5_UPTODATE, &dev->flags);
 						clear_bit(R5_OVERWRITE, &dev->flags);
@@ -5085,7 +5085,7 @@ static void handle_stripe(struct stripe_head *sh)
 	 */
 
 	if (!sh->reconstruct_state && !sh->check_state && !sh->log_io) {
-		if (!r5c_is_writeback(conf->log)) {
+		if (!r5c_is_writeback(conf)) {
 			if (s.to_write)
 				handle_stripe_dirtying(conf, sh, &s, disks);
 		} else { /* write back cache */
@@ -5533,7 +5533,7 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf, int group)
 	struct stripe_head *sh, *tmp;
 	struct list_head *handle_list = NULL;
 	struct r5worker_group *wg;
-	bool second_try = !r5c_is_writeback(conf->log) &&
+	bool second_try = !r5c_is_writeback(conf) &&
 		!r5l_log_disk_error(conf);
 	bool try_loprio = test_bit(R5C_LOG_TIGHT, &conf->cache_state) ||
 		r5l_log_disk_error(conf);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() to take a struct r5conf
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
  2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
  2022-05-19 19:12 ` [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf Logan Gunthorpe
@ 2022-05-19 19:12 ` Logan Gunthorpe
  2022-05-21 11:37   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() " Logan Gunthorpe
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:12 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

All calls to r5l_start() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_start() to reduce
the number of sites that dereference conf->log so this can be improved.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 9 +++------
 drivers/md/raid5-log.h   | 2 +-
 drivers/md/raid5.c       | 4 ++--
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 241c271e836d..0049c4f45e9c 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3032,20 +3032,17 @@ static int r5l_load_log(struct r5l_log *log)
 	return ret;
 }
 
-int r5l_start(struct r5l_log *log)
+int r5l_start(struct r5conf *conf)
 {
+	struct r5l_log *log = conf->log;
 	int ret;
 
 	if (!log)
 		return 0;
 
 	ret = r5l_load_log(log);
-	if (ret) {
-		struct mddev *mddev = log->rdev->mddev;
-		struct r5conf *conf = mddev->private;
-
+	if (ret)
 		r5l_exit_log(conf);
-	}
 	return ret;
 }
 
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 8a98b5e8c04b..c6f877df4f3e 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -29,7 +29,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf);
 extern struct md_sysfs_entry r5c_journal_mode;
 void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev);
 bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect);
-int r5l_start(struct r5l_log *log);
+int r5l_start(struct r5conf *conf);
 
 struct dma_async_tx_descriptor *
 ops_run_partial_parity(struct stripe_head *sh, struct raid5_percpu *percpu,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index bbc0e1101d0f..355760f34cb6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8040,7 +8040,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 		if (ret)
 			return ret;
 
-		ret = r5l_start(conf->log);
+		ret = r5l_start(conf);
 		if (ret)
 			return ret;
 
@@ -8743,7 +8743,7 @@ static int raid5_start(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
 
-	return r5l_start(conf->log);
+	return r5l_start(conf);
 }
 
 static struct md_personality raid6_personality =
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct r5conf
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (2 preceding siblings ...)
  2022-05-19 19:12 ` [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() " Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:38   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() " Logan Gunthorpe
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

All calls to r5l_flush_stripe_to_raid() have to dereference conf to
get the log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_flush_stripe_to_raid() to reduce
the number of sites that dereference conf->log so this can be improved.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 3 ++-
 drivers/md/raid5-log.h   | 4 ++--
 drivers/md/raid5.c       | 6 +++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 0049c4f45e9c..6db8040fb01b 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1287,8 +1287,9 @@ static void r5l_log_flush_endio(struct bio *bio)
  * only write stripes of an io_unit to raid disks till the io_unit is the first
  * one whose data/parity is in log.
  */
-void r5l_flush_stripe_to_raid(struct r5l_log *log)
+void r5l_flush_stripe_to_raid(struct r5conf *conf)
 {
+	struct r5l_log *log = conf->log;
 	bool do_flush;
 
 	if (!log || !log->need_cache_flush)
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index c6f877df4f3e..ead541075528 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -6,7 +6,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
 void r5l_exit_log(struct r5conf *conf);
 int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
 void r5l_write_stripe_run(struct r5l_log *log);
-void r5l_flush_stripe_to_raid(struct r5l_log *log);
+void r5l_flush_stripe_to_raid(struct r5conf *conf);
 void r5l_stripe_write_finished(struct stripe_head *sh);
 int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
 void r5l_quiesce(struct r5l_log *log, int quiesce);
@@ -96,7 +96,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
 static inline void log_flush_stripe_to_raid(struct r5conf *conf)
 {
 	if (conf->log)
-		r5l_flush_stripe_to_raid(conf->log);
+		r5l_flush_stripe_to_raid(conf);
 	else if (raid5_has_ppl(conf))
 		ppl_write_stripe_run(conf);
 }
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 355760f34cb6..436be2a42cc1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6427,7 +6427,7 @@ static int handle_active_stripes(struct r5conf *conf, int group,
 	release_inactive_stripe_list(conf, temp_inactive_list,
 				     NR_STRIPE_HASH_LOCKS);
 
-	r5l_flush_stripe_to_raid(conf->log);
+	r5l_flush_stripe_to_raid(conf);
 	if (release_inactive) {
 		spin_lock_irq(&conf->device_lock);
 		return 0;
@@ -6483,7 +6483,7 @@ static void raid5_do_work(struct work_struct *work)
 
 	flush_deferred_bios(conf);
 
-	r5l_flush_stripe_to_raid(conf->log);
+	r5l_flush_stripe_to_raid(conf);
 
 	async_tx_issue_pending_all();
 	blk_finish_plug(&plug);
@@ -6570,7 +6570,7 @@ static void raid5d(struct md_thread *thread)
 
 	flush_deferred_bios(conf);
 
-	r5l_flush_stripe_to_raid(conf->log);
+	r5l_flush_stripe_to_raid(conf);
 
 	async_tx_issue_pending_all();
 	blk_finish_plug(&plug);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (3 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() " Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:38   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf Logan Gunthorpe
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

All calls to r5l_wake_reclaim() have to dereference conf to get the
log. The log pointer should be protected by RCU, but isn't. Push
the dereference of conf->log into r5l_wake_reclaim() to reduce
the number of sites that dereference conf->log so this can be improved.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 41 +++++++++++++++++++++-------------------
 drivers/md/raid5-log.h   |  2 +-
 drivers/md/raid5.c       |  2 +-
 3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 6db8040fb01b..735666f9d793 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -325,7 +325,20 @@ void r5c_handle_cached_data_endio(struct r5conf *conf,
 	}
 }
 
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+{
+	unsigned long target;
+	unsigned long new = (unsigned long)space; /* overflow in theory */
+
+	if (!log)
+		return;
+	do {
+		target = log->reclaim_target;
+		if (new < target)
+			return;
+	} while (cmpxchg(&log->reclaim_target, target, new) != target);
+	md_wakeup_thread(log->reclaim_thread);
+}
 
 /* Check whether we should flush some stripes to free up stripe cache */
 void r5c_check_stripe_cache_usage(struct r5conf *conf)
@@ -348,7 +361,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
 	 */
 	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
 	    atomic_read(&conf->empty_inactive_list_nr) > 0)
-		r5l_wake_reclaim(conf->log, 0);
+		__r5l_wake_reclaim(conf->log, 0);
 }
 
 /*
@@ -367,7 +380,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
 	if (atomic_read(&conf->r5c_cached_full_stripes) >=
 	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
 		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
-		r5l_wake_reclaim(conf->log, 0);
+		__r5l_wake_reclaim(conf->log, 0);
 }
 
 /*
@@ -443,7 +456,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 		clear_bit(R5C_LOG_TIGHT, &conf->cache_state);
 
 	if (wake_reclaim)
-		r5l_wake_reclaim(log, 0);
+		__r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -1085,7 +1098,7 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
 
 	mutex_unlock(&log->io_mutex);
 	if (wake_reclaim)
-		r5l_wake_reclaim(log, reserve);
+		__r5l_wake_reclaim(log, reserve);
 	return 0;
 }
 
@@ -1237,7 +1250,7 @@ static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
 
 	if (r5l_reclaimable_space(log) > log->max_free_space ||
 	    test_bit(R5C_LOG_TIGHT, &conf->cache_state))
-		r5l_wake_reclaim(log, 0);
+		__r5l_wake_reclaim(log, 0);
 
 	spin_unlock_irqrestore(&log->io_list_lock, flags);
 	wake_up(&log->iounit_wait);
@@ -1562,19 +1575,9 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 	r5l_do_reclaim(log);
 }
 
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
 {
-	unsigned long target;
-	unsigned long new = (unsigned long)space; /* overflow in theory */
-
-	if (!log)
-		return;
-	do {
-		target = log->reclaim_target;
-		if (new < target)
-			return;
-	} while (cmpxchg(&log->reclaim_target, target, new) != target);
-	md_wakeup_thread(log->reclaim_thread);
+	__r5l_wake_reclaim(conf->log, space);
 }
 
 void r5l_quiesce(struct r5l_log *log, int quiesce)
@@ -1586,7 +1589,7 @@ void r5l_quiesce(struct r5l_log *log, int quiesce)
 		mddev = log->rdev->mddev;
 		wake_up(&mddev->sb_wait);
 		kthread_park(log->reclaim_thread->tsk);
-		r5l_wake_reclaim(log, MaxSector);
+		__r5l_wake_reclaim(log, MaxSector);
 		r5l_do_reclaim(log);
 	} else
 		kthread_unpark(log->reclaim_thread->tsk);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index ead541075528..3dd59dd4257f 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -18,7 +18,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh,
 				 struct stripe_head_state *s);
 void r5c_release_extra_page(struct stripe_head *sh);
 void r5c_use_extra_page(struct stripe_head *sh);
-void r5l_wake_reclaim(struct r5l_log *log, sector_t space);
+void r5l_wake_reclaim(struct r5conf *conf, sector_t space);
 void r5c_handle_cached_data_endio(struct r5conf *conf,
 				  struct stripe_head *sh, int disks);
 int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 436be2a42cc1..09e768f2d32c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -742,7 +742,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
 			if (!sh) {
 				set_bit(R5_INACTIVE_BLOCKED,
 					&conf->cache_state);
-				r5l_wake_reclaim(conf->log, 0);
+				r5l_wake_reclaim(conf, 0);
 				wait_event_lock_irq(
 					conf->wait_for_stripe,
 					!list_empty(conf->inactive_list + hash) &&
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (4 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() " Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:40   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

Many of the interface functions dereference conf->log in inline
helpers in raid5-log.h. Refactor all off these to dereference
conf->log inside the function instead. This will help to fix the
rcu locking when accessing conf->log.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 18 +++++++++++-------
 drivers/md/raid5-log.h   | 20 ++++++++++----------
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 735666f9d793..cddc2c37d2c5 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1004,9 +1004,9 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
  * running in raid5d, where reclaim could wait for raid5d too (when it flushes
  * data from log to raid disks), so we shouldn't wait for reclaim here
  */
-int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
+int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
 {
-	struct r5conf *conf = sh->raid_conf;
+	struct r5l_log *log = conf->log;
 	int write_disks = 0;
 	int data_pages, parity_pages;
 	int reserve;
@@ -1102,8 +1102,9 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
 	return 0;
 }
 
-void r5l_write_stripe_run(struct r5l_log *log)
+void r5l_write_stripe_run(struct r5conf *conf)
 {
+	struct r5l_log *log = conf->log;
 	if (!log)
 		return;
 	mutex_lock(&log->io_mutex);
@@ -1111,8 +1112,10 @@ void r5l_write_stripe_run(struct r5l_log *log)
 	mutex_unlock(&log->io_mutex);
 }
 
-int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio)
+int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
 {
+	struct r5l_log *log = conf->log;
+
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
 		/*
 		 * in write through (journal only)
@@ -1580,8 +1583,9 @@ void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
 	__r5l_wake_reclaim(conf->log, space);
 }
 
-void r5l_quiesce(struct r5l_log *log, int quiesce)
+void r5l_quiesce(struct r5conf *conf, int quiesce)
 {
+	struct r5l_log *log = conf->log;
 	struct mddev *mddev;
 
 	if (quiesce) {
@@ -2890,9 +2894,9 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 		set_bit(STRIPE_HANDLE, &sh->state);
 }
 
-int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
+int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
 {
-	struct r5conf *conf = sh->raid_conf;
+	struct r5l_log *log = conf->log;
 	int pages = 0;
 	int reserve;
 	int i;
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index 3dd59dd4257f..ccfbf8814753 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -4,12 +4,12 @@
 
 int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev);
 void r5l_exit_log(struct r5conf *conf);
-int r5l_write_stripe(struct r5l_log *log, struct stripe_head *head_sh);
-void r5l_write_stripe_run(struct r5l_log *log);
+int r5l_write_stripe(struct r5conf *conf, struct stripe_head *head_sh);
+void r5l_write_stripe_run(struct r5conf *conf);
 void r5l_flush_stripe_to_raid(struct r5conf *conf);
 void r5l_stripe_write_finished(struct stripe_head *sh);
-int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio);
-void r5l_quiesce(struct r5l_log *log, int quiesce);
+int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio);
+void r5l_quiesce(struct r5conf *conf, int quiesce);
 bool r5l_log_disk_error(struct r5conf *conf);
 bool r5c_is_writeback(struct r5conf *conf);
 int r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh,
@@ -21,7 +21,7 @@ void r5c_use_extra_page(struct stripe_head *sh);
 void r5l_wake_reclaim(struct r5conf *conf, sector_t space);
 void r5c_handle_cached_data_endio(struct r5conf *conf,
 				  struct stripe_head *sh, int disks);
-int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh);
+int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh);
 void r5c_make_stripe_write_out(struct stripe_head *sh);
 void r5c_flush_cache(struct r5conf *conf, int num);
 void r5c_check_stripe_cache_usage(struct r5conf *conf);
@@ -63,10 +63,10 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
 			/* writing out phase */
 			if (s->waiting_extra_page)
 				return 0;
-			return r5l_write_stripe(conf->log, sh);
+			return r5l_write_stripe(conf, sh);
 		} else if (test_bit(STRIPE_LOG_TRAPPED, &sh->state)) {
 			/* caching phase */
-			return r5c_cache_data(conf->log, sh);
+			return r5c_cache_data(conf, sh);
 		}
 	} else if (raid5_has_ppl(conf)) {
 		return ppl_write_stripe(conf, sh);
@@ -88,7 +88,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
 static inline void log_write_stripe_run(struct r5conf *conf)
 {
 	if (conf->log)
-		r5l_write_stripe_run(conf->log);
+		r5l_write_stripe_run(conf);
 	else if (raid5_has_ppl(conf))
 		ppl_write_stripe_run(conf);
 }
@@ -106,7 +106,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
 	int ret = -ENODEV;
 
 	if (conf->log)
-		ret = r5l_handle_flush_request(conf->log, bio);
+		ret = r5l_handle_flush_request(conf, bio);
 	else if (raid5_has_ppl(conf))
 		ret = ppl_handle_flush_request(conf->log, bio);
 
@@ -116,7 +116,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
 static inline void log_quiesce(struct r5conf *conf, int quiesce)
 {
 	if (conf->log)
-		r5l_quiesce(conf->log, quiesce);
+		r5l_quiesce(conf, quiesce);
 	else if (raid5_has_ppl(conf))
 		ppl_quiesce(conf, quiesce);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (5 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:41   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe() Logan Gunthorpe
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

ppl_handle_flush_request() takes an struct r5log argument but doesn't
use it. It has no buisiness taking this argument as it is only used
by raid5-cache and has no way to derference it anyway. Remove
the argument.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-log.h | 4 ++--
 drivers/md/raid5-ppl.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index ccfbf8814753..f26e6f4c7f9a 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -41,7 +41,7 @@ void ppl_write_stripe_run(struct r5conf *conf);
 void ppl_stripe_write_finished(struct stripe_head *sh);
 int ppl_modify_log(struct r5conf *conf, struct md_rdev *rdev, bool add);
 void ppl_quiesce(struct r5conf *conf, int quiesce);
-int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio);
+int ppl_handle_flush_request(struct bio *bio);
 extern struct md_sysfs_entry ppl_write_hint;
 
 static inline bool raid5_has_log(struct r5conf *conf)
@@ -108,7 +108,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
 	if (conf->log)
 		ret = r5l_handle_flush_request(conf, bio);
 	else if (raid5_has_ppl(conf))
-		ret = ppl_handle_flush_request(conf->log, bio);
+		ret = ppl_handle_flush_request(bio);
 
 	return ret;
 }
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 973e2e06f19c..4f5bdb4cad2b 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -679,7 +679,7 @@ void ppl_quiesce(struct r5conf *conf, int quiesce)
 	}
 }
 
-int ppl_handle_flush_request(struct r5l_log *log, struct bio *bio)
+int ppl_handle_flush_request(struct bio *bio)
 {
 	if (bio->bi_iter.bi_size == 0) {
 		bio_endio(bio);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (6 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:42   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp() Logan Gunthorpe
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

r5c_finish_cache_stripe() dereferences conf->log, which will need an
rcu_read_lock(). But that is not necessary here as the log is already
available in the call sites through other means.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index cddc2c37d2c5..f7d013479a68 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -505,10 +505,9 @@ static void r5c_handle_parity_cached(struct stripe_head *sh)
  * Setting proper flags after writing (or flushing) data and/or parity to the
  * log device. This is called from r5l_log_endio() or r5l_log_flush_endio().
  */
-static void r5c_finish_cache_stripe(struct stripe_head *sh)
+static void r5c_finish_cache_stripe(struct r5l_log *log,
+				    struct stripe_head *sh)
 {
-	struct r5l_log *log = sh->raid_conf->log;
-
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
 		BUG_ON(test_bit(STRIPE_R5C_CACHING, &sh->state));
 		/*
@@ -526,14 +525,14 @@ static void r5c_finish_cache_stripe(struct stripe_head *sh)
 	}
 }
 
-static void r5l_io_run_stripes(struct r5l_io_unit *io)
+static void r5l_io_run_stripes(struct r5l_log *log, struct r5l_io_unit *io)
 {
 	struct stripe_head *sh, *next;
 
 	list_for_each_entry_safe(sh, next, &io->stripe_list, log_list) {
 		list_del_init(&sh->log_list);
 
-		r5c_finish_cache_stripe(sh);
+		r5c_finish_cache_stripe(log, sh);
 
 		set_bit(STRIPE_HANDLE, &sh->state);
 		raid5_release_stripe(sh);
@@ -552,7 +551,7 @@ static void r5l_log_run_stripes(struct r5l_log *log)
 			break;
 
 		list_move_tail(&io->log_sibling, &log->finished_ios);
-		r5l_io_run_stripes(io);
+		r5l_io_run_stripes(log, io);
 	}
 }
 
@@ -1282,7 +1281,7 @@ static void r5l_log_flush_endio(struct bio *bio)
 
 	spin_lock_irqsave(&log->io_list_lock, flags);
 	list_for_each_entry(io, &log->flushing_ios, log_sibling)
-		r5l_io_run_stripes(io);
+		r5l_io_run_stripes(log, io);
 	list_splice_tail_init(&log->flushing_ios, &log->finished_ios);
 	spin_unlock_irqrestore(&log->io_list_lock, flags);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (7 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe() Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:42   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache() Logan Gunthorpe
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

r5c_calculate_new_cp() only uses conf to dereference the log which
both callers already have a pointer to and no longer need to
obtain a conf through a complicated dereference chain for this use.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index f7d013479a68..0ecff63202f8 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1166,10 +1166,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
  * for write through mode, returns log->next_checkpoint
  * for write back, returns log_start of first sh in stripe_in_journal_list
  */
-static sector_t r5c_calculate_new_cp(struct r5conf *conf)
+static sector_t r5c_calculate_new_cp(struct r5l_log *log)
 {
 	struct stripe_head *sh;
-	struct r5l_log *log = conf->log;
 	sector_t new_cp;
 	unsigned long flags;
 
@@ -1177,12 +1176,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 		return log->next_checkpoint;
 
 	spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
-	if (list_empty(&conf->log->stripe_in_journal_list)) {
+	if (list_empty(&log->stripe_in_journal_list)) {
 		/* all stripes flushed */
 		spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
 		return log->next_checkpoint;
 	}
-	sh = list_first_entry(&conf->log->stripe_in_journal_list,
+	sh = list_first_entry(&log->stripe_in_journal_list,
 			      struct stripe_head, r5c);
 	new_cp = sh->log_start;
 	spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1191,10 +1190,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 
 static sector_t r5l_reclaimable_space(struct r5l_log *log)
 {
-	struct r5conf *conf = log->rdev->mddev->private;
-
 	return r5l_ring_distance(log, log->last_checkpoint,
-				 r5c_calculate_new_cp(conf));
+				 r5c_calculate_new_cp(log));
 }
 
 static void r5l_run_no_mem_stripe(struct r5l_log *log)
@@ -1515,7 +1512,6 @@ static void r5c_do_reclaim(struct r5conf *conf)
 
 static void r5l_do_reclaim(struct r5l_log *log)
 {
-	struct r5conf *conf = log->rdev->mddev->private;
 	sector_t reclaim_target = xchg(&log->reclaim_target, 0);
 	sector_t reclaimable;
 	sector_t next_checkpoint;
@@ -1544,7 +1540,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
 				    log->io_list_lock);
 	}
 
-	next_checkpoint = r5c_calculate_new_cp(conf);
+	next_checkpoint = r5c_calculate_new_cp(log);
 	spin_unlock_irq(&log->io_list_lock);
 
 	if (reclaimable == 0 || !write_super)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (8 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp() Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:43   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

The only call site of r5c_log_required_to_flush_cache() already has
a log pointer and has already checked that it is writeback. So the
derference and writeback check is redundant. Just pass the log pointer.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 0ecff63202f8..f7b402138d16 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -411,13 +411,9 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
  *     (stripe_in_journal_count) * (max_degraded + 1) +
  *     (group_cnt + 1) * (raid_disks - max_degraded)
  */
-static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf)
+static sector_t r5c_log_required_to_flush_cache(struct r5conf *conf,
+						struct r5l_log *log)
 {
-	struct r5l_log *log = conf->log;
-
-	if (!r5c_is_writeback(conf))
-		return 0;
-
 	return BLOCK_SECTORS *
 		((conf->max_degraded + 1) * atomic_read(&log->stripe_in_journal_count) +
 		 (conf->raid_disks - conf->max_degraded) * (conf->group_cnt + 1));
@@ -442,7 +438,7 @@ static inline void r5c_update_log_state(struct r5l_log *log)
 
 	free_space = r5l_ring_distance(log, log->log_start,
 				       log->last_checkpoint);
-	reclaim_space = r5c_log_required_to_flush_cache(conf);
+	reclaim_space = r5c_log_required_to_flush_cache(conf, log);
 	if (free_space < 2 * reclaim_space)
 		set_bit(R5C_LOG_CRITICAL, &conf->cache_state);
 	else {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (9 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache() Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:44   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

The raid5-cache code relies on there being no IO in flight when
log_exit() is called. There are two places where this is not
guaranteed so add mddev_suspend() and mddev_resume() calls to these
sites.

The site in raid5_remove_disk() has a comment saying that it is
called in raid5d and thus cannot wait for pending writes; however that
does not appear to be correct anymore (if it ever was) as
raid5_remove_disk() is called from hot_remove_disk() which only
appears to be called in the md_ioctl(). Thus, the comment is removed,
as well as the racy check and replaced with calls to suspend/resume.

The site in raid5_change_consistency_policy() is in the error path,
and another similar call site already has suspend/resume calls just
below it; so it should be equally safe to make that change here.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 09e768f2d32c..37fe2af77c93 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7938,18 +7938,9 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 
 	print_raid5_conf(conf);
 	if (test_bit(Journal, &rdev->flags) && conf->log) {
-		/*
-		 * we can't wait pending write here, as this is called in
-		 * raid5d, wait will deadlock.
-		 * neilb: there is no locking about new writes here,
-		 * so this cannot be safe.
-		 */
-		if (atomic_read(&conf->active_stripes) ||
-		    atomic_read(&conf->r5c_cached_full_stripes) ||
-		    atomic_read(&conf->r5c_cached_partial_stripes)) {
-			return -EBUSY;
-		}
+		mddev_suspend(mddev);
 		log_exit(conf);
+		mddev_resume(mddev);
 		return 0;
 	}
 	if (rdev == rcu_access_pointer(p->rdev))
@@ -8697,8 +8688,11 @@ static int raid5_change_consistency_policy(struct mddev *mddev, const char *buf)
 			err = log_init(conf, NULL, true);
 			if (!err) {
 				err = resize_stripes(conf, conf->pool_size);
-				if (err)
+				if (err) {
+					mddev_suspend(mddev);
 					log_exit(conf);
+					mddev_resume(mddev);
+				}
 			}
 		} else
 			err = -EINVAL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (10 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:50   ` Christoph Hellwig
                     ` (2 more replies)
  2022-05-19 19:13 ` [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
                   ` (3 subsequent siblings)
  15 siblings, 3 replies; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

The mdadm test 21raid5cache randomly fails with NULL pointer accesses
conf->log when run repeatedly. conf->log was sort of protected with
a RCU, but most dereferences were not done with the correct functions.

Add rcu_read_locks() and rcu_access_pointers() to the appropriate
places.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 135 +++++++++++++++++++++++++++------------
 drivers/md/raid5-log.h   |  14 ++--
 drivers/md/raid5.c       |   4 +-
 drivers/md/raid5.h       |   2 +-
 4 files changed, 104 insertions(+), 51 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index f7b402138d16..1dbc7c4b9a15 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -254,7 +254,14 @@ static bool __r5c_is_writeback(struct r5l_log *log)
 
 bool r5c_is_writeback(struct r5conf *conf)
 {
-	return __r5c_is_writeback(conf->log);
+	struct r5l_log *log;
+	bool ret;
+
+	rcu_read_lock();
+	log = rcu_dereference(conf->log);
+	ret = __r5c_is_writeback(log);
+	rcu_read_unlock();
+	return ret;
 }
 
 static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
@@ -340,12 +347,23 @@ static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
 	md_wakeup_thread(log->reclaim_thread);
 }
 
+static struct r5l_log *get_log_for_io(struct r5conf *conf)
+{
+	/*
+	 * rcu_dereference_protected is safe because the array will be
+	 * quiesced before log_exit() so it can't be called while
+	 * an IO is in progress.
+	 */
+	return rcu_dereference_protected(conf->log, 1);
+}
+
 /* Check whether we should flush some stripes to free up stripe cache */
 void r5c_check_stripe_cache_usage(struct r5conf *conf)
 {
+	struct r5l_log *log = get_log_for_io(conf);
 	int total_cached;
 
-	if (!r5c_is_writeback(conf))
+	if (!__r5c_is_writeback(log))
 		return;
 
 	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
@@ -361,7 +379,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
 	 */
 	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
 	    atomic_read(&conf->empty_inactive_list_nr) > 0)
-		__r5l_wake_reclaim(conf->log, 0);
+		__r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -370,8 +388,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
  */
 void r5c_check_cached_full_stripe(struct r5conf *conf)
 {
-	if (!r5c_is_writeback(conf))
-		return;
+	struct r5l_log *log = get_log_for_io(conf);
 
 	/*
 	 * wake up reclaim for R5C_FULL_STRIPE_FLUSH_BATCH cached stripes
@@ -380,7 +397,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
 	if (atomic_read(&conf->r5c_cached_full_stripes) >=
 	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
 		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
-		__r5l_wake_reclaim(conf->log, 0);
+		__r5l_wake_reclaim(log, 0);
 }
 
 /*
@@ -703,7 +720,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
 
 	/* wait superblock change before suspend */
 	wait_event(mddev->sb_wait,
-		   conf->log == NULL ||
+		   rcu_access_pointer(conf->log) ||
 		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
 		    (locked = mddev_trylock(mddev))));
 	if (locked) {
@@ -1001,7 +1018,7 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
  */
 int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	int write_disks = 0;
 	int data_pages, parity_pages;
 	int reserve;
@@ -1099,7 +1116,8 @@ int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
 
 void r5l_write_stripe_run(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
+
 	if (!log)
 		return;
 	mutex_lock(&log->io_mutex);
@@ -1109,7 +1127,7 @@ void r5l_write_stripe_run(struct r5conf *conf)
 
 int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 
 	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
 		/*
@@ -1297,7 +1315,7 @@ static void r5l_log_flush_endio(struct bio *bio)
  */
 void r5l_flush_stripe_to_raid(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	bool do_flush;
 
 	if (!log || !log->need_cache_flush)
@@ -1412,7 +1430,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	struct stripe_head *sh, *next;
 
 	lockdep_assert_held(&conf->device_lock);
-	if (!conf->log)
+	if (!rcu_access_pointer(conf->log))
 		return;
 
 	count = 0;
@@ -1431,9 +1449,8 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	}
 }
 
-static void r5c_do_reclaim(struct r5conf *conf)
+static void r5c_do_reclaim(struct r5conf *conf, struct r5l_log *log)
 {
-	struct r5l_log *log = conf->log;
 	struct stripe_head *sh;
 	int count = 0;
 	unsigned long flags;
@@ -1441,7 +1458,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
 	int stripes_to_flush;
 	int flushing_partial, flushing_full;
 
-	if (!r5c_is_writeback(conf))
+	if (!__r5c_is_writeback(log))
 		return;
 
 	flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
@@ -1561,22 +1578,30 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 {
 	struct mddev *mddev = thread->mddev;
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log;
 
+	/*
+	 * This is safe, because the reclaim thread will be stopped before
+	 * the log is freed in log_exit()
+	 */
+	log = rcu_dereference_protected(conf->log, 1);
 	if (!log)
 		return;
-	r5c_do_reclaim(conf);
+
+	r5c_do_reclaim(conf, log);
 	r5l_do_reclaim(log);
 }
 
 void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
 {
-	__r5l_wake_reclaim(conf->log, space);
+	struct r5l_log *log = get_log_for_io(conf);
+
+	__r5l_wake_reclaim(log, space);
 }
 
 void r5l_quiesce(struct r5conf *conf, int quiesce)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	struct mddev *mddev;
 
 	if (quiesce) {
@@ -2534,16 +2559,20 @@ static void r5l_write_super(struct r5l_log *log, sector_t cp)
 static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
 {
 	struct r5conf *conf;
-	int ret;
+	struct r5l_log *log;
+	int ret = 0;
 
 	spin_lock(&mddev->lock);
 	conf = mddev->private;
-	if (!conf || !conf->log) {
-		spin_unlock(&mddev->lock);
-		return 0;
-	}
+	if (!conf)
+		goto out_spin_unlock;
+
+	rcu_read_lock();
+	log = rcu_dereference(conf->log);
+	if (!log)
+		goto out;
 
-	switch (conf->log->r5c_journal_mode) {
+	switch (log->r5c_journal_mode) {
 	case R5C_JOURNAL_MODE_WRITE_THROUGH:
 		ret = snprintf(
 			page, PAGE_SIZE, "[%s] %s\n",
@@ -2559,6 +2588,10 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
 	default:
 		ret = 0;
 	}
+
+out:
+	rcu_read_unlock();
+out_spin_unlock:
 	spin_unlock(&mddev->lock);
 	return ret;
 }
@@ -2572,13 +2605,15 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
 int r5c_journal_mode_set(struct mddev *mddev, int mode)
 {
 	struct r5conf *conf;
+	struct r5l_log *log;
+	int ret = 0;
 
 	if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH ||
 	    mode > R5C_JOURNAL_MODE_WRITE_BACK)
 		return -EINVAL;
 
 	conf = mddev->private;
-	if (!conf || !conf->log)
+	if (!conf || !rcu_access_pointer(conf->log))
 		return -ENODEV;
 
 	if (raid5_calc_degraded(conf) > 0 &&
@@ -2586,12 +2621,19 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
 		return -EINVAL;
 
 	mddev_suspend(mddev);
-	conf->log->r5c_journal_mode = mode;
+	rcu_read_lock();
+	log = rcu_dereference(conf->log);
+	if (log) {
+		log->r5c_journal_mode = mode;
+		pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
+			 mdname(mddev), mode, r5c_journal_mode_str[mode]);
+	} else {
+		ret = -ENODEV;
+	}
+	rcu_read_unlock();
 	mddev_resume(mddev);
 
-	pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
-		 mdname(mddev), mode, r5c_journal_mode_str[mode]);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(r5c_journal_mode_set);
 
@@ -2637,7 +2679,7 @@ int r5c_try_caching_write(struct r5conf *conf,
 			  struct stripe_head_state *s,
 			  int disks)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	int i;
 	struct r5dev *dev;
 	int to_cache = 0;
@@ -2804,7 +2846,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 				 struct stripe_head *sh,
 				 struct stripe_head_state *s)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	int i;
 	int do_wakeup = 0;
 	sector_t tree_index;
@@ -2887,7 +2929,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 
 int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	int pages = 0;
 	int reserve;
 	int i;
@@ -2943,7 +2985,7 @@ int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
 /* check whether this big stripe is in write back cache. */
 bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log = get_log_for_io(conf);
 	sector_t tree_index;
 	void *slot;
 
@@ -2953,6 +2995,7 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	tree_index = r5c_tree_index(conf, sect);
 	slot = radix_tree_lookup(&log->big_stripe_tree, tree_index);
+
 	return slot != NULL;
 }
 
@@ -3033,30 +3076,38 @@ static int r5l_load_log(struct r5l_log *log)
 
 int r5l_start(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log;
 	int ret;
 
+	log = rcu_dereference_protected(conf->log,
+			lockdep_is_held(&conf->mddev->reconfig_mutex));
 	if (!log)
 		return 0;
 
 	ret = r5l_load_log(log);
 	if (ret)
 		r5l_exit_log(conf);
+
 	return ret;
 }
 
 void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
 {
 	struct r5conf *conf = mddev->private;
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log;
 
+	rcu_read_lock();
+	log = rcu_dereference(conf->log);
 	if (!log)
-		return;
+		goto out;
 
 	if ((raid5_calc_degraded(conf) > 0 ||
 	     test_bit(Journal, &rdev->flags)) &&
-	    conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
+	    log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
 		schedule_work(&log->disable_writeback_work);
+
+out:
+	rcu_read_unlock();
 }
 
 int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
@@ -3164,15 +3215,17 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
 
 void r5l_exit_log(struct r5conf *conf)
 {
-	struct r5l_log *log = conf->log;
+	struct r5l_log *log;
 
-	conf->log = NULL;
-	synchronize_rcu();
+	lockdep_assert_held(&conf->mddev->reconfig_mutex);
+	log = rcu_replace_pointer(conf->log, NULL, 1);
 
 	/* Ensure disable_writeback_work wakes up and exits */
 	wake_up(&conf->mddev->sb_wait);
 	flush_work(&log->disable_writeback_work);
 	md_unregister_thread(&log->reclaim_thread);
+	synchronize_rcu();
+
 	mempool_exit(&log->meta_pool);
 	bioset_exit(&log->bs);
 	mempool_exit(&log->io_pool);
diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
index f26e6f4c7f9a..24b4dbd5b25c 100644
--- a/drivers/md/raid5-log.h
+++ b/drivers/md/raid5-log.h
@@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
 {
 	struct r5conf *conf = sh->raid_conf;
 
-	if (conf->log) {
+	if (rcu_access_pointer(conf->log)) {
 		if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
 			/* writing out phase */
 			if (s->waiting_extra_page)
@@ -79,7 +79,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
 {
 	struct r5conf *conf = sh->raid_conf;
 
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		r5l_stripe_write_finished(sh);
 	else if (raid5_has_ppl(conf))
 		ppl_stripe_write_finished(sh);
@@ -87,7 +87,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
 
 static inline void log_write_stripe_run(struct r5conf *conf)
 {
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		r5l_write_stripe_run(conf);
 	else if (raid5_has_ppl(conf))
 		ppl_write_stripe_run(conf);
@@ -95,7 +95,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
 
 static inline void log_flush_stripe_to_raid(struct r5conf *conf)
 {
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		r5l_flush_stripe_to_raid(conf);
 	else if (raid5_has_ppl(conf))
 		ppl_write_stripe_run(conf);
@@ -105,7 +105,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
 {
 	int ret = -ENODEV;
 
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		ret = r5l_handle_flush_request(conf, bio);
 	else if (raid5_has_ppl(conf))
 		ret = ppl_handle_flush_request(bio);
@@ -115,7 +115,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
 
 static inline void log_quiesce(struct r5conf *conf, int quiesce)
 {
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		r5l_quiesce(conf, quiesce);
 	else if (raid5_has_ppl(conf))
 		ppl_quiesce(conf, quiesce);
@@ -123,7 +123,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
 
 static inline void log_exit(struct r5conf *conf)
 {
-	if (conf->log)
+	if (rcu_access_pointer(conf->log))
 		r5l_exit_log(conf);
 	else if (raid5_has_ppl(conf))
 		ppl_exit_log(conf);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 37fe2af77c93..c06c9ef88417 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7937,7 +7937,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 	struct md_rdev *tmp;
 
 	print_raid5_conf(conf);
-	if (test_bit(Journal, &rdev->flags) && conf->log) {
+	if (test_bit(Journal, &rdev->flags) && rcu_access_pointer(conf->log)) {
 		mddev_suspend(mddev);
 		log_exit(conf);
 		mddev_resume(mddev);
@@ -8019,7 +8019,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 	int last = conf->raid_disks - 1;
 
 	if (test_bit(Journal, &rdev->flags)) {
-		if (conf->log)
+		if (rcu_access_pointer(conf->log))
 			return -EBUSY;
 
 		rdev->raid_disk = 0;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 638d29863503..04fe5b6f679c 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -684,7 +684,7 @@ struct r5conf {
 	struct r5worker_group	*worker_groups;
 	int			group_cnt;
 	int			worker_cnt_per_group;
-	struct r5l_log		*log;
+	struct r5l_log		__rcu *log;
 	void			*log_private;
 
 	spinlock_t		pending_bios_lock;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (11 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:51   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 14/15] md: Ensure resync is reported after it starts Logan Gunthorpe
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

radix_tree_lookup_slot() and radix_tree_replace_slot() API expect the
slot returned and looked up to be marked with __rcu. Otherwise
sparse warnings are generated:

  drivers/md/raid5-cache.c:2939:23: warning: incorrect type in
			assignment (different address spaces)
  drivers/md/raid5-cache.c:2939:23:    expected void **pslot
  drivers/md/raid5-cache.c:2939:23:    got void [noderef] __rcu **

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/raid5-cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 1dbc7c4b9a15..a541d17f84f6 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2683,7 +2683,7 @@ int r5c_try_caching_write(struct r5conf *conf,
 	int i;
 	struct r5dev *dev;
 	int to_cache = 0;
-	void **pslot;
+	void __rcu **pslot;
 	sector_t tree_index;
 	int ret;
 	uintptr_t refcount;
@@ -2850,7 +2850,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
 	int i;
 	int do_wakeup = 0;
 	sector_t tree_index;
-	void **pslot;
+	void __rcu **pslot;
 	uintptr_t refcount;
 
 	if (!log || !test_bit(R5_InJournal, &sh->dev[sh->pd_idx].flags))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 14/15] md: Ensure resync is reported after it starts
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (12 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:51   ` Christoph Hellwig
  2022-05-19 19:13 ` [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
  2022-05-23  6:28 ` [PATCH v1 00/15] Bug fixes for mdadm tests Song Liu
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

The 07layouts test in mdadm fails on some systems. The failure
presents itself as the backup file not being removed before the next
layout is grown into:

  mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
      File exists

This is because the background mdadm process, which is responsible for
cleaning up this backup file gets into an infinite loop waiting for
the reshape to start. mdadm checks the mdstat file if a reshape is
going and, if it is not, it waits for an event on the file or times
out in 5 seconds. On faster machines, the reshape may complete before
the 5 seconds times out, and thus the background mdadm process loops
waiting for a reshape to start that has already occurred.

mdadm reads the mdstat file to start, but mdstat does not report that the
reshape has begun, even though it has indeed begun. So the mdstat_wait()
call (in mdadm) which polls on the mdstat file won't ever return until
timing out.

The reason mdstat reports the reshape has started is due to an issue
in status_resync(). recovery_active is subtracted from curr_resync which
will result in a value of zero for the first chunk of reshaped data, and
the resulting read will report no reshape in progress.

To fix this, if "resync - recovery_active" is zero: force the value to
be 4 so the code reports a resync in progress.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/md.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8273ac5eef06..dbac63c8e35c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
 		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
 			/* Still cleaning up */
 			resync = max_sectors;
-	} else if (resync > max_sectors)
+	} else if (resync > max_sectors) {
 		resync = max_sectors;
-	else
+	} else {
 		resync -= atomic_read(&mddev->recovery_active);
+		if (!resync) {
+			/*
+			 * Resync has started, but if it's zero, ensure
+			 * it is still reported, by forcing it to be 4
+			 */
+			resync = 4;
+		}
+	}
 
 	if (resync == 0) {
 		if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread()
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (13 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 14/15] md: Ensure resync is reported after it starts Logan Gunthorpe
@ 2022-05-19 19:13 ` Logan Gunthorpe
  2022-05-21 11:52   ` Christoph Hellwig
  2022-05-23  6:28 ` [PATCH v1 00/15] Bug fixes for mdadm tests Song Liu
  15 siblings, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-19 19:13 UTC (permalink / raw)
  To: linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan, Logan Gunthorpe

The mdadm test 07layouts randomly produces a kernel hung task deadlock.
The deadlock is caused by the suspend_lo/suspend_hi files being set by
the mdadm background process during reshape and not being cleared
because the process hangs. (Leaving aside the issue of the fragility of
freezing kernel tasks by buggy userspace processes...)

When the background mdadm process hangs it, is waiting (without a
timeout) on a change to the sync_completed file signalling that the
reshape has completed. The process is woken up a couple times when
the reshape finishes but it is woken up before MD_RECOVERY_RUNNING
is cleared so sync_completed_show() reports 0 instead of "none.

To fix this, notify the sysfs file in md_reap_sync_thread() after
MD_RECOVERY_RUNNING has been cleared. This wakes up mdadm and causes
it to continue and write to suspend_lo/suspend_hi to allow IO to
continue.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/md/md.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index dbac63c8e35c..54108de46679 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9478,6 +9478,7 @@ void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
 	wake_up(&resync_wait);
 	/* flag recovery needed just to double check */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	sysfs_notify_dirent_safe(mddev->sysfs_completed);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	md_new_event();
 	if (mddev->event_work.func)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes
  2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
@ 2022-05-21 11:36   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:36 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
  2022-05-19 19:12 ` [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf Logan Gunthorpe
@ 2022-05-21 11:37   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:37 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() to take a struct r5conf
  2022-05-19 19:12 ` [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() " Logan Gunthorpe
@ 2022-05-21 11:37   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:37 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct r5conf
  2022-05-19 19:13 ` [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() " Logan Gunthorpe
@ 2022-05-21 11:38   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:38 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
  2022-05-19 19:13 ` [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() " Logan Gunthorpe
@ 2022-05-21 11:38   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:38 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf
  2022-05-19 19:13 ` [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf Logan Gunthorpe
@ 2022-05-21 11:40   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:40 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
  2022-05-19 19:13 ` [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
@ 2022-05-21 11:41   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
  2022-05-19 19:13 ` [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe() Logan Gunthorpe
@ 2022-05-21 11:42   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:42 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
  2022-05-19 19:13 ` [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp() Logan Gunthorpe
@ 2022-05-21 11:42   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:42 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache()
  2022-05-19 19:13 ` [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache() Logan Gunthorpe
@ 2022-05-21 11:43   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:43 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit()
  2022-05-19 19:13 ` [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
@ 2022-05-21 11:44   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:44 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
@ 2022-05-21 11:50   ` Christoph Hellwig
  2022-05-22  7:31   ` Donald Buczek
  2022-05-22  7:32   ` Donald Buczek
  2 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:50 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

On Thu, May 19, 2022 at 01:13:08PM -0600, Logan Gunthorpe wrote:
> The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> conf->log when run repeatedly. conf->log was sort of protected with
> a RCU, but most dereferences were not done with the correct functions.
> 
> Add rcu_read_locks() and rcu_access_pointers() to the appropriate
> places.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  drivers/md/raid5-cache.c | 135 +++++++++++++++++++++++++++------------
>  drivers/md/raid5-log.h   |  14 ++--
>  drivers/md/raid5.c       |   4 +-
>  drivers/md/raid5.h       |   2 +-
>  4 files changed, 104 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index f7b402138d16..1dbc7c4b9a15 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -254,7 +254,14 @@ static bool __r5c_is_writeback(struct r5l_log *log)
>  
>  bool r5c_is_writeback(struct r5conf *conf)
>  {
> -	return __r5c_is_writeback(conf->log);
> +	struct r5l_log *log;
> +	bool ret;
> +
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	ret = __r5c_is_writeback(log);

Nit: I'd do away with the local variable

	ret = __r5c_is_writeback(rcu_dereference(conf->log));

> +static struct r5l_log *get_log_for_io(struct r5conf *conf)
> +{
> +	/*
> +	 * rcu_dereference_protected is safe because the array will be
> +	 * quiesced before log_exit() so it can't be called while
> +	 * an IO is in progress.
> +	 */
> +	return rcu_dereference_protected(conf->log, 1);
> +}

The hardcoded one (shouldn't that be a true, btw?) kinda defeats the
purpose of rcu_dereference_protected.  But I can't really think of any
good runtime assert that we could use here.

>  void r5c_check_stripe_cache_usage(struct r5conf *conf)
>  {
> +	struct r5l_log *log = get_log_for_io(conf);
>  	int total_cached;
>  
> -	if (!r5c_is_writeback(conf))
> +	if (!__r5c_is_writeback(log))

This mostly just undoes earlier chanes.  Maybe we should have just let
r5c_is_writeback as-is and have a r5c_conf_is_writeback helper on top and
avoid this churn?  In general it would also be nice to have all these
newly added or removal local variables in place before the big fixup.

>  void r5c_check_cached_full_stripe(struct r5conf *conf)
>  {
> -	if (!r5c_is_writeback(conf))
> -		return;
> +	struct r5l_log *log = get_log_for_io(conf);

This looks odd.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation
  2022-05-19 19:13 ` [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
@ 2022-05-21 11:51   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:51 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 14/15] md: Ensure resync is reported after it starts
  2022-05-19 19:13 ` [PATCH v1 14/15] md: Ensure resync is reported after it starts Logan Gunthorpe
@ 2022-05-21 11:51   ` Christoph Hellwig
  2022-05-24 15:45     ` Logan Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:51 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

On Thu, May 19, 2022 at 01:13:10PM -0600, Logan Gunthorpe wrote:
> The 07layouts test in mdadm fails on some systems. The failure
> presents itself as the backup file not being removed before the next
> layout is grown into:
> 
>   mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
>       File exists
> 
> This is because the background mdadm process, which is responsible for
> cleaning up this backup file gets into an infinite loop waiting for
> the reshape to start. mdadm checks the mdstat file if a reshape is
> going and, if it is not, it waits for an event on the file or times
> out in 5 seconds. On faster machines, the reshape may complete before
> the 5 seconds times out, and thus the background mdadm process loops
> waiting for a reshape to start that has already occurred.
> 
> mdadm reads the mdstat file to start, but mdstat does not report that the
> reshape has begun, even though it has indeed begun. So the mdstat_wait()
> call (in mdadm) which polls on the mdstat file won't ever return until
> timing out.
> 
> The reason mdstat reports the reshape has started is due to an issue
> in status_resync(). recovery_active is subtracted from curr_resync which
> will result in a value of zero for the first chunk of reshaped data, and
> the resulting read will report no reshape in progress.
> 
> To fix this, if "resync - recovery_active" is zero: force the value to
> be 4 so the code reports a resync in progress.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  drivers/md/md.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8273ac5eef06..dbac63c8e35c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>  		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>  			/* Still cleaning up */
>  			resync = max_sectors;
> -	} else if (resync > max_sectors)
> +	} else if (resync > max_sectors) {
>  		resync = max_sectors;
> -	else
> +	} else {
>  		resync -= atomic_read(&mddev->recovery_active);
> +		if (!resync) {
> +			/*
> +			 * Resync has started, but if it's zero, ensure
> +			 * it is still reported, by forcing it to be 4
> +			 */
> +			resync = 4;

Where does this magic 4 come from?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread()
  2022-05-19 19:13 ` [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
@ 2022-05-21 11:52   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2022-05-21 11:52 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-raid, Song Liu, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
  2022-05-21 11:50   ` Christoph Hellwig
@ 2022-05-22  7:31   ` Donald Buczek
  2022-05-23  6:47     ` Song Liu
  2022-05-22  7:32   ` Donald Buczek
  2 siblings, 1 reply; 41+ messages in thread
From: Donald Buczek @ 2022-05-22  7:31 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan

On 19.05.22 21:13, Logan Gunthorpe wrote:
> The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> conf->log when run repeatedly. conf->log was sort of protected with
> a RCU, but most dereferences were not done with the correct functions.
> 
> Add rcu_read_locks() and rcu_access_pointers() to the appropriate
> places.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/md/raid5-cache.c | 135 +++++++++++++++++++++++++++------------
>   drivers/md/raid5-log.h   |  14 ++--
>   drivers/md/raid5.c       |   4 +-
>   drivers/md/raid5.h       |   2 +-
>   4 files changed, 104 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index f7b402138d16..1dbc7c4b9a15 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -254,7 +254,14 @@ static bool __r5c_is_writeback(struct r5l_log *log)
>   
>   bool r5c_is_writeback(struct r5conf *conf)
>   {
> -	return __r5c_is_writeback(conf->log);
> +	struct r5l_log *log;
> +	bool ret;
> +
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	ret = __r5c_is_writeback(log);
> +	rcu_read_unlock();
> +	return ret;
>   }
>   
>   static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
> @@ -340,12 +347,23 @@ static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
>   	md_wakeup_thread(log->reclaim_thread);
>   }
>   
> +static struct r5l_log *get_log_for_io(struct r5conf *conf)
> +{
> +	/*
> +	 * rcu_dereference_protected is safe because the array will be
> +	 * quiesced before log_exit() so it can't be called while
> +	 * an IO is in progress.
> +	 */
> +	return rcu_dereference_protected(conf->log, 1);
> +}
> +
>   /* Check whether we should flush some stripes to free up stripe cache */
>   void r5c_check_stripe_cache_usage(struct r5conf *conf)
>   {
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int total_cached;
>   
> -	if (!r5c_is_writeback(conf))
> +	if (!__r5c_is_writeback(log))
>   		return;
>   
>   	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
> @@ -361,7 +379,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
>   	 */
>   	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
>   	    atomic_read(&conf->empty_inactive_list_nr) > 0)
> -		__r5l_wake_reclaim(conf->log, 0);
> +		__r5l_wake_reclaim(log, 0);
>   }
>   
>   /*
> @@ -370,8 +388,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
>    */
>   void r5c_check_cached_full_stripe(struct r5conf *conf)
>   {
> -	if (!r5c_is_writeback(conf))
> -		return;
> +	struct r5l_log *log = get_log_for_io(conf);
>   
>   	/*
>   	 * wake up reclaim for R5C_FULL_STRIPE_FLUSH_BATCH cached stripes
> @@ -380,7 +397,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
>   	if (atomic_read(&conf->r5c_cached_full_stripes) >=
>   	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
>   		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
> -		__r5l_wake_reclaim(conf->log, 0);
> +		__r5l_wake_reclaim(log, 0);
>   }
>   
>   /*
> @@ -703,7 +720,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
>   
>   	/* wait superblock change before suspend */
>   	wait_event(mddev->sb_wait,
> -		   conf->log == NULL ||
> +		   rcu_access_pointer(conf->log) ||
>   		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>   		    (locked = mddev_trylock(mddev))));
>   	if (locked) {
> @@ -1001,7 +1018,7 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
>    */
>   int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int write_disks = 0;
>   	int data_pages, parity_pages;
>   	int reserve;
> @@ -1099,7 +1116,8 @@ int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
>   
>   void r5l_write_stripe_run(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
> +
>   	if (!log)
>   		return;
>   	mutex_lock(&log->io_mutex);
> @@ -1109,7 +1127,7 @@ void r5l_write_stripe_run(struct r5conf *conf)
>   
>   int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   
>   	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
>   		/*
> @@ -1297,7 +1315,7 @@ static void r5l_log_flush_endio(struct bio *bio)
>    */
>   void r5l_flush_stripe_to_raid(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	bool do_flush;
>   
>   	if (!log || !log->need_cache_flush)
> @@ -1412,7 +1430,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>   	struct stripe_head *sh, *next;
>   
>   	lockdep_assert_held(&conf->device_lock);
> -	if (!conf->log)
> +	if (!rcu_access_pointer(conf->log))
>   		return;
>   
>   	count = 0;
> @@ -1431,9 +1449,8 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>   	}
>   }
>   
> -static void r5c_do_reclaim(struct r5conf *conf)
> +static void r5c_do_reclaim(struct r5conf *conf, struct r5l_log *log)
>   {
> -	struct r5l_log *log = conf->log;
>   	struct stripe_head *sh;
>   	int count = 0;
>   	unsigned long flags;
> @@ -1441,7 +1458,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
>   	int stripes_to_flush;
>   	int flushing_partial, flushing_full;
>   
> -	if (!r5c_is_writeback(conf))
> +	if (!__r5c_is_writeback(log))
>   		return;
>   
>   	flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
> @@ -1561,22 +1578,30 @@ static void r5l_reclaim_thread(struct md_thread *thread)
>   {
>   	struct mddev *mddev = thread->mddev;
>   	struct r5conf *conf = mddev->private;
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> +	/*
> +	 * This is safe, because the reclaim thread will be stopped before
> +	 * the log is freed in log_exit()
> +	 */
> +	log = rcu_dereference_protected(conf->log, 1);
>   	if (!log)
>   		return;
> -	r5c_do_reclaim(conf);
> +
> +	r5c_do_reclaim(conf, log);
>   	r5l_do_reclaim(log);
>   }
>   
>   void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
>   {
> -	__r5l_wake_reclaim(conf->log, space);
> +	struct r5l_log *log = get_log_for_io(conf);
> +
> +	__r5l_wake_reclaim(log, space);
>   }
>   
>   void r5l_quiesce(struct r5conf *conf, int quiesce)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	struct mddev *mddev;
>   
>   	if (quiesce) {
> @@ -2534,16 +2559,20 @@ static void r5l_write_super(struct r5l_log *log, sector_t cp)
>   static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   {
>   	struct r5conf *conf;
> -	int ret;
> +	struct r5l_log *log;
> +	int ret = 0;
>   
>   	spin_lock(&mddev->lock);
>   	conf = mddev->private;
> -	if (!conf || !conf->log) {
> -		spin_unlock(&mddev->lock);
> -		return 0;
> -	}
> +	if (!conf)
> +		goto out_spin_unlock;
> +
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	if (!log)
> +		goto out;
>   
> -	switch (conf->log->r5c_journal_mode) {
> +	switch (log->r5c_journal_mode) {
>   	case R5C_JOURNAL_MODE_WRITE_THROUGH:
>   		ret = snprintf(
>   			page, PAGE_SIZE, "[%s] %s\n",
> @@ -2559,6 +2588,10 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   	default:
>   		ret = 0;
>   	}
> +
> +out:
> +	rcu_read_unlock();
> +out_spin_unlock:
>   	spin_unlock(&mddev->lock);
>   	return ret;
>   }
> @@ -2572,13 +2605,15 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   int r5c_journal_mode_set(struct mddev *mddev, int mode)
>   {
>   	struct r5conf *conf;
> +	struct r5l_log *log;
> +	int ret = 0;
>   
>   	if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH ||
>   	    mode > R5C_JOURNAL_MODE_WRITE_BACK)
>   		return -EINVAL;
>   
>   	conf = mddev->private;
> -	if (!conf || !conf->log)
> +	if (!conf || !rcu_access_pointer(conf->log))
>   		return -ENODEV;
>   
>   	if (raid5_calc_degraded(conf) > 0 &&
> @@ -2586,12 +2621,19 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
>   		return -EINVAL;
>   
>   	mddev_suspend(mddev);
> -	conf->log->r5c_journal_mode = mode;
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	if (log) {
> +		log->r5c_journal_mode = mode;
> +		pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
> +			 mdname(mddev), mode, r5c_journal_mode_str[mode]);
> +	} else {
> +		ret = -ENODEV;
> +	}
> +	rcu_read_unlock();
>   	mddev_resume(mddev);
>   
> -	pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
> -		 mdname(mddev), mode, r5c_journal_mode_str[mode]);
> -	return 0;
> +	return ret;
>   }
>   EXPORT_SYMBOL(r5c_journal_mode_set);
>   
> @@ -2637,7 +2679,7 @@ int r5c_try_caching_write(struct r5conf *conf,
>   			  struct stripe_head_state *s,
>   			  int disks)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int i;
>   	struct r5dev *dev;
>   	int to_cache = 0;
> @@ -2804,7 +2846,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
>   				 struct stripe_head *sh,
>   				 struct stripe_head_state *s)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int i;
>   	int do_wakeup = 0;
>   	sector_t tree_index;
> @@ -2887,7 +2929,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
>   
>   int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int pages = 0;
>   	int reserve;
>   	int i;
> @@ -2943,7 +2985,7 @@ int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
>   /* check whether this big stripe is in write back cache. */
>   bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	sector_t tree_index;
>   	void *slot;
>   
> @@ -2953,6 +2995,7 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
>   	WARN_ON_ONCE(!rcu_read_lock_held());
>   	tree_index = r5c_tree_index(conf, sect);
>   	slot = radix_tree_lookup(&log->big_stripe_tree, tree_index);
> +
>   	return slot != NULL;
>   }
>   
> @@ -3033,30 +3076,38 @@ static int r5l_load_log(struct r5l_log *log)
>   
>   int r5l_start(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   	int ret;
>   
> +	log = rcu_dereference_protected(conf->log,
> +			lockdep_is_held(&conf->mddev->reconfig_mutex));
>   	if (!log)
>   		return 0;
>   
>   	ret = r5l_load_log(log);
>   	if (ret)
>   		r5l_exit_log(conf);
> +
>   	return ret;
>   }
>   
>   void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
>   {
>   	struct r5conf *conf = mddev->private;
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
>   	if (!log)
> -		return;
> +		goto out;
>   
>   	if ((raid5_calc_degraded(conf) > 0 ||
>   	     test_bit(Journal, &rdev->flags)) &&
> -	    conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
> +	    log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
>   		schedule_work(&log->disable_writeback_work);
> +
> +out:
> +	rcu_read_unlock();
>   }
>   
>   int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
> @@ -3164,15 +3215,17 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
>   
>   void r5l_exit_log(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> -	conf->log = NULL;
> -	synchronize_rcu();
> +	lockdep_assert_held(&conf->mddev->reconfig_mutex);
> +	log = rcu_replace_pointer(conf->log, NULL, 1);
>   
>   	/* Ensure disable_writeback_work wakes up and exits */
>   	wake_up(&conf->mddev->sb_wait);
>   	flush_work(&log->disable_writeback_work);
>   	md_unregister_thread(&log->reclaim_thread);
> +	synchronize_rcu();
> +
>   	mempool_exit(&log->meta_pool);
>   	bioset_exit(&log->bs);
>   	mempool_exit(&log->io_pool);
> diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
> index f26e6f4c7f9a..24b4dbd5b25c 100644
> --- a/drivers/md/raid5-log.h
> +++ b/drivers/md/raid5-log.h
> @@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
>   {
>   	struct r5conf *conf = sh->raid_conf;
>   
> -	if (conf->log) {
> +	if (rcu_access_pointer(conf->log)) {


A problem here is that `struct r5l_log` of `conf->log` is private to raid5-cache.c and gcc below version 10 (wrongly) regards the `typeof(*p) *local` declaration of __rcu_access_pointer as a dereference:

   CC      drivers/md/raid5.o

In file included from ./include/linux/rculist.h:11:0,

                  from ./include/linux/dcache.h:8,

                  from ./include/linux/fs.h:8,

                  from ./include/linux/highmem.h:5,

                  from ./include/linux/bvec.h:10,

                  from ./include/linux/blk_types.h:10,

                  from ./include/linux/blkdev.h:9,

                  from drivers/md/raid5.c:38:

drivers/md/raid5-log.h: In function ‘log_stripe’:

./include/linux/rcupdate.h:384:9: error: dereferencing pointer to incomplete type ‘struct r5l_log’

   typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \

          ^

./include/linux/rcupdate.h:495:31: note: in expansion of macro ‘__rcu_access_pointer’

  #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)

                                ^~~~~~~~~~~~~~~~~~~~

drivers/md/raid5-log.h:61:6: note: in expansion of macro ‘rcu_access_pointer’

   if (rcu_access_pointer(conf->log)) {

       ^~~~~~~~~~~~~~~~~~

make[2]: *** [scripts/Makefile.build:288: drivers/md/raid5.o] Error 1

make[1]: *** [scripts/Makefile.build:550: drivers/md] Error 2

make: *** [Makefile:1834: drivers] Error 2


See https://godbolt.org/z/TPP8MdKbc to test compiler versions with this construct.

Best

   Donald


>   		if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
>   			/* writing out phase */
>   			if (s->waiting_extra_page)
> @@ -79,7 +79,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
>   {
>   	struct r5conf *conf = sh->raid_conf;
>   
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_stripe_write_finished(sh);
>   	else if (raid5_has_ppl(conf))
>   		ppl_stripe_write_finished(sh);
> @@ -87,7 +87,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
>   
>   static inline void log_write_stripe_run(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_write_stripe_run(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_write_stripe_run(conf);
> @@ -95,7 +95,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
>   
>   static inline void log_flush_stripe_to_raid(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_flush_stripe_to_raid(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_write_stripe_run(conf);
> @@ -105,7 +105,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   {
>   	int ret = -ENODEV;
>   
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		ret = r5l_handle_flush_request(conf, bio);
>   	else if (raid5_has_ppl(conf))
>   		ret = ppl_handle_flush_request(bio);
> @@ -115,7 +115,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   
>   static inline void log_quiesce(struct r5conf *conf, int quiesce)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_quiesce(conf, quiesce);
>   	else if (raid5_has_ppl(conf))
>   		ppl_quiesce(conf, quiesce);
> @@ -123,7 +123,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
>   
>   static inline void log_exit(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_exit_log(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_exit_log(conf);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 37fe2af77c93..c06c9ef88417 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7937,7 +7937,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>   	struct md_rdev *tmp;
>   
>   	print_raid5_conf(conf);
> -	if (test_bit(Journal, &rdev->flags) && conf->log) {
> +	if (test_bit(Journal, &rdev->flags) && rcu_access_pointer(conf->log)) {
>   		mddev_suspend(mddev);
>   		log_exit(conf);
>   		mddev_resume(mddev);
> @@ -8019,7 +8019,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>   	int last = conf->raid_disks - 1;
>   
>   	if (test_bit(Journal, &rdev->flags)) {
> -		if (conf->log)
> +		if (rcu_access_pointer(conf->log))
>   			return -EBUSY;
>   
>   		rdev->raid_disk = 0;
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index 638d29863503..04fe5b6f679c 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -684,7 +684,7 @@ struct r5conf {
>   	struct r5worker_group	*worker_groups;
>   	int			group_cnt;
>   	int			worker_cnt_per_group;
> -	struct r5l_log		*log;
> +	struct r5l_log		__rcu *log;
>   	void			*log_private;
>   
>   	spinlock_t		pending_bios_lock;

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
  2022-05-21 11:50   ` Christoph Hellwig
  2022-05-22  7:31   ` Donald Buczek
@ 2022-05-22  7:32   ` Donald Buczek
  2022-05-24 15:55     ` Logan Gunthorpe
  2 siblings, 1 reply; 41+ messages in thread
From: Donald Buczek @ 2022-05-22  7:32 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan

On 19.05.22 21:13, Logan Gunthorpe wrote:
> The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> conf->log when run repeatedly. conf->log was sort of protected with
> a RCU, but most dereferences were not done with the correct functions.
> 
> Add rcu_read_locks() and rcu_access_pointers() to the appropriate
> places.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>   drivers/md/raid5-cache.c | 135 +++++++++++++++++++++++++++------------
>   drivers/md/raid5-log.h   |  14 ++--
>   drivers/md/raid5.c       |   4 +-
>   drivers/md/raid5.h       |   2 +-
>   4 files changed, 104 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index f7b402138d16..1dbc7c4b9a15 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -254,7 +254,14 @@ static bool __r5c_is_writeback(struct r5l_log *log)
>   
>   bool r5c_is_writeback(struct r5conf *conf)
>   {
> -	return __r5c_is_writeback(conf->log);
> +	struct r5l_log *log;
> +	bool ret;
> +
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	ret = __r5c_is_writeback(log);
> +	rcu_read_unlock();
> +	return ret;
>   }
>   
>   static sector_t r5l_ring_add(struct r5l_log *log, sector_t start, sector_t inc)
> @@ -340,12 +347,23 @@ static void __r5l_wake_reclaim(struct r5l_log *log, sector_t space)
>   	md_wakeup_thread(log->reclaim_thread);
>   }
>   
> +static struct r5l_log *get_log_for_io(struct r5conf *conf)
> +{
> +	/*
> +	 * rcu_dereference_protected is safe because the array will be
> +	 * quiesced before log_exit() so it can't be called while
> +	 * an IO is in progress.
> +	 */
> +	return rcu_dereference_protected(conf->log, 1);
> +}
> +
>   /* Check whether we should flush some stripes to free up stripe cache */
>   void r5c_check_stripe_cache_usage(struct r5conf *conf)
>   {
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int total_cached;
>   
> -	if (!r5c_is_writeback(conf))
> +	if (!__r5c_is_writeback(log))
>   		return;
>   
>   	total_cached = atomic_read(&conf->r5c_cached_partial_stripes) +
> @@ -361,7 +379,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
>   	 */
>   	if (total_cached > conf->min_nr_stripes * 1 / 2 ||
>   	    atomic_read(&conf->empty_inactive_list_nr) > 0)
> -		__r5l_wake_reclaim(conf->log, 0);
> +		__r5l_wake_reclaim(log, 0);
>   }
>   
>   /*
> @@ -370,8 +388,7 @@ void r5c_check_stripe_cache_usage(struct r5conf *conf)
>    */
>   void r5c_check_cached_full_stripe(struct r5conf *conf)
>   {
> -	if (!r5c_is_writeback(conf))
> -		return;
> +	struct r5l_log *log = get_log_for_io(conf);
>   
>   	/*
>   	 * wake up reclaim for R5C_FULL_STRIPE_FLUSH_BATCH cached stripes
> @@ -380,7 +397,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
>   	if (atomic_read(&conf->r5c_cached_full_stripes) >=
>   	    min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
>   		conf->chunk_sectors >> RAID5_STRIPE_SHIFT(conf)))
> -		__r5l_wake_reclaim(conf->log, 0);
> +		__r5l_wake_reclaim(log, 0);
>   }
>   
>   /*
> @@ -703,7 +720,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
>   
>   	/* wait superblock change before suspend */
>   	wait_event(mddev->sb_wait,
> -		   conf->log == NULL ||
> +		   rcu_access_pointer(conf->log) ||

Reversed condition?

I think, some examples in Documentation/RCU/Design/Requirements/Requirements.rst are reversed, too.

Best
   Donald


>   		   (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>   		    (locked = mddev_trylock(mddev))));
>   	if (locked) {
> @@ -1001,7 +1018,7 @@ static inline void r5l_add_no_space_stripe(struct r5l_log *log,
>    */
>   int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int write_disks = 0;
>   	int data_pages, parity_pages;
>   	int reserve;
> @@ -1099,7 +1116,8 @@ int r5l_write_stripe(struct r5conf *conf, struct stripe_head *sh)
>   
>   void r5l_write_stripe_run(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
> +
>   	if (!log)
>   		return;
>   	mutex_lock(&log->io_mutex);
> @@ -1109,7 +1127,7 @@ void r5l_write_stripe_run(struct r5conf *conf)
>   
>   int r5l_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   
>   	if (log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_THROUGH) {
>   		/*
> @@ -1297,7 +1315,7 @@ static void r5l_log_flush_endio(struct bio *bio)
>    */
>   void r5l_flush_stripe_to_raid(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	bool do_flush;
>   
>   	if (!log || !log->need_cache_flush)
> @@ -1412,7 +1430,7 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>   	struct stripe_head *sh, *next;
>   
>   	lockdep_assert_held(&conf->device_lock);
> -	if (!conf->log)
> +	if (!rcu_access_pointer(conf->log))
>   		return;
>   
>   	count = 0;
> @@ -1431,9 +1449,8 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>   	}
>   }
>   
> -static void r5c_do_reclaim(struct r5conf *conf)
> +static void r5c_do_reclaim(struct r5conf *conf, struct r5l_log *log)
>   {
> -	struct r5l_log *log = conf->log;
>   	struct stripe_head *sh;
>   	int count = 0;
>   	unsigned long flags;
> @@ -1441,7 +1458,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
>   	int stripes_to_flush;
>   	int flushing_partial, flushing_full;
>   
> -	if (!r5c_is_writeback(conf))
> +	if (!__r5c_is_writeback(log))
>   		return;
>   
>   	flushing_partial = atomic_read(&conf->r5c_flushing_partial_stripes);
> @@ -1561,22 +1578,30 @@ static void r5l_reclaim_thread(struct md_thread *thread)
>   {
>   	struct mddev *mddev = thread->mddev;
>   	struct r5conf *conf = mddev->private;
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> +	/*
> +	 * This is safe, because the reclaim thread will be stopped before
> +	 * the log is freed in log_exit()
> +	 */
> +	log = rcu_dereference_protected(conf->log, 1);
>   	if (!log)
>   		return;
> -	r5c_do_reclaim(conf);
> +
> +	r5c_do_reclaim(conf, log);
>   	r5l_do_reclaim(log);
>   }
>   
>   void r5l_wake_reclaim(struct r5conf *conf, sector_t space)
>   {
> -	__r5l_wake_reclaim(conf->log, space);
> +	struct r5l_log *log = get_log_for_io(conf);
> +
> +	__r5l_wake_reclaim(log, space);
>   }
>   
>   void r5l_quiesce(struct r5conf *conf, int quiesce)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	struct mddev *mddev;
>   
>   	if (quiesce) {
> @@ -2534,16 +2559,20 @@ static void r5l_write_super(struct r5l_log *log, sector_t cp)
>   static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   {
>   	struct r5conf *conf;
> -	int ret;
> +	struct r5l_log *log;
> +	int ret = 0;
>   
>   	spin_lock(&mddev->lock);
>   	conf = mddev->private;
> -	if (!conf || !conf->log) {
> -		spin_unlock(&mddev->lock);
> -		return 0;
> -	}
> +	if (!conf)
> +		goto out_spin_unlock;
> +
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	if (!log)
> +		goto out;
>   
> -	switch (conf->log->r5c_journal_mode) {
> +	switch (log->r5c_journal_mode) {
>   	case R5C_JOURNAL_MODE_WRITE_THROUGH:
>   		ret = snprintf(
>   			page, PAGE_SIZE, "[%s] %s\n",
> @@ -2559,6 +2588,10 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   	default:
>   		ret = 0;
>   	}
> +
> +out:
> +	rcu_read_unlock();
> +out_spin_unlock:
>   	spin_unlock(&mddev->lock);
>   	return ret;
>   }
> @@ -2572,13 +2605,15 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page)
>   int r5c_journal_mode_set(struct mddev *mddev, int mode)
>   {
>   	struct r5conf *conf;
> +	struct r5l_log *log;
> +	int ret = 0;
>   
>   	if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH ||
>   	    mode > R5C_JOURNAL_MODE_WRITE_BACK)
>   		return -EINVAL;
>   
>   	conf = mddev->private;
> -	if (!conf || !conf->log)
> +	if (!conf || !rcu_access_pointer(conf->log))
>   		return -ENODEV;
>   
>   	if (raid5_calc_degraded(conf) > 0 &&
> @@ -2586,12 +2621,19 @@ int r5c_journal_mode_set(struct mddev *mddev, int mode)
>   		return -EINVAL;
>   
>   	mddev_suspend(mddev);
> -	conf->log->r5c_journal_mode = mode;
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
> +	if (log) {
> +		log->r5c_journal_mode = mode;
> +		pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
> +			 mdname(mddev), mode, r5c_journal_mode_str[mode]);
> +	} else {
> +		ret = -ENODEV;
> +	}
> +	rcu_read_unlock();
>   	mddev_resume(mddev);
>   
> -	pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n",
> -		 mdname(mddev), mode, r5c_journal_mode_str[mode]);
> -	return 0;
> +	return ret;
>   }
>   EXPORT_SYMBOL(r5c_journal_mode_set);
>   
> @@ -2637,7 +2679,7 @@ int r5c_try_caching_write(struct r5conf *conf,
>   			  struct stripe_head_state *s,
>   			  int disks)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int i;
>   	struct r5dev *dev;
>   	int to_cache = 0;
> @@ -2804,7 +2846,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
>   				 struct stripe_head *sh,
>   				 struct stripe_head_state *s)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int i;
>   	int do_wakeup = 0;
>   	sector_t tree_index;
> @@ -2887,7 +2929,7 @@ void r5c_finish_stripe_write_out(struct r5conf *conf,
>   
>   int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	int pages = 0;
>   	int reserve;
>   	int i;
> @@ -2943,7 +2985,7 @@ int r5c_cache_data(struct r5conf *conf, struct stripe_head *sh)
>   /* check whether this big stripe is in write back cache. */
>   bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log = get_log_for_io(conf);
>   	sector_t tree_index;
>   	void *slot;
>   
> @@ -2953,6 +2995,7 @@ bool r5c_big_stripe_cached(struct r5conf *conf, sector_t sect)
>   	WARN_ON_ONCE(!rcu_read_lock_held());
>   	tree_index = r5c_tree_index(conf, sect);
>   	slot = radix_tree_lookup(&log->big_stripe_tree, tree_index);
> +
>   	return slot != NULL;
>   }
>   
> @@ -3033,30 +3076,38 @@ static int r5l_load_log(struct r5l_log *log)
>   
>   int r5l_start(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   	int ret;
>   
> +	log = rcu_dereference_protected(conf->log,
> +			lockdep_is_held(&conf->mddev->reconfig_mutex));
>   	if (!log)
>   		return 0;
>   
>   	ret = r5l_load_log(log);
>   	if (ret)
>   		r5l_exit_log(conf);
> +
>   	return ret;
>   }
>   
>   void r5c_update_on_rdev_error(struct mddev *mddev, struct md_rdev *rdev)
>   {
>   	struct r5conf *conf = mddev->private;
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> +	rcu_read_lock();
> +	log = rcu_dereference(conf->log);
>   	if (!log)
> -		return;
> +		goto out;
>   
>   	if ((raid5_calc_degraded(conf) > 0 ||
>   	     test_bit(Journal, &rdev->flags)) &&
> -	    conf->log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
> +	    log->r5c_journal_mode == R5C_JOURNAL_MODE_WRITE_BACK)
>   		schedule_work(&log->disable_writeback_work);
> +
> +out:
> +	rcu_read_unlock();
>   }
>   
>   int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
> @@ -3164,15 +3215,17 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
>   
>   void r5l_exit_log(struct r5conf *conf)
>   {
> -	struct r5l_log *log = conf->log;
> +	struct r5l_log *log;
>   
> -	conf->log = NULL;
> -	synchronize_rcu();
> +	lockdep_assert_held(&conf->mddev->reconfig_mutex);
> +	log = rcu_replace_pointer(conf->log, NULL, 1);
>   
>   	/* Ensure disable_writeback_work wakes up and exits */
>   	wake_up(&conf->mddev->sb_wait);
>   	flush_work(&log->disable_writeback_work);
>   	md_unregister_thread(&log->reclaim_thread);
> +	synchronize_rcu();
> +
>   	mempool_exit(&log->meta_pool);
>   	bioset_exit(&log->bs);
>   	mempool_exit(&log->io_pool);
> diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
> index f26e6f4c7f9a..24b4dbd5b25c 100644
> --- a/drivers/md/raid5-log.h
> +++ b/drivers/md/raid5-log.h
> @@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
>   {
>   	struct r5conf *conf = sh->raid_conf;
>   
> -	if (conf->log) {
> +	if (rcu_access_pointer(conf->log)) {
>   		if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
>   			/* writing out phase */
>   			if (s->waiting_extra_page)
> @@ -79,7 +79,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
>   {
>   	struct r5conf *conf = sh->raid_conf;
>   
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_stripe_write_finished(sh);
>   	else if (raid5_has_ppl(conf))
>   		ppl_stripe_write_finished(sh);
> @@ -87,7 +87,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
>   
>   static inline void log_write_stripe_run(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_write_stripe_run(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_write_stripe_run(conf);
> @@ -95,7 +95,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
>   
>   static inline void log_flush_stripe_to_raid(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_flush_stripe_to_raid(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_write_stripe_run(conf);
> @@ -105,7 +105,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   {
>   	int ret = -ENODEV;
>   
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		ret = r5l_handle_flush_request(conf, bio);
>   	else if (raid5_has_ppl(conf))
>   		ret = ppl_handle_flush_request(bio);
> @@ -115,7 +115,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
>   
>   static inline void log_quiesce(struct r5conf *conf, int quiesce)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_quiesce(conf, quiesce);
>   	else if (raid5_has_ppl(conf))
>   		ppl_quiesce(conf, quiesce);
> @@ -123,7 +123,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
>   
>   static inline void log_exit(struct r5conf *conf)
>   {
> -	if (conf->log)
> +	if (rcu_access_pointer(conf->log))
>   		r5l_exit_log(conf);
>   	else if (raid5_has_ppl(conf))
>   		ppl_exit_log(conf);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 37fe2af77c93..c06c9ef88417 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7937,7 +7937,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>   	struct md_rdev *tmp;
>   
>   	print_raid5_conf(conf);
> -	if (test_bit(Journal, &rdev->flags) && conf->log) {
> +	if (test_bit(Journal, &rdev->flags) && rcu_access_pointer(conf->log)) {
>   		mddev_suspend(mddev);
>   		log_exit(conf);
>   		mddev_resume(mddev);
> @@ -8019,7 +8019,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>   	int last = conf->raid_disks - 1;
>   
>   	if (test_bit(Journal, &rdev->flags)) {
> -		if (conf->log)
> +		if (rcu_access_pointer(conf->log))
>   			return -EBUSY;
>   
>   		rdev->raid_disk = 0;
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index 638d29863503..04fe5b6f679c 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -684,7 +684,7 @@ struct r5conf {
>   	struct r5worker_group	*worker_groups;
>   	int			group_cnt;
>   	int			worker_cnt_per_group;
> -	struct r5l_log		*log;
> +	struct r5l_log		__rcu *log;
>   	void			*log_private;
>   
>   	spinlock_t		pending_bios_lock;

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 00/15] Bug fixes for mdadm tests
  2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
                   ` (14 preceding siblings ...)
  2022-05-19 19:13 ` [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
@ 2022-05-23  6:28 ` Song Liu
  15 siblings, 0 replies; 41+ messages in thread
From: Song Liu @ 2022-05-23  6:28 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: open list, linux-raid, Christoph Hellwig, Guoqing Jiang, Xiao Ni,
	Stephen Bates, Martin Oliveira, David Sloan

On Thu, May 19, 2022 at 12:13 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hi,
>
> This series includes fixes to fix all the kernel panics in the mdadm
> tests and some, related, sparse issues. The first 10 patches
> clean refactor the raid5-cache code so that the RCU usage of conf->log
> can be cleaned up which is done in patches 11 and 12 -- fixing some
> actual kernel NULL pointer dereference crashes in the mdadm test.
>
> Patch 13 fixes some of the remaining sparse warnings that are just
> missing __rcu annotations.
>
> Patches 14 and 15 fix a couple additional hangs in an mdadm test.
>
> This series also originally included a patch[1] to fix the
> mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
> crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
> this work to improve mdadm tests will mean these types of bugs will
> be caught much sooner, before merging.

Thanks for the fix! The set looks good overall. Please address feedback
from Christoph and Donald. We should be able to ship it soon.

Thanks,
Song

>
> This series will be followed by another series for mdadm which fixes
> the segfaults and annotates some failing tests to make mdadm tests
> runnable fairly reliably, but I'll wait for a stable hash for this
> series to note the kernel version tested against. Following that,
> v3 of my lock contention series will be sent with more confidence
> of its correctness.
>
> This series is based on the current md/md-next branch as of today
> (6ad84d559b8c). A git branch is available here:
>
>   https://github.com/sbates130272/linux-p2pmem md-bug
>
> Thanks,
>
> Logan
>
> [1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a
>
> --
>
> Logan Gunthorpe (15):
>   md/raid5-log: Drop extern decorators for function prototypes
>   md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
>   md/raid5-cache: Refactor r5l_start() to take a struct r5conf
>   md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
>     r5conf
>   md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
>   md/raid5-cache: Refactor remaining functions to take a r5conf
>   md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
>   md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
>   md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
>   md/raid5-cache: Take struct r5l_log in
>     r5c_log_required_to_flush_cache()
>   md/raid5: Ensure array is suspended for calls to log_exit()
>   md/raid5-cache: Add RCU protection to conf->log accesses
>   md/raid5-cache: Annotate pslot with __rcu notation
>   md: Ensure resync is reported after it starts
>   md: Notify sysfs sync_completed in md_reap_sync_thread()
>
>  drivers/md/md.c          |  13 ++-
>  drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
>  drivers/md/raid5-log.h   | 103 ++++++++---------
>  drivers/md/raid5-ppl.c   |   2 +-
>  drivers/md/raid5.c       |  50 ++++----
>  drivers/md/raid5.h       |   2 +-
>  6 files changed, 231 insertions(+), 179 deletions(-)
>
>
> base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
> --
> 2.30.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-22  7:31   ` Donald Buczek
@ 2022-05-23  6:47     ` Song Liu
  2022-05-23 18:15       ` Song Liu
  2022-05-24 15:59       ` Logan Gunthorpe
  0 siblings, 2 replies; 41+ messages in thread
From: Song Liu @ 2022-05-23  6:47 UTC (permalink / raw)
  To: Donald Buczek
  Cc: Logan Gunthorpe, open list, linux-raid, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

On Sun, May 22, 2022 at 12:32 AM Donald Buczek <buczek@molgen.mpg.de> wrote:
>
> On 19.05.22 21:13, Logan Gunthorpe wrote:
> > The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> > conf->log when run repeatedly. conf->log was sort of protected with
> > a RCU, but most dereferences were not done with the correct functions.
> >
> > Add rcu_read_locks() and rcu_access_pointers() to the appropriate
> > places.
> >
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

[...]

> > diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
> > index f26e6f4c7f9a..24b4dbd5b25c 100644
> > --- a/drivers/md/raid5-log.h
> > +++ b/drivers/md/raid5-log.h
> > @@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
> >   {
> >       struct r5conf *conf = sh->raid_conf;
> >
> > -     if (conf->log) {
> > +     if (rcu_access_pointer(conf->log)) {
>
>
> A problem here is that `struct r5l_log` of `conf->log` is private to raid5-cache.c and gcc below version 10 (wrongly) regards the `typeof(*p) *local` declaration of __rcu_access_pointer as a dereference:
>
>    CC      drivers/md/raid5.o
>
> In file included from ./include/linux/rculist.h:11:0,
>
>                   from ./include/linux/dcache.h:8,
>
>                   from ./include/linux/fs.h:8,
>
>                   from ./include/linux/highmem.h:5,
>
>                   from ./include/linux/bvec.h:10,
>
>                   from ./include/linux/blk_types.h:10,
>
>                   from ./include/linux/blkdev.h:9,
>
>                   from drivers/md/raid5.c:38:
>
> drivers/md/raid5-log.h: In function ‘log_stripe’:
>
> ./include/linux/rcupdate.h:384:9: error: dereferencing pointer to incomplete type ‘struct r5l_log’
>
>    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>
>           ^
>
> ./include/linux/rcupdate.h:495:31: note: in expansion of macro ‘__rcu_access_pointer’
>
>   #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
>
>                                 ^~~~~~~~~~~~~~~~~~~~
>
> drivers/md/raid5-log.h:61:6: note: in expansion of macro ‘rcu_access_pointer’
>
>    if (rcu_access_pointer(conf->log)) {
>
>        ^~~~~~~~~~~~~~~~~~
>
> make[2]: *** [scripts/Makefile.build:288: drivers/md/raid5.o] Error 1
>
> make[1]: *** [scripts/Makefile.build:550: drivers/md] Error 2
>
> make: *** [Makefile:1834: drivers] Error 2

This is annoying.. And there are a few other cases in raid5-log.h and
raid5.c.

Maybe we should move the definition of r5l_log to raid5-log.h?

Thanks,
Song


>
>
> See https://godbolt.org/z/TPP8MdKbc to test compiler versions with this construct.
>
> Best
>
>    Donald
>
>
> >               if (!test_bit(STRIPE_R5C_CACHING, &sh->state)) {
> >                       /* writing out phase */
> >                       if (s->waiting_extra_page)
> > @@ -79,7 +79,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
> >   {
> >       struct r5conf *conf = sh->raid_conf;
> >
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               r5l_stripe_write_finished(sh);
> >       else if (raid5_has_ppl(conf))
> >               ppl_stripe_write_finished(sh);
> > @@ -87,7 +87,7 @@ static inline void log_stripe_write_finished(struct stripe_head *sh)
> >
> >   static inline void log_write_stripe_run(struct r5conf *conf)
> >   {
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               r5l_write_stripe_run(conf);
> >       else if (raid5_has_ppl(conf))
> >               ppl_write_stripe_run(conf);
> > @@ -95,7 +95,7 @@ static inline void log_write_stripe_run(struct r5conf *conf)
> >
> >   static inline void log_flush_stripe_to_raid(struct r5conf *conf)
> >   {
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               r5l_flush_stripe_to_raid(conf);
> >       else if (raid5_has_ppl(conf))
> >               ppl_write_stripe_run(conf);
> > @@ -105,7 +105,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
> >   {
> >       int ret = -ENODEV;
> >
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               ret = r5l_handle_flush_request(conf, bio);
> >       else if (raid5_has_ppl(conf))
> >               ret = ppl_handle_flush_request(bio);
> > @@ -115,7 +115,7 @@ static inline int log_handle_flush_request(struct r5conf *conf, struct bio *bio)
> >
> >   static inline void log_quiesce(struct r5conf *conf, int quiesce)
> >   {
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               r5l_quiesce(conf, quiesce);
> >       else if (raid5_has_ppl(conf))
> >               ppl_quiesce(conf, quiesce);
> > @@ -123,7 +123,7 @@ static inline void log_quiesce(struct r5conf *conf, int quiesce)
> >
> >   static inline void log_exit(struct r5conf *conf)
> >   {
> > -     if (conf->log)
> > +     if (rcu_access_pointer(conf->log))
> >               r5l_exit_log(conf);
> >       else if (raid5_has_ppl(conf))
> >               ppl_exit_log(conf);
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c

[...]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-23  6:47     ` Song Liu
@ 2022-05-23 18:15       ` Song Liu
  2022-05-24 16:14         ` Logan Gunthorpe
  2022-05-24 15:59       ` Logan Gunthorpe
  1 sibling, 1 reply; 41+ messages in thread
From: Song Liu @ 2022-05-23 18:15 UTC (permalink / raw)
  To: Donald Buczek
  Cc: Logan Gunthorpe, open list, linux-raid, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

On Sun, May 22, 2022 at 11:47 PM Song Liu <song@kernel.org> wrote:
>
> On Sun, May 22, 2022 at 12:32 AM Donald Buczek <buczek@molgen.mpg.de> wrote:
> >
> > On 19.05.22 21:13, Logan Gunthorpe wrote:
> > > The mdadm test 21raid5cache randomly fails with NULL pointer accesses
> > > conf->log when run repeatedly. conf->log was sort of protected with
> > > a RCU, but most dereferences were not done with the correct functions.
> > >
> > > Add rcu_read_locks() and rcu_access_pointers() to the appropriate
> > > places.
> > >
> > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>
> [...]
>
> > > diff --git a/drivers/md/raid5-log.h b/drivers/md/raid5-log.h
> > > index f26e6f4c7f9a..24b4dbd5b25c 100644
> > > --- a/drivers/md/raid5-log.h
> > > +++ b/drivers/md/raid5-log.h
> > > @@ -58,7 +58,7 @@ static inline int log_stripe(struct stripe_head *sh, struct stripe_head_state *s
> > >   {
> > >       struct r5conf *conf = sh->raid_conf;
> > >
> > > -     if (conf->log) {
> > > +     if (rcu_access_pointer(conf->log)) {
> >
> >
> > A problem here is that `struct r5l_log` of `conf->log` is private to raid5-cache.c and gcc below version 10 (wrongly) regards the `typeof(*p) *local` declaration of __rcu_access_pointer as a dereference:
> >
> >    CC      drivers/md/raid5.o
> >
> > In file included from ./include/linux/rculist.h:11:0,
> >
> >                   from ./include/linux/dcache.h:8,
> >
> >                   from ./include/linux/fs.h:8,
> >
> >                   from ./include/linux/highmem.h:5,
> >
> >                   from ./include/linux/bvec.h:10,
> >
> >                   from ./include/linux/blk_types.h:10,
> >
> >                   from ./include/linux/blkdev.h:9,
> >
> >                   from drivers/md/raid5.c:38:
> >
> > drivers/md/raid5-log.h: In function ‘log_stripe’:
> >
> > ./include/linux/rcupdate.h:384:9: error: dereferencing pointer to incomplete type ‘struct r5l_log’
> >
> >    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> >
> >           ^
> >
> > ./include/linux/rcupdate.h:495:31: note: in expansion of macro ‘__rcu_access_pointer’
> >
> >   #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
> >
> >                                 ^~~~~~~~~~~~~~~~~~~~
> >
> > drivers/md/raid5-log.h:61:6: note: in expansion of macro ‘rcu_access_pointer’
> >
> >    if (rcu_access_pointer(conf->log)) {
> >
> >        ^~~~~~~~~~~~~~~~~~
> >
> > make[2]: *** [scripts/Makefile.build:288: drivers/md/raid5.o] Error 1
> >
> > make[1]: *** [scripts/Makefile.build:550: drivers/md] Error 2
> >
> > make: *** [Makefile:1834: drivers] Error 2
>
> This is annoying.. And there are a few other cases in raid5-log.h and
> raid5.c.
>
> Maybe we should move the definition of r5l_log to raid5-log.h?

Or we can use READ_ONCE(conf->log) for most cases.

Thought?
Song

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 14/15] md: Ensure resync is reported after it starts
  2022-05-21 11:51   ` Christoph Hellwig
@ 2022-05-24 15:45     ` Logan Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-24 15:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-raid, Song Liu, Guoqing Jiang, Xiao Ni,
	Stephen Bates, Martin Oliveira, David Sloan



On 2022-05-21 05:51, Christoph Hellwig wrote:
> On Thu, May 19, 2022 at 01:13:10PM -0600, Logan Gunthorpe wrote:
>> The 07layouts test in mdadm fails on some systems. The failure
>> presents itself as the backup file not being removed before the next
>> layout is grown into:
>>
>>   mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
>>       File exists
>>
>> This is because the background mdadm process, which is responsible for
>> cleaning up this backup file gets into an infinite loop waiting for
>> the reshape to start. mdadm checks the mdstat file if a reshape is
>> going and, if it is not, it waits for an event on the file or times
>> out in 5 seconds. On faster machines, the reshape may complete before
>> the 5 seconds times out, and thus the background mdadm process loops
>> waiting for a reshape to start that has already occurred.
>>
>> mdadm reads the mdstat file to start, but mdstat does not report that the
>> reshape has begun, even though it has indeed begun. So the mdstat_wait()
>> call (in mdadm) which polls on the mdstat file won't ever return until
>> timing out.
>>
>> The reason mdstat reports the reshape has started is due to an issue
>> in status_resync(). recovery_active is subtracted from curr_resync which
>> will result in a value of zero for the first chunk of reshaped data, and
>> the resulting read will report no reshape in progress.
>>
>> To fix this, if "resync - recovery_active" is zero: force the value to
>> be 4 so the code reports a resync in progress.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>  drivers/md/md.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 8273ac5eef06..dbac63c8e35c 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>>  		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>>  			/* Still cleaning up */
>>  			resync = max_sectors;
>> -	} else if (resync > max_sectors)
>> +	} else if (resync > max_sectors) {
>>  		resync = max_sectors;
>> -	else
>> +	} else {
>>  		resync -= atomic_read(&mddev->recovery_active);
>> +		if (!resync) {
>> +			/*
>> +			 * Resync has started, but if it's zero, ensure
>> +			 * it is still reported, by forcing it to be 4
>> +			 */
>> +			resync = 4;
> 
> Where does this magic 4 come from?

There are a bunch of existing magic numbers in this code. Just before
this hunk there's a if (resync <= 3) and just after the code there's an
if (resync < 3).

There's a comment in md_do_sync() indicating overloaded values for 1 and
2. I can try and turn this into an enum for v2.

Logan


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-22  7:32   ` Donald Buczek
@ 2022-05-24 15:55     ` Logan Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-24 15:55 UTC (permalink / raw)
  To: Donald Buczek, linux-kernel, linux-raid, Song Liu
  Cc: Christoph Hellwig, Guoqing Jiang, Xiao Ni, Stephen Bates,
	Martin Oliveira, David Sloan



On 2022-05-22 01:32, Donald Buczek wrote:
>>   /*
>> @@ -703,7 +720,7 @@ static void r5c_disable_writeback_async(struct work_struct *work)
>>   
>>   	/* wait superblock change before suspend */
>>   	wait_event(mddev->sb_wait,
>> -		   conf->log == NULL ||
>> +		   rcu_access_pointer(conf->log) ||
> 
> Reversed condition?
> 
> I think, some examples in Documentation/RCU/Design/Requirements/Requirements.rst are reversed, too.

Oops. Nice Catch!

Thanks,

Logan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-23  6:47     ` Song Liu
  2022-05-23 18:15       ` Song Liu
@ 2022-05-24 15:59       ` Logan Gunthorpe
  2022-05-24 18:13         ` Song Liu
  1 sibling, 1 reply; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-24 15:59 UTC (permalink / raw)
  To: Song Liu, Donald Buczek
  Cc: open list, linux-raid, Christoph Hellwig, Guoqing Jiang, Xiao Ni,
	Stephen Bates, Martin Oliveira, David Sloan



On 2022-05-23 00:47, Song Liu wrote:
>> A problem here is that `struct r5l_log` of `conf->log` is private to raid5-cache.c and gcc below version 10 (wrongly) regards the `typeof(*p) *local` declaration of __rcu_access_pointer as a dereference:
>>
>>    CC      drivers/md/raid5.o
>>
>> In file included from ./include/linux/rculist.h:11:0,
>>
>>                   from ./include/linux/dcache.h:8,
>>
>>                   from ./include/linux/fs.h:8,
>>
>>                   from ./include/linux/highmem.h:5,
>>
>>                   from ./include/linux/bvec.h:10,
>>
>>                   from ./include/linux/blk_types.h:10,
>>
>>                   from ./include/linux/blkdev.h:9,
>>
>>                   from drivers/md/raid5.c:38:
>>
>> drivers/md/raid5-log.h: In function ‘log_stripe’:
>>
>> ./include/linux/rcupdate.h:384:9: error: dereferencing pointer to incomplete type ‘struct r5l_log’
>>
>>    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>>
>>           ^
>>
>> ./include/linux/rcupdate.h:495:31: note: in expansion of macro ‘__rcu_access_pointer’
>>
>>   #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
>>
>>                                 ^~~~~~~~~~~~~~~~~~~~
>>
>> drivers/md/raid5-log.h:61:6: note: in expansion of macro ‘rcu_access_pointer’
>>
>>    if (rcu_access_pointer(conf->log)) {
>>
>>        ^~~~~~~~~~~~~~~~~~
>>
>> make[2]: *** [scripts/Makefile.build:288: drivers/md/raid5.o] Error 1
>>
>> make[1]: *** [scripts/Makefile.build:550: drivers/md] Error 2
>>
>> make: *** [Makefile:1834: drivers] Error 2
> 
> This is annoying.. And there are a few other cases in raid5-log.h and
> raid5.c.
> 
> Maybe we should move the definition of r5l_log to raid5-log.h?

That's the only solution I can think of, and what I'll likely do for v2.
If anyone has a better solution I'm open to it.

Logan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-23 18:15       ` Song Liu
@ 2022-05-24 16:14         ` Logan Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Logan Gunthorpe @ 2022-05-24 16:14 UTC (permalink / raw)
  To: Song Liu, Donald Buczek
  Cc: open list, linux-raid, Christoph Hellwig, Guoqing Jiang, Xiao Ni,
	Stephen Bates, Martin Oliveira, David Sloan



On 2022-05-23 12:15, Song Liu wrote:
> On Sun, May 22, 2022 at 11:47 PM Song Liu <song@kernel.org> wrote:
>> Maybe we should move the definition of r5l_log to raid5-log.h?
> 
> Or we can use READ_ONCE(conf->log) for most cases.

READ_ONCE() will still generate sparse errors.

Logan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses
  2022-05-24 15:59       ` Logan Gunthorpe
@ 2022-05-24 18:13         ` Song Liu
  0 siblings, 0 replies; 41+ messages in thread
From: Song Liu @ 2022-05-24 18:13 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Donald Buczek, open list, linux-raid, Christoph Hellwig,
	Guoqing Jiang, Xiao Ni, Stephen Bates, Martin Oliveira,
	David Sloan

On Tue, May 24, 2022 at 8:59 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2022-05-23 00:47, Song Liu wrote:
> >> A problem here is that `struct r5l_log` of `conf->log` is private to raid5-cache.c and gcc below version 10 (wrongly) regards the `typeof(*p) *local` declaration of __rcu_access_pointer as a dereference:
> >>
> >>    CC      drivers/md/raid5.o
> >>
> >> In file included from ./include/linux/rculist.h:11:0,
> >>
> >>                   from ./include/linux/dcache.h:8,
> >>
> >>                   from ./include/linux/fs.h:8,
> >>
> >>                   from ./include/linux/highmem.h:5,
> >>
> >>                   from ./include/linux/bvec.h:10,
> >>
> >>                   from ./include/linux/blk_types.h:10,
> >>
> >>                   from ./include/linux/blkdev.h:9,
> >>
> >>                   from drivers/md/raid5.c:38:
> >>
> >> drivers/md/raid5-log.h: In function ‘log_stripe’:
> >>
> >> ./include/linux/rcupdate.h:384:9: error: dereferencing pointer to incomplete type ‘struct r5l_log’
> >>
> >>    typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> >>
> >>           ^
> >>
> >> ./include/linux/rcupdate.h:495:31: note: in expansion of macro ‘__rcu_access_pointer’
> >>
> >>   #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
> >>
> >>                                 ^~~~~~~~~~~~~~~~~~~~
> >>
> >> drivers/md/raid5-log.h:61:6: note: in expansion of macro ‘rcu_access_pointer’
> >>
> >>    if (rcu_access_pointer(conf->log)) {
> >>
> >>        ^~~~~~~~~~~~~~~~~~
> >>
> >> make[2]: *** [scripts/Makefile.build:288: drivers/md/raid5.o] Error 1
> >>
> >> make[1]: *** [scripts/Makefile.build:550: drivers/md] Error 2
> >>
> >> make: *** [Makefile:1834: drivers] Error 2
> >
> > This is annoying.. And there are a few other cases in raid5-log.h and
> > raid5.c.
> >
> > Maybe we should move the definition of r5l_log to raid5-log.h?
>
> That's the only solution I can think of, and what I'll likely do for v2.
> If anyone has a better solution I'm open to it.

Let's move the definition.

Thanks,
Song

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-05-24 18:13 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
2022-05-21 11:36   ` Christoph Hellwig
2022-05-19 19:12 ` [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf Logan Gunthorpe
2022-05-21 11:37   ` Christoph Hellwig
2022-05-19 19:12 ` [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() " Logan Gunthorpe
2022-05-21 11:37   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() " Logan Gunthorpe
2022-05-21 11:38   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() " Logan Gunthorpe
2022-05-21 11:38   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf Logan Gunthorpe
2022-05-21 11:40   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
2022-05-21 11:41   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe() Logan Gunthorpe
2022-05-21 11:42   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp() Logan Gunthorpe
2022-05-21 11:42   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache() Logan Gunthorpe
2022-05-21 11:43   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
2022-05-21 11:44   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
2022-05-21 11:50   ` Christoph Hellwig
2022-05-22  7:31   ` Donald Buczek
2022-05-23  6:47     ` Song Liu
2022-05-23 18:15       ` Song Liu
2022-05-24 16:14         ` Logan Gunthorpe
2022-05-24 15:59       ` Logan Gunthorpe
2022-05-24 18:13         ` Song Liu
2022-05-22  7:32   ` Donald Buczek
2022-05-24 15:55     ` Logan Gunthorpe
2022-05-19 19:13 ` [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
2022-05-21 11:51   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 14/15] md: Ensure resync is reported after it starts Logan Gunthorpe
2022-05-21 11:51   ` Christoph Hellwig
2022-05-24 15:45     ` Logan Gunthorpe
2022-05-19 19:13 ` [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
2022-05-21 11:52   ` Christoph Hellwig
2022-05-23  6:28 ` [PATCH v1 00/15] Bug fixes for mdadm tests Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.