All of lore.kernel.org
 help / color / mirror / Atom feed
* [md PATCH 00/18] Assorted minor fixes, particularly RCU protection.
@ 2016-06-02  6:19 NeilBrown
  2016-06-02  6:19 ` [md PATCH 16/18] md/multipath: add rcu protection to rdev access in multipath_status NeilBrown
                   ` (18 more replies)
  0 siblings, 19 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

The 'rdev' fields in each personality's config data are often accessed
under rcu_read_lock() protection to avoid races with
->hot_remove_disk() removing the rdev.

Originally this was not necessary during resync/recovery etc because
->hot_remove_disk() was only called from md_check_recovery(), and it
would only make the call if there was no resync etc happening.

However we now call ->hot_remove_disk() (from
remove_and_add_spares()) from other contexts, so there could be a race
in the resync code.

So this patch set adds a lot of extra rcu_read_lock protection and
clean up some other bits and pieces on the way.

My goal was the final patch.  If you have a large raid10 array and
fail half of the devices at once (e.g. unplug a rack with half of the
mirrors) then synchronize_rcu() will be called once for each device,
which can add up to a big delay.  A single call should suffice.
The final patch makes that change.

Thanks,
NeilBrown
---

NeilBrown (18):
      md: disconnect device from personality before trying to remove it.
      md/raid1,raid10: don't recheck "Faulty" flag in read-balance.
      md/raid10: fix refounct imbalance when resyncing an array with a replacement device.
      md/raid10: add rcu protection in raid10_status.
      md/raid10: add rcu protection to rdev access in raid10_sync_request.
      md/raid10: add rcu protection to rdev access during reshape.
      md/raid10: minor code improvement in fix_read_error()
      md/raid10: simplify print_conf a little.
      md/raid10: stop print_conf from being too verbose.
      md/raid1: small cleanup in raid1_end_read/write_request
      md/raid1: small code cleanup in end_sync_write
      md/raid1: add rcu protection to rdev in fix_read_error
      md/raid5: add rcu protection to rdev accesses in handle_failed_sync.
      md/raid5: add rcu protection to rdev accesses in want_replace
      md/raid5: add rcu protection to rdev accesses in raid5_status.
      md/multipath: add rcu protection to rdev access in multipath_status.
      md: be extra careful not to take a reference to a Faulty device.
      md: reduce the number of synchronize_rcu() calls when multiple devices fail.


 drivers/md/md.c        |   29 ++++++-
 drivers/md/md.h        |    5 +
 drivers/md/multipath.c |   29 ++++---
 drivers/md/raid1.c     |  125 ++++++++++++++--------------
 drivers/md/raid10.c    |  213 +++++++++++++++++++++++++++++-------------------
 drivers/md/raid5.c     |   41 ++++++---
 6 files changed, 264 insertions(+), 178 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [md PATCH 01/18] md: disconnect device from personality before trying to remove it.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (12 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 12/18] md/raid1: add rcu protection to rdev in fix_read_error NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-03 22:31   ` Shaohua Li
  2016-06-02  6:19 ` [md PATCH 13/18] md/raid5: add rcu protection to rdev accesses in handle_failed_sync NeilBrown
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

When the HOT_REMOVE_DISK ioctl is used to remove a device, we
call remove_and_add_spares() which will remove it from the personality
if possible.  This improves the chances that the removal will succeed.

When writing "remove" to dev-XX/state, we don't.  So that can fail more easily.

So add the remove_and_add_spares() into "remove" handling.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 866825f10b4c..2d26099e1160 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2596,6 +2596,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		else
 			err = -EBUSY;
 	} else if (cmd_match(buf, "remove")) {
+		clear_bit(Blocked, &rdev->flags);
+		remove_and_add_spares(rdev->mddev, rdev);
 		if (rdev->raid_disk >= 0)
 			err = -EBUSY;
 		else {



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 02/18] md/raid1, raid10: don't recheck "Faulty" flag in read-balance.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (14 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 13/18] md/raid5: add rcu protection to rdev accesses in handle_failed_sync NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 18/18] md: reduce the number of synchronize_rcu() calls when multiple devices fail NeilBrown
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Re-checking the faulty flag here brings no value.
The comment about "risk" refers to the risk that the device could
be in the process of being removed by ->hot_remove_disk().
However providing that the ->nr_pending count is incremented inside
an rcu_read_locked() region, there is no risk of that happening.

This is because the rdev pointer (in the personalities array) is set
to NULL before synchronize_rcu(), and ->nr_pending is tested
afterwards.  If the rcu_read_locked region happens before the
synchronize_rcu(), the test will see that nr_pending has been incremented.
If it happens afterwards, the rdev pointer will be NULL so there is nothing
to increment.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c  |    7 -------
 drivers/md/raid10.c |    8 --------
 2 files changed, 15 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c7c8cde0ab21..a54edbe741ff 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -689,13 +689,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (!rdev)
 			goto retry;
 		atomic_inc(&rdev->nr_pending);
-		if (test_bit(Faulty, &rdev->flags)) {
-			/* cannot risk returning a device that failed
-			 * before we inc'ed nr_pending
-			 */
-			rdev_dec_pending(rdev, conf->mddev);
-			goto retry;
-		}
 		sectors = best_good_sectors;
 
 		if (conf->mirrors[best_disk].next_seq_sect != this_sector)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c7de2a53e625..cc9e3813e1a4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -707,7 +707,6 @@ static struct md_rdev *read_balance(struct r10conf *conf,
 
 	raid10_find_phys(conf, r10_bio);
 	rcu_read_lock();
-retry:
 	sectors = r10_bio->sectors;
 	best_slot = -1;
 	best_rdev = NULL;
@@ -804,13 +803,6 @@ retry:
 
 	if (slot >= 0) {
 		atomic_inc(&rdev->nr_pending);
-		if (test_bit(Faulty, &rdev->flags)) {
-			/* Cannot risk returning a device that failed
-			 * before we inc'ed nr_pending
-			 */
-			rdev_dec_pending(rdev, conf->mddev);
-			goto retry;
-		}
 		r10_bio->read_slot = slot;
 	} else
 		rdev = NULL;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 04/18] md/raid10: add rcu protection in raid10_status.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
  2016-06-02  6:19 ` [md PATCH 16/18] md/multipath: add rcu protection to rdev access in multipath_status NeilBrown
  2016-06-02  6:19 ` [md PATCH 08/18] md/raid10: simplify print_conf a little NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 07/18] md/raid10: minor code improvement in fix_read_error() NeilBrown
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

mirrors[].rdev can become NULL at any point unless:
 - a counted reference is held
 - ->reconfig_mutex is held, or
 - rcu_read_lock() is held

raid10_status holds none of these.  So add rcu_read_lock()
protection.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f371d45f786a..e6f3a11f8f70 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1495,10 +1495,12 @@ static void raid10_status(struct seq_file *seq, struct mddev *mddev)
 	}
 	seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks,
 					conf->geo.raid_disks - mddev->degraded);
-	for (i = 0; i < conf->geo.raid_disks; i++)
-		seq_printf(seq, "%s",
-			      conf->mirrors[i].rdev &&
-			      test_bit(In_sync, &conf->mirrors[i].rdev->flags) ? "U" : "_");
+	rcu_read_lock();
+	for (i = 0; i < conf->geo.raid_disks; i++) {
+		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
+		seq_printf(seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_");
+	}
+	rcu_read_unlock();
 	seq_printf(seq, "]");
 }
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 03/18] md/raid10: fix refounct imbalance when resyncing an array with a replacement device.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (5 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 11/18] md/raid1: small code cleanup in end_sync_write NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 09/18] md/raid10: stop print_conf from being too verbose NeilBrown
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

If you have a raid10 with a replacement device that is resyncing -
e.g. after a crash before the replacement was complete - the write to
the replacement will increment nr_pending on the wrong device, which
will lead to strangeness.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cc9e3813e1a4..f371d45f786a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3222,7 +3222,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			bio->bi_error = -EIO;
 
 			sector = r10_bio->devs[i].addr;
-			atomic_inc(&conf->mirrors[d].rdev->nr_pending);
+			atomic_inc(&conf->mirrors[d].replacement->nr_pending);
 			bio->bi_next = biolist;
 			biolist = bio;
 			bio->bi_private = r10_bio;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 06/18] md/raid10: add rcu protection to rdev access during reshape.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (10 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 15/18] md/raid5: add rcu protection to rdev accesses in raid5_status NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 12/18] md/raid1: add rcu protection to rdev in fix_read_error NeilBrown
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

mirrors[].rdev can become NULL at any point unless:
   - a counted reference is held
   - ->reconfig_mutex is held, or
   - rcu_read_lock() is held

Reshape isn't always suitably careful as in the past rdev couldn't be
removed during reshape.  It can now, so add protection.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f6f21a253926..588cf6544f07 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4352,15 +4352,16 @@ read_more:
 	blist = read_bio;
 	read_bio->bi_next = NULL;
 
+	rcu_read_lock();
 	for (s = 0; s < conf->copies*2; s++) {
 		struct bio *b;
 		int d = r10_bio->devs[s/2].devnum;
 		struct md_rdev *rdev2;
 		if (s&1) {
-			rdev2 = conf->mirrors[d].replacement;
+			rdev2 = rcu_dereference(conf->mirrors[d].replacement);
 			b = r10_bio->devs[s/2].repl_bio;
 		} else {
-			rdev2 = conf->mirrors[d].rdev;
+			rdev2 = rcu_dereference(conf->mirrors[d].rdev);
 			b = r10_bio->devs[s/2].bio;
 		}
 		if (!rdev2 || test_bit(Faulty, &rdev2->flags))
@@ -4405,6 +4406,7 @@ read_more:
 		nr_sectors += len >> 9;
 	}
 bio_full:
+	rcu_read_unlock();
 	r10_bio->sectors = nr_sectors;
 
 	/* Now submit the read */
@@ -4456,16 +4458,20 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		struct bio *b;
 		int d = r10_bio->devs[s/2].devnum;
 		struct md_rdev *rdev;
+		rcu_read_lock();
 		if (s&1) {
-			rdev = conf->mirrors[d].replacement;
+			rdev = rcu_dereference(conf->mirrors[d].replacement);
 			b = r10_bio->devs[s/2].repl_bio;
 		} else {
-			rdev = conf->mirrors[d].rdev;
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			b = r10_bio->devs[s/2].bio;
 		}
-		if (!rdev || test_bit(Faulty, &rdev->flags))
+		if (!rdev || test_bit(Faulty, &rdev->flags)) {
+			rcu_read_unlock();
 			continue;
+		}
 		atomic_inc(&rdev->nr_pending);
+		rcu_read_unlock();
 		md_sync_acct(b->bi_bdev, r10_bio->sectors);
 		atomic_inc(&r10_bio->remaining);
 		b->bi_next = NULL;
@@ -4526,9 +4532,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
 		if (s > (PAGE_SIZE >> 9))
 			s = PAGE_SIZE >> 9;
 
+		rcu_read_lock();
 		while (!success) {
 			int d = r10b->devs[slot].devnum;
-			struct md_rdev *rdev = conf->mirrors[d].rdev;
+			struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
 			sector_t addr;
 			if (rdev == NULL ||
 			    test_bit(Faulty, &rdev->flags) ||
@@ -4536,11 +4543,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
 				goto failed;
 
 			addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
+			atomic_inc(&rdev->nr_pending);
+			rcu_read_unlock();
 			success = sync_page_io(rdev,
 					       addr,
 					       s << 9,
 					       bvec[idx].bv_page,
 					       READ, false);
+			rdev_dec_pending(rdev, mddev);
+			rcu_read_lock();
 			if (success)
 				break;
 		failed:
@@ -4550,6 +4561,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 			if (slot == first_slot)
 				break;
 		}
+		rcu_read_unlock();
 		if (!success) {
 			/* couldn't read this block, must give up */
 			set_bit(MD_RECOVERY_INTR,
@@ -4619,16 +4631,18 @@ static void raid10_finish_reshape(struct mddev *mddev)
 		}
 	} else {
 		int d;
+		rcu_read_lock();
 		for (d = conf->geo.raid_disks ;
 		     d < conf->geo.raid_disks - mddev->delta_disks;
 		     d++) {
-			struct md_rdev *rdev = conf->mirrors[d].rdev;
+			struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev)
 				clear_bit(In_sync, &rdev->flags);
-			rdev = conf->mirrors[d].replacement;
+			rdev = rcu_dereference(conf->mirrors[d].replacement);
 			if (rdev)
 				clear_bit(In_sync, &rdev->flags);
 		}
+		rcu_read_unlock();
 	}
 	mddev->layout = mddev->new_layout;
 	mddev->chunk_sectors = 1 << conf->geo.chunk_shift;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (7 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 09/18] md/raid10: stop print_conf from being too verbose NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-03 22:33   ` Shaohua Li
  2016-06-02  6:19 ` [md PATCH 14/18] md/raid5: add rcu protection to rdev accesses in want_replace NeilBrown
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

mirrors[].rdev can become NULL at any point unless:
  - a counted reference is held
  - ->reconfig_mutex is held, or
  - rcu_read_lock() is held

Previously they could not become NULL during a resync/recovery/reshape either.
However when remove_and_add_spares() was added to hot_remove_disk(), that
changed.

So raid10_sync_request didn't previously need to protect rdev access,
but now it does.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |  120 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 72 insertions(+), 48 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e6f3a11f8f70..f6f21a253926 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2871,11 +2871,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				/* Completed a full sync so the replacements
 				 * are now fully recovered.
 				 */
-				for (i = 0; i < conf->geo.raid_disks; i++)
-					if (conf->mirrors[i].replacement)
-						conf->mirrors[i].replacement
-							->recovery_offset
-							= MaxSector;
+				rcu_read_lock();
+				for (i = 0; i < conf->geo.raid_disks; i++) {
+					struct md_rdev *rdev =
+						rcu_dereference(conf->mirrors[i].replacement);
+					if (rdev)
+						rdev->recovery_offset = MaxSector;
+				}
+				rcu_read_unlock();
 			}
 			conf->fullsync = 0;
 		}
@@ -2934,14 +2937,17 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			int must_sync;
 			int any_working;
 			struct raid10_info *mirror = &conf->mirrors[i];
+			struct md_rdev *mrdev, *mreplace;
 
-			if ((mirror->rdev == NULL ||
-			     test_bit(In_sync, &mirror->rdev->flags))
-			    &&
-			    (mirror->replacement == NULL ||
-			     test_bit(Faulty,
-				      &mirror->replacement->flags)))
+			rcu_read_lock();
+			mrdev = rcu_dereference(mirror->rdev);
+			mreplace = rcu_dereference(mirror->replacement);
+
+			if ((mrdev == NULL ||
+			     test_bit(In_sync, &mrdev->flags))) {
+				rcu_read_unlock();
 				continue;
+			}
 
 			still_degraded = 0;
 			/* want to reconstruct this device */
@@ -2951,6 +2957,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				/* last stripe is not complete - don't
 				 * try to recover this sector.
 				 */
+				rcu_read_unlock();
 				continue;
 			}
 			/* Unless we are doing a full sync, or a replacement
@@ -2962,14 +2969,19 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			if (sync_blocks < max_sync)
 				max_sync = sync_blocks;
 			if (!must_sync &&
-			    mirror->replacement == NULL &&
+			    mreplace == NULL &&
 			    !conf->fullsync) {
 				/* yep, skip the sync_blocks here, but don't assume
 				 * that there will never be anything to do here
 				 */
 				chunks_skipped = -1;
+				rcu_read_unlock();
 				continue;
 			}
+			atomic_inc(&mrdev->nr_pending);
+			if (mreplace)
+				atomic_inc(&mreplace->nr_pending);
+			rcu_read_unlock();
 
 			r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
 			r10_bio->state = 0;
@@ -2988,12 +3000,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			/* Need to check if the array will still be
 			 * degraded
 			 */
-			for (j = 0; j < conf->geo.raid_disks; j++)
-				if (conf->mirrors[j].rdev == NULL ||
-				    test_bit(Faulty, &conf->mirrors[j].rdev->flags)) {
+			rcu_read_lock();
+			for (j = 0; j < conf->geo.raid_disks; j++) {
+				struct md_rdev *rdev = rcu_dereference(
+					conf->mirrors[j].rdev);
+				if (rdev == NULL || test_bit(Faulty, &rdev->flags)) {
 					still_degraded = 1;
 					break;
 				}
+			}
 
 			must_sync = bitmap_start_sync(mddev->bitmap, sect,
 						      &sync_blocks, still_degraded);
@@ -3003,15 +3018,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				int k;
 				int d = r10_bio->devs[j].devnum;
 				sector_t from_addr, to_addr;
-				struct md_rdev *rdev;
+				struct md_rdev *rdev =
+					rcu_dereference(conf->mirrors[d].rdev);
 				sector_t sector, first_bad;
 				int bad_sectors;
-				if (!conf->mirrors[d].rdev ||
-				    !test_bit(In_sync, &conf->mirrors[d].rdev->flags))
+				if (!rdev ||
+				    !test_bit(In_sync, &rdev->flags))
 					continue;
 				/* This is where we read from */
 				any_working = 1;
-				rdev = conf->mirrors[d].rdev;
 				sector = r10_bio->devs[j].addr;
 
 				if (is_badblock(rdev, sector, max_sync,
@@ -3050,8 +3065,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				r10_bio->devs[1].devnum = i;
 				r10_bio->devs[1].addr = to_addr;
 
-				rdev = mirror->rdev;
-				if (!test_bit(In_sync, &rdev->flags)) {
+				if (!test_bit(In_sync, &mrdev->flags)) {
 					bio = r10_bio->devs[1].bio;
 					bio_reset(bio);
 					bio->bi_next = biolist;
@@ -3060,8 +3074,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					bio->bi_end_io = end_sync_write;
 					bio->bi_rw = WRITE;
 					bio->bi_iter.bi_sector = to_addr
-						+ rdev->data_offset;
-					bio->bi_bdev = rdev->bdev;
+						+ mrdev->data_offset;
+					bio->bi_bdev = mrdev->bdev;
 					atomic_inc(&r10_bio->remaining);
 				} else
 					r10_bio->devs[1].bio->bi_end_io = NULL;
@@ -3070,8 +3084,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio = r10_bio->devs[1].repl_bio;
 				if (bio)
 					bio->bi_end_io = NULL;
-				rdev = mirror->replacement;
-				/* Note: if rdev != NULL, then bio
+				/* Note: if mreplace != NULL, then bio
 				 * cannot be NULL as r10buf_pool_alloc will
 				 * have allocated it.
 				 * So the second test here is pointless.
@@ -3079,8 +3092,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				 * this comment keeps human reviewers
 				 * happy.
 				 */
-				if (rdev == NULL || bio == NULL ||
-				    test_bit(Faulty, &rdev->flags))
+				if (mreplace == NULL || bio == NULL ||
+				    test_bit(Faulty, &mreplace->flags))
 					break;
 				bio_reset(bio);
 				bio->bi_next = biolist;
@@ -3089,11 +3102,12 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio->bi_end_io = end_sync_write;
 				bio->bi_rw = WRITE;
 				bio->bi_iter.bi_sector = to_addr +
-					rdev->data_offset;
-				bio->bi_bdev = rdev->bdev;
+					mreplace->data_offset;
+				bio->bi_bdev = mreplace->bdev;
 				atomic_inc(&r10_bio->remaining);
 				break;
 			}
+			rcu_read_unlock();
 			if (j == conf->copies) {
 				/* Cannot recover, so abort the recovery or
 				 * record a bad block */
@@ -3106,15 +3120,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						if (r10_bio->devs[k].devnum == i)
 							break;
 					if (!test_bit(In_sync,
-						      &mirror->rdev->flags)
+						      &mrdev->flags)
 					    && !rdev_set_badblocks(
-						    mirror->rdev,
+						    mrdev,
 						    r10_bio->devs[k].addr,
 						    max_sync, 0))
 						any_working = 0;
-					if (mirror->replacement &&
+					if (mreplace &&
 					    !rdev_set_badblocks(
-						    mirror->replacement,
+						    mreplace,
 						    r10_bio->devs[k].addr,
 						    max_sync, 0))
 						any_working = 0;
@@ -3132,8 +3146,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				if (rb2)
 					atomic_dec(&rb2->remaining);
 				r10_bio = rb2;
+				rdev_dec_pending(mrdev, mddev);
+				if (mreplace)
+					rdev_dec_pending(mreplace, mddev);
 				break;
 			}
+			rdev_dec_pending(mrdev, mddev);
+			if (mreplace)
+				rdev_dec_pending(mreplace, mddev);
 		}
 		if (biolist == NULL) {
 			while (r10_bio) {
@@ -3178,6 +3198,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			int d = r10_bio->devs[i].devnum;
 			sector_t first_bad, sector;
 			int bad_sectors;
+			struct md_rdev *rdev;
 
 			if (r10_bio->devs[i].repl_bio)
 				r10_bio->devs[i].repl_bio->bi_end_io = NULL;
@@ -3185,12 +3206,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			bio = r10_bio->devs[i].bio;
 			bio_reset(bio);
 			bio->bi_error = -EIO;
-			if (conf->mirrors[d].rdev == NULL ||
-			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
+			rcu_read_lock();
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
+			if (rdev == NULL || test_bit(Faulty, &rdev->flags)) {
+				rcu_read_unlock();
 				continue;
+			}
 			sector = r10_bio->devs[i].addr;
-			if (is_badblock(conf->mirrors[d].rdev,
-					sector, max_sync,
+			if (is_badblock(rdev, sector, max_sync,
 					&first_bad, &bad_sectors)) {
 				if (first_bad > sector)
 					max_sync = first_bad - sector;
@@ -3198,25 +3221,28 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					bad_sectors -= (sector - first_bad);
 					if (max_sync > bad_sectors)
 						max_sync = bad_sectors;
+					rcu_read_unlock();
 					continue;
 				}
 			}
-			atomic_inc(&conf->mirrors[d].rdev->nr_pending);
+			atomic_inc(&rdev->nr_pending);
 			atomic_inc(&r10_bio->remaining);
 			bio->bi_next = biolist;
 			biolist = bio;
 			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_read;
 			bio->bi_rw = READ;
-			bio->bi_iter.bi_sector = sector +
-				conf->mirrors[d].rdev->data_offset;
-			bio->bi_bdev = conf->mirrors[d].rdev->bdev;
+			bio->bi_iter.bi_sector = sector + rdev->data_offset;
+			bio->bi_bdev = rdev->bdev;
 			count++;
 
-			if (conf->mirrors[d].replacement == NULL ||
-			    test_bit(Faulty,
-				     &conf->mirrors[d].replacement->flags))
+			rdev = rcu_dereference(conf->mirrors[d].replacement);
+			if (rdev == NULL || test_bit(Faulty, &rdev->flags)) {
+				rcu_read_unlock();
 				continue;
+			}
+			atomic_inc(&rdev->nr_pending);
+			rcu_read_unlock();
 
 			/* Need to set up for writing to the replacement */
 			bio = r10_bio->devs[i].repl_bio;
@@ -3224,15 +3250,13 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			bio->bi_error = -EIO;
 
 			sector = r10_bio->devs[i].addr;
-			atomic_inc(&conf->mirrors[d].replacement->nr_pending);
 			bio->bi_next = biolist;
 			biolist = bio;
 			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_write;
 			bio->bi_rw = WRITE;
-			bio->bi_iter.bi_sector = sector +
-				conf->mirrors[d].replacement->data_offset;
-			bio->bi_bdev = conf->mirrors[d].replacement->bdev;
+			bio->bi_iter.bi_sector = sector + rdev->data_offset;
+			bio->bi_bdev = rdev->bdev;
 			count++;
 		}
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 07/18] md/raid10: minor code improvement in fix_read_error()
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (2 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 04/18] md/raid10: add rcu protection in raid10_status NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 10/18] md/raid1: small cleanup in raid1_end_read/write_request NeilBrown
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

rdev already holds conf->mirrors[d].rdev, so no need to load it again.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 588cf6544f07..819c6d5767a4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2262,7 +2262,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 		printk(KERN_NOTICE
 		       "md/raid10:%s: %s: Failing raid device\n",
 		       mdname(mddev), b);
-		md_error(mddev, conf->mirrors[d].rdev);
+		md_error(mddev, rdev);
 		r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED;
 		return;
 	}



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 08/18] md/raid10: simplify print_conf a little.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
  2016-06-02  6:19 ` [md PATCH 16/18] md/multipath: add rcu protection to rdev access in multipath_status NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 04/18] md/raid10: add rcu protection in raid10_status NeilBrown
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

'tmp' is only ever used to extract 'tmp->rdev', so just use 'rdev' directly.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 819c6d5767a4..61091ab02a4b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1598,7 +1598,7 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 static void print_conf(struct r10conf *conf)
 {
 	int i;
-	struct raid10_info *tmp;
+	struct md_rdev *rdev;
 
 	printk(KERN_DEBUG "RAID10 conf printout:\n");
 	if (!conf) {
@@ -1608,14 +1608,16 @@ static void print_conf(struct r10conf *conf)
 	printk(KERN_DEBUG " --- wd:%d rd:%d\n", conf->geo.raid_disks - conf->mddev->degraded,
 		conf->geo.raid_disks);
 
+	/* This is only called with ->reconfix_mutex held, so
+	 * rcu protection of rdev is not needed */
 	for (i = 0; i < conf->geo.raid_disks; i++) {
 		char b[BDEVNAME_SIZE];
-		tmp = conf->mirrors + i;
-		if (tmp->rdev)
+		rdev = conf->mirrors[i].rdev;
+		if (rdev)
 			printk(KERN_DEBUG " disk %d, wo:%d, o:%d, dev:%s\n",
-				i, !test_bit(In_sync, &tmp->rdev->flags),
-			        !test_bit(Faulty, &tmp->rdev->flags),
-				bdevname(tmp->rdev->bdev,b));
+				i, !test_bit(In_sync, &rdev->flags),
+			        !test_bit(Faulty, &rdev->flags),
+				bdevname(rdev->bdev,b));
 	}
 }
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 09/18] md/raid10: stop print_conf from being too verbose.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (6 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 03/18] md/raid10: fix refounct imbalance when resyncing an array with a replacement device NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02 18:47   ` John Stoffel
  2016-06-02  6:19 ` [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request NeilBrown
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

raid10 arrays can usefully be very large.  When they are, the noise generated by print_conf
can over-whelm logs.
So truncate the listing at 16 devices.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 61091ab02a4b..5d40612d6219 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1610,7 +1610,7 @@ static void print_conf(struct r10conf *conf)
 
 	/* This is only called with ->reconfix_mutex held, so
 	 * rcu protection of rdev is not needed */
-	for (i = 0; i < conf->geo.raid_disks; i++) {
+	for (i = 0; i < conf->geo.raid_disks && i < 16; i++) {
 		char b[BDEVNAME_SIZE];
 		rdev = conf->mirrors[i].rdev;
 		if (rdev)
@@ -1619,6 +1619,8 @@ static void print_conf(struct r10conf *conf)
 			        !test_bit(Faulty, &rdev->flags),
 				bdevname(rdev->bdev,b));
 	}
+	if (conf->geo.raid_disks > 16)
+		printk(KERN_DEBUG " remaining devices excluded for brevity\n");
 }
 
 static void close_sync(struct r10conf *conf)



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 12/18] md/raid1: add rcu protection to rdev in fix_read_error
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (11 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 06/18] md/raid10: add rcu protection to rdev access during reshape NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 01/18] md: disconnect device from personality before trying to remove it NeilBrown
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Since remove_and_add_spares() was added to hot_remove_disk() it has
been possible for an rdev to be hot-removed while fix_read_error()
was running, so we need to be more careful, and take a reference to
the rdev while performing IO.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |   52 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc53120e4d68..0ee901ccc9c5 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2056,29 +2056,30 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 			s = PAGE_SIZE >> 9;
 
 		do {
-			/* Note: no rcu protection needed here
-			 * as this is synchronous in the raid1d thread
-			 * which is the thread that might remove
-			 * a device.  If raid1d ever becomes multi-threaded....
-			 */
 			sector_t first_bad;
 			int bad_sectors;
 
-			rdev = conf->mirrors[d].rdev;
+			rcu_read_lock();
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev &&
 			    (test_bit(In_sync, &rdev->flags) ||
 			     (!test_bit(Faulty, &rdev->flags) &&
 			      rdev->recovery_offset >= sect + s)) &&
 			    is_badblock(rdev, sect, s,
-					&first_bad, &bad_sectors) == 0 &&
-			    sync_page_io(rdev, sect, s<<9,
-					 conf->tmppage, READ, false))
-				success = 1;
-			else {
-				d++;
-				if (d == conf->raid_disks * 2)
-					d = 0;
-			}
+					&first_bad, &bad_sectors) == 0) {
+				atomic_inc(&rdev->nr_pending);
+				rcu_read_unlock();
+				if (sync_page_io(rdev, sect, s<<9,
+						 conf->tmppage, READ, false))
+					success = 1;
+				rdev_dec_pending(rdev, mddev);
+				if (success)
+					break;
+			} else
+				rcu_read_unlock();
+			d++;
+			if (d == conf->raid_disks * 2)
+				d = 0;
 		} while (!success && d != read_disk);
 
 		if (!success) {
@@ -2094,11 +2095,17 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 			if (d==0)
 				d = conf->raid_disks * 2;
 			d--;
-			rdev = conf->mirrors[d].rdev;
+			rcu_read_lock();
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev &&
-			    !test_bit(Faulty, &rdev->flags))
+			    !test_bit(Faulty, &rdev->flags)) {
+				atomic_inc(&rdev->nr_pending);
+				rcu_read_unlock();
 				r1_sync_page_io(rdev, sect, s,
 						conf->tmppage, WRITE);
+				rdev_dec_pending(rdev, mddev);
+			} else
+				rcu_read_unlock();
 		}
 		d = start;
 		while (d != read_disk) {
@@ -2106,9 +2113,12 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 			if (d==0)
 				d = conf->raid_disks * 2;
 			d--;
-			rdev = conf->mirrors[d].rdev;
+			rcu_read_lock();
+			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev &&
 			    !test_bit(Faulty, &rdev->flags)) {
+				atomic_inc(&rdev->nr_pending);
+				rcu_read_unlock();
 				if (r1_sync_page_io(rdev, sect, s,
 						    conf->tmppage, READ)) {
 					atomic_add(s, &rdev->corrected_errors);
@@ -2117,10 +2127,12 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 					       "(%d sectors at %llu on %s)\n",
 					       mdname(mddev), s,
 					       (unsigned long long)(sect +
-					           rdev->data_offset),
+								    rdev->data_offset),
 					       bdevname(rdev->bdev, b));
 				}
-			}
+				rdev_dec_pending(rdev, mddev);
+			} else
+				rcu_read_unlock();
 		}
 		sectors -= s;
 		sect += s;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 10/18] md/raid1: small cleanup in raid1_end_read/write_request
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (3 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 07/18] md/raid10: minor code improvement in fix_read_error() NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 11/18] md/raid1: small code cleanup in end_sync_write NeilBrown
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Both functions use conf->mirrors[mirror].rdev several times, so
improve readability by storing this in a local variable.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |   35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a54edbe741ff..9585a6b62123 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -319,14 +319,13 @@ static void raid1_end_read_request(struct bio *bio)
 {
 	int uptodate = !bio->bi_error;
 	struct r1bio *r1_bio = bio->bi_private;
-	int mirror;
 	struct r1conf *conf = r1_bio->mddev->private;
+	struct md_rdev *rdev = conf->mirrors[r1_bio->read_disk].rdev;
 
-	mirror = r1_bio->read_disk;
 	/*
 	 * this branch is our 'one mirror IO has finished' event handler:
 	 */
-	update_head_pos(mirror, r1_bio);
+	update_head_pos(r1_bio->read_disk, r1_bio);
 
 	if (uptodate)
 		set_bit(R1BIO_Uptodate, &r1_bio->state);
@@ -339,14 +338,14 @@ static void raid1_end_read_request(struct bio *bio)
 		spin_lock_irqsave(&conf->device_lock, flags);
 		if (r1_bio->mddev->degraded == conf->raid_disks ||
 		    (r1_bio->mddev->degraded == conf->raid_disks-1 &&
-		     test_bit(In_sync, &conf->mirrors[mirror].rdev->flags)))
+		     test_bit(In_sync, &rdev->flags)))
 			uptodate = 1;
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
 
 	if (uptodate) {
 		raid_end_bio_io(r1_bio);
-		rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev);
+		rdev_dec_pending(rdev, conf->mddev);
 	} else {
 		/*
 		 * oops, read error:
@@ -356,7 +355,7 @@ static void raid1_end_read_request(struct bio *bio)
 			KERN_ERR "md/raid1:%s: %s: "
 			"rescheduling sector %llu\n",
 			mdname(conf->mddev),
-			bdevname(conf->mirrors[mirror].rdev->bdev,
+			bdevname(rdev->bdev,
 				 b),
 			(unsigned long long)r1_bio->sector);
 		set_bit(R1BIO_ReadError, &r1_bio->state);
@@ -403,20 +402,18 @@ static void r1_bio_write_done(struct r1bio *r1_bio)
 static void raid1_end_write_request(struct bio *bio)
 {
 	struct r1bio *r1_bio = bio->bi_private;
-	int mirror, behind = test_bit(R1BIO_BehindIO, &r1_bio->state);
+	int behind = test_bit(R1BIO_BehindIO, &r1_bio->state);
 	struct r1conf *conf = r1_bio->mddev->private;
 	struct bio *to_put = NULL;
-
-	mirror = find_bio_disk(r1_bio, bio);
+	int mirror = find_bio_disk(r1_bio, bio);
+	struct md_rdev *rdev = conf->mirrors[mirror].rdev;
 
 	/*
 	 * 'one mirror IO has finished' event handler:
 	 */
 	if (bio->bi_error) {
-		set_bit(WriteErrorSeen,
-			&conf->mirrors[mirror].rdev->flags);
-		if (!test_and_set_bit(WantReplacement,
-				      &conf->mirrors[mirror].rdev->flags))
+		set_bit(WriteErrorSeen,	&rdev->flags);
+		if (!test_and_set_bit(WantReplacement, &rdev->flags))
 			set_bit(MD_RECOVERY_NEEDED, &
 				conf->mddev->recovery);
 
@@ -445,13 +442,12 @@ static void raid1_end_write_request(struct bio *bio)
 		 * before rdev->recovery_offset, but for simplicity we don't
 		 * check this here.
 		 */
-		if (test_bit(In_sync, &conf->mirrors[mirror].rdev->flags) &&
-		    !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags))
+		if (test_bit(In_sync, &rdev->flags) &&
+		    !test_bit(Faulty, &rdev->flags))
 			set_bit(R1BIO_Uptodate, &r1_bio->state);
 
 		/* Maybe we can clear some bad blocks. */
-		if (is_badblock(conf->mirrors[mirror].rdev,
-				r1_bio->sector, r1_bio->sectors,
+		if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
 				&first_bad, &bad_sectors)) {
 			r1_bio->bios[mirror] = IO_MADE_GOOD;
 			set_bit(R1BIO_MadeGood, &r1_bio->state);
@@ -459,7 +455,7 @@ static void raid1_end_write_request(struct bio *bio)
 	}
 
 	if (behind) {
-		if (test_bit(WriteMostly, &conf->mirrors[mirror].rdev->flags))
+		if (test_bit(WriteMostly, &rdev->flags))
 			atomic_dec(&r1_bio->behind_remaining);
 
 		/*
@@ -483,8 +479,7 @@ static void raid1_end_write_request(struct bio *bio)
 		}
 	}
 	if (r1_bio->bios[mirror] == NULL)
-		rdev_dec_pending(conf->mirrors[mirror].rdev,
-				 conf->mddev);
+		rdev_dec_pending(rdev, conf->mddev);
 
 	/*
 	 * Let's see if all mirrored write operations have finished



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 13/18] md/raid5: add rcu protection to rdev accesses in handle_failed_sync.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (13 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 01/18] md: disconnect device from personality before trying to remove it NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 02/18] md/raid1, raid10: don't recheck "Faulty" flag in read-balance NeilBrown
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

The rdev could be freed while handle_failed_sync is running, so
rcu protection is needed.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid5.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8959e6dd31dd..e9ad22d64bd7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3210,15 +3210,16 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
 		/* During recovery devices cannot be removed, so
 		 * locking and refcounting of rdevs is not needed
 		 */
+		rcu_read_lock();
 		for (i = 0; i < conf->raid_disks; i++) {
-			struct md_rdev *rdev = conf->disks[i].rdev;
+			struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev);
 			if (rdev
 			    && !test_bit(Faulty, &rdev->flags)
 			    && !test_bit(In_sync, &rdev->flags)
 			    && !rdev_set_badblocks(rdev, sh->sector,
 						   STRIPE_SECTORS, 0))
 				abort = 1;
-			rdev = conf->disks[i].replacement;
+			rdev = rcu_dereference(conf->disks[i].replacement);
 			if (rdev
 			    && !test_bit(Faulty, &rdev->flags)
 			    && !test_bit(In_sync, &rdev->flags)
@@ -3226,6 +3227,7 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
 						   STRIPE_SECTORS, 0))
 				abort = 1;
 		}
+		rcu_read_unlock();
 		if (abort)
 			conf->recovery_disabled =
 				conf->mddev->recovery_disabled;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 11/18] md/raid1: small code cleanup in end_sync_write
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (4 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 10/18] md/raid1: small cleanup in raid1_end_read/write_request NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 03/18] md/raid10: fix refounct imbalance when resyncing an array with a replacement device NeilBrown
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

'mirror' is only used to find 'rdev', several times.
So just find 'rdev' once, and use it instead.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |   14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 9585a6b62123..cc53120e4d68 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1709,11 +1709,9 @@ static void end_sync_write(struct bio *bio)
 	struct r1bio *r1_bio = bio->bi_private;
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
-	int mirror=0;
 	sector_t first_bad;
 	int bad_sectors;
-
-	mirror = find_bio_disk(r1_bio, bio);
+	struct md_rdev *rdev = conf->mirrors[find_bio_disk(r1_bio, bio)].rdev;
 
 	if (!uptodate) {
 		sector_t sync_blocks = 0;
@@ -1726,16 +1724,12 @@ static void end_sync_write(struct bio *bio)
 			s += sync_blocks;
 			sectors_to_go -= sync_blocks;
 		} while (sectors_to_go > 0);
-		set_bit(WriteErrorSeen,
-			&conf->mirrors[mirror].rdev->flags);
-		if (!test_and_set_bit(WantReplacement,
-				      &conf->mirrors[mirror].rdev->flags))
+		set_bit(WriteErrorSeen, &rdev->flags);
+		if (!test_and_set_bit(WantReplacement, &rdev->flags))
 			set_bit(MD_RECOVERY_NEEDED, &
 				mddev->recovery);
 		set_bit(R1BIO_WriteError, &r1_bio->state);
-	} else if (is_badblock(conf->mirrors[mirror].rdev,
-			       r1_bio->sector,
-			       r1_bio->sectors,
+	} else if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
 			       &first_bad, &bad_sectors) &&
 		   !is_badblock(conf->mirrors[r1_bio->read_disk].rdev,
 				r1_bio->sector,



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 14/18] md/raid5: add rcu protection to rdev accesses in want_replace
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (8 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 15/18] md/raid5: add rcu protection to rdev accesses in raid5_status NeilBrown
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Being in the middle of resync is no longer protection against failed
rdevs disappearing.  So add rcu protection.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid5.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e9ad22d64bd7..7da89ee22a31 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3239,15 +3239,16 @@ static int want_replace(struct stripe_head *sh, int disk_idx)
 {
 	struct md_rdev *rdev;
 	int rv = 0;
-	/* Doing recovery so rcu locking not required */
-	rdev = sh->raid_conf->disks[disk_idx].replacement;
+
+	rcu_read_lock();
+	rdev = rcu_dereference(sh->raid_conf->disks[disk_idx].replacement);
 	if (rdev
 	    && !test_bit(Faulty, &rdev->flags)
 	    && !test_bit(In_sync, &rdev->flags)
 	    && (rdev->recovery_offset <= sh->sector
 		|| rdev->mddev->recovery_cp <= sh->sector))
 		rv = 1;
-
+	rcu_read_unlock();
 	return rv;
 }
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 15/18] md/raid5: add rcu protection to rdev accesses in raid5_status.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (9 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 14/18] md/raid5: add rcu protection to rdev accesses in want_replace NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 06/18] md/raid10: add rcu protection to rdev access during reshape NeilBrown
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid5.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7da89ee22a31..e9beba258f4f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7069,10 +7069,12 @@ static void raid5_status(struct seq_file *seq, struct mddev *mddev)
 	seq_printf(seq, " level %d, %dk chunk, algorithm %d", mddev->level,
 		conf->chunk_sectors / 2, mddev->layout);
 	seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev->degraded);
-	for (i = 0; i < conf->raid_disks; i++)
-		seq_printf (seq, "%s",
-			       conf->disks[i].rdev &&
-			       test_bit(In_sync, &conf->disks[i].rdev->flags) ? "U" : "_");
+	rcu_read_lock();
+	for (i = 0; i < conf->raid_disks; i++) {
+		struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev);
+		seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_");
+	}
+	rcu_read_unlock();
 	seq_printf (seq, "]");
 }
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 16/18] md/multipath: add rcu protection to rdev access in multipath_status.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 08/18] md/raid10: simplify print_conf a little NeilBrown
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/multipath.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index dd483bb2e111..69244de2036b 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -141,17 +141,19 @@ static void multipath_make_request(struct mddev *mddev, struct bio * bio)
 	return;
 }
 
-static void multipath_status (struct seq_file *seq, struct mddev *mddev)
+static void multipath_status(struct seq_file *seq, struct mddev *mddev)
 {
 	struct mpconf *conf = mddev->private;
 	int i;
 
 	seq_printf (seq, " [%d/%d] [", conf->raid_disks,
 		    conf->raid_disks - mddev->degraded);
-	for (i = 0; i < conf->raid_disks; i++)
-		seq_printf (seq, "%s",
-			       conf->multipaths[i].rdev &&
-			       test_bit(In_sync, &conf->multipaths[i].rdev->flags) ? "U" : "_");
+	rcu_read_lock();
+	for (i = 0; i < conf->raid_disks; i++) {
+		struct md_rdev *rdev = rcu_dereference(conf->multipaths[i].rdev);
+		seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_");
+	}
+	rcu_read_unlock();
 	seq_printf (seq, "]");
 }
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 17/18] md: be extra careful not to take a reference to a Faulty device.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (16 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 18/18] md: reduce the number of synchronize_rcu() calls when multiple devices fail NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-03 22:28 ` [md PATCH 00/18] Assorted minor fixes, particularly RCU protection Shaohua Li
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

It is important that we never increment rdev->nr_pending on a Faulty
device as ->hot_remove_disk() assumes that once the Faulty flag is visible
no code will take a new reference.

Some places take a new reference after only check In_sync.  This should
be safe as the two are changed together.  However to make the code more
obviously safe, add checks for 'Faulty' as well.

Note: the actual rule is:
  Never increment nr_pending if  Faulty is set and Blocked is clear,
  never clear Faulty, and never set Blocked without holding a reference
  through nr_pending.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/multipath.c |    3 ++-
 drivers/md/raid10.c    |    6 ++++++
 drivers/md/raid5.c     |    3 ++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 69244de2036b..7eb9972a37e6 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -43,7 +43,8 @@ static int multipath_map (struct mpconf *conf)
 	rcu_read_lock();
 	for (i = 0; i < disks; i++) {
 		struct md_rdev *rdev = rcu_dereference(conf->multipaths[i].rdev);
-		if (rdev && test_bit(In_sync, &rdev->flags)) {
+		if (rdev && test_bit(In_sync, &rdev->flags) &&
+		    !test_bit(Faulty, &rdev->flags)) {
 			atomic_inc(&rdev->nr_pending);
 			rcu_read_unlock();
 			return i;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5d40612d6219..78016667ec00 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2289,6 +2289,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (rdev &&
 			    test_bit(In_sync, &rdev->flags) &&
+			    !test_bit(Faulty, &rdev->flags) &&
 			    is_badblock(rdev, r10_bio->devs[sl].addr + sect, s,
 					&first_bad, &bad_sectors) == 0) {
 				atomic_inc(&rdev->nr_pending);
@@ -2341,6 +2342,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			d = r10_bio->devs[sl].devnum;
 			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (!rdev ||
+			    test_bit(Faulty, &rdev->flags) ||
 			    !test_bit(In_sync, &rdev->flags))
 				continue;
 
@@ -2380,6 +2382,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 			d = r10_bio->devs[sl].devnum;
 			rdev = rcu_dereference(conf->mirrors[d].rdev);
 			if (!rdev ||
+			    test_bit(Faulty, &rdev->flags))
 			    !test_bit(In_sync, &rdev->flags))
 				continue;
 
@@ -2948,6 +2951,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			mreplace = rcu_dereference(mirror->replacement);
 
 			if ((mrdev == NULL ||
+			     test_bit(Faulty, &mrdev->flags) ||
 			     test_bit(In_sync, &mrdev->flags))) {
 				rcu_read_unlock();
 				continue;
@@ -2964,6 +2968,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				rcu_read_unlock();
 				continue;
 			}
+			if (mreplace && test_bit(Faulty, &mreplace->flags))
+				mreplace = NULL;
 			/* Unless we are doing a full sync, or a replacement
 			 * we only need to recover the block if it is set in
 			 * the bitmap
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e9beba258f4f..94c180f16294 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3080,7 +3080,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
 			struct md_rdev *rdev;
 			rcu_read_lock();
 			rdev = rcu_dereference(conf->disks[i].rdev);
-			if (rdev && test_bit(In_sync, &rdev->flags))
+			if (rdev && test_bit(In_sync, &rdev->flags) &&
+			    !test_bit(Faulty, &rdev->flags))
 				atomic_inc(&rdev->nr_pending);
 			else
 				rdev = NULL;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [md PATCH 18/18] md: reduce the number of synchronize_rcu() calls when multiple devices fail.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (15 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 02/18] md/raid1, raid10: don't recheck "Faulty" flag in read-balance NeilBrown
@ 2016-06-02  6:19 ` NeilBrown
  2016-06-02  6:19 ` [md PATCH 17/18] md: be extra careful not to take a reference to a Faulty device NeilBrown
  2016-06-03 22:28 ` [md PATCH 00/18] Assorted minor fixes, particularly RCU protection Shaohua Li
  18 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-02  6:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Every time a device is removed with ->hot_remove_disk() a synchronize_rcu() call is made
which can delay several milliseconds in some case.
If lots of devices fail at once - as could happen with a large RAID10 where one set
of devices are removed all at once - these delays can add up to be very inconcenient.

As failure is not reversible we can check for that first, setting a
separate flag if it is found, and then all synchronize_rcu() once for
all the flagged devices.  Then ->hot_remove_disk() function can skip the
synchronize_rcu() step if the flag is set.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c        |   27 +++++++++++++++++++++++++--
 drivers/md/md.h        |    5 +++++
 drivers/md/multipath.c |   14 ++++++++------
 drivers/md/raid1.c     |   17 ++++++++++-------
 drivers/md/raid10.c    |   19 +++++++++++--------
 drivers/md/raid5.c     |   15 +++++++++------
 6 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d26099e1160..1af69d15d0df 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8186,12 +8186,31 @@ static int remove_and_add_spares(struct mddev *mddev,
 	struct md_rdev *rdev;
 	int spares = 0;
 	int removed = 0;
+	bool remove_some = false;
 
-	rdev_for_each(rdev, mddev)
+	rdev_for_each(rdev, mddev) {
 		if ((this == NULL || rdev == this) &&
 		    rdev->raid_disk >= 0 &&
 		    !test_bit(Blocked, &rdev->flags) &&
-		    (test_bit(Faulty, &rdev->flags) ||
+		    test_bit(Faulty, &rdev->flags) &&
+		    atomic_read(&rdev->nr_pending)==0) {
+			/* Faulty non-Blocked devices with nr_pending == 0
+			 * never get nr_pending incremented,
+			 * never get Faulty cleared, and never get Blocked set.
+			 * So we can synchronize_rcu now rather than once per device
+			 */
+			remove_some = true;
+			set_bit(RemoveSynchronized, &rdev->flags);
+		}
+	}
+
+	if (remove_some)
+		synchronize_rcu();
+	rdev_for_each(rdev, mddev) {
+		if ((this == NULL || rdev == this) &&
+		    rdev->raid_disk >= 0 &&
+		    !test_bit(Blocked, &rdev->flags) &&
+		    ((test_bit(RemoveSynchronized, &rdev->flags) ||
 		     (!test_bit(In_sync, &rdev->flags) &&
 		      !test_bit(Journal, &rdev->flags))) &&
 		    atomic_read(&rdev->nr_pending)==0) {
@@ -8202,6 +8221,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 				removed++;
 			}
 		}
+		if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
+			clear_bit(RemoveSynchronized, &rdev->flags);
+	}
+
 	if (removed && mddev->kobj.sd)
 		sysfs_notify(&mddev->kobj, NULL, "degraded");
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b5c4be73e6e4..cfd883e96a38 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -163,6 +163,11 @@ enum flag_bits {
 				 * than other devices in the array
 				 */
 	ClusterRemove,
+	RemoveSynchronized,	/* synchronize_rcu() was called after
+				 * this device was known to be faulty,
+				 * so it is safe to remove without
+				 * another synchronize_rcu() call.
+				 */
 };
 
 static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 7eb9972a37e6..c145a5a114eb 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -298,12 +298,14 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 			goto abort;
 		}
 		p->rdev = NULL;
-		synchronize_rcu();
-		if (atomic_read(&rdev->nr_pending)) {
-			/* lost the race, try later */
-			err = -EBUSY;
-			p->rdev = rdev;
-			goto abort;
+		if (!test_bit(RemoveSynchronized, &rdev->flags)) {
+			synchronize_rcu();
+			if (atomic_read(&rdev->nr_pending)) {
+				/* lost the race, try later */
+				err = -EBUSY;
+				p->rdev = rdev;
+				goto abort;
+			}
 		}
 		err = md_integrity_register(mddev);
 	}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0ee901ccc9c5..8b27904c4f37 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1656,13 +1656,16 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 			goto abort;
 		}
 		p->rdev = NULL;
-		synchronize_rcu();
-		if (atomic_read(&rdev->nr_pending)) {
-			/* lost the race, try later */
-			err = -EBUSY;
-			p->rdev = rdev;
-			goto abort;
-		} else if (conf->mirrors[conf->raid_disks + number].rdev) {
+		if (!test_bit(RemoveSynchronized, &rdev->flags)) {
+			synchronize_rcu();
+			if (atomic_read(&rdev->nr_pending)) {
+				/* lost the race, try later */
+				err = -EBUSY;
+				p->rdev = rdev;
+				goto abort;
+			}
+		}
+		if (conf->mirrors[conf->raid_disks + number].rdev) {
 			/* We just removed a device that is being replaced.
 			 * Move down the replacement.  We drain all IO before
 			 * doing this to avoid confusion.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 78016667ec00..5e8fa7751920 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1768,7 +1768,7 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 		err = -EBUSY;
 		goto abort;
 	}
-	/* Only remove faulty devices if recovery
+	/* Only remove non-faulty devices if recovery
 	 * is not possible.
 	 */
 	if (!test_bit(Faulty, &rdev->flags) &&
@@ -1780,13 +1780,16 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 		goto abort;
 	}
 	*rdevp = NULL;
-	synchronize_rcu();
-	if (atomic_read(&rdev->nr_pending)) {
-		/* lost the race, try later */
-		err = -EBUSY;
-		*rdevp = rdev;
-		goto abort;
-	} else if (p->replacement) {
+	if (!test_bit(RemoveSynchronized, &rdev->flags)) {
+		synchronize_rcu();
+		if (atomic_read(&rdev->nr_pending)) {
+			/* lost the race, try later */
+			err = -EBUSY;
+			*rdevp = rdev;
+			goto abort;
+		}
+	}
+	if (p->replacement) {
 		/* We must have just cleared 'rdev' */
 		p->rdev = p->replacement;
 		clear_bit(Replacement, &p->replacement->flags);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 94c180f16294..4c6270e10b12 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7197,12 +7197,15 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 		goto abort;
 	}
 	*rdevp = NULL;
-	synchronize_rcu();
-	if (atomic_read(&rdev->nr_pending)) {
-		/* lost the race, try later */
-		err = -EBUSY;
-		*rdevp = rdev;
-	} else if (p->replacement) {
+	if (!test_bit(RemoveSynchronized, &rdev->flags)) {
+		synchronize_rcu();
+		if (atomic_read(&rdev->nr_pending)) {
+			/* lost the race, try later */
+			err = -EBUSY;
+			*rdevp = rdev;
+		}
+	}
+	if (p->replacement) {
 		/* We must have just cleared 'rdev' */
 		p->rdev = p->replacement;
 		clear_bit(Replacement, &p->replacement->flags);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [md PATCH 09/18] md/raid10: stop print_conf from being too verbose.
  2016-06-02  6:19 ` [md PATCH 09/18] md/raid10: stop print_conf from being too verbose NeilBrown
@ 2016-06-02 18:47   ` John Stoffel
  2016-06-02 22:48     ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: John Stoffel @ 2016-06-02 18:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: Shaohua Li, linux-raid


NeilBrown> raid10 arrays can usefully be very large.  When they are,
NeilBrown> the noise generated by print_conf can over-whelm logs.  So
NeilBrown> truncate the listing at 16 devices.

Why is this too noisy and how often does this print_conf() happen?
And why 16 devices?  I guess I don't like the magic number of 16 here,
I'd prefer it to be a define, and possibly even something that can by
dynamically changed.

But how big a problem is this really?  And what about for big RAID5/6
arrays as well?

Or would it be also good to condence the output of print_conf()
itself?

Of if it's noise, why not just remove it completely?  Can this
information be found in /proc/mdstat instead?

Sorry I havent' looked in the code deeply, but this just struck me as
a change that might not be ideal.

Thanks,
John



NeilBrown> Signed-off-by: NeilBrown <neilb@suse.com>
NeilBrown> ---
NeilBrown>  drivers/md/raid10.c |    4 +++-
NeilBrown>  1 file changed, 3 insertions(+), 1 deletion(-)

NeilBrown> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
NeilBrown> index 61091ab02a4b..5d40612d6219 100644
NeilBrown> --- a/drivers/md/raid10.c
NeilBrown> +++ b/drivers/md/raid10.c
NeilBrown> @@ -1610,7 +1610,7 @@ static void print_conf(struct r10conf *conf)
 
NeilBrown>  	/* This is only called with ->reconfix_mutex held, so
NeilBrown>  	 * rcu protection of rdev is not needed */
NeilBrown> -	for (i = 0; i < conf->geo.raid_disks; i++) {
NeilBrown> +	for (i = 0; i < conf->geo.raid_disks && i < 16; i++) {
NeilBrown>  		char b[BDEVNAME_SIZE];
NeilBrown>  		rdev = conf->mirrors[i].rdev;
NeilBrown>  		if (rdev)
NeilBrown> @@ -1619,6 +1619,8 @@ static void print_conf(struct r10conf *conf)
NeilBrown>  			        !test_bit(Faulty, &rdev->flags),
NeilBrown>  				bdevname(rdev->bdev,b));
NeilBrown>  	}
NeilBrown> +	if (conf->geo.raid_disks > 16)
NeilBrown> +		printk(KERN_DEBUG " remaining devices excluded for brevity\n");
NeilBrown>  }
 
NeilBrown>  static void close_sync(struct r10conf *conf)


NeilBrown> --
NeilBrown> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
NeilBrown> the body of a message to majordomo@vger.kernel.org
NeilBrown> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 09/18] md/raid10: stop print_conf from being too verbose.
  2016-06-02 18:47   ` John Stoffel
@ 2016-06-02 22:48     ` NeilBrown
  2016-06-03 22:39       ` Shaohua Li
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2016-06-02 22:48 UTC (permalink / raw)
  To: John Stoffel; +Cc: Shaohua Li, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

On Fri, Jun 03 2016, John Stoffel wrote:

> NeilBrown> raid10 arrays can usefully be very large.  When they are,
> NeilBrown> the noise generated by print_conf can over-whelm logs.  So
> NeilBrown> truncate the listing at 16 devices.
>
> Why is this too noisy and how often does this print_conf() happen?
> And why 16 devices?  I guess I don't like the magic number of 16 here,
> I'd prefer it to be a define, and possibly even something that can by
> dynamically changed.

print_conf happens whenever a device becomes active in the array, or a
device is removed from the array (usually because it has failed).

I got 16 by choosing a random number and multiplying by 4 (or maybe by
raising 2 to that power) :-)

More seriously, I guessed that most arrays were 16 devices or less, so
this would not affect most arrays.

I definitely don't think it needs a define.  I'm very tempted to remove
print_conf() completely, but it is sometimes useful.  So having it
present as long as it isn't a burden seems reasonable.

If configuration was important I would change it to use pr_debug(), but
then the default would be to not see the messages at all, and they can
be useful in diagnosing problems reported on mailing lists.

>
> But how big a problem is this really?  And what about for big RAID5/6
> arrays as well?

When you have 2000 devices in your RAID10 and half of them are removed
at once, it currently reports on 2,000,000 devices.  With the patch it
is only 32000.  Still possibly too many.

If you have 2000 devices in your RAID5/6 and half of them are removed,
you have other problems.

>
> Or would it be also good to condence the output of print_conf()
> itself?

Probably a very good idea.  Maybe the default could print a fixed-size
summary and the rest goes in pr_debug()??

>
> Of if it's noise, why not just remove it completely?  Can this
> information be found in /proc/mdstat instead?

Its value is historical - trying to understand a past sequence of
events.  For that, something in the logs beats something in /proc.

>
> Sorry I havent' looked in the code deeply, but this just struck me as
> a change that might not be ideal.

Fair enough - your comments are very valid.  I'm not really sure what is
best.  I only know what is worst :-) and want to avoid that.

Suggestions very welcome.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 00/18] Assorted minor fixes, particularly RCU protection.
  2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
                   ` (17 preceding siblings ...)
  2016-06-02  6:19 ` [md PATCH 17/18] md: be extra careful not to take a reference to a Faulty device NeilBrown
@ 2016-06-03 22:28 ` Shaohua Li
  18 siblings, 0 replies; 29+ messages in thread
From: Shaohua Li @ 2016-06-03 22:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
> The 'rdev' fields in each personality's config data are often accessed
> under rcu_read_lock() protection to avoid races with
> ->hot_remove_disk() removing the rdev.
> 
> Originally this was not necessary during resync/recovery etc because
> ->hot_remove_disk() was only called from md_check_recovery(), and it
> would only make the call if there was no resync etc happening.
> 
> However we now call ->hot_remove_disk() (from
> remove_and_add_spares()) from other contexts, so there could be a race
> in the resync code.
> 
> So this patch set adds a lot of extra rcu_read_lock protection and
> clean up some other bits and pieces on the way.
> 
> My goal was the final patch.  If you have a large raid10 array and
> fail half of the devices at once (e.g. unplug a rack with half of the
> mirrors) then synchronize_rcu() will be called once for each device,
> which can add up to a big delay.  A single call should suffice.
> The final patch makes that change.

Thanks, make a lot of sense. Actually there is bug report related to RCU.
https://bugzilla.kernel.org/show_bug.cgi?id=116021
I had the same patch of your patch 3 and the bug submitter is trying.

I only had some minor issues, will reply separated. Otherwise this patch set is
great.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 01/18] md: disconnect device from personality before trying to remove it.
  2016-06-02  6:19 ` [md PATCH 01/18] md: disconnect device from personality before trying to remove it NeilBrown
@ 2016-06-03 22:31   ` Shaohua Li
  2016-06-10  6:40     ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Shaohua Li @ 2016-06-03 22:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
> When the HOT_REMOVE_DISK ioctl is used to remove a device, we
> call remove_and_add_spares() which will remove it from the personality
> if possible.  This improves the chances that the removal will succeed.
> 
> When writing "remove" to dev-XX/state, we don't.  So that can fail more easily.
> 
> So add the remove_and_add_spares() into "remove" handling.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 866825f10b4c..2d26099e1160 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2596,6 +2596,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
>  		else
>  			err = -EBUSY;
>  	} else if (cmd_match(buf, "remove")) {
> +		clear_bit(Blocked, &rdev->flags);
> +		remove_and_add_spares(rdev->mddev, rdev);
>  		if (rdev->raid_disk >= 0)
>  			err = -EBUSY;
>  		else {

Do we need wakeup rdev->blocked_wait here? I noticed some times we do the
wakeup but sometimes we not. This makes me worry about
md_wait_for_blocked_rdev. Will timeout cause anything bad?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request.
  2016-06-02  6:19 ` [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request NeilBrown
@ 2016-06-03 22:33   ` Shaohua Li
  2016-06-10  6:46     ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Shaohua Li @ 2016-06-03 22:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
> mirrors[].rdev can become NULL at any point unless:
>   - a counted reference is held
>   - ->reconfig_mutex is held, or
>   - rcu_read_lock() is held
> 
> Previously they could not become NULL during a resync/recovery/reshape either.
> However when remove_and_add_spares() was added to hot_remove_disk(), that
> changed.
> 
> So raid10_sync_request didn't previously need to protect rdev access,
> but now it does.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/md/raid10.c |  120 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 72 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e6f3a11f8f70..f6f21a253926 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2871,11 +2871,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  				/* Completed a full sync so the replacements
>  				 * are now fully recovered.
>  				 */
> -				for (i = 0; i < conf->geo.raid_disks; i++)
> -					if (conf->mirrors[i].replacement)
> -						conf->mirrors[i].replacement
> -							->recovery_offset
> -							= MaxSector;
> +				rcu_read_lock();
> +				for (i = 0; i < conf->geo.raid_disks; i++) {
> +					struct md_rdev *rdev =
> +						rcu_dereference(conf->mirrors[i].replacement);
> +					if (rdev)
> +						rdev->recovery_offset = MaxSector;
> +				}
> +				rcu_read_unlock();
>  			}
>  			conf->fullsync = 0;
>  		}
> @@ -2934,14 +2937,17 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  			int must_sync;
>  			int any_working;
>  			struct raid10_info *mirror = &conf->mirrors[i];
> +			struct md_rdev *mrdev, *mreplace;
>  
> -			if ((mirror->rdev == NULL ||
> -			     test_bit(In_sync, &mirror->rdev->flags))
> -			    &&
> -			    (mirror->replacement == NULL ||
> -			     test_bit(Faulty,
> -				      &mirror->replacement->flags)))
> +			rcu_read_lock();
> +			mrdev = rcu_dereference(mirror->rdev);
> +			mreplace = rcu_dereference(mirror->replacement);
> +
> +			if ((mrdev == NULL ||
> +			     test_bit(In_sync, &mrdev->flags))) {
> +				rcu_read_unlock();
>  				continue;
> +			}

We don't check mreplace here.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 09/18] md/raid10: stop print_conf from being too verbose.
  2016-06-02 22:48     ` NeilBrown
@ 2016-06-03 22:39       ` Shaohua Li
  2016-06-10  6:47         ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Shaohua Li @ 2016-06-03 22:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stoffel, linux-raid

On Fri, Jun 03, 2016 at 08:48:33AM +1000, Neil Brown wrote:
> On Fri, Jun 03 2016, John Stoffel wrote:
> 
> > NeilBrown> raid10 arrays can usefully be very large.  When they are,
> > NeilBrown> the noise generated by print_conf can over-whelm logs.  So
> > NeilBrown> truncate the listing at 16 devices.
> >
> > Why is this too noisy and how often does this print_conf() happen?
> > And why 16 devices?  I guess I don't like the magic number of 16 here,
> > I'd prefer it to be a define, and possibly even something that can by
> > dynamically changed.
> 
> print_conf happens whenever a device becomes active in the array, or a
> device is removed from the array (usually because it has failed).
> 
> I got 16 by choosing a random number and multiplying by 4 (or maybe by
> raising 2 to that power) :-)
> 
> More seriously, I guessed that most arrays were 16 devices or less, so
> this would not affect most arrays.
> 
> I definitely don't think it needs a define.  I'm very tempted to remove
> print_conf() completely, but it is sometimes useful.  So having it
> present as long as it isn't a burden seems reasonable.
> 
> If configuration was important I would change it to use pr_debug(), but
> then the default would be to not see the messages at all, and they can
> be useful in diagnosing problems reported on mailing lists.
> 
> >
> > But how big a problem is this really?  And what about for big RAID5/6
> > arrays as well?
> 
> When you have 2000 devices in your RAID10 and half of them are removed
> at once, it currently reports on 2,000,000 devices.  With the patch it
> is only 32000.  Still possibly too many.
> 
> If you have 2000 devices in your RAID5/6 and half of them are removed,
> you have other problems.
> 
> >
> > Or would it be also good to condence the output of print_conf()
> > itself?
> 
> Probably a very good idea.  Maybe the default could print a fixed-size
> summary and the rest goes in pr_debug()??
> 
> >
> > Of if it's noise, why not just remove it completely?  Can this
> > information be found in /proc/mdstat instead?
> 
> Its value is historical - trying to understand a past sequence of
> events.  For that, something in the logs beats something in /proc.
> 
> >
> > Sorry I havent' looked in the code deeply, but this just struck me as
> > a change that might not be ideal.
> 
> Fair enough - your comments are very valid.  I'm not really sure what is
> best.  I only know what is worst :-) and want to avoid that.

I would prefer pr_debug. pr_debug can be enabled dynamically. If the info is
helpful for diagnosing, enabling it is simple.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 01/18] md: disconnect device from personality before trying to remove it.
  2016-06-03 22:31   ` Shaohua Li
@ 2016-06-10  6:40     ` NeilBrown
  0 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-10  6:40 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

On Sat, Jun 04 2016, Shaohua Li wrote:

> On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
>> When the HOT_REMOVE_DISK ioctl is used to remove a device, we
>> call remove_and_add_spares() which will remove it from the personality
>> if possible.  This improves the chances that the removal will succeed.
>> 
>> When writing "remove" to dev-XX/state, we don't.  So that can fail more easily.
>> 
>> So add the remove_and_add_spares() into "remove" handling.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/md/md.c |    2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 866825f10b4c..2d26099e1160 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -2596,6 +2596,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
>>  		else
>>  			err = -EBUSY;
>>  	} else if (cmd_match(buf, "remove")) {
>> +		clear_bit(Blocked, &rdev->flags);
>> +		remove_and_add_spares(rdev->mddev, rdev);
>>  		if (rdev->raid_disk >= 0)
>>  			err = -EBUSY;
>>  		else {
>
> Do we need wakeup rdev->blocked_wait here? I noticed some times we do the
> wakeup but sometimes we not. This makes me worry about
> md_wait_for_blocked_rdev. Will timeout cause anything bad?

(sorry for delay)

Yes, we probably should but it isn't very important.  If any code is
waiting then the 'remove' will fail and whatever would normally unblock
the device will still unblock and wake up as it would have done.
It is import to wake blocked_rdev when we write out the bad block list
or other metadata which makes it safe to use the device.  If we clear
the flag at other time it doesn't really matter.

Adding a wakeup here might make it more likely for the "remove" to work
in some rare cases, but I think that is the most significant effect.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request.
  2016-06-03 22:33   ` Shaohua Li
@ 2016-06-10  6:46     ` NeilBrown
  2016-06-10 16:22       ` Shaohua Li
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2016-06-10  6:46 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]

On Sat, Jun 04 2016, Shaohua Li wrote:

> On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
>> mirrors[].rdev can become NULL at any point unless:
>>   - a counted reference is held
>>   - ->reconfig_mutex is held, or
>>   - rcu_read_lock() is held
>> 
>> Previously they could not become NULL during a resync/recovery/reshape either.
>> However when remove_and_add_spares() was added to hot_remove_disk(), that
>> changed.
>> 
>> So raid10_sync_request didn't previously need to protect rdev access,
>> but now it does.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/md/raid10.c |  120 +++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 72 insertions(+), 48 deletions(-)
>> 
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e6f3a11f8f70..f6f21a253926 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -2871,11 +2871,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>  				/* Completed a full sync so the replacements
>>  				 * are now fully recovered.
>>  				 */
>> -				for (i = 0; i < conf->geo.raid_disks; i++)
>> -					if (conf->mirrors[i].replacement)
>> -						conf->mirrors[i].replacement
>> -							->recovery_offset
>> -							= MaxSector;
>> +				rcu_read_lock();
>> +				for (i = 0; i < conf->geo.raid_disks; i++) {
>> +					struct md_rdev *rdev =
>> +						rcu_dereference(conf->mirrors[i].replacement);
>> +					if (rdev)
>> +						rdev->recovery_offset = MaxSector;
>> +				}
>> +				rcu_read_unlock();
>>  			}
>>  			conf->fullsync = 0;
>>  		}
>> @@ -2934,14 +2937,17 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>  			int must_sync;
>>  			int any_working;
>>  			struct raid10_info *mirror = &conf->mirrors[i];
>> +			struct md_rdev *mrdev, *mreplace;
>>  
>> -			if ((mirror->rdev == NULL ||
>> -			     test_bit(In_sync, &mirror->rdev->flags))
>> -			    &&
>> -			    (mirror->replacement == NULL ||
>> -			     test_bit(Faulty,
>> -				      &mirror->replacement->flags)))
>> +			rcu_read_lock();
>> +			mrdev = rcu_dereference(mirror->rdev);
>> +			mreplace = rcu_dereference(mirror->replacement);
>> +
>> +			if ((mrdev == NULL ||
>> +			     test_bit(In_sync, &mrdev->flags))) {
>> +				rcu_read_unlock();
>>  				continue;
>> +			}
>
> We don't check mreplace here.

As you say ...  I wonder if I thought I was being clever somehow...

I'll resubmit with that fixed but it probably won't be for a couple of
week (I'm actually on leave, but thought I should reply).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 09/18] md/raid10: stop print_conf from being too verbose.
  2016-06-03 22:39       ` Shaohua Li
@ 2016-06-10  6:47         ` NeilBrown
  0 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2016-06-10  6:47 UTC (permalink / raw)
  To: Shaohua Li; +Cc: John Stoffel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2873 bytes --]

On Sat, Jun 04 2016, Shaohua Li wrote:

> On Fri, Jun 03, 2016 at 08:48:33AM +1000, Neil Brown wrote:
>> On Fri, Jun 03 2016, John Stoffel wrote:
>> 
>> > NeilBrown> raid10 arrays can usefully be very large.  When they are,
>> > NeilBrown> the noise generated by print_conf can over-whelm logs.  So
>> > NeilBrown> truncate the listing at 16 devices.
>> >
>> > Why is this too noisy and how often does this print_conf() happen?
>> > And why 16 devices?  I guess I don't like the magic number of 16 here,
>> > I'd prefer it to be a define, and possibly even something that can by
>> > dynamically changed.
>> 
>> print_conf happens whenever a device becomes active in the array, or a
>> device is removed from the array (usually because it has failed).
>> 
>> I got 16 by choosing a random number and multiplying by 4 (or maybe by
>> raising 2 to that power) :-)
>> 
>> More seriously, I guessed that most arrays were 16 devices or less, so
>> this would not affect most arrays.
>> 
>> I definitely don't think it needs a define.  I'm very tempted to remove
>> print_conf() completely, but it is sometimes useful.  So having it
>> present as long as it isn't a burden seems reasonable.
>> 
>> If configuration was important I would change it to use pr_debug(), but
>> then the default would be to not see the messages at all, and they can
>> be useful in diagnosing problems reported on mailing lists.
>> 
>> >
>> > But how big a problem is this really?  And what about for big RAID5/6
>> > arrays as well?
>> 
>> When you have 2000 devices in your RAID10 and half of them are removed
>> at once, it currently reports on 2,000,000 devices.  With the patch it
>> is only 32000.  Still possibly too many.
>> 
>> If you have 2000 devices in your RAID5/6 and half of them are removed,
>> you have other problems.
>> 
>> >
>> > Or would it be also good to condence the output of print_conf()
>> > itself?
>> 
>> Probably a very good idea.  Maybe the default could print a fixed-size
>> summary and the rest goes in pr_debug()??
>> 
>> >
>> > Of if it's noise, why not just remove it completely?  Can this
>> > information be found in /proc/mdstat instead?
>> 
>> Its value is historical - trying to understand a past sequence of
>> events.  For that, something in the logs beats something in /proc.
>> 
>> >
>> > Sorry I havent' looked in the code deeply, but this just struck me as
>> > a change that might not be ideal.
>> 
>> Fair enough - your comments are very valid.  I'm not really sure what is
>> best.  I only know what is worst :-) and want to avoid that.
>
> I would prefer pr_debug. pr_debug can be enabled dynamically. If the info is
> helpful for diagnosing, enabling it is simple.

Fair enough.  Please drop this patch.  I'll come up with an improved proposal
and resubmit.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request.
  2016-06-10  6:46     ` NeilBrown
@ 2016-06-10 16:22       ` Shaohua Li
  0 siblings, 0 replies; 29+ messages in thread
From: Shaohua Li @ 2016-06-10 16:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Fri, Jun 10, 2016 at 04:46:45PM +1000, Neil Brown wrote:
> On Sat, Jun 04 2016, Shaohua Li wrote:
> 
> > On Thu, Jun 02, 2016 at 04:19:52PM +1000, Neil Brown wrote:
> >> mirrors[].rdev can become NULL at any point unless:
> >>   - a counted reference is held
> >>   - ->reconfig_mutex is held, or
> >>   - rcu_read_lock() is held
> >> 
> >> Previously they could not become NULL during a resync/recovery/reshape either.
> >> However when remove_and_add_spares() was added to hot_remove_disk(), that
> >> changed.
> >> 
> >> So raid10_sync_request didn't previously need to protect rdev access,
> >> but now it does.
> >> 
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >>  drivers/md/raid10.c |  120 +++++++++++++++++++++++++++++++--------------------
> >>  1 file changed, 72 insertions(+), 48 deletions(-)
> >> 
> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >> index e6f3a11f8f70..f6f21a253926 100644
> >> --- a/drivers/md/raid10.c
> >> +++ b/drivers/md/raid10.c
> >> @@ -2871,11 +2871,14 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >>  				/* Completed a full sync so the replacements
> >>  				 * are now fully recovered.
> >>  				 */
> >> -				for (i = 0; i < conf->geo.raid_disks; i++)
> >> -					if (conf->mirrors[i].replacement)
> >> -						conf->mirrors[i].replacement
> >> -							->recovery_offset
> >> -							= MaxSector;
> >> +				rcu_read_lock();
> >> +				for (i = 0; i < conf->geo.raid_disks; i++) {
> >> +					struct md_rdev *rdev =
> >> +						rcu_dereference(conf->mirrors[i].replacement);
> >> +					if (rdev)
> >> +						rdev->recovery_offset = MaxSector;
> >> +				}
> >> +				rcu_read_unlock();
> >>  			}
> >>  			conf->fullsync = 0;
> >>  		}
> >> @@ -2934,14 +2937,17 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> >>  			int must_sync;
> >>  			int any_working;
> >>  			struct raid10_info *mirror = &conf->mirrors[i];
> >> +			struct md_rdev *mrdev, *mreplace;
> >>  
> >> -			if ((mirror->rdev == NULL ||
> >> -			     test_bit(In_sync, &mirror->rdev->flags))
> >> -			    &&
> >> -			    (mirror->replacement == NULL ||
> >> -			     test_bit(Faulty,
> >> -				      &mirror->replacement->flags)))
> >> +			rcu_read_lock();
> >> +			mrdev = rcu_dereference(mirror->rdev);
> >> +			mreplace = rcu_dereference(mirror->replacement);
> >> +
> >> +			if ((mrdev == NULL ||
> >> +			     test_bit(In_sync, &mrdev->flags))) {
> >> +				rcu_read_unlock();
> >>  				continue;
> >> +			}
> >
> > We don't check mreplace here.
> 
> As you say ...  I wonder if I thought I was being clever somehow...
> 
> I'll resubmit with that fixed but it probably won't be for a couple of
> week (I'm actually on leave, but thought I should reply).

I can fix this one if you like.
So I'll drop patch 9 and apply others with this patch fixed. Sounds good?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-06-10 16:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02  6:19 [md PATCH 00/18] Assorted minor fixes, particularly RCU protection NeilBrown
2016-06-02  6:19 ` [md PATCH 16/18] md/multipath: add rcu protection to rdev access in multipath_status NeilBrown
2016-06-02  6:19 ` [md PATCH 08/18] md/raid10: simplify print_conf a little NeilBrown
2016-06-02  6:19 ` [md PATCH 04/18] md/raid10: add rcu protection in raid10_status NeilBrown
2016-06-02  6:19 ` [md PATCH 07/18] md/raid10: minor code improvement in fix_read_error() NeilBrown
2016-06-02  6:19 ` [md PATCH 10/18] md/raid1: small cleanup in raid1_end_read/write_request NeilBrown
2016-06-02  6:19 ` [md PATCH 11/18] md/raid1: small code cleanup in end_sync_write NeilBrown
2016-06-02  6:19 ` [md PATCH 03/18] md/raid10: fix refounct imbalance when resyncing an array with a replacement device NeilBrown
2016-06-02  6:19 ` [md PATCH 09/18] md/raid10: stop print_conf from being too verbose NeilBrown
2016-06-02 18:47   ` John Stoffel
2016-06-02 22:48     ` NeilBrown
2016-06-03 22:39       ` Shaohua Li
2016-06-10  6:47         ` NeilBrown
2016-06-02  6:19 ` [md PATCH 05/18] md/raid10: add rcu protection to rdev access in raid10_sync_request NeilBrown
2016-06-03 22:33   ` Shaohua Li
2016-06-10  6:46     ` NeilBrown
2016-06-10 16:22       ` Shaohua Li
2016-06-02  6:19 ` [md PATCH 14/18] md/raid5: add rcu protection to rdev accesses in want_replace NeilBrown
2016-06-02  6:19 ` [md PATCH 15/18] md/raid5: add rcu protection to rdev accesses in raid5_status NeilBrown
2016-06-02  6:19 ` [md PATCH 06/18] md/raid10: add rcu protection to rdev access during reshape NeilBrown
2016-06-02  6:19 ` [md PATCH 12/18] md/raid1: add rcu protection to rdev in fix_read_error NeilBrown
2016-06-02  6:19 ` [md PATCH 01/18] md: disconnect device from personality before trying to remove it NeilBrown
2016-06-03 22:31   ` Shaohua Li
2016-06-10  6:40     ` NeilBrown
2016-06-02  6:19 ` [md PATCH 13/18] md/raid5: add rcu protection to rdev accesses in handle_failed_sync NeilBrown
2016-06-02  6:19 ` [md PATCH 02/18] md/raid1, raid10: don't recheck "Faulty" flag in read-balance NeilBrown
2016-06-02  6:19 ` [md PATCH 18/18] md: reduce the number of synchronize_rcu() calls when multiple devices fail NeilBrown
2016-06-02  6:19 ` [md PATCH 17/18] md: be extra careful not to take a reference to a Faulty device NeilBrown
2016-06-03 22:28 ` [md PATCH 00/18] Assorted minor fixes, particularly RCU protection Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.