All of lore.kernel.org
 help / color / mirror / Atom feed
* [md PATCH 00/10] Simplify bio splitting and related code.
@ 2017-04-05  4:05 NeilBrown
  2017-04-05  4:05 ` [md PATCH 03/10] Revert "block: introduce bio_copy_data_partial" NeilBrown
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

This is part of my little project to make bio splitting
in Linux uniform and dead-lock free, in a way that will mean
that we can get rid of all the bioset threads.

The basic approach is that when a bio needs to be split, we call
bio_split(), bio_chain() and then generic_make_request().
We then proceed to handle the remainder without further splitting.
Recent changes to generic_make_request() ensure that this will
be safe from deadlocks, providing each bioset is used only once
in the stack.

This leads to simpler code in various places.  In particular, the
splitting of bios that is needed to work around known bad blocks
is now much less complex.  There is only ever one r1bio per bio.

As you can see from
 10 files changed, 335 insertions(+), 540 deletions(-)
there is a net reduction in code.

Thanks,
NeilBrown

---

NeilBrown (10):
      md/raid1: simplify the splitting of requests.
      md/raid1: simplify alloc_behind_master_bio()
      Revert "block: introduce bio_copy_data_partial"
      md/raid1: simplify handle_read_error().
      md/raid1: factor out flush_bio_list()
      md/raid10: simplify the splitting of requests.
      md/raid10: simplify handle_read_error()
      md/raid5: make chunk_aligned_read() split bios more cleanly.
      md/linear: improve bio splitting.
      md/raid0: fix up bio splitting.


 block/bio.c         |   60 ++-------
 drivers/md/linear.c |   75 +++++------
 drivers/md/raid0.c  |   73 +++++------
 drivers/md/raid1.c  |  346 ++++++++++++++++++++-------------------------------
 drivers/md/raid1.h  |    2 
 drivers/md/raid10.c |  282 ++++++++++++++----------------------------
 drivers/md/raid10.h |    1 
 drivers/md/raid5.c  |   33 +++--
 drivers/md/raid5.h  |    1 
 include/linux/bio.h |    2 
 10 files changed, 335 insertions(+), 540 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [md PATCH 01/10] md/raid1: simplify the splitting of requests.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
  2017-04-05  4:05 ` [md PATCH 03/10] Revert "block: introduce bio_copy_data_partial" NeilBrown
  2017-04-05  4:05 ` [md PATCH 02/10] md/raid1: simplify alloc_behind_master_bio() NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 04/10] md/raid1: simplify handle_read_error() NeilBrown
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

raid1 currently splits requests in two different ways for
two different reasons.

First, bio_split() is used to ensure the bio fits within a
resync accounting region.
Second, multiple r1bios are allocated for each bio to handle
the possiblity of known bad blocks on some devices.

This can be simplified to just use bio_split() once, and not
use multiple r1bios.
We delay the split until we know a maximum bio size that can
be handled with a single r1bio, and then split the bio and
queue the remainder for later handling.

This avoids all loops inside raid1.c request handling.  Just
a single read, or a single set of writes, is submitted to
lower-level devices for each bio that comes from
generic_make_request().

When the bio needs to be split, generic_make_request() will
do the necessary looping and call md_make_request() multiple
times.

raid1_make_request() no longer queues request for raid1 to handle,
so we can remove that branch from the 'if'.

This patch also creates a new private bio_set
(conf->bio_split) for splitting bios.  Using fs_bio_set
is wrong, as it is meant to be used by filesystems, not
block devices.  Using it inside md can lead to deadlocks
under high memory pressure.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |  138 ++++++++++++++++++----------------------------------
 drivers/md/raid1.h |    2 +
 2 files changed, 51 insertions(+), 89 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7d6723558fd8..83853f65462a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1202,7 +1202,8 @@ alloc_r1bio(struct mddev *mddev, struct bio *bio, sector_t sectors_handled)
 	return r1_bio;
 }
 
-static void raid1_read_request(struct mddev *mddev, struct bio *bio)
+static void raid1_read_request(struct mddev *mddev, struct bio *bio,
+			       int max_read_sectors)
 {
 	struct r1conf *conf = mddev->private;
 	struct raid1_info *mirror;
@@ -1211,7 +1212,6 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio)
 	struct bitmap *bitmap = mddev->bitmap;
 	const int op = bio_op(bio);
 	const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
-	int sectors_handled;
 	int max_sectors;
 	int rdisk;
 
@@ -1222,12 +1222,12 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio)
 	wait_read_barrier(conf, bio->bi_iter.bi_sector);
 
 	r1_bio = alloc_r1bio(mddev, bio, 0);
+	r1_bio->sectors = max_read_sectors;
 
 	/*
 	 * make_request() can abort the operation when read-ahead is being
 	 * used and no empty request is available.
 	 */
-read_again:
 	rdisk = read_balance(conf, r1_bio, &max_sectors);
 
 	if (rdisk < 0) {
@@ -1247,11 +1247,20 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio)
 		wait_event(bitmap->behind_wait,
 			   atomic_read(&bitmap->behind_writes) == 0);
 	}
+
+	if (max_sectors < bio_sectors(bio)) {
+		struct bio *split = bio_split(bio, max_sectors,
+					      GFP_NOIO, conf->bio_split);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+		r1_bio->master_bio = bio;
+		r1_bio->sectors = max_sectors;
+	}
+
 	r1_bio->read_disk = rdisk;
 
 	read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
-	bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
-		 max_sectors);
 
 	r1_bio->bios[rdisk] = read_bio;
 
@@ -1270,30 +1279,11 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio)
 	                              read_bio, disk_devt(mddev->gendisk),
 	                              r1_bio->sector);
 
-	if (max_sectors < r1_bio->sectors) {
-		/*
-		 * could not read all from this device, so we will need another
-		 * r1_bio.
-		 */
-		sectors_handled = (r1_bio->sector + max_sectors
-				   - bio->bi_iter.bi_sector);
-		r1_bio->sectors = max_sectors;
-		bio_inc_remaining(bio);
-
-		/*
-		 * Cannot call generic_make_request directly as that will be
-		 * queued in __make_request and subsequent mempool_alloc might
-		 * block waiting for it.  So hand bio over to raid1d.
-		 */
-		reschedule_retry(r1_bio);
-
-		r1_bio = alloc_r1bio(mddev, bio, sectors_handled);
-		goto read_again;
-	} else
-		generic_make_request(read_bio);
+	generic_make_request(read_bio);
 }
 
-static void raid1_write_request(struct mddev *mddev, struct bio *bio)
+static void raid1_write_request(struct mddev *mddev, struct bio *bio,
+				int max_write_sectors)
 {
 	struct r1conf *conf = mddev->private;
 	struct r1bio *r1_bio;
@@ -1304,7 +1294,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 	struct blk_plug_cb *cb;
 	struct raid1_plug_cb *plug = NULL;
 	int first_clone;
-	int sectors_handled;
 	int max_sectors;
 	sector_t offset;
 
@@ -1345,6 +1334,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 	wait_barrier(conf, bio->bi_iter.bi_sector);
 
 	r1_bio = alloc_r1bio(mddev, bio, 0);
+	r1_bio->sectors = max_write_sectors;
 
 	if (conf->pending_count >= max_queued_requests) {
 		md_wakeup_thread(mddev->thread);
@@ -1443,10 +1433,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 		goto retry_write;
 	}
 
-	if (max_sectors < r1_bio->sectors)
+	if (max_sectors < bio_sectors(bio)) {
+		struct bio *split = bio_split(bio, max_sectors,
+					      GFP_NOIO, conf->bio_split);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+		r1_bio->master_bio = bio;
 		r1_bio->sectors = max_sectors;
-
-	sectors_handled = r1_bio->sector + max_sectors - bio->bi_iter.bi_sector;
+	}
 
 	atomic_set(&r1_bio->remaining, 1);
 	atomic_set(&r1_bio->behind_remaining, 0);
@@ -1470,7 +1465,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {
 				mbio = alloc_behind_master_bio(r1_bio, bio,
-							       offset << 9,
+							       0,
 							       max_sectors << 9);
 			}
 
@@ -1486,10 +1481,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 				mbio = bio_clone_fast(r1_bio->behind_master_bio,
 						      GFP_NOIO,
 						      mddev->bio_set);
-			else {
+			else
 				mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
-				bio_trim(mbio, offset, max_sectors);
-			}
 		}
 
 		if (r1_bio->behind_master_bio) {
@@ -1536,19 +1529,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 		if (!plug)
 			md_wakeup_thread(mddev->thread);
 	}
-	/* Mustn't call r1_bio_write_done before this next test,
-	 * as it could result in the bio being freed.
-	 */
-	if (sectors_handled < bio_sectors(bio)) {
-		/* We need another r1_bio, which must be counted */
-		sector_t sect = bio->bi_iter.bi_sector + sectors_handled;
-
-		inc_pending(conf, sect);
-		bio_inc_remaining(bio);
-		r1_bio_write_done(r1_bio);
-		r1_bio = alloc_r1bio(mddev, bio, sectors_handled);
-		goto retry_write;
-	}
 
 	r1_bio_write_done(r1_bio);
 
@@ -1558,7 +1538,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 
 static void raid1_make_request(struct mddev *mddev, struct bio *bio)
 {
-	struct bio *split;
 	sector_t sectors;
 
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
@@ -1566,43 +1545,19 @@ static void raid1_make_request(struct mddev *mddev, struct bio *bio)
 		return;
 	}
 
-	/* if bio exceeds barrier unit boundary, split it */
-	do {
-		sectors = align_to_barrier_unit_end(
-				bio->bi_iter.bi_sector, bio_sectors(bio));
-		if (sectors < bio_sectors(bio)) {
-			split = bio_split(bio, sectors, GFP_NOIO, fs_bio_set);
-			bio_chain(split, bio);
-		} else {
-			split = bio;
-		}
-
-		if (bio_data_dir(split) == READ) {
-			raid1_read_request(mddev, split);
+	/* There is a limit to the maximum size, but
+	 * the read/write handler might find a lower limit
+	 * due to bad blocks.  To avoid multiple splits,
+	 * we pass the maximum number of sectors down
+	 * and let the lower level perform the split.
+	 */
+	sectors = align_to_barrier_unit_end(
+		bio->bi_iter.bi_sector, bio_sectors(bio));
 
-			/*
-			 * If a bio is splitted, the first part of bio will
-			 * pass barrier but the bio is queued in
-			 * current->bio_list (see generic_make_request). If
-			 * there is a raise_barrier() called here, the second
-			 * part of bio can't pass barrier. But since the first
-			 * part bio isn't dispatched to underlaying disks yet,
-			 * the barrier is never released, hence raise_barrier
-			 * will alays wait. We have a deadlock.
-			 * Note, this only happens in read path. For write
-			 * path, the first part of bio is dispatched in a
-			 * schedule() call (because of blk plug) or offloaded
-			 * to raid10d.
-			 * Quitting from the function immediately can change
-			 * the bio order queued in bio_list and avoid the deadlock.
-			 */
-			if (split != bio) {
-				generic_make_request(bio);
-				break;
-			}
-		} else
-			raid1_write_request(mddev, split);
-	} while (split != bio);
+	if (bio_data_dir(bio) == READ)
+		raid1_read_request(mddev, bio, sectors);
+	else
+		raid1_write_request(mddev, bio, sectors);
 }
 
 static void raid1_status(struct seq_file *seq, struct mddev *mddev)
@@ -2645,10 +2600,7 @@ static void raid1d(struct md_thread *thread)
 		else if (test_bit(R1BIO_ReadError, &r1_bio->state))
 			handle_read_error(conf, r1_bio);
 		else
-			/* just a partial read to be scheduled from separate
-			 * context
-			 */
-			generic_make_request(r1_bio->bios[r1_bio->read_disk]);
+			WARN_ON_ONCE(1);
 
 		cond_resched();
 		if (mddev->sb_flags & ~(1<<MD_SB_CHANGE_PENDING))
@@ -3036,6 +2988,10 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	if (!conf->r1bio_pool)
 		goto abort;
 
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	if (!conf->bio_split)
+		goto abort;
+
 	conf->poolinfo->mddev = mddev;
 
 	err = -EINVAL;
@@ -3117,6 +3073,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 		kfree(conf->nr_waiting);
 		kfree(conf->nr_queued);
 		kfree(conf->barrier);
+		if (conf->bio_split)
+			bioset_free(conf->bio_split);
 		kfree(conf);
 	}
 	return ERR_PTR(err);
@@ -3222,6 +3180,8 @@ static void raid1_free(struct mddev *mddev, void *priv)
 	kfree(conf->nr_waiting);
 	kfree(conf->nr_queued);
 	kfree(conf->barrier);
+	if (conf->bio_split)
+		bioset_free(conf->bio_split);
 	kfree(conf);
 }
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 4271cd7ac2de..b0ab0da6e39e 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -107,6 +107,8 @@ struct r1conf {
 	mempool_t		*r1bio_pool;
 	mempool_t		*r1buf_pool;
 
+	struct bio_set		*bio_split;
+
 	/* temporary buffer to synchronous IO when attempting to repair
 	 * a read error.
 	 */



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 02/10] md/raid1: simplify alloc_behind_master_bio()
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
  2017-04-05  4:05 ` [md PATCH 03/10] Revert "block: introduce bio_copy_data_partial" NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 01/10] md/raid1: simplify the splitting of requests NeilBrown
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Now that we always always pass an offset of 0 and a size
that matches the bio to alloc_behind_master_bio(),
we can remove the offset/size args and simplify the code.

We could probably remove bio_copy_data_partial() too.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 83853f65462a..06a13010f1b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1091,9 +1091,9 @@ static void unfreeze_array(struct r1conf *conf)
 }
 
 static struct bio *alloc_behind_master_bio(struct r1bio *r1_bio,
-					   struct bio *bio,
-					   int offset, int size)
+					   struct bio *bio)
 {
+	int size = bio->bi_iter.bi_size;
 	unsigned vcnt = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	int i = 0;
 	struct bio *behind_bio = NULL;
@@ -1120,8 +1120,7 @@ static struct bio *alloc_behind_master_bio(struct r1bio *r1_bio,
 		i++;
 	}
 
-	bio_copy_data_partial(behind_bio, bio, offset,
-			      behind_bio->bi_iter.bi_size);
+	bio_copy_data(behind_bio, bio);
 skip_copy:
 	r1_bio->behind_master_bio = behind_bio;;
 	set_bit(R1BIO_BehindIO, &r1_bio->state);
@@ -1464,9 +1463,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {
-				mbio = alloc_behind_master_bio(r1_bio, bio,
-							       0,
-							       max_sectors << 9);
+				mbio = alloc_behind_master_bio(r1_bio, bio);
 			}
 
 			bitmap_startwrite(bitmap, r1_bio->sector,



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 03/10] Revert "block: introduce bio_copy_data_partial"
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 02/10] md/raid1: simplify alloc_behind_master_bio() NeilBrown
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

This reverts commit 6f8802852f7e58a12177a86179803b9efaad98e2.
bio_copy_data_partial() is no longer needed.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/bio.c         |   60 +++++++++++----------------------------------------
 include/linux/bio.h |    2 --
 2 files changed, 13 insertions(+), 49 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 036435995c55..12c2837c4277 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -990,8 +990,19 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(bio_alloc_pages);
 
-static void __bio_copy_data(struct bio *dst, struct bio *src,
-			    int offset, int size)
+/**
+ * bio_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data(struct bio *dst, struct bio *src)
 {
 	struct bvec_iter src_iter, dst_iter;
 	struct bio_vec src_bv, dst_bv;
@@ -1001,12 +1012,6 @@ static void __bio_copy_data(struct bio *dst, struct bio *src,
 	src_iter = src->bi_iter;
 	dst_iter = dst->bi_iter;
 
-	/* for supporting partial copy */
-	if (offset || size != src->bi_iter.bi_size) {
-		bio_advance_iter(src, &src_iter, offset);
-		src_iter.bi_size = size;
-	}
-
 	while (1) {
 		if (!src_iter.bi_size) {
 			src = src->bi_next;
@@ -1043,47 +1048,8 @@ static void __bio_copy_data(struct bio *dst, struct bio *src,
 		bio_advance_iter(dst, &dst_iter, bytes);
 	}
 }
-
-/**
- * bio_copy_data - copy contents of data buffers from one chain of bios to
- * another
- * @src: source bio list
- * @dst: destination bio list
- *
- * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
- * @src and @dst as linked lists of bios.
- *
- * Stops when it reaches the end of either @src or @dst - that is, copies
- * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
- */
-void bio_copy_data(struct bio *dst, struct bio *src)
-{
-	__bio_copy_data(dst, src, 0, src->bi_iter.bi_size);
-}
 EXPORT_SYMBOL(bio_copy_data);
 
-/**
- * bio_copy_data_partial - copy partial contents of data buffers from one
- * chain of bios to another
- * @dst: destination bio list
- * @src: source bio list
- * @offset: starting copy from the offset
- * @size: how many bytes to copy
- *
- * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
- * @src and @dst as linked lists of bios.
- *
- * Stops when it reaches the end of either @src or @dst - that is, copies
- * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
- */
-void bio_copy_data_partial(struct bio *dst, struct bio *src,
-			   int offset, int size)
-{
-	__bio_copy_data(dst, src, offset, size);
-
-}
-EXPORT_SYMBOL(bio_copy_data_partial);
-
 struct bio_map_data {
 	int is_our_pages;
 	struct iov_iter iter;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index fafef6343d1b..7cf8a6c70a3f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -461,8 +461,6 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 #endif
 
 extern void bio_copy_data(struct bio *dst, struct bio *src);
-extern void bio_copy_data_partial(struct bio *dst, struct bio *src,
-				  int offset, int size);
 extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
 extern void bio_free_pages(struct bio *bio);
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 04/10] md/raid1: simplify handle_read_error().
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (2 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 01/10] md/raid1: simplify the splitting of requests NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 10/10] md/raid0: fix up bio splitting NeilBrown
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

handle_read_error() duplicates a lot of the work that raid1_read_request()
does, so it makes sense to just use that function.
This doesn't quite work as handle_read_error() relies on the same r1bio
being re-used so that, in the case of a read-only array, setting
IO_BLOCKED in r1bio->bios[] ensures read_balance() won't re-use
that device.
So we need to allow a r1bio to be passed to raid1_read_request(), and to
have that function mostly initialise the r1bio, but leave the bios[]
array untouched.

Two parts of handle_read_error() that need to be preserved are the warning
message it prints, so they are conditionally added to raid1_read_request().

Note that this highlights a minor bug on alloc_r1bio().  It doesn't
initalise the bios[] array, so it is possible that old content is there,
which might cause read_balance() to ignore some devices with no good reason.

With this change, we no longer need inc_pending(), or the sectors_handled
arg to alloc_r1bio().


As handle_read_error() is called from raid1d() and allocates memory,
there is tiny chance of a deadlock.  All element of various pools
could be queued waiting for raid1 to handle them, and there may be no
extra memory free.
Achieving guaranteed forward progress would probably require a second
thread and another mempool.  Instead of that complexity, add
__GFP_HIGH to any allocations when read1_read_request() is called
from raid1d.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |  138 ++++++++++++++++++++++------------------------------
 1 file changed, 58 insertions(+), 80 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 06a13010f1b7..d15268fff052 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -988,16 +988,6 @@ static void wait_read_barrier(struct r1conf *conf, sector_t sector_nr)
 	spin_unlock_irq(&conf->resync_lock);
 }
 
-static void inc_pending(struct r1conf *conf, sector_t bi_sector)
-{
-	/* The current request requires multiple r1_bio, so
-	 * we need to increment the pending count, and the corresponding
-	 * window count.
-	 */
-	int idx = sector_to_idx(bi_sector);
-	atomic_inc(&conf->nr_pending[idx]);
-}
-
 static void wait_barrier(struct r1conf *conf, sector_t sector_nr)
 {
 	int idx = sector_to_idx(sector_nr);
@@ -1184,35 +1174,58 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	kfree(plug);
 }
 
+static void init_r1bio(struct r1bio *r1_bio, struct mddev *mddev, struct bio *bio)
+{
+	r1_bio->master_bio = bio;
+	r1_bio->sectors = bio_sectors(bio);
+	r1_bio->state = 0;
+	r1_bio->mddev = mddev;
+	r1_bio->sector = bio->bi_iter.bi_sector;
+}
+
 static inline struct r1bio *
-alloc_r1bio(struct mddev *mddev, struct bio *bio, sector_t sectors_handled)
+alloc_r1bio(struct mddev *mddev, struct bio *bio)
 {
 	struct r1conf *conf = mddev->private;
 	struct r1bio *r1_bio;
 
 	r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
-
-	r1_bio->master_bio = bio;
-	r1_bio->sectors = bio_sectors(bio) - sectors_handled;
-	r1_bio->state = 0;
-	r1_bio->mddev = mddev;
-	r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
-
+	/* Ensure no bio records IO_BLOCKED */
+	memset(r1_bio->bios, 0, conf->raid_disks * sizeof(r1_bio->bios[0]));
+	init_r1bio(r1_bio, mddev, bio);
 	return r1_bio;
 }
 
 static void raid1_read_request(struct mddev *mddev, struct bio *bio,
-			       int max_read_sectors)
+			       int max_read_sectors, struct r1bio *r1_bio)
 {
 	struct r1conf *conf = mddev->private;
 	struct raid1_info *mirror;
-	struct r1bio *r1_bio;
 	struct bio *read_bio;
 	struct bitmap *bitmap = mddev->bitmap;
 	const int op = bio_op(bio);
 	const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
 	int max_sectors;
 	int rdisk;
+	bool print_msg = !!r1_bio;
+	char b[BDEVNAME_SIZE];
+	/* If r1_bio is set, we are blocking the raid1d thread
+	 * so there is a tiny risk of deadlock.  So ask for
+	 * emergency memory if needed.
+	 */
+	gfp_t gfp = r1_bio ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
+
+	if (print_msg) {
+		/* Need to get the block device name carefully */
+		struct md_rdev *rdev;
+		rcu_read_lock();
+		rdev = rcu_dereference(conf->mirrors[r1_bio->read_disk].rdev);
+		if (rdev)
+			bdevname(rdev->bdev, b);
+		else
+			strcpy(b, "???");
+		rcu_read_unlock();
+	}
 
 	/*
 	 * Still need barrier for READ in case that whole
@@ -1220,7 +1233,10 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	 */
 	wait_read_barrier(conf, bio->bi_iter.bi_sector);
 
-	r1_bio = alloc_r1bio(mddev, bio, 0);
+	if (!r1_bio)
+		r1_bio = alloc_r1bio(mddev, bio);
+	else
+		init_r1bio(r1_bio, mddev, bio);
 	r1_bio->sectors = max_read_sectors;
 
 	/*
@@ -1231,11 +1247,23 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 
 	if (rdisk < 0) {
 		/* couldn't find anywhere to read from */
+		if (print_msg) {
+			pr_crit_ratelimited("md/raid1:%s: %s: unrecoverable I/O read error for block %llu\n",
+					    mdname(mddev),
+					    b,
+					    (unsigned long long)r1_bio->sector);
+		}
 		raid_end_bio_io(r1_bio);
 		return;
 	}
 	mirror = conf->mirrors + rdisk;
 
+	if (print_msg)
+		pr_info_ratelimited("md/raid1:%s: redirecting sector %llu to other mirror: %s\n",
+				    mdname(mddev),
+				    (unsigned long long)r1_bio->sector,
+				    bdevname(mirror->rdev->bdev, b));
+
 	if (test_bit(WriteMostly, &mirror->rdev->flags) &&
 	    bitmap) {
 		/*
@@ -1249,7 +1277,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
-					      GFP_NOIO, conf->bio_split);
+					      gfp, conf->bio_split);
 		bio_chain(split, bio);
 		generic_make_request(bio);
 		bio = split;
@@ -1259,7 +1287,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 
 	r1_bio->read_disk = rdisk;
 
-	read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
+	read_bio = bio_clone_fast(bio, gfp, mddev->bio_set);
 
 	r1_bio->bios[rdisk] = read_bio;
 
@@ -1332,7 +1360,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	}
 	wait_barrier(conf, bio->bi_iter.bi_sector);
 
-	r1_bio = alloc_r1bio(mddev, bio, 0);
+	r1_bio = alloc_r1bio(mddev, bio);
 	r1_bio->sectors = max_write_sectors;
 
 	if (conf->pending_count >= max_queued_requests) {
@@ -1552,7 +1580,7 @@ static void raid1_make_request(struct mddev *mddev, struct bio *bio)
 		bio->bi_iter.bi_sector, bio_sectors(bio));
 
 	if (bio_data_dir(bio) == READ)
-		raid1_read_request(mddev, bio, sectors);
+		raid1_read_request(mddev, bio, sectors, NULL);
 	else
 		raid1_write_request(mddev, bio, sectors);
 }
@@ -2442,11 +2470,8 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
 
 static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 {
-	int disk;
-	int max_sectors;
 	struct mddev *mddev = conf->mddev;
 	struct bio *bio;
-	char b[BDEVNAME_SIZE];
 	struct md_rdev *rdev;
 	dev_t bio_dev;
 	sector_t bio_sector;
@@ -2462,7 +2487,6 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	 */
 
 	bio = r1_bio->bios[r1_bio->read_disk];
-	bdevname(bio->bi_bdev, b);
 	bio_dev = bio->bi_bdev->bd_dev;
 	bio_sector = conf->mirrors[r1_bio->read_disk].rdev->data_offset + r1_bio->sector;
 	bio_put(bio);
@@ -2480,58 +2504,12 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	}
 
 	rdev_dec_pending(rdev, conf->mddev);
+	allow_barrier(conf, r1_bio->sector);
+	bio = r1_bio->master_bio;
 
-read_more:
-	disk = read_balance(conf, r1_bio, &max_sectors);
-	if (disk == -1) {
-		pr_crit_ratelimited("md/raid1:%s: %s: unrecoverable I/O read error for block %llu\n",
-				    mdname(mddev), b, (unsigned long long)r1_bio->sector);
-		raid_end_bio_io(r1_bio);
-	} else {
-		const unsigned long do_sync
-			= r1_bio->master_bio->bi_opf & REQ_SYNC;
-		r1_bio->read_disk = disk;
-		bio = bio_clone_fast(r1_bio->master_bio, GFP_NOIO,
-				     mddev->bio_set);
-		bio_trim(bio, r1_bio->sector - bio->bi_iter.bi_sector,
-			 max_sectors);
-		r1_bio->bios[r1_bio->read_disk] = bio;
-		rdev = conf->mirrors[disk].rdev;
-		pr_info_ratelimited("md/raid1:%s: redirecting sector %llu to other mirror: %s\n",
-				    mdname(mddev),
-				    (unsigned long long)r1_bio->sector,
-				    bdevname(rdev->bdev, b));
-		bio->bi_iter.bi_sector = r1_bio->sector + rdev->data_offset;
-		bio->bi_bdev = rdev->bdev;
-		bio->bi_end_io = raid1_end_read_request;
-		bio_set_op_attrs(bio, REQ_OP_READ, do_sync);
-		if (test_bit(FailFast, &rdev->flags) &&
-		    test_bit(R1BIO_FailFast, &r1_bio->state))
-			bio->bi_opf |= MD_FAILFAST;
-		bio->bi_private = r1_bio;
-		if (max_sectors < r1_bio->sectors) {
-			/* Drat - have to split this up more */
-			struct bio *mbio = r1_bio->master_bio;
-			int sectors_handled = (r1_bio->sector + max_sectors
-					       - mbio->bi_iter.bi_sector);
-			r1_bio->sectors = max_sectors;
-			bio_inc_remaining(mbio);
-			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
-					      bio, bio_dev, bio_sector);
-			generic_make_request(bio);
-			bio = NULL;
-
-			r1_bio = alloc_r1bio(mddev, mbio, sectors_handled);
-			set_bit(R1BIO_ReadError, &r1_bio->state);
-			inc_pending(conf, r1_bio->sector);
-
-			goto read_more;
-		} else {
-			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
-					      bio, bio_dev, bio_sector);
-			generic_make_request(bio);
-		}
-	}
+	/* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
+	r1_bio->state = 0;
+	raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
 }
 
 static void raid1d(struct md_thread *thread)



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 05/10] md/raid1: factor out flush_bio_list()
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (7 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 09/10] md/linear: improve bio splitting NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 06/10] md/raid10: simplify the splitting of requests NeilBrown
  2017-04-11 17:01 ` [md PATCH 00/10] Simplify bio splitting and related code Shaohua Li
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

flush_pending_writes() and raid1_unplug() each contain identical
copies of a fairly large slab of code.  So factor that out into
new flush_bio_list() to simplify maintenance.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |   67 +++++++++++++++++++++-------------------------------
 1 file changed, 27 insertions(+), 40 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d15268fff052..a70283753a35 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -787,6 +787,31 @@ static int raid1_congested(struct mddev *mddev, int bits)
 	return ret;
 }
 
+static void flush_bio_list(struct r1conf *conf, struct bio *bio)
+{
+	/* flush any pending bitmap writes to
+	 * disk before proceeding w/ I/O */
+	bitmap_unplug(conf->mddev->bitmap);
+	wake_up(&conf->wait_barrier);
+
+	while (bio) { /* submit pending writes */
+		struct bio *next = bio->bi_next;
+		struct md_rdev *rdev = (void*)bio->bi_bdev;
+		bio->bi_next = NULL;
+		bio->bi_bdev = rdev->bdev;
+		if (test_bit(Faulty, &rdev->flags)) {
+			bio->bi_error = -EIO;
+			bio_endio(bio);
+		} else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
+				    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+			/* Just ignore it */
+			bio_endio(bio);
+		else
+			generic_make_request(bio);
+		bio = next;
+	}
+}
+
 static void flush_pending_writes(struct r1conf *conf)
 {
 	/* Any writes that have been queued but are awaiting
@@ -799,27 +824,7 @@ static void flush_pending_writes(struct r1conf *conf)
 		bio = bio_list_get(&conf->pending_bio_list);
 		conf->pending_count = 0;
 		spin_unlock_irq(&conf->device_lock);
-		/* flush any pending bitmap writes to
-		 * disk before proceeding w/ I/O */
-		bitmap_unplug(conf->mddev->bitmap);
-		wake_up(&conf->wait_barrier);
-
-		while (bio) { /* submit pending writes */
-			struct bio *next = bio->bi_next;
-			struct md_rdev *rdev = (void*)bio->bi_bdev;
-			bio->bi_next = NULL;
-			bio->bi_bdev = rdev->bdev;
-			if (test_bit(Faulty, &rdev->flags)) {
-				bio->bi_error = -EIO;
-				bio_endio(bio);
-			} else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
-					    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
-				/* Just ignore it */
-				bio_endio(bio);
-			else
-				generic_make_request(bio);
-			bio = next;
-		}
+		flush_bio_list(conf, bio);
 	} else
 		spin_unlock_irq(&conf->device_lock);
 }
@@ -1152,25 +1157,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 
 	/* we aren't scheduling, so we can do the write-out directly. */
 	bio = bio_list_get(&plug->pending);
-	bitmap_unplug(mddev->bitmap);
-	wake_up(&conf->wait_barrier);
-
-	while (bio) { /* submit pending writes */
-		struct bio *next = bio->bi_next;
-		struct md_rdev *rdev = (void*)bio->bi_bdev;
-		bio->bi_next = NULL;
-		bio->bi_bdev = rdev->bdev;
-		if (test_bit(Faulty, &rdev->flags)) {
-			bio->bi_error = -EIO;
-			bio_endio(bio);
-		} else if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
-				    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
-			/* Just ignore it */
-			bio_endio(bio);
-		else
-			generic_make_request(bio);
-		bio = next;
-	}
+	flush_bio_list(conf, bio);
 	kfree(plug);
 }
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 06/10] md/raid10: simplify the splitting of requests.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (8 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 05/10] md/raid1: factor out flush_bio_list() NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-11 17:01 ` [md PATCH 00/10] Simplify bio splitting and related code Shaohua Li
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

raid10 splits requests in two different ways for two different
reasons.

First, bio_split() is used to ensure the bio fits with a chunk.
Second, multiple r10bio structures are allocated to represent the
different sections that need to go to different devices, to avoid
known bad blocks.

This can be simplified to just use bio_split() once, and not to use
multiple r10bios.
We delay the split until we know a maximum bio size that can
be handled with a single r10bio, and then split the bio and queue
the remainder for later handling.

As with raid1, we allocate a new bio_set to help with the splitting.
It is not correct to use fs_bio_set in a device driver.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |  164 ++++++++++++++++-----------------------------------
 drivers/md/raid10.h |    1 
 2 files changed, 51 insertions(+), 114 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0f13d57ef646..ff02c058dbdd 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1127,7 +1127,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	struct bio *read_bio;
 	const int op = bio_op(bio);
 	const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
-	int sectors_handled;
 	int max_sectors;
 	sector_t sectors;
 	struct md_rdev *rdev;
@@ -1140,7 +1139,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	 */
 	wait_barrier(conf);
 
-	sectors = bio_sectors(bio);
+	sectors = r10_bio->sectors;
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    bio->bi_iter.bi_sector < conf->reshape_progress &&
 	    bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
@@ -1157,17 +1156,23 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 		wait_barrier(conf);
 	}
 
-read_again:
 	rdev = read_balance(conf, r10_bio, &max_sectors);
 	if (!rdev) {
 		raid_end_bio_io(r10_bio);
 		return;
 	}
+	if (max_sectors < bio_sectors(bio)) {
+		struct bio *split = bio_split(bio, max_sectors,
+					      GFP_NOIO, conf->bio_split);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+		r10_bio->master_bio = bio;
+		r10_bio->sectors = max_sectors;
+	}
 	slot = r10_bio->read_slot;
 
 	read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
-	bio_trim(read_bio, r10_bio->sector - bio->bi_iter.bi_sector,
-		 max_sectors);
 
 	r10_bio->devs[slot].bio = read_bio;
 	r10_bio->devs[slot].rdev = rdev;
@@ -1186,40 +1191,13 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	        trace_block_bio_remap(bdev_get_queue(read_bio->bi_bdev),
 	                              read_bio, disk_devt(mddev->gendisk),
 	                              r10_bio->sector);
-	if (max_sectors < r10_bio->sectors) {
-		/*
-		 * Could not read all from this device, so we will need another
-		 * r10_bio.
-		 */
-		sectors_handled = (r10_bio->sector + max_sectors
-				   - bio->bi_iter.bi_sector);
-		r10_bio->sectors = max_sectors;
-		inc_pending(conf);
-		bio_inc_remaining(bio);
-		/*
-		 * Cannot call generic_make_request directly as that will be
-		 * queued in __generic_make_request and subsequent
-		 * mempool_alloc might block waiting for it.  so hand bio over
-		 * to raid10d.
-		 */
-		reschedule_retry(r10_bio);
-
-		r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
-
-		r10_bio->master_bio = bio;
-		r10_bio->sectors = bio_sectors(bio) - sectors_handled;
-		r10_bio->state = 0;
-		r10_bio->mddev = mddev;
-		r10_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
-		goto read_again;
-	} else
-		generic_make_request(read_bio);
+	generic_make_request(read_bio);
 	return;
 }
 
 static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 				  struct bio *bio, bool replacement,
-				  int n_copy, int max_sectors)
+				  int n_copy)
 {
 	const int op = bio_op(bio);
 	const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
@@ -1243,7 +1221,6 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 		rdev = conf->mirrors[devnum].rdev;
 
 	mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
-	bio_trim(mbio, r10_bio->sector - bio->bi_iter.bi_sector, max_sectors);
 	if (replacement)
 		r10_bio->devs[n_copy].repl_bio = mbio;
 	else
@@ -1294,7 +1271,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	int i;
 	struct md_rdev *blocked_rdev;
 	sector_t sectors;
-	int sectors_handled;
 	int max_sectors;
 
 	md_write_start(mddev, bio);
@@ -1306,7 +1282,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	 */
 	wait_barrier(conf);
 
-	sectors = bio_sectors(bio);
+	sectors = r10_bio->sectors;
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    bio->bi_iter.bi_sector < conf->reshape_progress &&
 	    bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
@@ -1476,44 +1452,29 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 
 	if (max_sectors < r10_bio->sectors)
 		r10_bio->sectors = max_sectors;
-	sectors_handled = r10_bio->sector + max_sectors -
-		bio->bi_iter.bi_sector;
+
+	if (r10_bio->sectors < bio_sectors(bio)) {
+		struct bio *split = bio_split(bio, r10_bio->sectors,
+					      GFP_NOIO, conf->bio_split);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+		r10_bio->master_bio = bio;
+	}
 
 	atomic_set(&r10_bio->remaining, 1);
 	bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0);
 
 	for (i = 0; i < conf->copies; i++) {
 		if (r10_bio->devs[i].bio)
-			raid10_write_one_disk(mddev, r10_bio, bio, false,
-					      i, max_sectors);
+			raid10_write_one_disk(mddev, r10_bio, bio, false, i);
 		if (r10_bio->devs[i].repl_bio)
-			raid10_write_one_disk(mddev, r10_bio, bio, true,
-					      i, max_sectors);
-	}
-
-	/* Don't remove the bias on 'remaining' (one_write_done) until
-	 * after checking if we need to go around again.
-	 */
-
-	if (sectors_handled < bio_sectors(bio)) {
-		/* We need another r10_bio and it needs to be counted */
-		inc_pending(conf);
-		bio_inc_remaining(bio);
-		one_write_done(r10_bio);
-		r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
-
-		r10_bio->master_bio = bio;
-		r10_bio->sectors = bio_sectors(bio) - sectors_handled;
-
-		r10_bio->mddev = mddev;
-		r10_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
-		r10_bio->state = 0;
-		goto retry_write;
+			raid10_write_one_disk(mddev, r10_bio, bio, true, i);
 	}
 	one_write_done(r10_bio);
 }
 
-static void __make_request(struct mddev *mddev, struct bio *bio)
+static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 {
 	struct r10conf *conf = mddev->private;
 	struct r10bio *r10_bio;
@@ -1521,7 +1482,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 
 	r10_bio->master_bio = bio;
-	r10_bio->sectors = bio_sectors(bio);
+	r10_bio->sectors = sectors;
 
 	r10_bio->mddev = mddev;
 	r10_bio->sector = bio->bi_iter.bi_sector;
@@ -1538,54 +1499,26 @@ static void raid10_make_request(struct mddev *mddev, struct bio *bio)
 	struct r10conf *conf = mddev->private;
 	sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask);
 	int chunk_sects = chunk_mask + 1;
-
-	struct bio *split;
+	int sectors = bio_sectors(bio);
 
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
 		md_flush_request(mddev, bio);
 		return;
 	}
 
-	do {
-
-		/*
-		 * If this request crosses a chunk boundary, we need to split
-		 * it.
-		 */
-		if (unlikely((bio->bi_iter.bi_sector & chunk_mask) +
-			     bio_sectors(bio) > chunk_sects
-			     && (conf->geo.near_copies < conf->geo.raid_disks
-				 || conf->prev.near_copies <
-				 conf->prev.raid_disks))) {
-			split = bio_split(bio, chunk_sects -
-					  (bio->bi_iter.bi_sector &
-					   (chunk_sects - 1)),
-					  GFP_NOIO, fs_bio_set);
-			bio_chain(split, bio);
-		} else {
-			split = bio;
-		}
-
-		/*
-		 * If a bio is splitted, the first part of bio will pass
-		 * barrier but the bio is queued in current->bio_list (see
-		 * generic_make_request). If there is a raise_barrier() called
-		 * here, the second part of bio can't pass barrier. But since
-		 * the first part bio isn't dispatched to underlaying disks
-		 * yet, the barrier is never released, hence raise_barrier will
-		 * alays wait. We have a deadlock.
-		 * Note, this only happens in read path. For write path, the
-		 * first part of bio is dispatched in a schedule() call
-		 * (because of blk plug) or offloaded to raid10d.
-		 * Quitting from the function immediately can change the bio
-		 * order queued in bio_list and avoid the deadlock.
-		 */
-		__make_request(mddev, split);
-		if (split != bio && bio_data_dir(bio) == READ) {
-			generic_make_request(bio);
-			break;
-		}
-	} while (split != bio);
+	/*
+	 * If this request crosses a chunk boundary, we need to split
+	 * it.
+	 */
+	if (unlikely((bio->bi_iter.bi_sector & chunk_mask) +
+		     sectors > chunk_sects
+		     && (conf->geo.near_copies < conf->geo.raid_disks
+			 || conf->prev.near_copies <
+			 conf->prev.raid_disks)))
+		sectors = chunk_sects -
+			(bio->bi_iter.bi_sector &
+			 (chunk_sects - 1));
+	__make_request(mddev, bio, sectors);
 
 	/* In case raid10d snuck in to freeze_array */
 	wake_up(&conf->wait_barrier);
@@ -2873,13 +2806,8 @@ static void raid10d(struct md_thread *thread)
 			recovery_request_write(mddev, r10_bio);
 		else if (test_bit(R10BIO_ReadError, &r10_bio->state))
 			handle_read_error(mddev, r10_bio);
-		else {
-			/* just a partial read to be scheduled from a
-			 * separate context
-			 */
-			int slot = r10_bio->read_slot;
-			generic_make_request(r10_bio->devs[slot].bio);
-		}
+		else
+			WARN_ON_ONCE(1);
 
 		cond_resched();
 		if (mddev->sb_flags & ~(1<<MD_SB_CHANGE_PENDING))
@@ -3652,6 +3580,10 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 	if (!conf->r10bio_pool)
 		goto out;
 
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	if (!conf->bio_split)
+		goto out;
+
 	calc_sectors(conf, mddev->dev_sectors);
 	if (mddev->reshape_position == MaxSector) {
 		conf->prev = conf->geo;
@@ -3689,6 +3621,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 		mempool_destroy(conf->r10bio_pool);
 		kfree(conf->mirrors);
 		safe_put_page(conf->tmppage);
+		if (conf->bio_split)
+			bioset_free(conf->bio_split);
 		kfree(conf);
 	}
 	return ERR_PTR(err);
@@ -3898,6 +3832,8 @@ static void raid10_free(struct mddev *mddev, void *priv)
 	kfree(conf->mirrors);
 	kfree(conf->mirrors_old);
 	kfree(conf->mirrors_new);
+	if (conf->bio_split)
+		bioset_free(conf->bio_split);
 	kfree(conf);
 }
 
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 3162615e57bd..735ce1a3d260 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -82,6 +82,7 @@ struct r10conf {
 	mempool_t		*r10bio_pool;
 	mempool_t		*r10buf_pool;
 	struct page		*tmppage;
+	struct bio_set		*bio_split;
 
 	/* When taking over an array from a different personality, we store
 	 * the new thread here until we fully activate the array.



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 07/10] md/raid10: simplify handle_read_error()
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (5 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 09/10] md/linear: improve bio splitting NeilBrown
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

handle_read_error() duplicates a lot of the work that raid10_read_request()
does, so it makes sense to just use that function.

handle_read_error() relies on the same r10bio being re-used so that,
in the case of a read-only array, setting IO_BLOCKED in r1bio->devs[].bio
ensures read_balance() won't re-use that device.
So when called from raid10_make_request() we clear that array, but not
when called from handle_read_error().

Two parts of handle_read_error() that need to be preserved are the warning
message it prints, so they are conditionally added to
raid10_read_request().  If the failing rdev can be found, messages
are printed.  Otherwise they aren't.

Not that as rdev_dec_pending() has already been called on the failing
rdev, we need to use rcu_read_lock() to get a new reference from
the conf.  We only use this to get the name of the failing block device.

With this change, we no longer need inc_pending().

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid10.c |  120 +++++++++++++++++++--------------------------------
 1 file changed, 45 insertions(+), 75 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ff02c058dbdd..ca722570bd38 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1008,15 +1008,6 @@ static void wait_barrier(struct r10conf *conf)
 	spin_unlock_irq(&conf->resync_lock);
 }
 
-static void inc_pending(struct r10conf *conf)
-{
-	/* The current request requires multiple r10_bio, so
-	 * we need to increment the pending count.
-	 */
-	WARN_ON(!atomic_read(&conf->nr_pending));
-	atomic_inc(&conf->nr_pending);
-}
-
 static void allow_barrier(struct r10conf *conf)
 {
 	if ((atomic_dec_and_test(&conf->nr_pending)) ||
@@ -1130,8 +1121,36 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	int max_sectors;
 	sector_t sectors;
 	struct md_rdev *rdev;
-	int slot;
+	char b[BDEVNAME_SIZE];
+	int slot = r10_bio->read_slot;
+	struct md_rdev *err_rdev = NULL;
+	gfp_t gfp = GFP_NOIO;
+
+	if (r10_bio->devs[slot].rdev) {
+		/* This is an error retry, but we cannot
+		 * safely dereference the rdev in the r10_bio,
+		 * we must use the one in conf.
+		 * If it has already been disconnected (unlikely)
+		 * we lose the device name in error messages.
+		 */
+		int disk;
+		/* As we are blocking raid10, it is a little safer to
+		 * use __GFP_HIGH.
+		 */
+		gfp = GFP_NOIO | __GFP_HIGH;
 
+		rcu_read_lock();
+		disk = r10_bio->devs[slot].devnum;
+		err_rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		if (err_rdev)
+			bdevname(err_rdev->bdev, b);
+		else {
+			strcpy(b, "???");
+			/* This never gets dereferenced */
+			err_rdev = r10_bio->devs[slot].rdev;
+		}
+		rcu_read_unlock();
+	}
 	/*
 	 * Register the new request and wait if the reconstruction
 	 * thread has put up a bar for new requests.
@@ -1158,12 +1177,22 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 
 	rdev = read_balance(conf, r10_bio, &max_sectors);
 	if (!rdev) {
+		if (err_rdev) {
+			pr_crit_ratelimited("md/raid10:%s: %s: unrecoverable I/O read error for block %llu\n",
+					    mdname(mddev), b,
+					    (unsigned long long)r10_bio->sector);
+		}
 		raid_end_bio_io(r10_bio);
 		return;
 	}
+	if (err_rdev)
+		pr_err_ratelimited("md/raid10:%s: %s: redirecting sector %llu to another mirror\n",
+				   mdname(mddev),
+				   bdevname(rdev->bdev, b),
+				   (unsigned long long)r10_bio->sector);
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
-					      GFP_NOIO, conf->bio_split);
+					      gfp, conf->bio_split);
 		bio_chain(split, bio);
 		generic_make_request(bio);
 		bio = split;
@@ -1172,7 +1201,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	}
 	slot = r10_bio->read_slot;
 
-	read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
+	read_bio = bio_clone_fast(bio, gfp, mddev->bio_set);
 
 	r10_bio->devs[slot].bio = read_bio;
 	r10_bio->devs[slot].rdev = rdev;
@@ -1487,6 +1516,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 	r10_bio->mddev = mddev;
 	r10_bio->sector = bio->bi_iter.bi_sector;
 	r10_bio->state = 0;
+	memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->copies);
 
 	if (bio_data_dir(bio) == READ)
 		raid10_read_request(mddev, bio, r10_bio);
@@ -2556,9 +2586,6 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	struct bio *bio;
 	struct r10conf *conf = mddev->private;
 	struct md_rdev *rdev = r10_bio->devs[slot].rdev;
-	char b[BDEVNAME_SIZE];
-	unsigned long do_sync;
-	int max_sectors;
 	dev_t bio_dev;
 	sector_t bio_last_sector;
 
@@ -2571,7 +2598,6 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	 * frozen.
 	 */
 	bio = r10_bio->devs[slot].bio;
-	bdevname(bio->bi_bdev, b);
 	bio_dev = bio->bi_bdev->bd_dev;
 	bio_last_sector = r10_bio->devs[slot].addr + rdev->data_offset + r10_bio->sectors;
 	bio_put(bio);
@@ -2587,65 +2613,9 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 		md_error(mddev, rdev);
 
 	rdev_dec_pending(rdev, mddev);
-
-read_more:
-	rdev = read_balance(conf, r10_bio, &max_sectors);
-	if (rdev == NULL) {
-		pr_crit_ratelimited("md/raid10:%s: %s: unrecoverable I/O read error for block %llu\n",
-				    mdname(mddev), b,
-				    (unsigned long long)r10_bio->sector);
-		raid_end_bio_io(r10_bio);
-		return;
-	}
-
-	do_sync = (r10_bio->master_bio->bi_opf & REQ_SYNC);
-	slot = r10_bio->read_slot;
-	pr_err_ratelimited("md/raid10:%s: %s: redirecting sector %llu to another mirror\n",
-			   mdname(mddev),
-			   bdevname(rdev->bdev, b),
-			   (unsigned long long)r10_bio->sector);
-	bio = bio_clone_fast(r10_bio->master_bio, GFP_NOIO, mddev->bio_set);
-	bio_trim(bio, r10_bio->sector - bio->bi_iter.bi_sector, max_sectors);
-	r10_bio->devs[slot].bio = bio;
-	r10_bio->devs[slot].rdev = rdev;
-	bio->bi_iter.bi_sector = r10_bio->devs[slot].addr
-		+ choose_data_offset(r10_bio, rdev);
-	bio->bi_bdev = rdev->bdev;
-	bio_set_op_attrs(bio, REQ_OP_READ, do_sync);
-	if (test_bit(FailFast, &rdev->flags) &&
-	    test_bit(R10BIO_FailFast, &r10_bio->state))
-		bio->bi_opf |= MD_FAILFAST;
-	bio->bi_private = r10_bio;
-	bio->bi_end_io = raid10_end_read_request;
-	trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
-			      bio, bio_dev,
-			      bio_last_sector - r10_bio->sectors);
-
-	if (max_sectors < r10_bio->sectors) {
-		/* Drat - have to split this up more */
-		struct bio *mbio = r10_bio->master_bio;
-		int sectors_handled =
-			r10_bio->sector + max_sectors
-			- mbio->bi_iter.bi_sector;
-		r10_bio->sectors = max_sectors;
-		bio_inc_remaining(mbio);
-		inc_pending(conf);
-		generic_make_request(bio);
-
-		r10_bio = mempool_alloc(conf->r10bio_pool,
-					GFP_NOIO);
-		r10_bio->master_bio = mbio;
-		r10_bio->sectors = bio_sectors(mbio) - sectors_handled;
-		r10_bio->state = 0;
-		set_bit(R10BIO_ReadError,
-			&r10_bio->state);
-		r10_bio->mddev = mddev;
-		r10_bio->sector = mbio->bi_iter.bi_sector
-			+ sectors_handled;
-
-		goto read_more;
-	} else
-		generic_make_request(bio);
+	allow_barrier(conf);
+	r10_bio->state = 0;
+	raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
 }
 
 static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (4 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 10/10] md/raid0: fix up bio splitting NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05 22:15   ` kbuild test robot
  2017-04-06  0:13   ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 07/10] md/raid10: simplify handle_read_error() NeilBrown
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

chunk_aligned_read() currently uses fs_bio_set - which is meant for
filesystems to use - and loops if multiple splits are needed, which is
not best practice.
As this is only used for READ requests, not writes, it is unlikely
to cause a problem.  However it is best to be consistent in how
we split bios, and to follow the pattern used in raid1/raid10.

So create a private bioset, bio_split, and use it to perform a single
split, submitting the remainder to generic_make_request() for later
processing.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid5.c |   33 +++++++++++++++++----------------
 drivers/md/raid5.h |    1 +
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6036d5e41ddd..c523fd69a7bc 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5234,24 +5234,20 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio)
 {
 	struct bio *split;
+	sector_t sector = raid_bio->bi_iter.bi_sector;
+	unsigned chunk_sects = mddev->chunk_sectors;
+	unsigned sectors = chunk_sects - (sector & (chunk_sects-1));
 
-	do {
-		sector_t sector = raid_bio->bi_iter.bi_sector;
-		unsigned chunk_sects = mddev->chunk_sectors;
-		unsigned sectors = chunk_sects - (sector & (chunk_sects-1));
-
-		if (sectors < bio_sectors(raid_bio)) {
-			split = bio_split(raid_bio, sectors, GFP_NOIO, fs_bio_set);
-			bio_chain(split, raid_bio);
-		} else
-			split = raid_bio;
+	if (sectors < bio_sectors(raid_bio)) {
+		struct r5conf *conf = mddev->private;
+		split = bio_split(raid_bio, sectors, GFP_NOIO, conf->bio_split);
+		bio_chain(split, raid_bio);
+		generic_make_request(raid_bio);
+		raid_bio = split;
+	}
 
-		if (!raid5_read_one_chunk(mddev, split)) {
-			if (split != raid_bio)
-				generic_make_request(raid_bio);
-			return split;
-		}
-	} while (split != raid_bio);
+	if (!raid5_read_one_chunk(mddev, raid_bio))
+		return split;
 
 	return NULL;
 }
@@ -6735,6 +6731,8 @@ static void free_conf(struct r5conf *conf)
 		if (conf->disks[i].extra_page)
 			put_page(conf->disks[i].extra_page);
 	kfree(conf->disks);
+	if (conf->bio_split)
+		bioset_free(conf->bio_split);
 	kfree(conf->stripe_hashtbl);
 	kfree(conf->pending_data);
 	kfree(conf);
@@ -6910,6 +6908,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 			goto abort;
 	}
 
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	if (!conf->bio_split)
+		goto abort;
 	conf->mddev = mddev;
 
 	if ((conf->stripe_hashtbl = kzalloc(PAGE_SIZE, GFP_KERNEL)) == NULL)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index cdc7f92e1806..625c7f16fd6b 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -646,6 +646,7 @@ struct r5conf {
 	int			pool_size; /* number of disks in stripeheads in pool */
 	spinlock_t		device_lock;
 	struct disk_info	*disks;
+	struct bio_set		*bio_split;
 
 	/* When taking over an array from a different personality, we store
 	 * the new thread here until we fully activate the array.



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 09/10] md/linear: improve bio splitting.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (6 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 07/10] md/raid10: simplify handle_read_error() NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 05/10] md/raid1: factor out flush_bio_list() NeilBrown
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

linear_make_request() uses fs_bio_set, which is meant for filesystems
to use, and loops, possible allocating  from the same bio set multiple
times.
These behaviors can theoretically cause deadlocks, though as
linear requests are hardly ever split, it is unlikely in practice.

Change to use mddev->bio_set - otherwise unused for linear, and submit
the tail of a split request to generic_make_request() for it to
handle.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/linear.c |   75 ++++++++++++++++++++++++---------------------------
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 3e38e0207a3e..d67105a15a39 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -249,53 +249,50 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 {
 	char b[BDEVNAME_SIZE];
 	struct dev_info *tmp_dev;
-	struct bio *split;
 	sector_t start_sector, end_sector, data_offset;
+	sector_t bio_sector = bio->bi_iter.bi_sector;
 
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
 		md_flush_request(mddev, bio);
 		return;
 	}
 
-	do {
-		sector_t bio_sector = bio->bi_iter.bi_sector;
-		tmp_dev = which_dev(mddev, bio_sector);
-		start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
-		end_sector = tmp_dev->end_sector;
-		data_offset = tmp_dev->rdev->data_offset;
-		bio->bi_bdev = tmp_dev->rdev->bdev;
-
-		if (unlikely(bio_sector >= end_sector ||
-			     bio_sector < start_sector))
-			goto out_of_bounds;
-
-		if (unlikely(bio_end_sector(bio) > end_sector)) {
-			/* This bio crosses a device boundary, so we have to
-			 * split it.
-			 */
-			split = bio_split(bio, end_sector - bio_sector,
-					  GFP_NOIO, fs_bio_set);
-			bio_chain(split, bio);
-		} else {
-			split = bio;
-		}
+	tmp_dev = which_dev(mddev, bio_sector);
+	start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
+	end_sector = tmp_dev->end_sector;
+	data_offset = tmp_dev->rdev->data_offset;
+
+	if (unlikely(bio_sector >= end_sector ||
+		     bio_sector < start_sector))
+		goto out_of_bounds;
+
+	if (unlikely(bio_end_sector(bio) > end_sector)) {
+		/* This bio crosses a device boundary, so we have to
+		 * split it.
+		 */
+		struct bio *split = bio_split(bio, end_sector - bio_sector,
+					      GFP_NOIO, mddev->bio_set);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+	}
 
-		split->bi_iter.bi_sector = split->bi_iter.bi_sector -
-			start_sector + data_offset;
-
-		if (unlikely((bio_op(split) == REQ_OP_DISCARD) &&
-			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
-			/* Just ignore it */
-			bio_endio(split);
-		} else {
-			if (mddev->gendisk)
-				trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
-						      split, disk_devt(mddev->gendisk),
-						      bio_sector);
-			mddev_check_writesame(mddev, split);
-			generic_make_request(split);
-		}
-	} while (split != bio);
+	bio->bi_bdev = tmp_dev->rdev->bdev;
+	bio->bi_iter.bi_sector = bio->bi_iter.bi_sector -
+		start_sector + data_offset;
+
+	if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
+		     !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio);
+	} else {
+		if (mddev->gendisk)
+			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+					      bio, disk_devt(mddev->gendisk),
+					      bio_sector);
+		mddev_check_writesame(mddev, bio);
+		generic_make_request(bio);
+	}
 	return;
 
 out_of_bounds:



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 10/10] md/raid0: fix up bio splitting.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (3 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 04/10] md/raid1: simplify handle_read_error() NeilBrown
@ 2017-04-05  4:05 ` NeilBrown
  2017-04-05  4:05 ` [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly NeilBrown
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-05  4:05 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

raid0_make_request() should use a private bio_set rather than the
shared fs_bio_set, which is only meant for filesystems to use.

raid0_make_request() shouldn't loop around using the bio_set
multiple times as that can deadlock.

So use mddev->bio_set and pass the tail to generic_make_request()
instead of looping on it.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid0.c |   73 ++++++++++++++++++++++++++--------------------------
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 56f70c3ad37c..e777e48f55f6 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -462,52 +462,53 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 {
 	struct strip_zone *zone;
 	struct md_rdev *tmp_dev;
-	struct bio *split;
+	sector_t bio_sector;
+	sector_t sector;
+	unsigned chunk_sects;
+	unsigned sectors;
 
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
 		md_flush_request(mddev, bio);
 		return;
 	}
 
-	do {
-		sector_t bio_sector = bio->bi_iter.bi_sector;
-		sector_t sector = bio_sector;
-		unsigned chunk_sects = mddev->chunk_sectors;
+	bio_sector = bio->bi_iter.bi_sector;
+	sector = bio_sector;
+	chunk_sects = mddev->chunk_sectors;
 
-		unsigned sectors = chunk_sects -
-			(likely(is_power_of_2(chunk_sects))
-			 ? (sector & (chunk_sects-1))
-			 : sector_div(sector, chunk_sects));
+	sectors = chunk_sects -
+		(likely(is_power_of_2(chunk_sects))
+		 ? (sector & (chunk_sects-1))
+		 : sector_div(sector, chunk_sects));
 
-		/* Restore due to sector_div */
-		sector = bio_sector;
+	/* Restore due to sector_div */
+	sector = bio_sector;
 
-		if (sectors < bio_sectors(bio)) {
-			split = bio_split(bio, sectors, GFP_NOIO, fs_bio_set);
-			bio_chain(split, bio);
-		} else {
-			split = bio;
-		}
+	if (sectors < bio_sectors(bio)) {
+		struct bio *split = bio_split(bio, sectors, GFP_NOIO, mddev->bio_set);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+	}
 
-		zone = find_zone(mddev->private, &sector);
-		tmp_dev = map_sector(mddev, zone, sector, &sector);
-		split->bi_bdev = tmp_dev->bdev;
-		split->bi_iter.bi_sector = sector + zone->dev_start +
-			tmp_dev->data_offset;
-
-		if (unlikely((bio_op(split) == REQ_OP_DISCARD) &&
-			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
-			/* Just ignore it */
-			bio_endio(split);
-		} else {
-			if (mddev->gendisk)
-				trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
-						      split, disk_devt(mddev->gendisk),
-						      bio_sector);
-			mddev_check_writesame(mddev, split);
-			generic_make_request(split);
-		}
-	} while (split != bio);
+	zone = find_zone(mddev->private, &sector);
+	tmp_dev = map_sector(mddev, zone, sector, &sector);
+	bio->bi_bdev = tmp_dev->bdev;
+	bio->bi_iter.bi_sector = sector + zone->dev_start +
+		tmp_dev->data_offset;
+
+	if (unlikely((bio_op(bio) == REQ_OP_DISCARD) &&
+		     !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio);
+	} else {
+		if (mddev->gendisk)
+			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+					      bio, disk_devt(mddev->gendisk),
+					      bio_sector);
+		mddev_check_writesame(mddev, bio);
+		generic_make_request(bio);
+	}
 }
 
 static void raid0_status(struct seq_file *seq, struct mddev *mddev)



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly.
  2017-04-05  4:05 ` [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly NeilBrown
@ 2017-04-05 22:15   ` kbuild test robot
  2017-04-06  0:13   ` NeilBrown
  1 sibling, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2017-04-05 22:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: kbuild-all, Shaohua Li, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3188 bytes --]

Hi NeilBrown,

[auto build test WARNING on next-20170330]
[cannot apply to md/master md/for-next block/for-next linus/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Simplify-bio-splitting-and-related-code/20170406-045741
config: x86_64-randconfig-x007-201714 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/md/raid5.c: In function 'raid5_make_request':
>> drivers/md/raid5.c:5554:6: warning: 'split' may be used uninitialized in this function [-Wmaybe-uninitialized]
      if (!bi)
         ^

vim +/split +5554 drivers/md/raid5.c

828cbe98 Shaohua Li           2015-09-02  5538  		/* ret == -EAGAIN, fallback */
3bddb7f8 Song Liu             2016-11-18  5539  		/*
3bddb7f8 Song Liu             2016-11-18  5540  		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
3bddb7f8 Song Liu             2016-11-18  5541  		 * we need to flush journal device
3bddb7f8 Song Liu             2016-11-18  5542  		 */
3bddb7f8 Song Liu             2016-11-18  5543  		do_flush = bi->bi_opf & REQ_PREFLUSH;
828cbe98 Shaohua Li           2015-09-02  5544  	}
e5dcdd80 NeilBrown            2005-09-09  5545  
9ffc8f7c Eric Mei             2015-03-18  5546  	/*
9ffc8f7c Eric Mei             2015-03-18  5547  	 * If array is degraded, better not do chunk aligned read because
9ffc8f7c Eric Mei             2015-03-18  5548  	 * later we might have to read it again in order to reconstruct
9ffc8f7c Eric Mei             2015-03-18  5549  	 * data on failed drives.
9ffc8f7c Eric Mei             2015-03-18  5550  	 */
9ffc8f7c Eric Mei             2015-03-18  5551  	if (rw == READ && mddev->degraded == 0 &&
7ef6b12a Ming Lin             2015-05-06  5552  	    mddev->reshape_position == MaxSector) {
7ef6b12a Ming Lin             2015-05-06  5553  		bi = chunk_aligned_read(mddev, bi);
7ef6b12a Ming Lin             2015-05-06 @5554  		if (!bi)
5a7bbad2 Christoph Hellwig    2011-09-12  5555  			return;
7ef6b12a Ming Lin             2015-05-06  5556  	}
52488615 Raz Ben-Jehuda(caro  2006-12-10  5557) 
796a5cf0 Mike Christie        2016-06-05  5558  	if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
620125f2 Shaohua Li           2012-10-11  5559  		make_discard_request(mddev, bi);
620125f2 Shaohua Li           2012-10-11  5560  		return;
620125f2 Shaohua Li           2012-10-11  5561  	}
620125f2 Shaohua Li           2012-10-11  5562  

:::::: The code at line 5554 was first introduced by commit
:::::: 7ef6b12a1966f273afb750e19e1e8129bea48fec md/raid5: split bio for chunk_aligned_read

:::::: TO: Ming Lin <ming.l@ssi.samsung.com>
:::::: CC: Jens Axboe <axboe@fb.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24343 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly.
  2017-04-05  4:05 ` [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly NeilBrown
  2017-04-05 22:15   ` kbuild test robot
@ 2017-04-06  0:13   ` NeilBrown
  1 sibling, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-06  0:13 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]


Missed a split->raid_bio conversion here

Signed-off-by: NeilBrown <neilb@suse.com>

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c523fd69a7bc..683fbf7f22a8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5247,7 +5247,7 @@ static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio)
 	}
 
 	if (!raid5_read_one_chunk(mddev, raid_bio))
-		return split;
+		return raid_bio;
 
 	return NULL;
 }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [md PATCH 00/10] Simplify bio splitting and related code.
  2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
                   ` (9 preceding siblings ...)
  2017-04-05  4:05 ` [md PATCH 06/10] md/raid10: simplify the splitting of requests NeilBrown
@ 2017-04-11 17:01 ` Shaohua Li
  2017-04-11 23:27   ` NeilBrown
  10 siblings, 1 reply; 17+ messages in thread
From: Shaohua Li @ 2017-04-11 17:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Apr 05, 2017 at 02:05:50PM +1000, Neil Brown wrote:
> This is part of my little project to make bio splitting
> in Linux uniform and dead-lock free, in a way that will mean
> that we can get rid of all the bioset threads.
> 
> The basic approach is that when a bio needs to be split, we call
> bio_split(), bio_chain() and then generic_make_request().
> We then proceed to handle the remainder without further splitting.
> Recent changes to generic_make_request() ensure that this will
> be safe from deadlocks, providing each bioset is used only once
> in the stack.
> 
> This leads to simpler code in various places.  In particular, the
> splitting of bios that is needed to work around known bad blocks
> is now much less complex.  There is only ever one r1bio per bio.
> 
> As you can see from
>  10 files changed, 335 insertions(+), 540 deletions(-)
> there is a net reduction in code.

Looks good and makes code simpler, applied, thanks Neil! The patch 1 and 6 need
comments in the code to explain how deadlock is avoided though. Care to send a
new patch?

Thanks,
Shaohua

> 
> Thanks,
> NeilBrown
> 
> ---
> 
> NeilBrown (10):
>       md/raid1: simplify the splitting of requests.
>       md/raid1: simplify alloc_behind_master_bio()
>       Revert "block: introduce bio_copy_data_partial"
>       md/raid1: simplify handle_read_error().
>       md/raid1: factor out flush_bio_list()
>       md/raid10: simplify the splitting of requests.
>       md/raid10: simplify handle_read_error()
>       md/raid5: make chunk_aligned_read() split bios more cleanly.
>       md/linear: improve bio splitting.
>       md/raid0: fix up bio splitting.
> 
> 
>  block/bio.c         |   60 ++-------
>  drivers/md/linear.c |   75 +++++------
>  drivers/md/raid0.c  |   73 +++++------
>  drivers/md/raid1.c  |  346 ++++++++++++++++++++-------------------------------
>  drivers/md/raid1.h  |    2 
>  drivers/md/raid10.c |  282 ++++++++++++++----------------------------
>  drivers/md/raid10.h |    1 
>  drivers/md/raid5.c  |   33 +++--
>  drivers/md/raid5.h  |    1 
>  include/linux/bio.h |    2 
>  10 files changed, 335 insertions(+), 540 deletions(-)
> 
> --
> Signature
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [md PATCH 00/10] Simplify bio splitting and related code.
  2017-04-11 17:01 ` [md PATCH 00/10] Simplify bio splitting and related code Shaohua Li
@ 2017-04-11 23:27   ` NeilBrown
  2017-04-12  2:51     ` Shaohua Li
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2017-04-11 23:27 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

On Tue, Apr 11 2017, Shaohua Li wrote:

> On Wed, Apr 05, 2017 at 02:05:50PM +1000, Neil Brown wrote:
>> This is part of my little project to make bio splitting
>> in Linux uniform and dead-lock free, in a way that will mean
>> that we can get rid of all the bioset threads.
>> 
>> The basic approach is that when a bio needs to be split, we call
>> bio_split(), bio_chain() and then generic_make_request().
>> We then proceed to handle the remainder without further splitting.
>> Recent changes to generic_make_request() ensure that this will
>> be safe from deadlocks, providing each bioset is used only once
>> in the stack.
>> 
>> This leads to simpler code in various places.  In particular, the
>> splitting of bios that is needed to work around known bad blocks
>> is now much less complex.  There is only ever one r1bio per bio.
>> 
>> As you can see from
>>  10 files changed, 335 insertions(+), 540 deletions(-)
>> there is a net reduction in code.
>
> Looks good and makes code simpler, applied, thanks Neil! The patch 1 and 6 need
> comments in the code to explain how deadlock is avoided though. Care to send a
> new patch?

It isn't clear to me what sort of comment you want, or where it should
go.
It might make sense to have a comment near bio_split() explaining how to
use it (i.e. explaining the pattern used in various patches here), but
I don't see what sort of comments would help in raid1.c or raid10.c
??

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [md PATCH 00/10] Simplify bio splitting and related code.
  2017-04-11 23:27   ` NeilBrown
@ 2017-04-12  2:51     ` Shaohua Li
  2017-04-20  1:37       ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Shaohua Li @ 2017-04-12  2:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Apr 12, 2017 at 09:27:07AM +1000, Neil Brown wrote:
> On Tue, Apr 11 2017, Shaohua Li wrote:
> 
> > On Wed, Apr 05, 2017 at 02:05:50PM +1000, Neil Brown wrote:
> >> This is part of my little project to make bio splitting
> >> in Linux uniform and dead-lock free, in a way that will mean
> >> that we can get rid of all the bioset threads.
> >> 
> >> The basic approach is that when a bio needs to be split, we call
> >> bio_split(), bio_chain() and then generic_make_request().
> >> We then proceed to handle the remainder without further splitting.
> >> Recent changes to generic_make_request() ensure that this will
> >> be safe from deadlocks, providing each bioset is used only once
> >> in the stack.
> >> 
> >> This leads to simpler code in various places.  In particular, the
> >> splitting of bios that is needed to work around known bad blocks
> >> is now much less complex.  There is only ever one r1bio per bio.
> >> 
> >> As you can see from
> >>  10 files changed, 335 insertions(+), 540 deletions(-)
> >> there is a net reduction in code.
> >
> > Looks good and makes code simpler, applied, thanks Neil! The patch 1 and 6 need
> > comments in the code to explain how deadlock is avoided though. Care to send a
> > new patch?
> 
> It isn't clear to me what sort of comment you want, or where it should
> go.
> It might make sense to have a comment near bio_split() explaining how to
> use it (i.e. explaining the pattern used in various patches here), but
> I don't see what sort of comments would help in raid1.c or raid10.c
> ??

Both raid1.c and raid10.c have comments why we need offload the bio to
raid1d/raid10d to avoid deadlock before, we also have comments to explain why
we do bio_split() and then generic_make_request() before. Now these info are
lost, so I hope we can add it back why the new way (bio_split and follow
generic_make_request of next part) can avoid deadlock. That will be very
helpful for others.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [md PATCH 00/10] Simplify bio splitting and related code.
  2017-04-12  2:51     ` Shaohua Li
@ 2017-04-20  1:37       ` NeilBrown
  0 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2017-04-20  1:37 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]

On Tue, Apr 11 2017, Shaohua Li wrote:

> On Wed, Apr 12, 2017 at 09:27:07AM +1000, Neil Brown wrote:
>> On Tue, Apr 11 2017, Shaohua Li wrote:
>> 
>> > On Wed, Apr 05, 2017 at 02:05:50PM +1000, Neil Brown wrote:
>> >> This is part of my little project to make bio splitting
>> >> in Linux uniform and dead-lock free, in a way that will mean
>> >> that we can get rid of all the bioset threads.
>> >> 
>> >> The basic approach is that when a bio needs to be split, we call
>> >> bio_split(), bio_chain() and then generic_make_request().
>> >> We then proceed to handle the remainder without further splitting.
>> >> Recent changes to generic_make_request() ensure that this will
>> >> be safe from deadlocks, providing each bioset is used only once
>> >> in the stack.
>> >> 
>> >> This leads to simpler code in various places.  In particular, the
>> >> splitting of bios that is needed to work around known bad blocks
>> >> is now much less complex.  There is only ever one r1bio per bio.
>> >> 
>> >> As you can see from
>> >>  10 files changed, 335 insertions(+), 540 deletions(-)
>> >> there is a net reduction in code.
>> >
>> > Looks good and makes code simpler, applied, thanks Neil! The patch 1 and 6 need
>> > comments in the code to explain how deadlock is avoided though. Care to send a
>> > new patch?
>> 
>> It isn't clear to me what sort of comment you want, or where it should
>> go.
>> It might make sense to have a comment near bio_split() explaining how to
>> use it (i.e. explaining the pattern used in various patches here), but
>> I don't see what sort of comments would help in raid1.c or raid10.c
>> ??
>
> Both raid1.c and raid10.c have comments why we need offload the bio to
> raid1d/raid10d to avoid deadlock before, we also have comments to explain why
> we do bio_split() and then generic_make_request() before. Now these info are
> lost, so I hope we can add it back why the new way (bio_split and follow
> generic_make_request of next part) can avoid deadlock. That will be very
> helpful for others.

Those comments were needed before because of design flaws in
generic_make_request().  It make it too easy to trigger deadlocks.
With the recent changes to generic_make_request(), deadlocks are much
harder to trigger so there is less need for documentation on how to be
careful.

It would be good to have some clear documentation for bio_split()
describing how it should be used.  Once I have tidied up all the users
of bio_split() and removed the bioset work queues, I plan to add that
documentation.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-20  1:37 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05  4:05 [md PATCH 00/10] Simplify bio splitting and related code NeilBrown
2017-04-05  4:05 ` [md PATCH 03/10] Revert "block: introduce bio_copy_data_partial" NeilBrown
2017-04-05  4:05 ` [md PATCH 02/10] md/raid1: simplify alloc_behind_master_bio() NeilBrown
2017-04-05  4:05 ` [md PATCH 01/10] md/raid1: simplify the splitting of requests NeilBrown
2017-04-05  4:05 ` [md PATCH 04/10] md/raid1: simplify handle_read_error() NeilBrown
2017-04-05  4:05 ` [md PATCH 10/10] md/raid0: fix up bio splitting NeilBrown
2017-04-05  4:05 ` [md PATCH 08/10] md/raid5: make chunk_aligned_read() split bios more cleanly NeilBrown
2017-04-05 22:15   ` kbuild test robot
2017-04-06  0:13   ` NeilBrown
2017-04-05  4:05 ` [md PATCH 07/10] md/raid10: simplify handle_read_error() NeilBrown
2017-04-05  4:05 ` [md PATCH 09/10] md/linear: improve bio splitting NeilBrown
2017-04-05  4:05 ` [md PATCH 05/10] md/raid1: factor out flush_bio_list() NeilBrown
2017-04-05  4:05 ` [md PATCH 06/10] md/raid10: simplify the splitting of requests NeilBrown
2017-04-11 17:01 ` [md PATCH 00/10] Simplify bio splitting and related code Shaohua Li
2017-04-11 23:27   ` NeilBrown
2017-04-12  2:51     ` Shaohua Li
2017-04-20  1:37       ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.