linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Make generic_make_request() handle arbitrary size bios
@ 2014-02-26 23:39 Kent Overstreet
  2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel

Jens, the following patches are for 3.15. This builds off of immutable biovecs,
and this series in turn enables the DIO rewrite I've been working on off and on
for ages and ages, and it also will let us delete merge_bvec_fn piecemeal
(altogether merge_bvec_fn is 1500 lines of code we'll be able to delete).

This will also get us performance gains (that I've measured myself, benchmarking
on a p320h) once we push the work of splitting too-big bios all the way down to
the drivers, and I expect bigger gains for stacking block drivers due to just
processing larger bios.

I've got a patch I'll send separately for the mtip32xx driver to make it take
arbitrary size bios, it's pretty small.

Also a few related small refactoring patches.

This is on top of your for-3.15/core branch.  If you want to pull these patches
from my tree, here's that:

The following changes since commit c46fff2a3b29794b35d717b5680a27f31a6a6bc0:

  smp: Rename __smp_call_function_single() to smp_call_function_single_async() (2014-02-24 14:47:15 -0800)

are available in the git repository at:

  git://evilpiepirate.org/~kent/linux-bcache.git for-jens

for you to fetch changes up to 089f8de5c42a121f351ef9d240d66e1128fa0ea2:

  iov_iter: Kill written arg to iov_iter_init() (2014-02-26 15:17:36 -0800)

----------------------------------------------------------------
Kent Overstreet (9):
      block: Make generic_make_request handle arbitrary sized bios
      block: Gut bio_add_page()
      blk-lib.c: generic_make_request() handles large bios now
      bcache: generic_make_request() handles large bios now
      btrfs: generic_make_request() handles arbitrary size bios now
      btrfs: Convert to bio_for_each_segment()
      iov_iter: Move iov_iter to uio.h
      iov_iter: Kill iov_iter_single_seg_count()
      iov_iter: Kill written arg to iov_iter_init()

 block/blk-core.c                            |  19 ++-
 block/blk-lib.c                             | 175 +++++-----------------------
 block/blk-merge.c                           | 150 ++++++++++++++++++++++--
 block/blk-mq.c                              |   2 +
 drivers/block/drbd/drbd_req.c               |   2 +
 drivers/block/mtip32xx/mtip32xx.c           |   6 +-
 drivers/block/nvme-core.c                   |   2 +
 drivers/block/pktcdvd.c                     |   6 +-
 drivers/block/ps3vram.c                     |   2 +
 drivers/block/rsxx/dev.c                    |   2 +
 drivers/block/umem.c                        |   2 +
 drivers/block/zram/zram_drv.c               |   2 +
 drivers/md/bcache/bcache.h                  |  18 ---
 drivers/md/bcache/io.c                      | 100 +---------------
 drivers/md/bcache/journal.c                 |   4 +-
 drivers/md/bcache/request.c                 |  16 +--
 drivers/md/bcache/super.c                   |  32 +----
 drivers/md/bcache/util.h                    |   5 +-
 drivers/md/bcache/writeback.c               |   4 +-
 drivers/md/dm.c                             |   2 +
 drivers/md/md.c                             |   2 +
 drivers/s390/block/dcssblk.c                |   2 +
 drivers/s390/block/xpram.c                  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 fs/bio.c                                    | 137 +++++++++-------------
 fs/btrfs/extent_io.c                        |  12 +-
 fs/btrfs/file-item.c                        |  59 ++++------
 fs/btrfs/file.c                             |   8 +-
 fs/btrfs/inode.c                            |  22 ++--
 fs/btrfs/volumes.c                          |  73 ------------
 fs/ceph/file.c                              |   7 +-
 fs/cifs/file.c                              |   4 +-
 fs/fuse/file.c                              |  17 +--
 include/linux/blkdev.h                      |   3 +
 include/linux/fs.h                          |  32 -----
 include/linux/uio.h                         |  46 ++++++++
 mm/filemap.c                                |  19 +--
 37 files changed, 399 insertions(+), 599 deletions(-)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-27 17:22   ` Matthew Wilcox
  2014-03-02 20:31   ` Muthu Kumar
  2014-02-26 23:39 ` [PATCH 2/9] block: Gut bio_add_page() Kent Overstreet
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Neil Brown,
	Alasdair Kergon, dm-devel, Lars Ellenberg, drbd-user,
	Asai Thambi S P, Sam Bradshaw, Matthew Wilcox, linux-nvme,
	Jiri Kosina, Geoff Levand, Jim Paris, Joshua Morris,
	Philip Kelleher, Minchan Kim, Nitin Gupta, Martin Schwidefsky,
	Heiko Carstens, Peng Tao

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to blk_queue_bounce();
this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be
concerned with bouncing affecting segment merging.

Some make_request_fns were simple enough to audit and verify they don't
need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left for future
patches.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-nvme@lists.infradead.org
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peng Tao <bergwolf@gmail.com>
---
 block/blk-core.c                            |  19 ++--
 block/blk-merge.c                           | 150 ++++++++++++++++++++++++++--
 block/blk-mq.c                              |   2 +
 drivers/block/drbd/drbd_req.c               |   2 +
 drivers/block/mtip32xx/mtip32xx.c           |   6 +-
 drivers/block/nvme-core.c                   |   2 +
 drivers/block/pktcdvd.c                     |   6 +-
 drivers/block/ps3vram.c                     |   2 +
 drivers/block/rsxx/dev.c                    |   2 +
 drivers/block/umem.c                        |   2 +
 drivers/block/zram/zram_drv.c               |   2 +
 drivers/md/dm.c                             |   2 +
 drivers/md/md.c                             |   2 +
 drivers/s390/block/dcssblk.c                |   2 +
 drivers/s390/block/xpram.c                  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h                      |   3 +
 17 files changed, 185 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 853f927492..d3b0782ec3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -581,6 +581,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_c;
 
+	q->bio_split = bioset_create(4, 0);
+	if (!q->bio_split)
+		goto fail_id;
+
 	q->backing_dev_info.ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
 	q->backing_dev_info.state = 0;
@@ -590,7 +594,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 
 	err = bdi_init(&q->backing_dev_info);
 	if (err)
-		goto fail_id;
+		goto fail_split;
 
 	setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
@@ -635,6 +639,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 
 fail_bdi:
 	bdi_destroy(&q->backing_dev_info);
+fail_split:
+	bioset_free(q->bio_split);
 fail_id:
 	ida_simple_remove(&blk_queue_ida, q->id);
 fail_c:
@@ -1501,6 +1507,8 @@ void blk_queue_bio(struct request_queue *q, struct bio *bio)
 	struct request *req;
 	unsigned int request_count = 0;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	/*
 	 * low level driver can indicate that it wants pages above a
 	 * certain limit bounced to low memory (ie for highmem, or even
@@ -1723,15 +1731,6 @@ generic_make_request_checks(struct bio *bio)
 		goto end_io;
 	}
 
-	if (likely(bio_is_rw(bio) &&
-		   nr_sectors > queue_max_hw_sectors(q))) {
-		printk(KERN_ERR "bio too big device %s (%u > %u)\n",
-		       bdevname(bio->bi_bdev, b),
-		       bio_sectors(bio),
-		       queue_max_hw_sectors(q));
-		goto end_io;
-	}
-
 	part = bio->bi_bdev->bd_part;
 	if (should_fail_request(part, bio->bi_iter.bi_size) ||
 	    should_fail_request(&part_to_disk(part)->part0,
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6c583f9c5b..0afbe3f1c2 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,11 +9,149 @@
 
 #include "blk.h"
 
+static struct bio *blk_bio_discard_split(struct request_queue *q,
+					 struct bio *bio,
+					 struct bio_set *bs)
+{
+	unsigned int max_discard_sectors, granularity;
+	int alignment;
+	sector_t tmp;
+	unsigned split_sectors;
+
+	/* Zero-sector (unknown) and one-sector granularities are the same.  */
+	granularity = max(q->limits.discard_granularity >> 9, 1U);
+
+	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+	max_discard_sectors -= max_discard_sectors % granularity;
+
+	if (unlikely(!max_discard_sectors)) {
+		/* XXX: warn */
+		return NULL;
+	}
+
+	if (bio_sectors(bio) <= max_discard_sectors)
+		return NULL;
+
+	split_sectors = max_discard_sectors;
+
+	/*
+	 * If the next starting sector would be misaligned, stop the discard at
+	 * the previous aligned sector.
+	 */
+	alignment = (q->limits.discard_alignment >> 9) % granularity;
+
+	tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
+	tmp = sector_div(tmp, granularity);
+
+	if (split_sectors > tmp)
+		split_sectors -= tmp;
+
+	return bio_split(bio, split_sectors, GFP_NOIO, bs);
+}
+
+static struct bio *blk_bio_write_same_split(struct request_queue *q,
+					    struct bio *bio,
+					    struct bio_set *bs)
+{
+	if (!q->limits.max_write_same_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
+		return NULL;
+
+	return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
+}
+
+static struct bio *blk_bio_segment_split(struct request_queue *q,
+					 struct bio *bio,
+					 struct bio_set *bs)
+{
+	struct bio *split;
+	struct bio_vec bv, bvprv;
+	struct bvec_iter iter;
+	unsigned seg_size = 0, nsegs = 0;
+	int prev = 0;
+
+	struct bvec_merge_data bvm = {
+		.bi_bdev	= bio->bi_bdev,
+		.bi_sector	= bio->bi_iter.bi_sector,
+		.bi_size	= 0,
+		.bi_rw		= bio->bi_rw,
+	};
+
+	bio_for_each_segment(bv, bio, iter) {
+		if (q->merge_bvec_fn &&
+		    q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
+			goto split;
+
+		bvm.bi_size += bv.bv_len;
+
+		if (bvm.bi_size >> 9 > queue_max_sectors(q))
+			goto split;
+
+		if (prev && blk_queue_cluster(q)) {
+			if (seg_size + bv.bv_len > queue_max_segment_size(q))
+				goto new_segment;
+			if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
+				goto new_segment;
+			if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
+				goto new_segment;
+
+			seg_size += bv.bv_len;
+			bvprv = bv;
+			prev = 1;
+			continue;
+		}
+new_segment:
+		if (nsegs == queue_max_segments(q))
+			goto split;
+
+		nsegs++;
+		bvprv = bv;
+		prev = 1;
+		seg_size = bv.bv_len;
+	}
+
+	return NULL;
+split:
+	split = bio_clone_bioset(bio, GFP_NOIO, bs);
+
+	split->bi_iter.bi_size -= iter.bi_size;
+	bio->bi_iter = iter;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_advance(bio, split->bi_iter.bi_size);
+		bio_integrity_trim(split, 0, bio_sectors(split));
+	}
+
+	return split;
+}
+
+void blk_queue_split(struct request_queue *q, struct bio **bio,
+		     struct bio_set *bs)
+{
+	struct bio *split;
+
+	if ((*bio)->bi_rw & REQ_DISCARD)
+		split = blk_bio_discard_split(q, *bio, bs);
+	else if ((*bio)->bi_rw & REQ_WRITE_SAME)
+		split = blk_bio_write_same_split(q, *bio, bs);
+	else
+		split = blk_bio_segment_split(q, *bio, q->bio_split);
+
+	if (split) {
+		bio_chain(split, *bio);
+		generic_make_request(*bio);
+		*bio = split;
+	}
+}
+EXPORT_SYMBOL(blk_queue_split);
+
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					     struct bio *bio)
 {
 	struct bio_vec bv, bvprv = { NULL };
-	int cluster, high, highprv = 1;
+	int cluster, prev = 0;
 	unsigned int seg_size, nr_phys_segs;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
@@ -37,13 +175,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
 		bio_for_each_segment(bv, bio, iter) {
-			/*
-			 * the trick here is making sure that a high page is
-			 * never considered part of another segment, since that
-			 * might change with the bounce page.
-			 */
-			high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
-			if (!high && !highprv && cluster) {
+			if (prev && cluster) {
 				if (seg_size + bv.bv_len
 				    > queue_max_segment_size(q))
 					goto new_segment;
@@ -63,8 +195,8 @@ new_segment:
 
 			nr_phys_segs++;
 			bvprv = bv;
+			prev = 1;
 			seg_size = bv.bv_len;
-			highprv = high;
 		}
 		bbio = bio;
 	}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6468a715a0..7893e254d8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -915,6 +915,8 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return;
 	}
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	if (use_plug && blk_attempt_plug_merge(q, bio, &request_count))
 		return;
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 104a040f24..941a69c50c 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1275,6 +1275,8 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 	struct drbd_conf *mdev = (struct drbd_conf *) q->queuedata;
 	unsigned long start_time;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	start_time = jiffies;
 
 	/*
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 516026954b..df733ca685 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4033,6 +4033,10 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
 	int nents = 0;
 	int tag = 0, unaligned = 0;
 
+	blk_queue_bounce(queue, &bio);
+
+	blk_queue_split(queue, &bio, queue->bio_split);
+
 	if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) {
 		if (unlikely(test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
 							&dd->dd_flag))) {
@@ -4082,8 +4086,6 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
 
 	sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
 	if (likely(sg != NULL)) {
-		blk_queue_bounce(queue, &bio);
-
 		if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
 			dev_warn(&dd->pdev->dev,
 				"Maximum number of SGL entries exceeded\n");
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 51824d1f23..e4376b9613 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
 	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
 	int result = -EBUSY;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	if (!nvmeq) {
 		put_nvmeq(NULL);
 		bio_endio(bio, -EIO);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index a2af73db18..a37acf722b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2444,6 +2444,10 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 	char b[BDEVNAME_SIZE];
 	struct bio *split;
 
+	blk_queue_bounce(q, &bio);
+
+	blk_queue_split(q, &bio, q->bio_split);
+
 	pd = q->queuedata;
 	if (!pd) {
 		pr_err("%s incorrect request queue\n",
@@ -2474,8 +2478,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 		goto end_io;
 	}
 
-	blk_queue_bounce(q, &bio);
-
 	do {
 		sector_t zone = get_zone(bio->bi_iter.bi_sector, pd);
 		sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd);
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index ef45cfb98f..a995972961 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -603,6 +603,8 @@ static void ps3vram_make_request(struct request_queue *q, struct bio *bio)
 	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
 	int busy;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	dev_dbg(&dev->core, "%s\n", __func__);
 
 	spin_lock_irq(&priv->lock);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 2839d37e5a..ff074a3cd4 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -169,6 +169,8 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
 	struct rsxx_bio_meta *bio_meta;
 	int st = -EINVAL;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	might_sleep();
 
 	if (!card)
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index 4cf81b5bf0..13d577cfbc 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -531,6 +531,8 @@ static void mm_make_request(struct request_queue *q, struct bio *bio)
 		 (unsigned long long)bio->bi_iter.bi_sector,
 		 bio->bi_iter.bi_size);
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	spin_lock_irq(&card->lock);
 	*card->biotail = bio;
 	bio->bi_next = NULL;
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 011e55d820..ecf9daa01c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -733,6 +733,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
+	blk_queue_split(queue, &bio, queue->bio_split);
+
 	down_read(&zram->init_lock);
 	if (unlikely(!zram->init_done))
 		goto error;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8c53b09b9a..97f70420f2 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1500,6 +1500,8 @@ static void dm_request(struct request_queue *q, struct bio *bio)
 {
 	struct mapped_device *md = q->queuedata;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	if (dm_request_based(md))
 		blk_queue_bio(q, bio);
 	else
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4ad5cc4e63..1421bc3f7b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -256,6 +256,8 @@ static void md_make_request(struct request_queue *q, struct bio *bio)
 	int cpu;
 	unsigned int sectors;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	if (mddev == NULL || mddev->pers == NULL
 	    || !mddev->ready) {
 		bio_io_error(bio);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index ebf41e228e..db33cd3e4c 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -815,6 +815,8 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 	unsigned long source_addr;
 	unsigned long bytes_done;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	bytes_done = 0;
 	dev_info = bio->bi_bdev->bd_disk->private_data;
 	if (dev_info == NULL)
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index 6969d39f1e..f03c103f13 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -190,6 +190,8 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
 	unsigned long page_addr;
 	unsigned long bytes;
 
+	blk_queue_split(q, &bio, q->bio_split);
+
 	if ((bio->bi_iter.bi_sector & 7) != 0 ||
 	    (bio->bi_iter.bi_size & 4095) != 0)
 		/* Request is not page-aligned. */
diff --git a/drivers/staging/lustre/lustre/llite/lloop.c b/drivers/staging/lustre/lustre/llite/lloop.c
index 0718905ade..a3f6dc930b 100644
--- a/drivers/staging/lustre/lustre/llite/lloop.c
+++ b/drivers/staging/lustre/lustre/llite/lloop.c
@@ -344,6 +344,8 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
 	int rw = bio_rw(old_bio);
 	int inactive;
 
+	blk_queue_split(q, &old_bio, q->bio_split);
+
 	if (!lo)
 		goto err;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1e1fa3f93d..99e9955c4d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -470,6 +470,7 @@ struct request_queue {
 	wait_queue_head_t	mq_freeze_wq;
 	struct percpu_counter	mq_usage_counter;
 	struct list_head	all_q_node;
+	struct bio_set		*bio_split;
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
@@ -781,6 +782,8 @@ extern void blk_rq_unprep_clone(struct request *rq);
 extern int blk_insert_cloned_request(struct request_queue *q,
 				     struct request *rq);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
+extern void blk_queue_split(struct request_queue *, struct bio **,
+			    struct bio_set *);
 extern void blk_recount_segments(struct request_queue *, struct bio *);
 extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
 extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] block: Gut bio_add_page()
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
  2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 3/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel, Kent Overstreet

Since generic_make_request() can now handle arbitrary size bios, all we
have to do is make sure the bvec array doesn't overflow.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/bio.c | 137 ++++++++++++++++++++++++++-------------------------------------
 1 file changed, 57 insertions(+), 80 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index b2dd42ed9e..8985cc784d 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -694,9 +694,23 @@ int bio_get_nr_vecs(struct block_device *bdev)
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 
-static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
-			  *page, unsigned int len, unsigned int offset,
-			  unsigned int max_sectors)
+/**
+ *	bio_add_pc_page	-	attempt to add page to bio
+ *	@q: the target queue
+ *	@bio: destination bio
+ *	@page: page to add
+ *	@len: vec entry length
+ *	@offset: vec entry offset
+ *
+ *	Attempt to add a page to the bio_vec maplist. This can fail for a
+ *	number of reasons, such as the bio being full or target block device
+ *	limitations. The target block device must allow bio's up to PAGE_SIZE,
+ *	so it is always possible to add a single page to an empty bio.
+ *
+ *	This should only be used by REQ_PC bios.
+ */
+int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
+		    *page, unsigned int len, unsigned int offset)
 {
 	int retried_segments = 0;
 	struct bio_vec *bvec;
@@ -707,7 +721,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	if (unlikely(bio_flagged(bio, BIO_CLONED)))
 		return 0;
 
-	if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
+	if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q))
 		return 0;
 
 	/*
@@ -720,28 +734,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 
 		if (page == prev->bv_page &&
 		    offset == prev->bv_offset + prev->bv_len) {
-			unsigned int prev_bv_len = prev->bv_len;
 			prev->bv_len += len;
-
-			if (q->merge_bvec_fn) {
-				struct bvec_merge_data bvm = {
-					/* prev_bvec is already charged in
-					   bi_size, discharge it in order to
-					   simulate merging updated prev_bvec
-					   as new bvec. */
-					.bi_bdev = bio->bi_bdev,
-					.bi_sector = bio->bi_iter.bi_sector,
-					.bi_size = bio->bi_iter.bi_size -
-						prev_bv_len,
-					.bi_rw = bio->bi_rw,
-				};
-
-				if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len) {
-					prev->bv_len -= len;
-					return 0;
-				}
-			}
-
 			goto done;
 		}
 	}
@@ -772,31 +765,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	bvec->bv_len = len;
 	bvec->bv_offset = offset;
 
-	/*
-	 * if queue has other restrictions (eg varying max sector size
-	 * depending on offset), it can specify a merge_bvec_fn in the
-	 * queue to get further control
-	 */
-	if (q->merge_bvec_fn) {
-		struct bvec_merge_data bvm = {
-			.bi_bdev = bio->bi_bdev,
-			.bi_sector = bio->bi_iter.bi_sector,
-			.bi_size = bio->bi_iter.bi_size,
-			.bi_rw = bio->bi_rw,
-		};
-
-		/*
-		 * merge_bvec_fn() returns number of bytes it can accept
-		 * at this offset
-		 */
-		if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
-			bvec->bv_page = NULL;
-			bvec->bv_len = 0;
-			bvec->bv_offset = 0;
-			return 0;
-		}
-	}
-
 	/* If we may be able to merge these biovecs, force a recount */
 	if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
 		bio->bi_flags &= ~(1 << BIO_SEG_VALID);
@@ -807,28 +775,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	bio->bi_iter.bi_size += len;
 	return len;
 }
-
-/**
- *	bio_add_pc_page	-	attempt to add page to bio
- *	@q: the target queue
- *	@bio: destination bio
- *	@page: page to add
- *	@len: vec entry length
- *	@offset: vec entry offset
- *
- *	Attempt to add a page to the bio_vec maplist. This can fail for a
- *	number of reasons, such as the bio being full or target block device
- *	limitations. The target block device must allow bio's up to PAGE_SIZE,
- *	so it is always possible to add a single page to an empty bio.
- *
- *	This should only be used by REQ_PC bios.
- */
-int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page,
-		    unsigned int len, unsigned int offset)
-{
-	return __bio_add_page(q, bio, page, len, offset,
-			      queue_max_hw_sectors(q));
-}
 EXPORT_SYMBOL(bio_add_pc_page);
 
 /**
@@ -838,16 +784,47 @@ EXPORT_SYMBOL(bio_add_pc_page);
  *	@len: vec entry length
  *	@offset: vec entry offset
  *
- *	Attempt to add a page to the bio_vec maplist. This can fail for a
- *	number of reasons, such as the bio being full or target block device
- *	limitations. The target block device must allow bio's up to PAGE_SIZE,
- *	so it is always possible to add a single page to an empty bio.
+ *	Attempt to add a page to the bio_vec maplist. This will only fail if
+ *	bio->bi_vcnt == bio->bi_max_vecs.
  */
-int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
-		 unsigned int offset)
+int bio_add_page(struct bio *bio, struct page *page,
+		 unsigned int len, unsigned int offset)
 {
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	return __bio_add_page(q, bio, page, len, offset, queue_max_sectors(q));
+	struct bio_vec *bv;
+
+	/*
+	 * cloned bio must not modify vec list
+	 */
+	if (unlikely(bio_flagged(bio, BIO_CLONED)))
+		return 0;
+
+	/*
+	 * For filesystems with a blocksize smaller than the pagesize
+	 * we will often be called with the same page as last time and
+	 * a consecutive offset.  Optimize this special case.
+	 */
+	if (bio->bi_vcnt > 0) {
+		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+
+		if (page == bv->bv_page &&
+		    offset == bv->bv_offset + bv->bv_len) {
+			bv->bv_len += len;
+			goto done;
+		}
+	}
+
+	if (bio->bi_vcnt >= bio->bi_max_vecs)
+		return 0;
+
+	bv		= &bio->bi_io_vec[bio->bi_vcnt];
+	bv->bv_page	= page;
+	bv->bv_len	= len;
+	bv->bv_offset	= offset;
+
+	bio->bi_vcnt++;
+done:
+	bio->bi_iter.bi_size += len;
+	return len;
 }
 EXPORT_SYMBOL(bio_add_page);
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] blk-lib.c: generic_make_request() handles large bios now
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
  2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
  2014-02-26 23:39 ` [PATCH 2/9] block: Gut bio_add_page() Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 4/9] bcache: " Kent Overstreet
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel, Kent Overstreet

generic_make_request() will now do for us what the code in blk-lib.c was
doing manually, with the bio_batch stuff - we still need some looping in
case we're trying to discard/zeroout more than around a gigabyte, but
when we can submit that much at a time doing the submissions in parallel
really shouldn't matter.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-lib.c | 175 ++++++++++----------------------------------------------
 1 file changed, 30 insertions(+), 145 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 97a733cf3d..117467c2ff 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -9,23 +9,6 @@
 
 #include "blk.h"
 
-struct bio_batch {
-	atomic_t		done;
-	unsigned long		flags;
-	struct completion	*wait;
-};
-
-static void bio_batch_end_io(struct bio *bio, int err)
-{
-	struct bio_batch *bb = bio->bi_private;
-
-	if (err && (err != -EOPNOTSUPP))
-		clear_bit(BIO_UPTODATE, &bb->flags);
-	if (atomic_dec_and_test(&bb->done))
-		complete(bb->wait);
-	bio_put(bio);
-}
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:	blockdev to issue discard for
@@ -40,15 +23,10 @@ static void bio_batch_end_io(struct bio *bio, int err)
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	int type = REQ_WRITE | REQ_DISCARD;
-	unsigned int max_discard_sectors, granularity;
-	int alignment;
-	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
-	struct blk_plug plug;
 
 	if (!q)
 		return -ENXIO;
@@ -56,69 +34,27 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
-	/* Zero-sector (unknown) and one-sector granularities are the same.  */
-	granularity = max(q->limits.discard_granularity >> 9, 1U);
-	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-	/*
-	 * Ensure that max_discard_sectors is of the proper
-	 * granularity, so that requests stay aligned after a split.
-	 */
-	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors)) {
-		/* Avoid infinite loop below. Being cautious never hurts. */
-		return -EOPNOTSUPP;
-	}
-
 	if (flags & BLKDEV_DISCARD_SECURE) {
 		if (!blk_queue_secdiscard(q))
 			return -EOPNOTSUPP;
 		type |= REQ_SECURE;
 	}
 
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
-
-	blk_start_plug(&plug);
 	while (nr_sects) {
-		unsigned int req_sects;
-		sector_t end_sect, tmp;
-
 		bio = bio_alloc(gfp_mask, 1);
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
-
-		req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
+		if (!bio)
+			return -ENOMEM;
 
-		/*
-		 * If splitting a request, and the next starting sector would be
-		 * misaligned, stop the discard at the previous aligned sector.
-		 */
-		end_sect = sector + req_sects;
-		tmp = end_sect;
-		if (req_sects < nr_sects &&
-		    sector_div(tmp, granularity) != alignment) {
-			end_sect = end_sect - alignment;
-			sector_div(end_sect, granularity);
-			end_sect = end_sect * granularity + alignment;
-			req_sects = end_sect - sector;
-		}
-
-		bio->bi_iter.bi_sector = sector;
-		bio->bi_end_io = bio_batch_end_io;
 		bio->bi_bdev = bdev;
-		bio->bi_private = &bb;
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;
 
-		bio->bi_iter.bi_size = req_sects << 9;
-		nr_sects -= req_sects;
-		sector = end_sect;
+		sector += bio_sectors(bio);
+		nr_sects -= bio_sectors(bio);
 
-		atomic_inc(&bb.done);
-		submit_bio(type, bio);
+		ret = submit_bio_wait(type, bio);
+		if (ret)
+			break;
 
 		/*
 		 * We can loop for a long time in here, if someone does
@@ -128,14 +64,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		 */
 		cond_resched();
 	}
-	blk_finish_plug(&plug);
-
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
-
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -EIO;
 
 	return ret;
 }
@@ -156,61 +84,37 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 			    sector_t nr_sects, gfp_t gfp_mask,
 			    struct page *page)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
-	unsigned int max_write_same_sectors;
-	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
 
 	if (!q)
 		return -ENXIO;
 
-	max_write_same_sectors = q->limits.max_write_same_sectors;
-
-	if (max_write_same_sectors == 0)
+	if (!q->limits.max_write_same_sectors)
 		return -EOPNOTSUPP;
 
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
-
 	while (nr_sects) {
 		bio = bio_alloc(gfp_mask, 1);
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
+		if (!bio)
+			return -ENOMEM;
 
-		bio->bi_iter.bi_sector = sector;
-		bio->bi_end_io = bio_batch_end_io;
 		bio->bi_bdev = bdev;
-		bio->bi_private = &bb;
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;
 		bio->bi_vcnt = 1;
 		bio->bi_io_vec->bv_page = page;
 		bio->bi_io_vec->bv_offset = 0;
 		bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
 
-		if (nr_sects > max_write_same_sectors) {
-			bio->bi_iter.bi_size = max_write_same_sectors << 9;
-			nr_sects -= max_write_same_sectors;
-			sector += max_write_same_sectors;
-		} else {
-			bio->bi_iter.bi_size = nr_sects << 9;
-			nr_sects = 0;
-		}
+		sector += bio_sectors(bio);
+		nr_sects -= bio_sectors(bio);
 
-		atomic_inc(&bb.done);
-		submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
+		ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
+		if (ret)
+			break;
 	}
 
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
-
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -ENOTSUPP;
-
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
@@ -225,33 +129,22 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
  * Description:
  *  Generate and issue number of bios with zerofiled pages.
  */
-
 int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
-			sector_t nr_sects, gfp_t gfp_mask)
+			   sector_t nr_sects, gfp_t gfp_mask)
 {
-	int ret;
+	int ret = 0;
 	struct bio *bio;
-	struct bio_batch bb;
 	unsigned int sz;
-	DECLARE_COMPLETION_ONSTACK(wait);
-
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
 
-	ret = 0;
-	while (nr_sects != 0) {
+	while (nr_sects) {
 		bio = bio_alloc(gfp_mask,
-				min(nr_sects, (sector_t)BIO_MAX_PAGES));
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
+				min(nr_sects / (PAGE_SIZE >> 9),
+				    (sector_t)BIO_MAX_PAGES));
+		if (!bio)
+			return -ENOMEM;
 
 		bio->bi_iter.bi_sector = sector;
 		bio->bi_bdev   = bdev;
-		bio->bi_end_io = bio_batch_end_io;
-		bio->bi_private = &bb;
 
 		while (nr_sects != 0) {
 			sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
@@ -261,18 +154,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			if (ret < (sz << 9))
 				break;
 		}
-		ret = 0;
-		atomic_inc(&bb.done);
-		submit_bio(WRITE, bio);
-	}
-
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		/* One of bios in the batch was completed with error.*/
-		ret = -EIO;
+		ret = submit_bio_wait(WRITE, bio);
+		if (ret)
+			break;
+	}
 
 	return ret;
 }
@@ -287,7 +173,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
  * Description:
  *  Generate and issue number of bios with zerofiled pages.
  */
-
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			 sector_t nr_sects, gfp_t gfp_mask)
 {
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] bcache: generic_make_request() handles large bios now
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (2 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 3/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 5/9] btrfs: generic_make_request() handles arbitrary size " Kent Overstreet
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel, Kent Overstreet

So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
---
 drivers/md/bcache/bcache.h    |  18 --------
 drivers/md/bcache/io.c        | 100 +-----------------------------------------
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++----
 drivers/md/bcache/super.c     |  32 +-------------
 drivers/md/bcache/util.h      |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 7 files changed, 18 insertions(+), 161 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index a4c7306ff4..d669947650 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -245,19 +245,6 @@ struct keybuf {
 	DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-	struct bio_set		*bio_split;
-	mempool_t		*bio_split_hook;
-};
-
-struct bio_split_hook {
-	struct closure		cl;
-	struct bio_split_pool	*p;
-	struct bio		*bio;
-	bio_end_io_t		*bi_end_io;
-	void			*bi_private;
-};
-
 struct bcache_device {
 	struct closure		cl;
 
@@ -290,8 +277,6 @@ struct bcache_device {
 	int (*cache_miss)(struct btree *, struct search *,
 			  struct bio *, unsigned);
 	int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-	struct bio_split_pool	bio_split_hook;
 };
 
 struct io {
@@ -467,8 +452,6 @@ struct cache {
 	atomic_long_t		meta_sectors_written;
 	atomic_long_t		btree_sectors_written;
 	atomic_long_t		sectors_written;
-
-	struct bio_split_pool	bio_split_hook;
 };
 
 struct gc_stat {
@@ -893,7 +876,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa82d..86a0bb8712 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include <linux/blkdev.h>
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	struct bio_vec bv;
-	struct bvec_iter iter;
-	unsigned ret = 0, seg = 0;
-
-	if (bio->bi_rw & REQ_DISCARD)
-		return min(bio_sectors(bio), q->limits.max_discard_sectors);
-
-	bio_for_each_segment(bv, bio, iter) {
-		struct bvec_merge_data bvm = {
-			.bi_bdev	= bio->bi_bdev,
-			.bi_sector	= bio->bi_iter.bi_sector,
-			.bi_size	= ret << 9,
-			.bi_rw		= bio->bi_rw,
-		};
-
-		if (seg == min_t(unsigned, BIO_MAX_PAGES,
-				 queue_max_segments(q)))
-			break;
-
-		if (q->merge_bvec_fn &&
-		    q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
-			break;
-
-		seg++;
-		ret += bv.bv_len >> 9;
-	}
-
-	ret = min(ret, queue_max_sectors(q));
-
-	WARN_ON(!ret);
-	ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);
-
-	return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-	s->bio->bi_end_io = s->bi_end_io;
-	s->bio->bi_private = s->bi_private;
-	bio_endio_nodec(s->bio, 0);
-
-	closure_debug_destroy(&s->cl);
-	mempool_free(s, s->p->bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-	struct closure *cl = bio->bi_private;
-	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-	if (error)
-		clear_bit(BIO_UPTODATE, &s->bio->bi_flags);
-
-	bio_put(bio);
-	closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-	struct bio_split_hook *s;
-	struct bio *n;
-
-	if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
-		goto submit;
-
-	if (bio_sectors(bio) <= bch_bio_max_sectors(bio))
-		goto submit;
-
-	s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
-	closure_init(&s->cl, NULL);
-
-	s->bio		= bio;
-	s->p		= p;
-	s->bi_end_io	= bio->bi_end_io;
-	s->bi_private	= bio->bi_private;
-	bio_get(bio);
-
-	do {
-		n = bio_next_split(bio, bch_bio_max_sectors(bio),
-				   GFP_NOIO, s->p->bio_split);
-
-		n->bi_end_io	= bch_bio_submit_split_endio;
-		n->bi_private	= &s->cl;
-
-		closure_get(&s->cl);
-		generic_make_request(n);
-	} while (n != bio);
-
-	continue_at(&s->cl, bch_bio_submit_split_done, NULL);
-submit:
-	generic_make_request(bio);
-}
-
 /* Bios with headers */
 
 void bch_bbio_free(struct bio *bio, struct cache_set *c)
@@ -138,7 +40,7 @@ void __bch_submit_bbio(struct bio *bio, struct cache_set *c)
 	bio->bi_bdev		= PTR_CACHE(c, &b->key, 0)->bdev;
 
 	b->submit_time_us = local_clock_us();
-	closure_bio_submit(bio, bio->bi_private, PTR_CACHE(c, &b->key, 0));
+	closure_bio_submit(bio, bio->bi_private);
 }
 
 void bch_submit_bbio(struct bio *bio, struct cache_set *c,
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 18039affc3..3246497c1a 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -60,7 +60,7 @@ reread:		left = ca->sb.bucket_size - offset;
 		bio->bi_private = &cl;
 		bch_bio_map(bio, data);
 
-		closure_bio_submit(bio, &cl, ca);
+		closure_bio_submit(bio, &cl);
 		closure_sync(&cl);
 
 		/* This function could be simpler now since we no longer write
@@ -642,7 +642,7 @@ static void journal_write_unlocked(struct closure *cl)
 	spin_unlock(&c->journal.lock);
 
 	while ((bio = bio_list_pop(&list)))
-		closure_bio_submit(bio, cl, c->cache[0]);
+		closure_bio_submit(bio, cl);
 
 	continue_at(cl, journal_write_done, NULL);
 }
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 5d5d031cf3..29f77f9da2 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -873,7 +873,7 @@ static void cached_dev_read_error(struct closure *cl)
 
 		/* XXX: invalidate cache */
 
-		closure_bio_submit(bio, cl, s->d);
+		closure_bio_submit(bio, cl);
 	}
 
 	continue_at(cl, cached_dev_cache_miss_done, NULL);
@@ -996,7 +996,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
 	s->cache_miss	= miss;
 	s->iop.bio	= cache_bio;
 	bio_get(cache_bio);
-	closure_bio_submit(cache_bio, &s->cl, s->d);
+	closure_bio_submit(cache_bio, &s->cl);
 
 	return ret;
 out_put:
@@ -1004,7 +1004,7 @@ out_put:
 out_submit:
 	miss->bi_end_io		= request_endio;
 	miss->bi_private	= &s->cl;
-	closure_bio_submit(miss, &s->cl, s->d);
+	closure_bio_submit(miss, &s->cl);
 	return ret;
 }
 
@@ -1069,7 +1069,7 @@ static void cached_dev_write(struct cached_dev *dc, struct search *s)
 
 		if (!(bio->bi_rw & REQ_DISCARD) ||
 		    blk_queue_discard(bdev_get_queue(dc->bdev)))
-			closure_bio_submit(bio, cl, s->d);
+			closure_bio_submit(bio, cl);
 	} else if (s->iop.writeback) {
 		bch_writeback_add(dc);
 		s->iop.bio = bio;
@@ -1084,12 +1084,12 @@ static void cached_dev_write(struct cached_dev *dc, struct search *s)
 			flush->bi_end_io = request_endio;
 			flush->bi_private = cl;
 
-			closure_bio_submit(flush, cl, s->d);
+			closure_bio_submit(flush, cl);
 		}
 	} else {
 		s->iop.bio = bio_clone_fast(bio, GFP_NOIO, dc->disk.bio_split);
 
-		closure_bio_submit(bio, cl, s->d);
+		closure_bio_submit(bio, cl);
 	}
 
 	closure_call(&s->iop.cl, bch_data_insert, NULL, cl);
@@ -1105,7 +1105,7 @@ static void cached_dev_nodata(struct closure *cl)
 		bch_journal_meta(s->iop.c, cl);
 
 	/* If it's a flush, we send the flush to the backing device too */
-	closure_bio_submit(bio, cl, s->d);
+	closure_bio_submit(bio, cl);
 
 	continue_at(cl, cached_dev_bio_complete, NULL);
 }
@@ -1152,7 +1152,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
 		    !blk_queue_discard(bdev_get_queue(dc->bdev)))
 			bio_endio(bio, 0);
 		else
-			bch_generic_make_request(bio, &d->bio_split_hook);
+			generic_make_request(bio);
 	}
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 24a3a1546c..d4e328386f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -59,29 +59,6 @@ struct workqueue_struct *bcache_wq;
 
 #define BTREE_MAX_PAGES		(256 * 1024 / PAGE_SIZE)
 
-static void bio_split_pool_free(struct bio_split_pool *p)
-{
-	if (p->bio_split_hook)
-		mempool_destroy(p->bio_split_hook);
-
-	if (p->bio_split)
-		bioset_free(p->bio_split);
-}
-
-static int bio_split_pool_init(struct bio_split_pool *p)
-{
-	p->bio_split = bioset_create(4, 0);
-	if (!p->bio_split)
-		return -ENOMEM;
-
-	p->bio_split_hook = mempool_create_kmalloc_pool(4,
-				sizeof(struct bio_split_hook));
-	if (!p->bio_split_hook)
-		return -ENOMEM;
-
-	return 0;
-}
-
 /* Superblock */
 
 static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
@@ -537,7 +514,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, unsigned long rw)
 	bio->bi_private = ca;
 	bch_bio_map(bio, ca->disk_buckets);
 
-	closure_bio_submit(bio, &ca->prio, ca);
+	closure_bio_submit(bio, &ca->prio);
 	closure_sync(cl);
 }
 
@@ -763,7 +740,6 @@ static void bcache_device_free(struct bcache_device *d)
 		put_disk(d->disk);
 	}
 
-	bio_split_pool_free(&d->bio_split_hook);
 	if (d->bio_split)
 		bioset_free(d->bio_split);
 	if (is_vmalloc_addr(d->full_dirty_stripes))
@@ -816,7 +792,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 		return minor;
 
 	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
-	    bio_split_pool_init(&d->bio_split_hook) ||
 	    !(d->disk = alloc_disk(1))) {
 		ida_simple_remove(&bcache_minor, minor);
 		return -ENOMEM;
@@ -1775,8 +1750,6 @@ void bch_cache_release(struct kobject *kobj)
 	if (ca->set)
 		ca->set->cache[ca->sb.nr_this_dev] = NULL;
 
-	bio_split_pool_free(&ca->bio_split_hook);
-
 	free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
 	kfree(ca->prio_buckets);
 	vfree(ca->buckets);
@@ -1825,8 +1798,7 @@ static int cache_alloc(struct cache_sb *sb, struct cache *ca)
 					  ca->sb.nbuckets)) ||
 	    !(ca->prio_buckets	= kzalloc(sizeof(uint64_t) * prio_buckets(ca) *
 					  2, GFP_KERNEL)) ||
-	    !(ca->disk_buckets	= alloc_bucket_pages(GFP_KERNEL, ca)) ||
-	    bio_split_pool_init(&ca->bio_split_hook))
+	    !(ca->disk_buckets	= alloc_bucket_pages(GFP_KERNEL, ca)))
 		return -ENOMEM;
 
 	ca->prio_last_buckets = ca->prio_buckets + prio_buckets(ca);
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index ac7d0d1f70..9ac17568f8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -4,6 +4,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/errno.h>
+#include <linux/blkdev.h>
 #include <linux/kernel.h>
 #include <linux/llist.h>
 #include <linux/ratelimit.h>
@@ -576,10 +577,10 @@ static inline sector_t bdev_sectors(struct block_device *bdev)
 	return bdev->bd_inode->i_size >> 9;
 }
 
-#define closure_bio_submit(bio, cl, dev)				\
+#define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-	bch_generic_make_request(bio, &(dev)->bio_split_hook);		\
+	generic_make_request(bio);					\
 } while (0)
 
 uint64_t bch_crc64_update(uint64_t, const void *, size_t);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index f4300e4c01..a01d97bc2e 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -188,7 +188,7 @@ static void write_dirty(struct closure *cl)
 	io->bio.bi_bdev		= io->dc->bdev;
 	io->bio.bi_end_io	= dirty_endio;
 
-	closure_bio_submit(&io->bio, cl, &io->dc->disk);
+	closure_bio_submit(&io->bio, cl);
 
 	continue_at(cl, write_dirty_finish, system_wq);
 }
@@ -208,7 +208,7 @@ static void read_dirty_submit(struct closure *cl)
 {
 	struct dirty_io *io = container_of(cl, struct dirty_io, cl);
 
-	closure_bio_submit(&io->bio, cl, &io->dc->disk);
+	closure_bio_submit(&io->bio, cl);
 
 	continue_at(cl, write_dirty, system_wq);
 }
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] btrfs: generic_make_request() handles arbitrary size bios now
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (3 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 4/9] bcache: " Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 6/9] btrfs: Convert to bio_for_each_segment() Kent Overstreet
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Chris Mason, linux-btrfs

So there's no need for btrfs to break up bios for device limits anymore

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Chris Mason <clm@fb.com>
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/volumes.c | 73 ------------------------------------------------------
 1 file changed, 73 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bab0b84d8f..3ce1a8aed6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5395,34 +5395,6 @@ static noinline void btrfs_schedule_bio(struct btrfs_root *root,
 				   &device->work);
 }
 
-static int bio_size_ok(struct block_device *bdev, struct bio *bio,
-		       sector_t sector)
-{
-	struct bio_vec *prev;
-	struct request_queue *q = bdev_get_queue(bdev);
-	unsigned int max_sectors = queue_max_sectors(q);
-	struct bvec_merge_data bvm = {
-		.bi_bdev = bdev,
-		.bi_sector = sector,
-		.bi_rw = bio->bi_rw,
-	};
-
-	if (WARN_ON(bio->bi_vcnt == 0))
-		return 1;
-
-	prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
-	if (bio_sectors(bio) > max_sectors)
-		return 0;
-
-	if (!q->merge_bvec_fn)
-		return 1;
-
-	bvm.bi_size = bio->bi_iter.bi_size - prev->bv_len;
-	if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len)
-		return 0;
-	return 1;
-}
-
 static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
 			      struct bio *bio, u64 physical, int dev_nr,
 			      int rw, int async)
@@ -5453,38 +5425,6 @@ static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
 		btrfsic_submit_bio(rw, bio);
 }
 
-static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
-			      struct bio *first_bio, struct btrfs_device *dev,
-			      int dev_nr, int rw, int async)
-{
-	struct bio_vec *bvec = first_bio->bi_io_vec;
-	struct bio *bio;
-	int nr_vecs = bio_get_nr_vecs(dev->bdev);
-	u64 physical = bbio->stripes[dev_nr].physical;
-
-again:
-	bio = btrfs_bio_alloc(dev->bdev, physical >> 9, nr_vecs, GFP_NOFS);
-	if (!bio)
-		return -ENOMEM;
-
-	while (bvec <= (first_bio->bi_io_vec + first_bio->bi_vcnt - 1)) {
-		if (bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-				 bvec->bv_offset) < bvec->bv_len) {
-			u64 len = bio->bi_iter.bi_size;
-
-			atomic_inc(&bbio->stripes_pending);
-			submit_stripe_bio(root, bbio, bio, physical, dev_nr,
-					  rw, async);
-			physical += len;
-			goto again;
-		}
-		bvec++;
-	}
-
-	submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async);
-	return 0;
-}
-
 static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
 {
 	atomic_inc(&bbio->error);
@@ -5553,19 +5493,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio,
 			continue;
 		}
 
-		/*
-		 * Check and see if we're ok with this bio based on it's size
-		 * and offset with the given device.
-		 */
-		if (!bio_size_ok(dev->bdev, first_bio,
-				 bbio->stripes[dev_nr].physical >> 9)) {
-			ret = breakup_stripe_bio(root, bbio, first_bio, dev,
-						 dev_nr, rw, async_submit);
-			BUG_ON(ret);
-			dev_nr++;
-			continue;
-		}
-
 		if (dev_nr < total_devs - 1) {
 			bio = btrfs_bio_clone(first_bio, GFP_NOFS);
 			BUG_ON(!bio); /* -ENOMEM */
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] btrfs: Convert to bio_for_each_segment()
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (4 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 5/9] btrfs: generic_make_request() handles arbitrary size " Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 7/9] iov_iter: Move iov_iter to uio.h Kent Overstreet
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Chris Mason, linux-btrfs

This is going to be important for future (hopeful) block layer refactoring, and
using the standard primitives makes the code easier to audit.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Chris Mason <clm@fb.com>
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/extent_io.c | 12 ++++++++---
 fs/btrfs/file-item.c | 59 ++++++++++++++++++++--------------------------------
 fs/btrfs/inode.c     | 22 +++++++-------------
 3 files changed, 39 insertions(+), 54 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 85bbd01f12..0a84847123 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2632,12 +2632,18 @@ static int __must_check submit_one_bio(int rw, struct bio *bio,
 				       int mirror_num, unsigned long bio_flags)
 {
 	int ret = 0;
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
-	struct page *page = bvec->bv_page;
+	struct bio_vec bvec = { 0 };
+	struct bvec_iter iter;
+	struct page *page;
 	struct extent_io_tree *tree = bio->bi_private;
 	u64 start;
 
-	start = page_offset(page) + bvec->bv_offset;
+	bio_for_each_segment(bvec, bio, iter)
+		if (bio_iter_last(bvec, iter))
+			break;
+
+	page = bvec.bv_page;
+	start = page_offset(page) + bvec.bv_offset;
 
 	bio->bi_private = NULL;
 
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 127555b29f..c41642ea69 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -162,7 +162,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 				   struct inode *inode, struct bio *bio,
 				   u64 logical_offset, u32 *dst, int dio)
 {
-	struct bio_vec *bvec = bio->bi_io_vec;
+	struct bvec_iter iter = bio->bi_iter;
 	struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
@@ -171,10 +171,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 	u64 offset = 0;
 	u64 item_start_offset = 0;
 	u64 item_last_offset = 0;
-	u64 disk_bytenr;
 	u32 diff;
 	int nblocks;
-	int bio_index = 0;
 	int count;
 	u16 csum_size = btrfs_super_csum_size(root->fs_info->super_copy);
 
@@ -204,8 +202,6 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 	if (bio->bi_iter.bi_size > PAGE_CACHE_SIZE * 8)
 		path->reada = 2;
 
-	WARN_ON(bio->bi_vcnt <= 0);
-
 	/*
 	 * the free space stuff is only read when it hasn't been
 	 * updated in the current transaction.  So, we can safely
@@ -217,12 +213,13 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 		path->skip_locking = 1;
 	}
 
-	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
 	if (dio)
 		offset = logical_offset;
-	while (bio_index < bio->bi_vcnt) {
+	while (iter.bi_size) {
+		u64 disk_bytenr = (u64)iter.bi_sector << 9;
+		struct bio_vec bvec = bio_iter_iovec(bio, iter);
 		if (!dio)
-			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
 		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
 					       (u32 *)csum, nblocks);
 		if (count)
@@ -243,7 +240,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 				if (BTRFS_I(inode)->root->root_key.objectid ==
 				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
 					set_extent_bits(io_tree, offset,
-						offset + bvec->bv_len - 1,
+						offset + bvec.bv_len - 1,
 						EXTENT_NODATASUM, GFP_NOFS);
 				} else {
 					btrfs_info(BTRFS_I(inode)->root->fs_info,
@@ -281,12 +278,9 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 found:
 		csum += count * csum_size;
 		nblocks -= count;
-		while (count--) {
-			disk_bytenr += bvec->bv_len;
-			offset += bvec->bv_len;
-			bio_index++;
-			bvec++;
-		}
+		bio_advance_iter(bio, &iter,
+				 count << inode->i_sb->s_blocksize_bits);
+		offset += count << inode->i_sb->s_blocksize_bits;
 	}
 	btrfs_free_path(path);
 	return 0;
@@ -439,14 +433,12 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
 	struct btrfs_ordered_sum *sums;
 	struct btrfs_ordered_extent *ordered;
 	char *data;
-	struct bio_vec *bvec = bio->bi_io_vec;
-	int bio_index = 0;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	int index;
-	unsigned long total_bytes = 0;
 	unsigned long this_sum_bytes = 0;
 	u64 offset;
 
-	WARN_ON(bio->bi_vcnt <= 0);
 	sums = kzalloc(btrfs_ordered_sum_size(root, bio->bi_iter.bi_size),
 		       GFP_NOFS);
 	if (!sums)
@@ -458,53 +450,46 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
 	if (contig)
 		offset = file_start;
 	else
-		offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+		offset = page_offset(bio_page(bio)) + bio_offset(bio);
 
 	ordered = btrfs_lookup_ordered_extent(inode, offset);
 	BUG_ON(!ordered); /* Logic error */
 	sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
 	index = 0;
 
-	while (bio_index < bio->bi_vcnt) {
+	bio_for_each_segment(bvec, bio, iter) {
 		if (!contig)
-			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
 
 		if (offset >= ordered->file_offset + ordered->len ||
 		    offset < ordered->file_offset) {
-			unsigned long bytes_left;
 			sums->len = this_sum_bytes;
 			this_sum_bytes = 0;
 			btrfs_add_ordered_sum(inode, ordered, sums);
 			btrfs_put_ordered_extent(ordered);
 
-			bytes_left = bio->bi_iter.bi_size - total_bytes;
-
-			sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left),
+			sums = kzalloc(btrfs_ordered_sum_size(root, iter.bi_size),
 				       GFP_NOFS);
 			BUG_ON(!sums); /* -ENOMEM */
-			sums->len = bytes_left;
+			sums->len = iter.bi_size;
 			ordered = btrfs_lookup_ordered_extent(inode, offset);
 			BUG_ON(!ordered); /* Logic error */
-			sums->bytenr = ((u64)bio->bi_iter.bi_sector << 9) +
-				       total_bytes;
+			sums->bytenr = ((u64)iter.bi_sector) << 9;
 			index = 0;
 		}
 
-		data = kmap_atomic(bvec->bv_page);
+		data = kmap_atomic(bvec.bv_page);
 		sums->sums[index] = ~(u32)0;
-		sums->sums[index] = btrfs_csum_data(data + bvec->bv_offset,
+		sums->sums[index] = btrfs_csum_data(data + bvec.bv_offset,
 						    sums->sums[index],
-						    bvec->bv_len);
+						    bvec.bv_len);
 		kunmap_atomic(data);
 		btrfs_csum_final(sums->sums[index],
 				 (char *)(sums->sums + index));
 
-		bio_index++;
 		index++;
-		total_bytes += bvec->bv_len;
-		this_sum_bytes += bvec->bv_len;
-		offset += bvec->bv_len;
-		bvec++;
+		offset += bvec.bv_len;
+		this_sum_bytes += bvec.bv_len;
 	}
 	this_sum_bytes = 0;
 	btrfs_add_ordered_sum(inode, ordered, sums);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d3d4448629..2475908a7b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7161,12 +7161,11 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct bio *bio;
 	struct bio *orig_bio = dip->orig_bio;
-	struct bio_vec *bvec = orig_bio->bi_io_vec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	u64 start_sector = orig_bio->bi_iter.bi_sector;
 	u64 file_offset = dip->logical_offset;
-	u64 submit_len = 0;
 	u64 map_length;
-	int nr_pages = 0;
 	int ret = 0;
 	int async_submit = 0;
 
@@ -7197,10 +7196,12 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 	bio->bi_end_io = btrfs_end_dio_bio;
 	atomic_inc(&dip->pending_bios);
 
-	while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) {
-		if (unlikely(map_length < submit_len + bvec->bv_len ||
-		    bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-				 bvec->bv_offset) < bvec->bv_len)) {
+	bio_for_each_segment(bvec, orig_bio, iter) {
+		if (unlikely(map_length < bio->bi_iter.bi_size + bvec.bv_len ||
+		    bio_add_page(bio, bvec.bv_page, bvec.bv_len,
+				 bvec.bv_offset) < bvec.bv_len)) {
+			unsigned submit_len = bio->bi_iter.bi_size;
+
 			/*
 			 * inc the count before we submit the bio so
 			 * we know the end IO handler won't happen before
@@ -7220,9 +7221,6 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 			start_sector += submit_len >> 9;
 			file_offset += submit_len;
 
-			submit_len = 0;
-			nr_pages = 0;
-
 			bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
 						  start_sector, GFP_NOFS);
 			if (!bio)
@@ -7238,10 +7236,6 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 				bio_put(bio);
 				goto out_err;
 			}
-		} else {
-			submit_len += bvec->bv_len;
-			nr_pages++;
-			bvec++;
 		}
 	}
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] iov_iter: Move iov_iter to uio.h
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (5 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 6/9] btrfs: Convert to bio_for_each_segment() Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 8/9] iov_iter: Kill iov_iter_single_seg_count() Kent Overstreet
  2014-02-26 23:39 ` [PATCH 9/9] iov_iter: Kill written arg to iov_iter_init() Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Alexander Viro

Going to be consolidating all the iov iter in one place, and fs.h is way too
big. This also adds a new helper,  iovec iov_iter_iovec().

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h  | 32 --------------------------------
 include/linux/uio.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 60829565e5..e42211f044 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -290,38 +290,6 @@ struct page;
 struct address_space;
 struct writeback_control;
 
-struct iov_iter {
-	const struct iovec *iov;
-	unsigned long nr_segs;
-	size_t iov_offset;
-	size_t count;
-};
-
-size_t iov_iter_copy_from_user_atomic(struct page *page,
-		struct iov_iter *i, unsigned long offset, size_t bytes);
-size_t iov_iter_copy_from_user(struct page *page,
-		struct iov_iter *i, unsigned long offset, size_t bytes);
-void iov_iter_advance(struct iov_iter *i, size_t bytes);
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
-size_t iov_iter_single_seg_count(const struct iov_iter *i);
-
-static inline void iov_iter_init(struct iov_iter *i,
-			const struct iovec *iov, unsigned long nr_segs,
-			size_t count, size_t written)
-{
-	i->iov = iov;
-	i->nr_segs = nr_segs;
-	i->iov_offset = 0;
-	i->count = count + written;
-
-	iov_iter_advance(i, written);
-}
-
-static inline size_t iov_iter_count(struct iov_iter *i)
-{
-	return i->count;
-}
-
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
diff --git a/include/linux/uio.h b/include/linux/uio.h
index c55ce243cc..347d70ce09 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -9,14 +9,23 @@
 #ifndef __LINUX_UIO_H
 #define __LINUX_UIO_H
 
+#include <linux/kernel.h>
 #include <uapi/linux/uio.h>
 
+struct page;
 
 struct kvec {
 	void *iov_base; /* and that should *never* hold a userland pointer */
 	size_t iov_len;
 };
 
+struct iov_iter {
+	const struct iovec *iov;
+	unsigned long nr_segs;
+	size_t iov_offset;
+	size_t count;
+};
+
 /*
  * Total number of bytes covered by an iovec.
  *
@@ -34,8 +43,49 @@ static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs)
 	return ret;
 }
 
+static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
+{
+	return (struct iovec) {
+		.iov_base = iter->iov->iov_base + iter->iov_offset,
+		.iov_len = min(iter->count,
+			       iter->iov->iov_len - iter->iov_offset),
+	};
+}
+
+#define iov_for_each(iov, iter, start)				\
+	for (iter = (start);					\
+	     (iter).count &&					\
+	     ((iov = iov_iter_iovec(&(iter))), 1);		\
+	     iov_iter_advance(&(iter), (iov).iov_len))
+
 unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to);
 
+size_t iov_iter_copy_from_user_atomic(struct page *page,
+		struct iov_iter *i, unsigned long offset, size_t bytes);
+size_t iov_iter_copy_from_user(struct page *page,
+		struct iov_iter *i, unsigned long offset, size_t bytes);
+void iov_iter_advance(struct iov_iter *i, size_t bytes);
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
+size_t iov_iter_single_seg_count(const struct iov_iter *i);
+
+static inline void iov_iter_init(struct iov_iter *i,
+			const struct iovec *iov, unsigned long nr_segs,
+			size_t count, size_t written)
+{
+	i->iov = iov;
+	i->nr_segs = nr_segs;
+	i->iov_offset = 0;
+	i->count = count + written;
+
+	iov_iter_advance(i, written);
+}
+
+static inline size_t iov_iter_count(struct iov_iter *i)
+{
+	return i->count;
+}
+
 int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
 int memcpy_toiovec(struct iovec *iov, unsigned char *kdata, int len);
+
 #endif
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] iov_iter: Kill iov_iter_single_seg_count()
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (6 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 7/9] iov_iter: Move iov_iter to uio.h Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  2014-02-26 23:39 ` [PATCH 9/9] iov_iter: Kill written arg to iov_iter_init() Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Alexander Viro

The new iov_iter_iovec() is a more general replacement.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/fuse/file.c      |  6 +++---
 include/linux/uio.h |  1 -
 mm/filemap.c        | 15 +--------------
 3 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 77bcc303c3..b4c9a14e70 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1013,7 +1013,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req,
 		if (!tmp) {
 			unlock_page(page);
 			page_cache_release(page);
-			bytes = min(bytes, iov_iter_single_seg_count(ii));
+			bytes = min(bytes, iov_iter_iovec(ii).iov_len);
 			goto again;
 		}
 
@@ -1204,7 +1204,7 @@ static inline unsigned long fuse_get_user_addr(const struct iov_iter *ii)
 static inline size_t fuse_get_frag_size(const struct iov_iter *ii,
 					size_t max_size)
 {
-	return min(iov_iter_single_seg_count(ii), max_size);
+	return min(iov_iter_iovec(ii).iov_len, max_size);
 }
 
 static int fuse_get_user_pages(struct fuse_req *req, struct iov_iter *ii,
@@ -1278,7 +1278,7 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p)
 	while (iov_iter_count(&ii) && npages < FUSE_MAX_PAGES_PER_REQ) {
 		unsigned long user_addr = fuse_get_user_addr(&ii);
 		unsigned offset = user_addr & ~PAGE_MASK;
-		size_t frag_size = iov_iter_single_seg_count(&ii);
+		size_t frag_size = iov_iter_iovec(&ii).iov_len;
 
 		npages += (frag_size + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		iov_iter_advance(&ii, frag_size);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 347d70ce09..2b99e0d2a1 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -66,7 +66,6 @@ size_t iov_iter_copy_from_user(struct page *page,
 		struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
 int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
-size_t iov_iter_single_seg_count(const struct iov_iter *i);
 
 static inline void iov_iter_init(struct iov_iter *i,
 			const struct iovec *iov, unsigned long nr_segs,
diff --git a/mm/filemap.c b/mm/filemap.c
index 7a13f6ac54..8e509b3225 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2073,19 +2073,6 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
 EXPORT_SYMBOL(iov_iter_fault_in_readable);
 
 /*
- * Return the count of just the current iov_iter segment.
- */
-size_t iov_iter_single_seg_count(const struct iov_iter *i)
-{
-	const struct iovec *iov = i->iov;
-	if (i->nr_segs == 1)
-		return i->count;
-	else
-		return min(i->count, iov->iov_len - i->iov_offset);
-}
-EXPORT_SYMBOL(iov_iter_single_seg_count);
-
-/*
  * Performs necessary checks before doing a write
  *
  * Can adjust writing position or amount of bytes to write.
@@ -2373,7 +2360,7 @@ again:
 			 * once without a pagefault.
 			 */
 			bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
-						iov_iter_single_seg_count(i));
+				      iov_iter_iovec(i).iov_len);
 			goto again;
 		}
 		pos += copied;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] iov_iter: Kill written arg to iov_iter_init()
  2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
                   ` (7 preceding siblings ...)
  2014-02-26 23:39 ` [PATCH 8/9] iov_iter: Kill iov_iter_single_seg_count() Kent Overstreet
@ 2014-02-26 23:39 ` Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-26 23:39 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, linux-fsdevel, Kent Overstreet, Alexander Viro,
	Chris Mason, linux-btrfs, Steve French, linux-cifs,
	Miklos Szeredi, fuse-devel, Sage Weil, ceph-devel

This gets rid of a usually needless call to iov_iter_advance().

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chris Mason <clm@fb.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: fuse-devel@lists.sourceforge.net
Cc: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
---
 fs/btrfs/file.c     |  8 +++++---
 fs/ceph/file.c      |  7 ++++---
 fs/cifs/file.c      |  4 ++--
 fs/fuse/file.c      | 11 ++++++-----
 include/linux/uio.h |  9 +++------
 mm/filemap.c        |  4 +++-
 6 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0165b8672f..18b2e127e8 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1652,8 +1652,9 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb,
 		return written;
 
 	pos += written;
-	count -= written;
-	iov_iter_init(&i, iov, nr_segs, count, written);
+	iov_iter_init(&i, iov, nr_segs, count);
+	iov_iter_advance(&i, written);
+
 	written_buffered = __btrfs_buffered_write(file, &i, pos);
 	if (written_buffered < 0) {
 		err = written_buffered;
@@ -1768,7 +1769,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 	} else {
 		struct iov_iter i;
 
-		iov_iter_init(&i, iov, nr_segs, count, num_written);
+		iov_iter_init(&i, iov, nr_segs, count + num_written);
+		iov_iter_advance(&i, num_written);
 
 		num_written = __btrfs_buffered_write(file, &i, pos);
 		if (num_written > 0)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index dfd2ce3419..163ae0fe20 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -580,7 +580,7 @@ ceph_sync_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		CEPH_OSD_FLAG_ONDISK |
 		CEPH_OSD_FLAG_WRITE;
 
-	iov_iter_init(&i, iov, nr_segs, count, 0);
+	iov_iter_init(&i, iov, nr_segs, count);
 
 	while (iov_iter_count(&i) > 0) {
 		void __user *data = i.iov->iov_base + i.iov_offset;
@@ -701,7 +701,7 @@ static ssize_t ceph_sync_write(struct kiocb *iocb, const struct iovec *iov,
 		CEPH_OSD_FLAG_WRITE |
 		CEPH_OSD_FLAG_ACK;
 
-	iov_iter_init(&i, iov, nr_segs, count, 0);
+	iov_iter_init(&i, iov, nr_segs, count);
 
 	while ((len = iov_iter_count(&i)) > 0) {
 		size_t left;
@@ -834,7 +834,8 @@ again:
 				goto out;
 		}
 
-		iov_iter_init(&i, iov, nr_segs, len, read);
+		iov_iter_init(&i, iov, nr_segs, len);
+		iov_iter_advance(&i, read);
 
 		/* hmm, this isn't really async... */
 		ret = ceph_sync_read(iocb, &i, &checkeof);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 755584684f..8a03eeb95e 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2424,7 +2424,7 @@ cifs_iovec_write(struct file *file, const struct iovec *iov,
 	else
 		pid = current->tgid;
 
-	iov_iter_init(&it, iov, nr_segs, len, 0);
+	iov_iter_init(&it, iov, nr_segs, len);
 	do {
 		size_t save_len;
 
@@ -2729,7 +2729,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, const struct iovec *iov,
 	unsigned int i;
 
 	/* set up iov_iter and advance to the correct offset */
-	iov_iter_init(&ii, iov, nr_segs, iov_length(iov, nr_segs), 0);
+	iov_iter_init(&ii, iov, nr_segs, iov_length(iov, nr_segs));
 	iov_iter_advance(&ii, pos);
 
 	*copied = 0;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b4c9a14e70..67542da685 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1152,9 +1152,10 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 			goto out;
 
 		pos += written;
-		count -= written;
 
-		iov_iter_init(&i, iov, nr_segs, count, written);
+		iov_iter_init(&i, iov, nr_segs, count);
+		iov_iter_advance(&i, written);
+
 		written_buffered = fuse_perform_write(file, mapping, &i, pos);
 		if (written_buffered < 0) {
 			err = written_buffered;
@@ -1174,7 +1175,7 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		written += written_buffered;
 		iocb->ki_pos = pos + written_buffered;
 	} else {
-		iov_iter_init(&i, iov, nr_segs, count, 0);
+		iov_iter_init(&i, iov, nr_segs, count);
 		written = fuse_perform_write(file, mapping, &i, pos);
 		if (written >= 0)
 			iocb->ki_pos = pos + written;
@@ -1300,7 +1301,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
 	struct fuse_req *req;
 	struct iov_iter ii;
 
-	iov_iter_init(&ii, iov, nr_segs, count, 0);
+	iov_iter_init(&ii, iov, nr_segs, count);
 
 	if (io->async)
 		req = fuse_get_req_for_background(fc, fuse_iter_npages(&ii));
@@ -2188,7 +2189,7 @@ static int fuse_ioctl_copy_user(struct page **pages, struct iovec *iov,
 	if (!bytes)
 		return 0;
 
-	iov_iter_init(&ii, iov, nr_segs, bytes, 0);
+	iov_iter_init(&ii, iov, nr_segs, bytes);
 
 	while (iov_iter_count(&ii)) {
 		struct page *page = pages[page_idx++];
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2b99e0d2a1..d653feeeb1 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -67,16 +67,13 @@ size_t iov_iter_copy_from_user(struct page *page,
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
 int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
 
-static inline void iov_iter_init(struct iov_iter *i,
-			const struct iovec *iov, unsigned long nr_segs,
-			size_t count, size_t written)
+static inline void iov_iter_init(struct iov_iter *i, const struct iovec *iov,
+				 unsigned long nr_segs, size_t count)
 {
 	i->iov = iov;
 	i->nr_segs = nr_segs;
 	i->iov_offset = 0;
-	i->count = count + written;
-
-	iov_iter_advance(i, written);
+	i->count = count;
 }
 
 static inline size_t iov_iter_count(struct iov_iter *i)
diff --git a/mm/filemap.c b/mm/filemap.c
index 8e509b3225..e756a0cf21 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2385,7 +2385,9 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	ssize_t status;
 	struct iov_iter i;
 
-	iov_iter_init(&i, iov, nr_segs, count, written);
+	iov_iter_init(&i, iov, nr_segs, count + written);
+	iov_iter_advance(&i, written);
+
 	status = generic_perform_write(file, &i, pos);
 
 	if (likely(status >= 0)) {
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
@ 2014-02-27 17:22   ` Matthew Wilcox
  2014-02-27 21:27     ` Kent Overstreet
  2014-02-28 23:30     ` Kent Overstreet
  2014-03-02 20:31   ` Muthu Kumar
  1 sibling, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2014-02-27 17:22 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: axboe, linux-kernel, linux-fsdevel, Neil Brown, Alasdair Kergon,
	dm-devel, Lars Ellenberg, drbd-user, Asai Thambi S P,
	Sam Bradshaw, linux-nvme, Jiri Kosina, Geoff Levand, Jim Paris,
	Joshua Morris, Philip Kelleher, Minchan Kim, Nitin Gupta,
	Martin Schwidefsky, Heiko Carstens, Peng Tao

On Wed, Feb 26, 2014 at 03:39:49PM -0800, Kent Overstreet wrote:
> We do this by adding calls to blk_queue_split() to the various
> make_request functions that need it - a few can already handle arbitrary
> size bios. Note that we add the call _after_ any call to blk_queue_bounce();
> this means that blk_queue_split() and blk_recalc_rq_segments() don't need to
> be concerned with bouncing affecting segment merging.

> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index 51824d1f23..e4376b9613 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
>  	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
>  	int result = -EBUSY;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if (!nvmeq) {
>  		put_nvmeq(NULL);
>  		bio_endio(bio, -EIO);

I'd suggest that we do:

-	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
+	struct nvme_queue *nvmeq;
 	int result = -EBUSY;

+	blk_queue_split(q, &bio, q->bio_split);
+
+	nvmeq = get_nvmeq(ns->dev);
 	if (!nvmeq) {

so that we're running the blk_queue_split() code outside the get_cpu()
call.

Now, the NVMe driver has its own rules about when BIOs have to be split.
Right now, that's way down inside the nvme_map_bio() call when we're
walking the bio to compose the scatterlist.  Should we instead have an
nvme_bio_split() routine that is called instead of blk_queue_split(),
and we can simplify nvme_map_bio() since it'll know that it's working
with bios that don't have to be split.

In fact, I think it would have little NVMe-specific in it at that point,
so we could name __blk_bios_map_sg() better, export it to drivers and
call it from nvme_map_bio(), which I think would make everybody happier.

> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index a2af73db18..a37acf722b 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -2444,6 +2444,10 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>  	char b[BDEVNAME_SIZE];
>  	struct bio *split;
>  
> +	blk_queue_bounce(q, &bio);
> +
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	pd = q->queuedata;
>  	if (!pd) {
>  		pr_err("%s incorrect request queue\n",
> @@ -2474,8 +2478,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>  		goto end_io;
>  	}
>  
> -	blk_queue_bounce(q, &bio);
> -
>  	do {
>  		sector_t zone = get_zone(bio->bi_iter.bi_sector, pd);
>  		sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd);
> diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
> index ef45cfb98f..a995972961 100644
> --- a/drivers/block/ps3vram.c
> +++ b/drivers/block/ps3vram.c
> @@ -603,6 +603,8 @@ static void ps3vram_make_request(struct request_queue *q, struct bio *bio)
>  	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>  	int busy;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	dev_dbg(&dev->core, "%s\n", __func__);
>  
>  	spin_lock_irq(&priv->lock);
> diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
> index 2839d37e5a..ff074a3cd4 100644
> --- a/drivers/block/rsxx/dev.c
> +++ b/drivers/block/rsxx/dev.c
> @@ -169,6 +169,8 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
>  	struct rsxx_bio_meta *bio_meta;
>  	int st = -EINVAL;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	might_sleep();
>  
>  	if (!card)
> diff --git a/drivers/block/umem.c b/drivers/block/umem.c
> index 4cf81b5bf0..13d577cfbc 100644
> --- a/drivers/block/umem.c
> +++ b/drivers/block/umem.c
> @@ -531,6 +531,8 @@ static void mm_make_request(struct request_queue *q, struct bio *bio)
>  		 (unsigned long long)bio->bi_iter.bi_sector,
>  		 bio->bi_iter.bi_size);
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	spin_lock_irq(&card->lock);
>  	*card->biotail = bio;
>  	bio->bi_next = NULL;
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 011e55d820..ecf9daa01c 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -733,6 +733,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>  	struct zram *zram = queue->queuedata;
>  
> +	blk_queue_split(queue, &bio, queue->bio_split);
> +
>  	down_read(&zram->init_lock);
>  	if (unlikely(!zram->init_done))
>  		goto error;
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 8c53b09b9a..97f70420f2 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1500,6 +1500,8 @@ static void dm_request(struct request_queue *q, struct bio *bio)
>  {
>  	struct mapped_device *md = q->queuedata;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if (dm_request_based(md))
>  		blk_queue_bio(q, bio);
>  	else
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4ad5cc4e63..1421bc3f7b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -256,6 +256,8 @@ static void md_make_request(struct request_queue *q, struct bio *bio)
>  	int cpu;
>  	unsigned int sectors;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if (mddev == NULL || mddev->pers == NULL
>  	    || !mddev->ready) {
>  		bio_io_error(bio);
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index ebf41e228e..db33cd3e4c 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -815,6 +815,8 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
>  	unsigned long source_addr;
>  	unsigned long bytes_done;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	bytes_done = 0;
>  	dev_info = bio->bi_bdev->bd_disk->private_data;
>  	if (dev_info == NULL)
> diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
> index 6969d39f1e..f03c103f13 100644
> --- a/drivers/s390/block/xpram.c
> +++ b/drivers/s390/block/xpram.c
> @@ -190,6 +190,8 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
>  	unsigned long page_addr;
>  	unsigned long bytes;
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if ((bio->bi_iter.bi_sector & 7) != 0 ||
>  	    (bio->bi_iter.bi_size & 4095) != 0)
>  		/* Request is not page-aligned. */
> diff --git a/drivers/staging/lustre/lustre/llite/lloop.c b/drivers/staging/lustre/lustre/llite/lloop.c
> index 0718905ade..a3f6dc930b 100644
> --- a/drivers/staging/lustre/lustre/llite/lloop.c
> +++ b/drivers/staging/lustre/lustre/llite/lloop.c
> @@ -344,6 +344,8 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>  	int rw = bio_rw(old_bio);
>  	int inactive;
>  
> +	blk_queue_split(q, &old_bio, q->bio_split);
> +
>  	if (!lo)
>  		goto err;
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1e1fa3f93d..99e9955c4d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -470,6 +470,7 @@ struct request_queue {
>  	wait_queue_head_t	mq_freeze_wq;
>  	struct percpu_counter	mq_usage_counter;
>  	struct list_head	all_q_node;
> +	struct bio_set		*bio_split;
>  };
>  
>  #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
> @@ -781,6 +782,8 @@ extern void blk_rq_unprep_clone(struct request *rq);
>  extern int blk_insert_cloned_request(struct request_queue *q,
>  				     struct request *rq);
>  extern void blk_delay_queue(struct request_queue *, unsigned long);
> +extern void blk_queue_split(struct request_queue *, struct bio **,
> +			    struct bio_set *);
>  extern void blk_recount_segments(struct request_queue *, struct bio *);
>  extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
>  extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-27 17:22   ` Matthew Wilcox
@ 2014-02-27 21:27     ` Kent Overstreet
  2014-02-28 23:30     ` Kent Overstreet
  1 sibling, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2014-02-27 21:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: axboe, linux-kernel, linux-fsdevel, Neil Brown, Alasdair Kergon,
	dm-devel, Lars Ellenberg, drbd-user, Asai Thambi S P,
	Sam Bradshaw, linux-nvme, Jiri Kosina, Geoff Levand, Jim Paris,
	Joshua Morris, Philip Kelleher, Minchan Kim, Nitin Gupta,
	Martin Schwidefsky, Heiko Carstens, Peng Tao

On Thu, Feb 27, 2014 at 12:22:54PM -0500, Matthew Wilcox wrote:
> On Wed, Feb 26, 2014 at 03:39:49PM -0800, Kent Overstreet wrote:
> > We do this by adding calls to blk_queue_split() to the various
> > make_request functions that need it - a few can already handle arbitrary
> > size bios. Note that we add the call _after_ any call to blk_queue_bounce();
> > this means that blk_queue_split() and blk_recalc_rq_segments() don't need to
> > be concerned with bouncing affecting segment merging.
> 
> > diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> > index 51824d1f23..e4376b9613 100644
> > --- a/drivers/block/nvme-core.c
> > +++ b/drivers/block/nvme-core.c
> > @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
> >  	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
> >  	int result = -EBUSY;
> >  
> > +	blk_queue_split(q, &bio, q->bio_split);
> > +
> >  	if (!nvmeq) {
> >  		put_nvmeq(NULL);
> >  		bio_endio(bio, -EIO);
> 
> I'd suggest that we do:
> 
> -	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
> +	struct nvme_queue *nvmeq;
>  	int result = -EBUSY;
> 
> +	blk_queue_split(q, &bio, q->bio_split);
> +
> +	nvmeq = get_nvmeq(ns->dev);
>  	if (!nvmeq) {
> 
> so that we're running the blk_queue_split() code outside the get_cpu()
> call.

Whoops, that's definitely a bug.

> Now, the NVMe driver has its own rules about when BIOs have to be split.
> Right now, that's way down inside the nvme_map_bio() call when we're
> walking the bio to compose the scatterlist.  Should we instead have an
> nvme_bio_split() routine that is called instead of blk_queue_split(),
> and we can simplify nvme_map_bio() since it'll know that it's working
> with bios that don't have to be split.
> 
> In fact, I think it would have little NVMe-specific in it at that point,
> so we could name __blk_bios_map_sg() better, export it to drivers and
> call it from nvme_map_bio(), which I think would make everybody happier.

Yes, definitely - and by doing it there we shoudn't even have to split
the bios, we can just process them incrementally. I can write a patch
for it later if you want to test it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-27 17:22   ` Matthew Wilcox
  2014-02-27 21:27     ` Kent Overstreet
@ 2014-02-28 23:30     ` Kent Overstreet
  2014-03-01 17:52       ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2014-02-28 23:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: axboe, linux-kernel, linux-fsdevel, Neil Brown, Alasdair Kergon,
	dm-devel, Lars Ellenberg, drbd-user, Asai Thambi S P,
	Sam Bradshaw, linux-nvme, Jiri Kosina, Geoff Levand, Jim Paris,
	Joshua Morris, Philip Kelleher, Minchan Kim, Nitin Gupta,
	Martin Schwidefsky, Heiko Carstens, Peng Tao

On Thu, Feb 27, 2014 at 12:22:54PM -0500, Matthew Wilcox wrote:
> On Wed, Feb 26, 2014 at 03:39:49PM -0800, Kent Overstreet wrote:
> > We do this by adding calls to blk_queue_split() to the various
> > make_request functions that need it - a few can already handle arbitrary
> > size bios. Note that we add the call _after_ any call to blk_queue_bounce();
> > this means that blk_queue_split() and blk_recalc_rq_segments() don't need to
> > be concerned with bouncing affecting segment merging.
> 
> > diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> > index 51824d1f23..e4376b9613 100644
> > --- a/drivers/block/nvme-core.c
> > +++ b/drivers/block/nvme-core.c
> > @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
> >  	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
> >  	int result = -EBUSY;
> >  
> > +	blk_queue_split(q, &bio, q->bio_split);
> > +
> >  	if (!nvmeq) {
> >  		put_nvmeq(NULL);
> >  		bio_endio(bio, -EIO);
> 
> I'd suggest that we do:
> 
> -	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
> +	struct nvme_queue *nvmeq;
>  	int result = -EBUSY;
> 
> +	blk_queue_split(q, &bio, q->bio_split);
> +
> +	nvmeq = get_nvmeq(ns->dev);
>  	if (!nvmeq) {
> 
> so that we're running the blk_queue_split() code outside the get_cpu()
> call.
> 
> Now, the NVMe driver has its own rules about when BIOs have to be split.
> Right now, that's way down inside the nvme_map_bio() call when we're
> walking the bio to compose the scatterlist.  Should we instead have an
> nvme_bio_split() routine that is called instead of blk_queue_split(),
> and we can simplify nvme_map_bio() since it'll know that it's working
> with bios that don't have to be split.
> 
> In fact, I think it would have little NVMe-specific in it at that point,
> so we could name __blk_bios_map_sg() better, export it to drivers and
> call it from nvme_map_bio(), which I think would make everybody happier.

Actually, reading nvme_map_bio() (it's different since last I looked at
it) it looks like nvme should already be able to handle arbitrary size
bios?

I do intend to rework the blk_bio_map_sg() (or add a new one?) to
incrementally map as much of a bio as will fit in the provided
scatterlist, but it looks like nvme has some odd restrictions where it's
using BIOVEC_PHYS_MERGABLE()/BIOVEC_NOT_VIRT_MERGABLE() so I dunno if
it's worth bothering to try and have it use generic code.

However we don't need an explicit split here: if the sg fills up (i.e.
the places nvme_split_and_submit() is called), we can just mark the bio
as partially completed (set bio->bi_iter = iter, i.e. use the iterator
you passed to bio_for_each_segment), then increment bi_remaining (which
just counts completions, i.e. bio_endio() calls before the bio is really
completed) and resubmit the original bio. No need to allocate a split
bio, or loop over the bio again in bio_split().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-28 23:30     ` Kent Overstreet
@ 2014-03-01 17:52       ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2014-03-01 17:52 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, axboe, Martin Schwidefsky, Minchan Kim,
	Neil Brown, Asai Thambi S P, Peng Tao, Heiko Carstens,
	linux-kernel, linux-nvme, Philip Kelleher, Geoff Levand,
	dm-devel, drbd-user, Jiri Kosina, linux-fsdevel, Jim Paris,
	Nitin Gupta, Sam Bradshaw, Joshua Morris, Alasdair Kergon,
	Lars Ellenberg

On Fri, 28 Feb 2014, Kent Overstreet wrote:
> On Thu, Feb 27, 2014 at 12:22:54PM -0500, Matthew Wilcox wrote:
>> On Wed, Feb 26, 2014 at 03:39:49PM -0800, Kent Overstreet wrote:
>>> We do this by adding calls to blk_queue_split() to the various
>>> make_request functions that need it - a few can already handle arbitrary
>>> size bios. Note that we add the call _after_ any call to blk_queue_bounce();
>>> this means that blk_queue_split() and blk_recalc_rq_segments() don't need to
>>> be concerned with bouncing affecting segment merging.
>>
>>> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
>>> index 51824d1f23..e4376b9613 100644
>>> --- a/drivers/block/nvme-core.c
>>> +++ b/drivers/block/nvme-core.c
>>> @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
>>>  	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
>>>  	int result = -EBUSY;
>>>
>>> +	blk_queue_split(q, &bio, q->bio_split);
>>> +
>>>  	if (!nvmeq) {
>>>  		put_nvmeq(NULL);
>>>  		bio_endio(bio, -EIO);
>>
>> I'd suggest that we do:
>>
>> -	struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
>> +	struct nvme_queue *nvmeq;
>>  	int result = -EBUSY;
>>
>> +	blk_queue_split(q, &bio, q->bio_split);
>> +
>> +	nvmeq = get_nvmeq(ns->dev);
>>  	if (!nvmeq) {
>>
>> so that we're running the blk_queue_split() code outside the get_cpu()
>> call.
>>
>> Now, the NVMe driver has its own rules about when BIOs have to be split.
>> Right now, that's way down inside the nvme_map_bio() call when we're
>> walking the bio to compose the scatterlist.  Should we instead have an
>> nvme_bio_split() routine that is called instead of blk_queue_split(),
>> and we can simplify nvme_map_bio() since it'll know that it's working
>> with bios that don't have to be split.
>>
>> In fact, I think it would have little NVMe-specific in it at that point,
>> so we could name __blk_bios_map_sg() better, export it to drivers and
>> call it from nvme_map_bio(), which I think would make everybody happier.
>
> Actually, reading nvme_map_bio() (it's different since last I looked at
> it) it looks like nvme should already be able to handle arbitrary size
> bios?
>
> I do intend to rework the blk_bio_map_sg() (or add a new one?) to
> incrementally map as much of a bio as will fit in the provided
> scatterlist, but it looks like nvme has some odd restrictions where it's
> using BIOVEC_PHYS_MERGABLE()/BIOVEC_NOT_VIRT_MERGABLE() so I dunno if
> it's worth bothering to try and have it use generic code.

Is nvme the only driver that has these kinds of restrictions on segment
address offsets? If so, I guess there's no reason to make it generic.

> However we don't need an explicit split here: if the sg fills up (i.e.
> the places nvme_split_and_submit() is called), we can just mark the bio
> as partially completed (set bio->bi_iter = iter, i.e. use the iterator
> you passed to bio_for_each_segment), then increment bi_remaining (which
> just counts completions, i.e. bio_endio() calls before the bio is really
> completed) and resubmit the original bio. No need to allocate a split
> bio, or loop over the bio again in bio_split().

We used to manipulate the original bio to track partial completions,
but I changed that for reasons that haven't quite yet materialized. If we
move the bio's bi_iter, it will make it difficult to retry the original
request on intermittent failures, and it will break the integrity verify
if the device format supports protection information. It's also more
performant to submit all parts at once rather than wait for the previous
part to complete before sending the next.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
  2014-02-27 17:22   ` Matthew Wilcox
@ 2014-03-02 20:31   ` Muthu Kumar
  2014-03-02 20:50     ` Muthu Kumar
  1 sibling, 1 reply; 16+ messages in thread
From: Muthu Kumar @ 2014-03-02 20:31 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Jens Axboe, linux-kernel, linux-fsdevel, Neil Brown,
	Alasdair Kergon, device-mapper development, Lars Ellenberg,
	drbd-user, Asai Thambi S P, Sam Bradshaw, Matthew Wilcox,
	linux-nvme, Jiri Kosina, Geoff Levand, Jim Paris, Joshua Morris,
	Philip Kelleher, Minchan Kim, Nitin Gupta, Martin Schwidefsky,
	Heiko Carstens, Peng Tao

Kent,
The blk_queue_split(), splits a bio into at most two bios right? So,
if the original bio spans larger space than two bios can cover
(restriction by the lower driver in the stack), this might not work?
Am I reading it incorrectly?

Thanks!

Regards,
Muthu



On Wed, Feb 26, 2014 at 3:39 PM, Kent Overstreet <kmo@daterainc.com> wrote:
> The way the block layer is currently written, it goes to great lengths
> to avoid having to split bios; upper layer code (such as bio_add_page())
> checks what the underlying device can handle and tries to always create
> bios that don't need to be split.
>
> But this approach becomes unwieldy and eventually breaks down with
> stacked devices and devices with dynamic limits, and it adds a lot of
> complexity. If the block layer could split bios as needed, we could
> eliminate a lot of complexity elsewhere - particularly in stacked
> drivers. Code that creates bios can then create whatever size bios are
> convenient, and more importantly stacked drivers don't have to deal with
> both their own bio size limitations and the limitations of the
> (potentially multiple) devices underneath them.  In the future this will
> let us delete merge_bvec_fn and a bunch of other code.
>
> We do this by adding calls to blk_queue_split() to the various
> make_request functions that need it - a few can already handle arbitrary
> size bios. Note that we add the call _after_ any call to blk_queue_bounce();
> this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be
> concerned with bouncing affecting segment merging.
>
> Some make_request_fns were simple enough to audit and verify they don't
> need blk_queue_split() calls. The skipped ones are:
>
>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>  * brd_make_request (ramdisk - drivers/block/brd.c)
>  * loop_make_request
>  * null_queue_bio
>  * bcache's make_request fns
>
> Some others are almost certainly safe to remove now, but will be left for future
> patches.
>
> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Neil Brown <neilb@suse.de>
> Cc: Alasdair Kergon <agk@redhat.com>
> Cc: dm-devel@redhat.com
> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
> Cc: drbd-user@lists.linbit.com
> Cc: Asai Thambi S P <asamymuthupa@micron.com>
> Cc: Sam Bradshaw <sbradshaw@micron.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: linux-nvme@lists.infradead.org
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Geoff Levand <geoff@infradead.org>
> Cc: Jim Paris <jim@jtan.com>
> Cc: Joshua Morris <josh.h.morris@us.ibm.com>
> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Peng Tao <bergwolf@gmail.com>
> ---
>  block/blk-core.c                            |  19 ++--
>  block/blk-merge.c                           | 150 ++++++++++++++++++++++++++--
>  block/blk-mq.c                              |   2 +
>  drivers/block/drbd/drbd_req.c               |   2 +
>  drivers/block/mtip32xx/mtip32xx.c           |   6 +-
>  drivers/block/nvme-core.c                   |   2 +
>  drivers/block/pktcdvd.c                     |   6 +-
>  drivers/block/ps3vram.c                     |   2 +
>  drivers/block/rsxx/dev.c                    |   2 +
>  drivers/block/umem.c                        |   2 +
>  drivers/block/zram/zram_drv.c               |   2 +
>  drivers/md/dm.c                             |   2 +
>  drivers/md/md.c                             |   2 +
>  drivers/s390/block/dcssblk.c                |   2 +
>  drivers/s390/block/xpram.c                  |   2 +
>  drivers/staging/lustre/lustre/llite/lloop.c |   2 +
>  include/linux/blkdev.h                      |   3 +
>  17 files changed, 185 insertions(+), 23 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 853f927492..d3b0782ec3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -581,6 +581,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>         if (q->id < 0)
>                 goto fail_c;
>
> +       q->bio_split = bioset_create(4, 0);
> +       if (!q->bio_split)
> +               goto fail_id;
> +
>         q->backing_dev_info.ra_pages =
>                         (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
>         q->backing_dev_info.state = 0;
> @@ -590,7 +594,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>
>         err = bdi_init(&q->backing_dev_info);
>         if (err)
> -               goto fail_id;
> +               goto fail_split;
>
>         setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
>                     laptop_mode_timer_fn, (unsigned long) q);
> @@ -635,6 +639,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>
>  fail_bdi:
>         bdi_destroy(&q->backing_dev_info);
> +fail_split:
> +       bioset_free(q->bio_split);
>  fail_id:
>         ida_simple_remove(&blk_queue_ida, q->id);
>  fail_c:
> @@ -1501,6 +1507,8 @@ void blk_queue_bio(struct request_queue *q, struct bio *bio)
>         struct request *req;
>         unsigned int request_count = 0;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         /*
>          * low level driver can indicate that it wants pages above a
>          * certain limit bounced to low memory (ie for highmem, or even
> @@ -1723,15 +1731,6 @@ generic_make_request_checks(struct bio *bio)
>                 goto end_io;
>         }
>
> -       if (likely(bio_is_rw(bio) &&
> -                  nr_sectors > queue_max_hw_sectors(q))) {
> -               printk(KERN_ERR "bio too big device %s (%u > %u)\n",
> -                      bdevname(bio->bi_bdev, b),
> -                      bio_sectors(bio),
> -                      queue_max_hw_sectors(q));
> -               goto end_io;
> -       }
> -
>         part = bio->bi_bdev->bd_part;
>         if (should_fail_request(part, bio->bi_iter.bi_size) ||
>             should_fail_request(&part_to_disk(part)->part0,
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 6c583f9c5b..0afbe3f1c2 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -9,11 +9,149 @@
>
>  #include "blk.h"
>
> +static struct bio *blk_bio_discard_split(struct request_queue *q,
> +                                        struct bio *bio,
> +                                        struct bio_set *bs)
> +{
> +       unsigned int max_discard_sectors, granularity;
> +       int alignment;
> +       sector_t tmp;
> +       unsigned split_sectors;
> +
> +       /* Zero-sector (unknown) and one-sector granularities are the same.  */
> +       granularity = max(q->limits.discard_granularity >> 9, 1U);
> +
> +       max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> +       max_discard_sectors -= max_discard_sectors % granularity;
> +
> +       if (unlikely(!max_discard_sectors)) {
> +               /* XXX: warn */
> +               return NULL;
> +       }
> +
> +       if (bio_sectors(bio) <= max_discard_sectors)
> +               return NULL;
> +
> +       split_sectors = max_discard_sectors;
> +
> +       /*
> +        * If the next starting sector would be misaligned, stop the discard at
> +        * the previous aligned sector.
> +        */
> +       alignment = (q->limits.discard_alignment >> 9) % granularity;
> +
> +       tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
> +       tmp = sector_div(tmp, granularity);
> +
> +       if (split_sectors > tmp)
> +               split_sectors -= tmp;
> +
> +       return bio_split(bio, split_sectors, GFP_NOIO, bs);
> +}
> +
> +static struct bio *blk_bio_write_same_split(struct request_queue *q,
> +                                           struct bio *bio,
> +                                           struct bio_set *bs)
> +{
> +       if (!q->limits.max_write_same_sectors)
> +               return NULL;
> +
> +       if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
> +               return NULL;
> +
> +       return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
> +}
> +
> +static struct bio *blk_bio_segment_split(struct request_queue *q,
> +                                        struct bio *bio,
> +                                        struct bio_set *bs)
> +{
> +       struct bio *split;
> +       struct bio_vec bv, bvprv;
> +       struct bvec_iter iter;
> +       unsigned seg_size = 0, nsegs = 0;
> +       int prev = 0;
> +
> +       struct bvec_merge_data bvm = {
> +               .bi_bdev        = bio->bi_bdev,
> +               .bi_sector      = bio->bi_iter.bi_sector,
> +               .bi_size        = 0,
> +               .bi_rw          = bio->bi_rw,
> +       };
> +
> +       bio_for_each_segment(bv, bio, iter) {
> +               if (q->merge_bvec_fn &&
> +                   q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
> +                       goto split;
> +
> +               bvm.bi_size += bv.bv_len;
> +
> +               if (bvm.bi_size >> 9 > queue_max_sectors(q))
> +                       goto split;
> +
> +               if (prev && blk_queue_cluster(q)) {
> +                       if (seg_size + bv.bv_len > queue_max_segment_size(q))
> +                               goto new_segment;
> +                       if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
> +                               goto new_segment;
> +                       if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
> +                               goto new_segment;
> +
> +                       seg_size += bv.bv_len;
> +                       bvprv = bv;
> +                       prev = 1;
> +                       continue;
> +               }
> +new_segment:
> +               if (nsegs == queue_max_segments(q))
> +                       goto split;
> +
> +               nsegs++;
> +               bvprv = bv;
> +               prev = 1;
> +               seg_size = bv.bv_len;
> +       }
> +
> +       return NULL;
> +split:
> +       split = bio_clone_bioset(bio, GFP_NOIO, bs);
> +
> +       split->bi_iter.bi_size -= iter.bi_size;
> +       bio->bi_iter = iter;
> +
> +       if (bio_integrity(bio)) {
> +               bio_integrity_advance(bio, split->bi_iter.bi_size);
> +               bio_integrity_trim(split, 0, bio_sectors(split));
> +       }
> +
> +       return split;
> +}
> +
> +void blk_queue_split(struct request_queue *q, struct bio **bio,
> +                    struct bio_set *bs)
> +{
> +       struct bio *split;
> +
> +       if ((*bio)->bi_rw & REQ_DISCARD)
> +               split = blk_bio_discard_split(q, *bio, bs);
> +       else if ((*bio)->bi_rw & REQ_WRITE_SAME)
> +               split = blk_bio_write_same_split(q, *bio, bs);
> +       else
> +               split = blk_bio_segment_split(q, *bio, q->bio_split);
> +
> +       if (split) {
> +               bio_chain(split, *bio);
> +               generic_make_request(*bio);
> +               *bio = split;
> +       }
> +}
> +EXPORT_SYMBOL(blk_queue_split);
> +
>  static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>                                              struct bio *bio)
>  {
>         struct bio_vec bv, bvprv = { NULL };
> -       int cluster, high, highprv = 1;
> +       int cluster, prev = 0;
>         unsigned int seg_size, nr_phys_segs;
>         struct bio *fbio, *bbio;
>         struct bvec_iter iter;
> @@ -37,13 +175,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>         nr_phys_segs = 0;
>         for_each_bio(bio) {
>                 bio_for_each_segment(bv, bio, iter) {
> -                       /*
> -                        * the trick here is making sure that a high page is
> -                        * never considered part of another segment, since that
> -                        * might change with the bounce page.
> -                        */
> -                       high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
> -                       if (!high && !highprv && cluster) {
> +                       if (prev && cluster) {
>                                 if (seg_size + bv.bv_len
>                                     > queue_max_segment_size(q))
>                                         goto new_segment;
> @@ -63,8 +195,8 @@ new_segment:
>
>                         nr_phys_segs++;
>                         bvprv = bv;
> +                       prev = 1;
>                         seg_size = bv.bv_len;
> -                       highprv = high;
>                 }
>                 bbio = bio;
>         }
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6468a715a0..7893e254d8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -915,6 +915,8 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 return;
>         }
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         if (use_plug && blk_attempt_plug_merge(q, bio, &request_count))
>                 return;
>
> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index 104a040f24..941a69c50c 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1275,6 +1275,8 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
>         struct drbd_conf *mdev = (struct drbd_conf *) q->queuedata;
>         unsigned long start_time;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         start_time = jiffies;
>
>         /*
> diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
> index 516026954b..df733ca685 100644
> --- a/drivers/block/mtip32xx/mtip32xx.c
> +++ b/drivers/block/mtip32xx/mtip32xx.c
> @@ -4033,6 +4033,10 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
>         int nents = 0;
>         int tag = 0, unaligned = 0;
>
> +       blk_queue_bounce(queue, &bio);
> +
> +       blk_queue_split(queue, &bio, queue->bio_split);
> +
>         if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) {
>                 if (unlikely(test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
>                                                         &dd->dd_flag))) {
> @@ -4082,8 +4086,6 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
>
>         sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
>         if (likely(sg != NULL)) {
> -               blk_queue_bounce(queue, &bio);
> -
>                 if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
>                         dev_warn(&dd->pdev->dev,
>                                 "Maximum number of SGL entries exceeded\n");
> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index 51824d1f23..e4376b9613 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
>         struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
>         int result = -EBUSY;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         if (!nvmeq) {
>                 put_nvmeq(NULL);
>                 bio_endio(bio, -EIO);
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index a2af73db18..a37acf722b 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -2444,6 +2444,10 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>         char b[BDEVNAME_SIZE];
>         struct bio *split;
>
> +       blk_queue_bounce(q, &bio);
> +
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         pd = q->queuedata;
>         if (!pd) {
>                 pr_err("%s incorrect request queue\n",
> @@ -2474,8 +2478,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>                 goto end_io;
>         }
>
> -       blk_queue_bounce(q, &bio);
> -
>         do {
>                 sector_t zone = get_zone(bio->bi_iter.bi_sector, pd);
>                 sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd);
> diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
> index ef45cfb98f..a995972961 100644
> --- a/drivers/block/ps3vram.c
> +++ b/drivers/block/ps3vram.c
> @@ -603,6 +603,8 @@ static void ps3vram_make_request(struct request_queue *q, struct bio *bio)
>         struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>         int busy;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         dev_dbg(&dev->core, "%s\n", __func__);
>
>         spin_lock_irq(&priv->lock);
> diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
> index 2839d37e5a..ff074a3cd4 100644
> --- a/drivers/block/rsxx/dev.c
> +++ b/drivers/block/rsxx/dev.c
> @@ -169,6 +169,8 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
>         struct rsxx_bio_meta *bio_meta;
>         int st = -EINVAL;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         might_sleep();
>
>         if (!card)
> diff --git a/drivers/block/umem.c b/drivers/block/umem.c
> index 4cf81b5bf0..13d577cfbc 100644
> --- a/drivers/block/umem.c
> +++ b/drivers/block/umem.c
> @@ -531,6 +531,8 @@ static void mm_make_request(struct request_queue *q, struct bio *bio)
>                  (unsigned long long)bio->bi_iter.bi_sector,
>                  bio->bi_iter.bi_size);
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         spin_lock_irq(&card->lock);
>         *card->biotail = bio;
>         bio->bi_next = NULL;
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 011e55d820..ecf9daa01c 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -733,6 +733,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>         struct zram *zram = queue->queuedata;
>
> +       blk_queue_split(queue, &bio, queue->bio_split);
> +
>         down_read(&zram->init_lock);
>         if (unlikely(!zram->init_done))
>                 goto error;
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 8c53b09b9a..97f70420f2 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1500,6 +1500,8 @@ static void dm_request(struct request_queue *q, struct bio *bio)
>  {
>         struct mapped_device *md = q->queuedata;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         if (dm_request_based(md))
>                 blk_queue_bio(q, bio);
>         else
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4ad5cc4e63..1421bc3f7b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -256,6 +256,8 @@ static void md_make_request(struct request_queue *q, struct bio *bio)
>         int cpu;
>         unsigned int sectors;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         if (mddev == NULL || mddev->pers == NULL
>             || !mddev->ready) {
>                 bio_io_error(bio);
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index ebf41e228e..db33cd3e4c 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -815,6 +815,8 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
>         unsigned long source_addr;
>         unsigned long bytes_done;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         bytes_done = 0;
>         dev_info = bio->bi_bdev->bd_disk->private_data;
>         if (dev_info == NULL)
> diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
> index 6969d39f1e..f03c103f13 100644
> --- a/drivers/s390/block/xpram.c
> +++ b/drivers/s390/block/xpram.c
> @@ -190,6 +190,8 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
>         unsigned long page_addr;
>         unsigned long bytes;
>
> +       blk_queue_split(q, &bio, q->bio_split);
> +
>         if ((bio->bi_iter.bi_sector & 7) != 0 ||
>             (bio->bi_iter.bi_size & 4095) != 0)
>                 /* Request is not page-aligned. */
> diff --git a/drivers/staging/lustre/lustre/llite/lloop.c b/drivers/staging/lustre/lustre/llite/lloop.c
> index 0718905ade..a3f6dc930b 100644
> --- a/drivers/staging/lustre/lustre/llite/lloop.c
> +++ b/drivers/staging/lustre/lustre/llite/lloop.c
> @@ -344,6 +344,8 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>         int rw = bio_rw(old_bio);
>         int inactive;
>
> +       blk_queue_split(q, &old_bio, q->bio_split);
> +
>         if (!lo)
>                 goto err;
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1e1fa3f93d..99e9955c4d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -470,6 +470,7 @@ struct request_queue {
>         wait_queue_head_t       mq_freeze_wq;
>         struct percpu_counter   mq_usage_counter;
>         struct list_head        all_q_node;
> +       struct bio_set          *bio_split;
>  };
>
>  #define QUEUE_FLAG_QUEUED      1       /* uses generic tag queueing */
> @@ -781,6 +782,8 @@ extern void blk_rq_unprep_clone(struct request *rq);
>  extern int blk_insert_cloned_request(struct request_queue *q,
>                                      struct request *rq);
>  extern void blk_delay_queue(struct request_queue *, unsigned long);
> +extern void blk_queue_split(struct request_queue *, struct bio **,
> +                           struct bio_set *);
>  extern void blk_recount_segments(struct request_queue *, struct bio *);
>  extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
>  extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios
  2014-03-02 20:31   ` Muthu Kumar
@ 2014-03-02 20:50     ` Muthu Kumar
  0 siblings, 0 replies; 16+ messages in thread
From: Muthu Kumar @ 2014-03-02 20:50 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Jens Axboe, linux-kernel, linux-fsdevel, Neil Brown,
	Alasdair Kergon, device-mapper development, Lars Ellenberg,
	drbd-user, Asai Thambi S P, Sam Bradshaw, Matthew Wilcox,
	linux-nvme, Jiri Kosina, Geoff Levand, Jim Paris, Joshua Morris,
	Philip Kelleher, Minchan Kim, Nitin Gupta, Martin Schwidefsky,
	Heiko Carstens, Peng Tao

Never mind...

The following code covers it:


+       if (split) {
+               bio_chain(split, *bio);
+               generic_make_request(*bio);
+               *bio = split;
+       }


My other question is, can we avoid calling the queue_split from
individual drivers make_request()? Can we move the functionality into
generic_make_request()?

Thanks.

Regards,
Muthu


On Sun, Mar 2, 2014 at 12:31 PM, Muthu Kumar <muthu.lkml@gmail.com> wrote:
> Kent,
> The blk_queue_split(), splits a bio into at most two bios right? So,
> if the original bio spans larger space than two bios can cover
> (restriction by the lower driver in the stack), this might not work?
> Am I reading it incorrectly?
>
> Thanks!
>
> Regards,
> Muthu
>
>
>
> On Wed, Feb 26, 2014 at 3:39 PM, Kent Overstreet <kmo@daterainc.com> wrote:
>> The way the block layer is currently written, it goes to great lengths
>> to avoid having to split bios; upper layer code (such as bio_add_page())
>> checks what the underlying device can handle and tries to always create
>> bios that don't need to be split.
>>
>> But this approach becomes unwieldy and eventually breaks down with
>> stacked devices and devices with dynamic limits, and it adds a lot of
>> complexity. If the block layer could split bios as needed, we could
>> eliminate a lot of complexity elsewhere - particularly in stacked
>> drivers. Code that creates bios can then create whatever size bios are
>> convenient, and more importantly stacked drivers don't have to deal with
>> both their own bio size limitations and the limitations of the
>> (potentially multiple) devices underneath them.  In the future this will
>> let us delete merge_bvec_fn and a bunch of other code.
>>
>> We do this by adding calls to blk_queue_split() to the various
>> make_request functions that need it - a few can already handle arbitrary
>> size bios. Note that we add the call _after_ any call to blk_queue_bounce();
>> this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be
>> concerned with bouncing affecting segment merging.
>>
>> Some make_request_fns were simple enough to audit and verify they don't
>> need blk_queue_split() calls. The skipped ones are:
>>
>>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
>>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>>  * brd_make_request (ramdisk - drivers/block/brd.c)
>>  * loop_make_request
>>  * null_queue_bio
>>  * bcache's make_request fns
>>
>> Some others are almost certainly safe to remove now, but will be left for future
>> patches.
>>
>> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> Cc: Neil Brown <neilb@suse.de>
>> Cc: Alasdair Kergon <agk@redhat.com>
>> Cc: dm-devel@redhat.com
>> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
>> Cc: drbd-user@lists.linbit.com
>> Cc: Asai Thambi S P <asamymuthupa@micron.com>
>> Cc: Sam Bradshaw <sbradshaw@micron.com>
>> Cc: Matthew Wilcox <willy@linux.intel.com>
>> Cc: linux-nvme@lists.infradead.org
>> Cc: Jiri Kosina <jkosina@suse.cz>
>> Cc: Geoff Levand <geoff@infradead.org>
>> Cc: Jim Paris <jim@jtan.com>
>> Cc: Joshua Morris <josh.h.morris@us.ibm.com>
>> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Nitin Gupta <ngupta@vflare.org>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Peng Tao <bergwolf@gmail.com>
>> ---
>>  block/blk-core.c                            |  19 ++--
>>  block/blk-merge.c                           | 150 ++++++++++++++++++++++++++--
>>  block/blk-mq.c                              |   2 +
>>  drivers/block/drbd/drbd_req.c               |   2 +
>>  drivers/block/mtip32xx/mtip32xx.c           |   6 +-
>>  drivers/block/nvme-core.c                   |   2 +
>>  drivers/block/pktcdvd.c                     |   6 +-
>>  drivers/block/ps3vram.c                     |   2 +
>>  drivers/block/rsxx/dev.c                    |   2 +
>>  drivers/block/umem.c                        |   2 +
>>  drivers/block/zram/zram_drv.c               |   2 +
>>  drivers/md/dm.c                             |   2 +
>>  drivers/md/md.c                             |   2 +
>>  drivers/s390/block/dcssblk.c                |   2 +
>>  drivers/s390/block/xpram.c                  |   2 +
>>  drivers/staging/lustre/lustre/llite/lloop.c |   2 +
>>  include/linux/blkdev.h                      |   3 +
>>  17 files changed, 185 insertions(+), 23 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 853f927492..d3b0782ec3 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -581,6 +581,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>>         if (q->id < 0)
>>                 goto fail_c;
>>
>> +       q->bio_split = bioset_create(4, 0);
>> +       if (!q->bio_split)
>> +               goto fail_id;
>> +
>>         q->backing_dev_info.ra_pages =
>>                         (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
>>         q->backing_dev_info.state = 0;
>> @@ -590,7 +594,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>>
>>         err = bdi_init(&q->backing_dev_info);
>>         if (err)
>> -               goto fail_id;
>> +               goto fail_split;
>>
>>         setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
>>                     laptop_mode_timer_fn, (unsigned long) q);
>> @@ -635,6 +639,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>>
>>  fail_bdi:
>>         bdi_destroy(&q->backing_dev_info);
>> +fail_split:
>> +       bioset_free(q->bio_split);
>>  fail_id:
>>         ida_simple_remove(&blk_queue_ida, q->id);
>>  fail_c:
>> @@ -1501,6 +1507,8 @@ void blk_queue_bio(struct request_queue *q, struct bio *bio)
>>         struct request *req;
>>         unsigned int request_count = 0;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         /*
>>          * low level driver can indicate that it wants pages above a
>>          * certain limit bounced to low memory (ie for highmem, or even
>> @@ -1723,15 +1731,6 @@ generic_make_request_checks(struct bio *bio)
>>                 goto end_io;
>>         }
>>
>> -       if (likely(bio_is_rw(bio) &&
>> -                  nr_sectors > queue_max_hw_sectors(q))) {
>> -               printk(KERN_ERR "bio too big device %s (%u > %u)\n",
>> -                      bdevname(bio->bi_bdev, b),
>> -                      bio_sectors(bio),
>> -                      queue_max_hw_sectors(q));
>> -               goto end_io;
>> -       }
>> -
>>         part = bio->bi_bdev->bd_part;
>>         if (should_fail_request(part, bio->bi_iter.bi_size) ||
>>             should_fail_request(&part_to_disk(part)->part0,
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 6c583f9c5b..0afbe3f1c2 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -9,11 +9,149 @@
>>
>>  #include "blk.h"
>>
>> +static struct bio *blk_bio_discard_split(struct request_queue *q,
>> +                                        struct bio *bio,
>> +                                        struct bio_set *bs)
>> +{
>> +       unsigned int max_discard_sectors, granularity;
>> +       int alignment;
>> +       sector_t tmp;
>> +       unsigned split_sectors;
>> +
>> +       /* Zero-sector (unknown) and one-sector granularities are the same.  */
>> +       granularity = max(q->limits.discard_granularity >> 9, 1U);
>> +
>> +       max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
>> +       max_discard_sectors -= max_discard_sectors % granularity;
>> +
>> +       if (unlikely(!max_discard_sectors)) {
>> +               /* XXX: warn */
>> +               return NULL;
>> +       }
>> +
>> +       if (bio_sectors(bio) <= max_discard_sectors)
>> +               return NULL;
>> +
>> +       split_sectors = max_discard_sectors;
>> +
>> +       /*
>> +        * If the next starting sector would be misaligned, stop the discard at
>> +        * the previous aligned sector.
>> +        */
>> +       alignment = (q->limits.discard_alignment >> 9) % granularity;
>> +
>> +       tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
>> +       tmp = sector_div(tmp, granularity);
>> +
>> +       if (split_sectors > tmp)
>> +               split_sectors -= tmp;
>> +
>> +       return bio_split(bio, split_sectors, GFP_NOIO, bs);
>> +}
>> +
>> +static struct bio *blk_bio_write_same_split(struct request_queue *q,
>> +                                           struct bio *bio,
>> +                                           struct bio_set *bs)
>> +{
>> +       if (!q->limits.max_write_same_sectors)
>> +               return NULL;
>> +
>> +       if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
>> +               return NULL;
>> +
>> +       return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
>> +}
>> +
>> +static struct bio *blk_bio_segment_split(struct request_queue *q,
>> +                                        struct bio *bio,
>> +                                        struct bio_set *bs)
>> +{
>> +       struct bio *split;
>> +       struct bio_vec bv, bvprv;
>> +       struct bvec_iter iter;
>> +       unsigned seg_size = 0, nsegs = 0;
>> +       int prev = 0;
>> +
>> +       struct bvec_merge_data bvm = {
>> +               .bi_bdev        = bio->bi_bdev,
>> +               .bi_sector      = bio->bi_iter.bi_sector,
>> +               .bi_size        = 0,
>> +               .bi_rw          = bio->bi_rw,
>> +       };
>> +
>> +       bio_for_each_segment(bv, bio, iter) {
>> +               if (q->merge_bvec_fn &&
>> +                   q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
>> +                       goto split;
>> +
>> +               bvm.bi_size += bv.bv_len;
>> +
>> +               if (bvm.bi_size >> 9 > queue_max_sectors(q))
>> +                       goto split;
>> +
>> +               if (prev && blk_queue_cluster(q)) {
>> +                       if (seg_size + bv.bv_len > queue_max_segment_size(q))
>> +                               goto new_segment;
>> +                       if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
>> +                               goto new_segment;
>> +                       if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
>> +                               goto new_segment;
>> +
>> +                       seg_size += bv.bv_len;
>> +                       bvprv = bv;
>> +                       prev = 1;
>> +                       continue;
>> +               }
>> +new_segment:
>> +               if (nsegs == queue_max_segments(q))
>> +                       goto split;
>> +
>> +               nsegs++;
>> +               bvprv = bv;
>> +               prev = 1;
>> +               seg_size = bv.bv_len;
>> +       }
>> +
>> +       return NULL;
>> +split:
>> +       split = bio_clone_bioset(bio, GFP_NOIO, bs);
>> +
>> +       split->bi_iter.bi_size -= iter.bi_size;
>> +       bio->bi_iter = iter;
>> +
>> +       if (bio_integrity(bio)) {
>> +               bio_integrity_advance(bio, split->bi_iter.bi_size);
>> +               bio_integrity_trim(split, 0, bio_sectors(split));
>> +       }
>> +
>> +       return split;
>> +}
>> +
>> +void blk_queue_split(struct request_queue *q, struct bio **bio,
>> +                    struct bio_set *bs)
>> +{
>> +       struct bio *split;
>> +
>> +       if ((*bio)->bi_rw & REQ_DISCARD)
>> +               split = blk_bio_discard_split(q, *bio, bs);
>> +       else if ((*bio)->bi_rw & REQ_WRITE_SAME)
>> +               split = blk_bio_write_same_split(q, *bio, bs);
>> +       else
>> +               split = blk_bio_segment_split(q, *bio, q->bio_split);
>> +
>> +       if (split) {
>> +               bio_chain(split, *bio);
>> +               generic_make_request(*bio);
>> +               *bio = split;
>> +       }
>> +}
>> +EXPORT_SYMBOL(blk_queue_split);
>> +
>>  static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>>                                              struct bio *bio)
>>  {
>>         struct bio_vec bv, bvprv = { NULL };
>> -       int cluster, high, highprv = 1;
>> +       int cluster, prev = 0;
>>         unsigned int seg_size, nr_phys_segs;
>>         struct bio *fbio, *bbio;
>>         struct bvec_iter iter;
>> @@ -37,13 +175,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
>>         nr_phys_segs = 0;
>>         for_each_bio(bio) {
>>                 bio_for_each_segment(bv, bio, iter) {
>> -                       /*
>> -                        * the trick here is making sure that a high page is
>> -                        * never considered part of another segment, since that
>> -                        * might change with the bounce page.
>> -                        */
>> -                       high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
>> -                       if (!high && !highprv && cluster) {
>> +                       if (prev && cluster) {
>>                                 if (seg_size + bv.bv_len
>>                                     > queue_max_segment_size(q))
>>                                         goto new_segment;
>> @@ -63,8 +195,8 @@ new_segment:
>>
>>                         nr_phys_segs++;
>>                         bvprv = bv;
>> +                       prev = 1;
>>                         seg_size = bv.bv_len;
>> -                       highprv = high;
>>                 }
>>                 bbio = bio;
>>         }
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 6468a715a0..7893e254d8 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -915,6 +915,8 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>                 return;
>>         }
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         if (use_plug && blk_attempt_plug_merge(q, bio, &request_count))
>>                 return;
>>
>> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
>> index 104a040f24..941a69c50c 100644
>> --- a/drivers/block/drbd/drbd_req.c
>> +++ b/drivers/block/drbd/drbd_req.c
>> @@ -1275,6 +1275,8 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
>>         struct drbd_conf *mdev = (struct drbd_conf *) q->queuedata;
>>         unsigned long start_time;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         start_time = jiffies;
>>
>>         /*
>> diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
>> index 516026954b..df733ca685 100644
>> --- a/drivers/block/mtip32xx/mtip32xx.c
>> +++ b/drivers/block/mtip32xx/mtip32xx.c
>> @@ -4033,6 +4033,10 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
>>         int nents = 0;
>>         int tag = 0, unaligned = 0;
>>
>> +       blk_queue_bounce(queue, &bio);
>> +
>> +       blk_queue_split(queue, &bio, queue->bio_split);
>> +
>>         if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) {
>>                 if (unlikely(test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
>>                                                         &dd->dd_flag))) {
>> @@ -4082,8 +4086,6 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
>>
>>         sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
>>         if (likely(sg != NULL)) {
>> -               blk_queue_bounce(queue, &bio);
>> -
>>                 if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
>>                         dev_warn(&dd->pdev->dev,
>>                                 "Maximum number of SGL entries exceeded\n");
>> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
>> index 51824d1f23..e4376b9613 100644
>> --- a/drivers/block/nvme-core.c
>> +++ b/drivers/block/nvme-core.c
>> @@ -737,6 +737,8 @@ static void nvme_make_request(struct request_queue *q, struct bio *bio)
>>         struct nvme_queue *nvmeq = get_nvmeq(ns->dev);
>>         int result = -EBUSY;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         if (!nvmeq) {
>>                 put_nvmeq(NULL);
>>                 bio_endio(bio, -EIO);
>> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
>> index a2af73db18..a37acf722b 100644
>> --- a/drivers/block/pktcdvd.c
>> +++ b/drivers/block/pktcdvd.c
>> @@ -2444,6 +2444,10 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>>         char b[BDEVNAME_SIZE];
>>         struct bio *split;
>>
>> +       blk_queue_bounce(q, &bio);
>> +
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         pd = q->queuedata;
>>         if (!pd) {
>>                 pr_err("%s incorrect request queue\n",
>> @@ -2474,8 +2478,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
>>                 goto end_io;
>>         }
>>
>> -       blk_queue_bounce(q, &bio);
>> -
>>         do {
>>                 sector_t zone = get_zone(bio->bi_iter.bi_sector, pd);
>>                 sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd);
>> diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
>> index ef45cfb98f..a995972961 100644
>> --- a/drivers/block/ps3vram.c
>> +++ b/drivers/block/ps3vram.c
>> @@ -603,6 +603,8 @@ static void ps3vram_make_request(struct request_queue *q, struct bio *bio)
>>         struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>>         int busy;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         dev_dbg(&dev->core, "%s\n", __func__);
>>
>>         spin_lock_irq(&priv->lock);
>> diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
>> index 2839d37e5a..ff074a3cd4 100644
>> --- a/drivers/block/rsxx/dev.c
>> +++ b/drivers/block/rsxx/dev.c
>> @@ -169,6 +169,8 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
>>         struct rsxx_bio_meta *bio_meta;
>>         int st = -EINVAL;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         might_sleep();
>>
>>         if (!card)
>> diff --git a/drivers/block/umem.c b/drivers/block/umem.c
>> index 4cf81b5bf0..13d577cfbc 100644
>> --- a/drivers/block/umem.c
>> +++ b/drivers/block/umem.c
>> @@ -531,6 +531,8 @@ static void mm_make_request(struct request_queue *q, struct bio *bio)
>>                  (unsigned long long)bio->bi_iter.bi_sector,
>>                  bio->bi_iter.bi_size);
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         spin_lock_irq(&card->lock);
>>         *card->biotail = bio;
>>         bio->bi_next = NULL;
>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> index 011e55d820..ecf9daa01c 100644
>> --- a/drivers/block/zram/zram_drv.c
>> +++ b/drivers/block/zram/zram_drv.c
>> @@ -733,6 +733,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
>>  {
>>         struct zram *zram = queue->queuedata;
>>
>> +       blk_queue_split(queue, &bio, queue->bio_split);
>> +
>>         down_read(&zram->init_lock);
>>         if (unlikely(!zram->init_done))
>>                 goto error;
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 8c53b09b9a..97f70420f2 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1500,6 +1500,8 @@ static void dm_request(struct request_queue *q, struct bio *bio)
>>  {
>>         struct mapped_device *md = q->queuedata;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         if (dm_request_based(md))
>>                 blk_queue_bio(q, bio);
>>         else
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 4ad5cc4e63..1421bc3f7b 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -256,6 +256,8 @@ static void md_make_request(struct request_queue *q, struct bio *bio)
>>         int cpu;
>>         unsigned int sectors;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         if (mddev == NULL || mddev->pers == NULL
>>             || !mddev->ready) {
>>                 bio_io_error(bio);
>> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
>> index ebf41e228e..db33cd3e4c 100644
>> --- a/drivers/s390/block/dcssblk.c
>> +++ b/drivers/s390/block/dcssblk.c
>> @@ -815,6 +815,8 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
>>         unsigned long source_addr;
>>         unsigned long bytes_done;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         bytes_done = 0;
>>         dev_info = bio->bi_bdev->bd_disk->private_data;
>>         if (dev_info == NULL)
>> diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
>> index 6969d39f1e..f03c103f13 100644
>> --- a/drivers/s390/block/xpram.c
>> +++ b/drivers/s390/block/xpram.c
>> @@ -190,6 +190,8 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
>>         unsigned long page_addr;
>>         unsigned long bytes;
>>
>> +       blk_queue_split(q, &bio, q->bio_split);
>> +
>>         if ((bio->bi_iter.bi_sector & 7) != 0 ||
>>             (bio->bi_iter.bi_size & 4095) != 0)
>>                 /* Request is not page-aligned. */
>> diff --git a/drivers/staging/lustre/lustre/llite/lloop.c b/drivers/staging/lustre/lustre/llite/lloop.c
>> index 0718905ade..a3f6dc930b 100644
>> --- a/drivers/staging/lustre/lustre/llite/lloop.c
>> +++ b/drivers/staging/lustre/lustre/llite/lloop.c
>> @@ -344,6 +344,8 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>>         int rw = bio_rw(old_bio);
>>         int inactive;
>>
>> +       blk_queue_split(q, &old_bio, q->bio_split);
>> +
>>         if (!lo)
>>                 goto err;
>>
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 1e1fa3f93d..99e9955c4d 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -470,6 +470,7 @@ struct request_queue {
>>         wait_queue_head_t       mq_freeze_wq;
>>         struct percpu_counter   mq_usage_counter;
>>         struct list_head        all_q_node;
>> +       struct bio_set          *bio_split;
>>  };
>>
>>  #define QUEUE_FLAG_QUEUED      1       /* uses generic tag queueing */
>> @@ -781,6 +782,8 @@ extern void blk_rq_unprep_clone(struct request *rq);
>>  extern int blk_insert_cloned_request(struct request_queue *q,
>>                                      struct request *rq);
>>  extern void blk_delay_queue(struct request_queue *, unsigned long);
>> +extern void blk_queue_split(struct request_queue *, struct bio **,
>> +                           struct bio_set *);
>>  extern void blk_recount_segments(struct request_queue *, struct bio *);
>>  extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
>>  extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
>> --
>> 1.9.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-03-02 20:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 23:39 Make generic_make_request() handle arbitrary size bios Kent Overstreet
2014-02-26 23:39 ` [PATCH 1/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
2014-02-27 17:22   ` Matthew Wilcox
2014-02-27 21:27     ` Kent Overstreet
2014-02-28 23:30     ` Kent Overstreet
2014-03-01 17:52       ` Keith Busch
2014-03-02 20:31   ` Muthu Kumar
2014-03-02 20:50     ` Muthu Kumar
2014-02-26 23:39 ` [PATCH 2/9] block: Gut bio_add_page() Kent Overstreet
2014-02-26 23:39 ` [PATCH 3/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
2014-02-26 23:39 ` [PATCH 4/9] bcache: " Kent Overstreet
2014-02-26 23:39 ` [PATCH 5/9] btrfs: generic_make_request() handles arbitrary size " Kent Overstreet
2014-02-26 23:39 ` [PATCH 6/9] btrfs: Convert to bio_for_each_segment() Kent Overstreet
2014-02-26 23:39 ` [PATCH 7/9] iov_iter: Move iov_iter to uio.h Kent Overstreet
2014-02-26 23:39 ` [PATCH 8/9] iov_iter: Kill iov_iter_single_seg_count() Kent Overstreet
2014-02-26 23:39 ` [PATCH 9/9] iov_iter: Kill written arg to iov_iter_init() Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).