linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Block layer stuff/DIO rewrite prep for 3.14
@ 2013-11-04 23:36 Kent Overstreet
  2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch

Now that immutable biovecs is in, these are the remaining patches required for
my DIO rewrite, along with some related cleanup/refactoring.

The key enabler is patch 4 - making generic_make_request() handle arbitary sized
bios. This takes what was once bio_add_page()'s responsibility and pushes it
down; long term plan is to hopefully push this down to the driver level, which
should simplify a lot of code (a lot of it very fragile code!) and improve
performance at the same time.

The DIO rewrite needs some more work before it'll be ready, but I wanted to get
this patch series out because this stuff should all be ready and it's useful for
other reasons - I think this stuff will make it a lot easier for the btrfs
people to do what they need with their DIO code.

Other stuff enabled by this:

 * With this and the bio_split() rewrite already in, we can just delete
   merge_bvec_fn. I have patches for this, but they'll need more testing.

 * Multipage bvecs - bvecs pointing to an arbitrary amount of contiguous
   physical memory. I have this working but it'll need more testing and code
   auditing - this is what lets us kill bi_seg_front_size and bi_seg_back_size
   though.

Patch series is based on Jens' for-next tree, and it's available in my git
repository - git://evilpiepirate.org/~kent/linux-bcache.git for-jens


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] block: Convert various code to bio_for_each_segment()
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-05 13:53   ` Josef Bacik
  2013-11-07 11:26   ` Jan Kara
  2013-11-04 23:36 ` [PATCH 2/9] block: submit_bio_wait() conversions Kent Overstreet
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs
  Cc: axboe, hch, Kent Overstreet, Alexander Viro, Chris Mason,
	Jaegeuk Kim, Joern Engel, Prasad Joshi, Trond Myklebust

With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/btrfs/compression.c           | 10 ++++------
 fs/btrfs/disk-io.c               | 11 ++++-------
 fs/btrfs/extent_io.c             | 37 ++++++++++++++-----------------------
 fs/btrfs/inode.c                 | 15 ++++++---------
 fs/f2fs/data.c                   | 13 +++++--------
 fs/f2fs/segment.c                | 12 +++++-------
 fs/logfs/dev_bdev.c              | 18 +++++++-----------
 fs/mpage.c                       | 17 ++++++++---------
 fs/nfs/blocklayout/blocklayout.c | 34 +++++++++++++---------------------
 9 files changed, 66 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 06ab821..52e7848 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -203,18 +203,16 @@ csum_failed:
 	if (cb->errors) {
 		bio_io_error(cb->orig_bio);
 	} else {
-		int bio_index = 0;
-		struct bio_vec *bvec = cb->orig_bio->bi_io_vec;
+		int i;
+		struct bio_vec *bvec;
 
 		/*
 		 * we have verified the checksum already, set page
 		 * checked so the end_io handlers know about it
 		 */
-		while (bio_index < cb->orig_bio->bi_vcnt) {
+		bio_for_each_segment_all(bvec, cb->orig_bio, i)
 			SetPageChecked(bvec->bv_page);
-			bvec++;
-			bio_index++;
-		}
+
 		bio_endio(cb->orig_bio, 0);
 	}
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..733182e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -850,20 +850,17 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
 
 static int btree_csum_one_bio(struct bio *bio)
 {
-	struct bio_vec *bvec = bio->bi_io_vec;
-	int bio_index = 0;
+	struct bio_vec *bvec;
 	struct btrfs_root *root;
-	int ret = 0;
+	int i, ret = 0;
 
-	WARN_ON(bio->bi_vcnt <= 0);
-	while (bio_index < bio->bi_vcnt) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
 		ret = csum_dirty_buffer(root, bvec->bv_page);
 		if (ret)
 			break;
-		bio_index++;
-		bvec++;
 	}
+
 	return ret;
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0df176a..ea5a08b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2014,7 +2014,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
 	}
 	bio->bi_bdev = dev->bdev;
 	bio_add_page(bio, page, length, start - page_offset(page));
-	btrfsic_submit_bio(WRITE_SYNC, bio);
+	btrfsic_submit_bio(WRITE_SYNC, bio); /* XXX: submit_bio_wait() */
 	wait_for_completion(&compl);
 
 	if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
@@ -2340,12 +2340,13 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
  */
 static void end_bio_extent_writepage(struct bio *bio, int err)
 {
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec;
 	struct extent_io_tree *tree;
 	u64 start;
 	u64 end;
+	int i;
 
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 		tree = &BTRFS_I(page->mapping->host)->io_tree;
 
@@ -2363,14 +2364,11 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
 		start = page_offset(page);
 		end = start + bvec->bv_offset + bvec->bv_len - 1;
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
-
 		if (end_extent_writepage(page, err, start, end))
 			continue;
 
 		end_page_writeback(page);
-	} while (bvec >= bio->bi_io_vec);
+	}
 
 	bio_put(bio);
 }
@@ -2400,9 +2398,8 @@ endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
  */
 static void end_bio_extent_readpage(struct bio *bio, int err)
 {
+	struct bio_vec *bvec;
 	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
-	struct bio_vec *bvec = bio->bi_io_vec;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree;
 	u64 offset = 0;
@@ -2413,11 +2410,12 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
 	u64 extent_len = 0;
 	int mirror;
 	int ret;
+	int i;
 
 	if (err)
 		uptodate = 0;
 
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 
@@ -2441,9 +2439,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
 		end = start + bvec->bv_offset + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
-		if (++bvec <= bvec_end)
-			prefetchw(&bvec->bv_page->flags);
-
 		mirror = io_bio->mirror_num;
 		if (likely(uptodate && tree->ops &&
 			   tree->ops->readpage_end_io_hook)) {
@@ -2524,7 +2519,7 @@ readpage_ok:
 			extent_start = start;
 			extent_len = end + 1 - start;
 		}
-	} while (bvec <= bvec_end);
+	}
 
 	if (extent_len)
 		endio_readpage_release_extent(tree, extent_start, extent_len,
@@ -2555,7 +2550,6 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
 	}
 
 	if (bio) {
-		bio->bi_iter.bi_size = 0;
 		bio->bi_bdev = bdev;
 		bio->bi_iter.bi_sector = first_sector;
 		btrfs_bio = btrfs_io_bio(bio);
@@ -3418,20 +3412,18 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
 
 static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
 {
-	int uptodate = err == 0;
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec;
 	struct extent_buffer *eb;
-	int done;
+	int i, done;
 
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 
-		bvec--;
 		eb = (struct extent_buffer *)page->private;
 		BUG_ON(!eb);
 		done = atomic_dec_and_test(&eb->io_pages);
 
-		if (!uptodate || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) {
+		if (err || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) {
 			set_bit(EXTENT_BUFFER_IOERR, &eb->bflags);
 			ClearPageUptodate(page);
 			SetPageError(page);
@@ -3443,10 +3435,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
 			continue;
 
 		end_extent_buffer_writeback(eb);
-	} while (bvec >= bio->bi_io_vec);
+	}
 
 	bio_put(bio);
-
 }
 
 static int write_one_eb(struct extent_buffer *eb,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6f5a64d..b7209a6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6765,17 +6765,16 @@ unlock_err:
 static void btrfs_endio_direct_read(struct bio *bio, int err)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
-	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
-	struct bio_vec *bvec = bio->bi_io_vec;
+	struct bio_vec *bvec;
 	struct inode *inode = dip->inode;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct bio *dio_bio;
 	u32 *csums = (u32 *)dip->csum;
-	int index = 0;
 	u64 start;
+	int i;
 
 	start = dip->logical_offset;
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
 			struct page *page = bvec->bv_page;
 			char *kaddr;
@@ -6791,18 +6790,16 @@ static void btrfs_endio_direct_read(struct bio *bio, int err)
 			local_irq_restore(flags);
 
 			flush_dcache_page(bvec->bv_page);
-			if (csum != csums[index]) {
+			if (csum != csums[i]) {
 				btrfs_err(root->fs_info, "csum failed ino %llu off %llu csum %u expected csum %u",
 					  btrfs_ino(inode), start, csum,
-					  csums[index]);
+					  csums[i]);
 				err = -EIO;
 			}
 		}
 
 		start += bvec->bv_len;
-		bvec++;
-		index++;
-	} while (bvec <= bvec_end);
+	}
 
 	unlock_extent(&BTRFS_I(inode)->io_tree, dip->logical_offset,
 		      dip->logical_offset + dip->bytes - 1);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 97d8b34..dd02271 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -357,23 +357,20 @@ repeat:
 
 static void read_end_io(struct bio *bio, int err)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec;
+	int i;
 
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
-
-		if (uptodate) {
+		if (!err) {
 			SetPageUptodate(page);
 		} else {
 			ClearPageUptodate(page);
 			SetPageError(page);
 		}
 		unlock_page(page);
-	} while (bvec >= bio->bi_io_vec);
+	}
 	bio_put(bio);
 }
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9d77ce1..4382c90 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -575,16 +575,14 @@ static const struct segment_allocation default_salloc_ops = {
 
 static void f2fs_end_io_write(struct bio *bio, int err)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
 	struct bio_private *p = bio->bi_private;
+	struct bio_vec *bvec;
+	int i;
 
-	do {
+	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
-		if (!uptodate) {
+		if (err) {
 			SetPageError(page);
 			if (page->mapping)
 				set_bit(AS_EIO, &page->mapping->flags);
@@ -593,7 +591,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 		}
 		end_page_writeback(page);
 		dec_page_count(p->sbi, F2FS_WRITEBACK);
-	} while (bvec >= bio->bi_io_vec);
+	}
 
 	if (p->is_sync)
 		complete(p->wait);
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index a1b161f..ca42715 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -67,22 +67,18 @@ static DECLARE_WAIT_QUEUE_HEAD(wq);
 static void writeseg_end_io(struct bio *bio, int err)
 {
 	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec;
+	int i;
 	struct super_block *sb = bio->bi_private;
 	struct logfs_super *super = logfs_super(sb);
-	struct page *page;
 
 	BUG_ON(!uptodate); /* FIXME: Retry io or write elsewhere */
 	BUG_ON(err);
-	BUG_ON(bio->bi_vcnt == 0);
-	do {
-		page = bvec->bv_page;
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
-
-		end_page_writeback(page);
-		page_cache_release(page);
-	} while (bvec >= bio->bi_io_vec);
+
+	bio_for_each_segment_all(bvec, bio, i) {
+		end_page_writeback(bvec->bv_page);
+		page_cache_release(bvec->bv_page);
+	}
 	bio_put(bio);
 	if (atomic_dec_and_test(&super->s_pending_writes))
 		wake_up(&wq);
diff --git a/fs/mpage.c b/fs/mpage.c
index 92b125f..4979ffa 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -43,16 +43,14 @@
  */
 static void mpage_end_io(struct bio *bio, int err)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bv;
+	int i;
 
-	do {
-		struct page *page = bvec->bv_page;
+	bio_for_each_segment_all(bv, bio, i) {
+		struct page *page = bv->bv_page;
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
 		if (bio_data_dir(bio) == READ) {
-			if (uptodate) {
+			if (!err) {
 				SetPageUptodate(page);
 			} else {
 				ClearPageUptodate(page);
@@ -60,14 +58,15 @@ static void mpage_end_io(struct bio *bio, int err)
 			}
 			unlock_page(page);
 		} else { /* bio_data_dir(bio) == WRITE */
-			if (!uptodate) {
+			if (err) {
 				SetPageError(page);
 				if (page->mapping)
 					set_bit(AS_EIO, &page->mapping->flags);
 			}
 			end_page_writeback(page);
 		}
-	} while (bvec >= bio->bi_io_vec);
+	}
+
 	bio_put(bio);
 }
 
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index af73896..56ff823 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -202,18 +202,14 @@ static struct bio *bl_add_page_to_bio(struct bio *bio, int npg, int rw,
 static void bl_end_io_read(struct bio *bio, int err)
 {
 	struct parallel_io *par = bio->bi_private;
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec;
+	int i;
 
-	do {
-		struct page *page = bvec->bv_page;
+	if (!err)
+		bio_for_each_segment_all(bvec, bio, i)
+			SetPageUptodate(bvec->bv_page);
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
-		if (uptodate)
-			SetPageUptodate(page);
-	} while (bvec >= bio->bi_io_vec);
-	if (!uptodate) {
+	if (err) {
 		struct nfs_read_data *rdata = par->data;
 		struct nfs_pgio_header *header = rdata->header;
 
@@ -384,20 +380,16 @@ static void mark_extents_written(struct pnfs_block_layout *bl,
 static void bl_end_io_write_zero(struct bio *bio, int err)
 {
 	struct parallel_io *par = bio->bi_private;
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
-
-	do {
-		struct page *page = bvec->bv_page;
+	struct bio_vec *bvec;
+	int i;
 
-		if (--bvec >= bio->bi_io_vec)
-			prefetchw(&bvec->bv_page->flags);
+	bio_for_each_segment_all(bvec, bio, i) {
 		/* This is the zeroing page we added */
-		end_page_writeback(page);
-		page_cache_release(page);
-	} while (bvec >= bio->bi_io_vec);
+		end_page_writeback(bvec->bv_page);
+		page_cache_release(bvec->bv_page);
+	}
 
-	if (unlikely(!uptodate)) {
+	if (unlikely(err)) {
 		struct nfs_write_data *data = par->data;
 		struct nfs_pgio_header *header = data->header;
 
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] block: submit_bio_wait() conversions
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
  2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:36 ` [PATCH 3/9] block: Move bouncing to generic_make_request() Kent Overstreet
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs
  Cc: axboe, hch, Kent Overstreet, Joern Engel, Prasad Joshi

It was being open coded in a few places.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
---
 block/blk-flush.c   | 19 +------------------
 fs/logfs/dev_bdev.c |  8 +-------
 2 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 5580b05..9288aaf 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -502,15 +502,6 @@ void blk_abort_flushes(struct request_queue *q)
 	}
 }
 
-static void bio_end_flush(struct bio *bio, int err)
-{
-	if (err)
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
-	if (bio->bi_private)
-		complete(bio->bi_private);
-	bio_put(bio);
-}
-
 /**
  * blkdev_issue_flush - queue a flush
  * @bdev:	blockdev to issue flush for
@@ -526,7 +517,6 @@ static void bio_end_flush(struct bio *bio, int err)
 int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 		sector_t *error_sector)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q;
 	struct bio *bio;
 	int ret = 0;
@@ -548,13 +538,9 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 		return -ENXIO;
 
 	bio = bio_alloc(gfp_mask, 0);
-	bio->bi_end_io = bio_end_flush;
 	bio->bi_bdev = bdev;
-	bio->bi_private = &wait;
 
-	bio_get(bio);
-	submit_bio(WRITE_FLUSH, bio);
-	wait_for_completion_io(&wait);
+	ret = submit_bio_wait(WRITE_FLUSH, bio);
 
 	/*
 	 * The driver must store the error location in ->bi_sector, if
@@ -564,9 +550,6 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 	if (error_sector)
 		*error_sector = bio->bi_iter.bi_sector;
 
-	if (!bio_flagged(bio, BIO_UPTODATE))
-		ret = -EIO;
-
 	bio_put(bio);
 	return ret;
 }
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index ca42715..80adce7 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -23,7 +23,6 @@ static int sync_request(struct page *page, struct block_device *bdev, int rw)
 {
 	struct bio bio;
 	struct bio_vec bio_vec;
-	struct completion complete;
 
 	bio_init(&bio);
 	bio.bi_max_vecs = 1;
@@ -35,13 +34,8 @@ static int sync_request(struct page *page, struct block_device *bdev, int rw)
 	bio.bi_iter.bi_size = PAGE_SIZE;
 	bio.bi_bdev = bdev;
 	bio.bi_iter.bi_sector = page->index * (PAGE_SIZE >> 9);
-	init_completion(&complete);
-	bio.bi_private = &complete;
-	bio.bi_end_io = request_complete;
 
-	submit_bio(rw, &bio);
-	wait_for_completion(&complete);
-	return test_bit(BIO_UPTODATE, &bio.bi_flags) ? 0 : -EIO;
+	return submit_bio_wait(rw, &bio);
 }
 
 static int bdev_readpage(void *_sb, struct page *page)
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] block: Move bouncing to generic_make_request()
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
  2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
  2013-11-04 23:36 ` [PATCH 2/9] block: submit_bio_wait() conversions Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:36 ` [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs
  Cc: axboe, hch, Kent Overstreet, Jiri Kosina, Asai Thambi S P

Next patch is going to make generic_make_request() handle arbitrary
sized bios by splitting them if necessary. It makes more sense to call
blk_queue_bounce() first, partly so it's working on larger bios - but also the
code that splits bios, and __blk_recalc_rq_segments(), won't have to take into
account bouncing (as it'll already have been done).

Also, __blk_recalc_rq_segments() now doesn't have to take into account
potential bouncing - it's already been done.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
---
 block/blk-core.c                  | 14 +++++++-------
 block/blk-merge.c                 | 13 ++++---------
 drivers/block/mtip32xx/mtip32xx.c |  2 --
 drivers/block/pktcdvd.c           |  2 --
 4 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d9cab97..3c7467e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1466,13 +1466,6 @@ void blk_queue_bio(struct request_queue *q, struct bio *bio)
 	struct request *req;
 	unsigned int request_count = 0;
 
-	/*
-	 * low level driver can indicate that it wants pages above a
-	 * certain limit bounced to low memory (ie for highmem, or even
-	 * ISA dma in theory)
-	 */
-	blk_queue_bounce(q, &bio);
-
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
 		bio_endio(bio, -EIO);
 		return;
@@ -1822,6 +1815,13 @@ void generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
+		/*
+		 * low level driver can indicate that it wants pages above a
+		 * certain limit bounced to low memory (ie for highmem, or even
+		 * ISA dma in theory)
+		 */
+		blk_queue_bounce(q, &bio);
+
 		q->make_request_fn(q, bio);
 
 		bio = bio_list_pop(current->bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 953b8df..9680ec73 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -13,7 +13,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					     struct bio *bio)
 {
 	struct bio_vec bv, bvprv = { NULL };
-	int cluster, high, highprv = 1;
+	int cluster, prev = 0;
 	unsigned int seg_size, nr_phys_segs;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
@@ -27,13 +27,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
 		bio_for_each_segment(bv, bio, iter) {
-			/*
-			 * the trick here is making sure that a high page is
-			 * never considered part of another segment, since that
-			 * might change with the bounce page.
-			 */
-			high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
-			if (!high && !highprv && cluster) {
+			if (prev && cluster) {
 				if (seg_size + bv.bv_len
 				    > queue_max_segment_size(q))
 					goto new_segment;
@@ -44,6 +38,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 
 				seg_size += bv.bv_len;
 				bvprv = bv;
+				prev = 1;
 				continue;
 			}
 new_segment:
@@ -53,8 +48,8 @@ new_segment:
 
 			nr_phys_segs++;
 			bvprv = bv;
+			prev = 1;
 			seg_size = bv.bv_len;
-			highprv = high;
 		}
 		bbio = bio;
 	}
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 52b2f2a..d4c669b 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4016,8 +4016,6 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
 
 	sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
 	if (likely(sg != NULL)) {
-		blk_queue_bounce(queue, &bio);
-
 		if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
 			dev_warn(&dd->pdev->dev,
 				"Maximum number of SGL entries exceeded\n");
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1bf1f22..7991cc8 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2486,8 +2486,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 		goto end_io;
 	}
 
-	blk_queue_bounce(q, &bio);
-
 	do {
 		sector_t zone = get_zone(bio->bi_iter.bi_sector, pd);
 		sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd);
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (2 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 3/9] block: Move bouncing to generic_make_request() Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:56   ` [dm-devel] " Mike Christie
  2013-11-04 23:36 ` [PATCH 5/9] block: Gut bio_add_page(), kill bio_add_pc_page() Kent Overstreet
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs
  Cc: axboe, hch, Kent Overstreet, Neil Brown, Alasdair Kergon, dm-devel

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.

In the future this will let us delete merge_bvec_fn and a bunch of other code.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
---
 block/blk-core.c       |  26 ++++++-----
 block/blk-merge.c      | 120 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk.h            |   3 ++
 include/linux/blkdev.h |   4 ++
 4 files changed, 143 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3c7467e..abc5d23 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -566,6 +566,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_c;
 
+	q->bio_split = bioset_create(4, 0);
+	if (!q->bio_split)
+		goto fail_id;
+
 	q->backing_dev_info.ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
 	q->backing_dev_info.state = 0;
@@ -575,7 +579,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 
 	err = bdi_init(&q->backing_dev_info);
 	if (err)
-		goto fail_id;
+		goto fail_split;
 
 	setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
@@ -620,6 +624,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 
 fail_bdi:
 	bdi_destroy(&q->backing_dev_info);
+fail_split:
+	bioset_free(q->bio_split);
 fail_id:
 	ida_simple_remove(&blk_queue_ida, q->id);
 fail_c:
@@ -1681,15 +1687,6 @@ generic_make_request_checks(struct bio *bio)
 		goto end_io;
 	}
 
-	if (likely(bio_is_rw(bio) &&
-		   nr_sectors > queue_max_hw_sectors(q))) {
-		printk(KERN_ERR "bio too big device %s (%u > %u)\n",
-		       bdevname(bio->bi_bdev, b),
-		       bio_sectors(bio),
-		       queue_max_hw_sectors(q));
-		goto end_io;
-	}
-
 	part = bio->bi_bdev->bd_part;
 	if (should_fail_request(part, bio->bi_iter.bi_size) ||
 	    should_fail_request(&part_to_disk(part)->part0,
@@ -1814,6 +1811,7 @@ void generic_make_request(struct bio *bio)
 	current->bio_list = &bio_list_on_stack;
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+		struct bio *split = NULL;
 
 		/*
 		 * low level driver can indicate that it wants pages above a
@@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
 		 */
 		blk_queue_bounce(q, &bio);
 
+		if (!blk_queue_largebios(q))
+			split = blk_bio_segment_split(q, bio, q->bio_split);
+		if (split) {
+			bio_chain(split, bio);
+			bio_list_add(current->bio_list, bio);
+			bio = split;
+		}
+
 		q->make_request_fn(q, bio);
 
 		bio = bio_list_pop(current->bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9680ec73..6e07213 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,6 +9,126 @@
 
 #include "blk.h"
 
+static struct bio *blk_bio_discard_split(struct request_queue *q,
+					 struct bio *bio,
+					 struct bio_set *bs)
+{
+	unsigned int max_discard_sectors, granularity;
+	int alignment;
+	sector_t tmp;
+	unsigned split_sectors;
+
+	/* Zero-sector (unknown) and one-sector granularities are the same.  */
+	granularity = max(q->limits.discard_granularity >> 9, 1U);
+
+	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+	max_discard_sectors -= max_discard_sectors % granularity;
+
+	if (unlikely(!max_discard_sectors)) {
+		/* XXX: warn */
+		return NULL;
+	}
+
+	if (bio_sectors(bio) <= max_discard_sectors)
+		return NULL;
+
+	split_sectors = max_discard_sectors;
+
+	/*
+	 * If the next starting sector would be misaligned, stop the discard at
+	 * the previous aligned sector.
+	 */
+	alignment = (q->limits.discard_alignment >> 9) % granularity;
+
+	tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
+	tmp = sector_div(tmp, granularity);
+
+	if (split_sectors > tmp)
+		split_sectors -= tmp;
+
+	return bio_split(bio, split_sectors, GFP_NOIO, bs);
+}
+
+static struct bio *blk_bio_write_same_split(struct request_queue *q,
+					    struct bio *bio,
+					    struct bio_set *bs)
+{
+	if (!q->limits.max_write_same_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
+		return NULL;
+
+	return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
+}
+
+struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
+				  struct bio_set *bs)
+{
+	struct bio *split;
+	struct bio_vec bv, bvprv;
+	struct bvec_iter iter;
+	unsigned seg_size = 0, nsegs = 0;
+	int prev = 0;
+
+	struct bvec_merge_data bvm = {
+		.bi_bdev	= bio->bi_bdev,
+		.bi_sector	= bio->bi_iter.bi_sector,
+		.bi_size	= 0,
+		.bi_rw		= bio->bi_rw,
+	};
+
+	if (bio->bi_rw & REQ_DISCARD)
+		return blk_bio_discard_split(q, bio, bs);
+
+	if (bio->bi_rw & REQ_WRITE_SAME)
+		return blk_bio_write_same_split(q, bio, bs);
+
+	bio_for_each_segment(bv, bio, iter) {
+		if (q->merge_bvec_fn &&
+		    q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
+			goto split;
+
+		bvm.bi_size += bv.bv_len;
+
+		if (prev && blk_queue_cluster(q)) {
+			if (seg_size + bv.bv_len > queue_max_segment_size(q))
+				goto new_segment;
+			if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
+				goto new_segment;
+			if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
+				goto new_segment;
+
+			seg_size += bv.bv_len;
+			bvprv = bv;
+			prev = 1;
+			continue;
+		}
+new_segment:
+		if (nsegs == queue_max_segments(q))
+			goto split;
+
+		nsegs++;
+		bvprv = bv;
+		prev = 1;
+		seg_size = bv.bv_len;
+	}
+
+	return NULL;
+split:
+	split = bio_clone_bioset(bio, GFP_NOIO, bs);
+
+	split->bi_iter.bi_size -= iter.bi_size;
+	bio->bi_iter = iter;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_advance(bio, split->bi_iter.bi_size);
+		bio_integrity_trim(split, 0, bio_sectors(split));
+	}
+
+	return split;
+}
+
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 					     struct bio *bio)
 {
diff --git a/block/blk.h b/block/blk.h
index c90e1d8..1e18330 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -147,6 +147,9 @@ static inline int blk_should_fake_timeout(struct request_queue *q)
 }
 #endif
 
+struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
+				  struct bio_set *bs);
+
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 		     struct bio *bio);
 int ll_front_merge_fn(struct request_queue *q, struct request *req, 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ca0119d..e590a08 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -476,6 +476,7 @@ struct request_queue {
 	wait_queue_head_t	mq_freeze_wq;
 	struct percpu_counter	mq_usage_counter;
 	struct list_head	all_q_node;
+	struct bio_set		*bio_split;
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
@@ -499,6 +500,7 @@ struct request_queue {
 #define QUEUE_FLAG_SAME_FORCE  18	/* force complete on same CPU */
 #define QUEUE_FLAG_DEAD        19	/* queue tear-down finished */
 #define QUEUE_FLAG_INIT_DONE   20	/* queue is initialized */
+#define QUEUE_FLAG_LARGEBIOS   21	/* no limits on bio size */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -583,6 +585,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_discard(q)	test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
 #define blk_queue_secdiscard(q)	(blk_queue_discard(q) && \
 	test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))
+#define blk_queue_largebios(q)				\
+	test_bit(QUEUE_FLAG_LARGEBIOS, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] block: Gut bio_add_page(), kill bio_add_pc_page()
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (3 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:36 ` [PATCH 6/9] mtip32xx: handle arbitrary size bios Kent Overstreet
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch, Kent Overstreet

Since generic_make_request() can now handle arbitrary size bios, all we
have to do is make sure the bvec array doesn't overflow.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
---
 drivers/scsi/osd/osd_initiator.c   |   5 +-
 drivers/target/target_core_pscsi.c |   5 +-
 fs/bio.c                           | 158 +++++++------------------------------
 fs/exofs/ore.c                     |   8 +-
 fs/exofs/ore_raid.c                |   8 +-
 include/linux/bio.h                |   2 -
 6 files changed, 37 insertions(+), 149 deletions(-)

diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
index bac04c2..e52b30d 100644
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -1043,7 +1043,6 @@ EXPORT_SYMBOL(osd_req_read_sg);
 static struct bio *_create_sg_bios(struct osd_request *or,
 	void **buff, const struct osd_sg_entry *sglist, unsigned numentries)
 {
-	struct request_queue *q = osd_request_queue(or->osd_dev);
 	struct bio *bio;
 	unsigned i;
 
@@ -1060,9 +1059,9 @@ static struct bio *_create_sg_bios(struct osd_request *or,
 		unsigned added_len;
 
 		BUG_ON(offset + len > PAGE_SIZE);
-		added_len = bio_add_pc_page(q, bio, page, len, offset);
+		added_len = bio_add_page(bio, page, len, offset);
 		if (unlikely(len != added_len)) {
-			OSD_DEBUG("bio_add_pc_page len(%d) != added_len(%d)\n",
+			OSD_DEBUG("bio_add_page len(%d) != added_len(%d)\n",
 				  len, added_len);
 			bio_put(bio);
 			return ERR_PTR(-ENOMEM);
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 551c96c..d65d512 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -922,12 +922,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 					tbio = tbio->bi_next = bio;
 			}
 
-			pr_debug("PSCSI: Calling bio_add_pc_page() i: %d"
+			pr_debug("PSCSI: Calling bio_add_page() i: %d"
 				" bio: %p page: %p len: %d off: %d\n", i, bio,
 				page, len, off);
 
-			rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
-					bio, page, bytes, off);
+			rc = bio_add_page(bio, page, bytes, off);
 			if (rc != bytes)
 				goto fail;
 
diff --git a/fs/bio.c b/fs/bio.c
index 7d538a1..c60bfcb 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -665,12 +665,22 @@ int bio_get_nr_vecs(struct block_device *bdev)
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 
-static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
-			  *page, unsigned int len, unsigned int offset,
-			  unsigned short max_sectors)
+/**
+ *	bio_add_page	-	attempt to add page to bio
+ *	@bio: destination bio
+ *	@page: page to add
+ *	@len: vec entry length
+ *	@offset: vec entry offset
+ *
+ *	Attempt to add a page to the bio_vec maplist. This can fail for a
+ *	number of reasons, such as the bio being full or target block device
+ *	limitations. The target block device must allow bio's up to PAGE_SIZE,
+ *	so it is always possible to add a single page to an empty bio.
+ */
+int bio_add_page(struct bio *bio, struct page *page,
+		 unsigned int len, unsigned int offset)
 {
-	int retried_segments = 0;
-	struct bio_vec *bvec;
+	struct bio_vec *bv;
 
 	/*
 	 * cloned bio must not modify vec list
@@ -678,41 +688,17 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	if (unlikely(bio_flagged(bio, BIO_CLONED)))
 		return 0;
 
-	if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
-		return 0;
-
 	/*
 	 * For filesystems with a blocksize smaller than the pagesize
 	 * we will often be called with the same page as last time and
 	 * a consecutive offset.  Optimize this special case.
 	 */
 	if (bio->bi_vcnt > 0) {
-		struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
-
-		if (page == prev->bv_page &&
-		    offset == prev->bv_offset + prev->bv_len) {
-			unsigned int prev_bv_len = prev->bv_len;
-			prev->bv_len += len;
-
-			if (q->merge_bvec_fn) {
-				struct bvec_merge_data bvm = {
-					/* prev_bvec is already charged in
-					   bi_size, discharge it in order to
-					   simulate merging updated prev_bvec
-					   as new bvec. */
-					.bi_bdev = bio->bi_bdev,
-					.bi_sector = bio->bi_iter.bi_sector,
-					.bi_size = bio->bi_iter.bi_size -
-						prev_bv_len,
-					.bi_rw = bio->bi_rw,
-				};
-
-				if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len) {
-					prev->bv_len -= len;
-					return 0;
-				}
-			}
+		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
+		if (page == bv->bv_page &&
+		    offset == bv->bv_offset + bv->bv_len) {
+			bv->bv_len += len;
 			goto done;
 		}
 	}
@@ -720,106 +706,16 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	if (bio->bi_vcnt >= bio->bi_max_vecs)
 		return 0;
 
-	/*
-	 * we might lose a segment or two here, but rather that than
-	 * make this too complex.
-	 */
-
-	while (bio->bi_phys_segments >= queue_max_segments(q)) {
-
-		if (retried_segments)
-			return 0;
-
-		retried_segments = 1;
-		blk_recount_segments(q, bio);
-	}
-
-	/*
-	 * setup the new entry, we might clear it again later if we
-	 * cannot add the page
-	 */
-	bvec = &bio->bi_io_vec[bio->bi_vcnt];
-	bvec->bv_page = page;
-	bvec->bv_len = len;
-	bvec->bv_offset = offset;
-
-	/*
-	 * if queue has other restrictions (eg varying max sector size
-	 * depending on offset), it can specify a merge_bvec_fn in the
-	 * queue to get further control
-	 */
-	if (q->merge_bvec_fn) {
-		struct bvec_merge_data bvm = {
-			.bi_bdev = bio->bi_bdev,
-			.bi_sector = bio->bi_iter.bi_sector,
-			.bi_size = bio->bi_iter.bi_size,
-			.bi_rw = bio->bi_rw,
-		};
-
-		/*
-		 * merge_bvec_fn() returns number of bytes it can accept
-		 * at this offset
-		 */
-		if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
-			bvec->bv_page = NULL;
-			bvec->bv_len = 0;
-			bvec->bv_offset = 0;
-			return 0;
-		}
-	}
-
-	/* If we may be able to merge these biovecs, force a recount */
-	if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
-		bio->bi_flags &= ~(1 << BIO_SEG_VALID);
+	bv		= &bio->bi_io_vec[bio->bi_vcnt];
+	bv->bv_page	= page;
+	bv->bv_len	= len;
+	bv->bv_offset	= offset;
 
 	bio->bi_vcnt++;
-	bio->bi_phys_segments++;
- done:
+done:
 	bio->bi_iter.bi_size += len;
 	return len;
 }
-
-/**
- *	bio_add_pc_page	-	attempt to add page to bio
- *	@q: the target queue
- *	@bio: destination bio
- *	@page: page to add
- *	@len: vec entry length
- *	@offset: vec entry offset
- *
- *	Attempt to add a page to the bio_vec maplist. This can fail for a
- *	number of reasons, such as the bio being full or target block device
- *	limitations. The target block device must allow bio's up to PAGE_SIZE,
- *	so it is always possible to add a single page to an empty bio.
- *
- *	This should only be used by REQ_PC bios.
- */
-int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page,
-		    unsigned int len, unsigned int offset)
-{
-	return __bio_add_page(q, bio, page, len, offset,
-			      queue_max_hw_sectors(q));
-}
-EXPORT_SYMBOL(bio_add_pc_page);
-
-/**
- *	bio_add_page	-	attempt to add page to bio
- *	@bio: destination bio
- *	@page: page to add
- *	@len: vec entry length
- *	@offset: vec entry offset
- *
- *	Attempt to add a page to the bio_vec maplist. This can fail for a
- *	number of reasons, such as the bio being full or target block device
- *	limitations. The target block device must allow bio's up to PAGE_SIZE,
- *	so it is always possible to add a single page to an empty bio.
- */
-int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
-		 unsigned int offset)
-{
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	return __bio_add_page(q, bio, page, len, offset, queue_max_sectors(q));
-}
 EXPORT_SYMBOL(bio_add_page);
 
 struct submit_bio_ret {
@@ -1169,7 +1065,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 			}
 		}
 
-		if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes)
+		if (bio_add_page(bio, page, bytes, offset) < bytes)
 			break;
 
 		len -= bytes;
@@ -1462,8 +1358,8 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
 		if (bytes > len)
 			bytes = len;
 
-		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
-				    offset) < bytes)
+		if (bio_add_page(bio, virt_to_page(data), bytes,
+				 offset) < bytes)
 			break;
 
 		data += bytes;
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index b744228..c694d6d 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -577,8 +577,6 @@ int _ore_add_stripe_unit(struct ore_io_state *ios,  unsigned *cur_pg,
 			 struct ore_per_dev_state *per_dev, int cur_len)
 {
 	unsigned pg = *cur_pg;
-	struct request_queue *q =
-			osd_request_queue(_ios_od(ios, per_dev->dev));
 	unsigned len = cur_len;
 	int ret;
 
@@ -606,10 +604,10 @@ int _ore_add_stripe_unit(struct ore_io_state *ios,  unsigned *cur_pg,
 
 		cur_len -= pglen;
 
-		added_len = bio_add_pc_page(q, per_dev->bio, pages[pg],
-					    pglen, pgbase);
+		added_len = bio_add_page(per_dev->bio, pages[pg],
+					 pglen, pgbase);
 		if (unlikely(pglen != added_len)) {
-			ORE_DBGMSG("Failed bio_add_pc_page bi_vcnt=%u\n",
+			ORE_DBGMSG("Failed bio_add_page bi_vcnt=%u\n",
 				   per_dev->bio->bi_vcnt);
 			ret = -ENOMEM;
 			goto out;
diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c
index 7682b97..bbd627f 100644
--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -331,7 +331,6 @@ static int _alloc_read_4_write(struct ore_io_state *ios)
 static int _add_to_r4w(struct ore_io_state *ios, struct ore_striping_info *si,
 		       struct page *page, unsigned pg_len)
 {
-	struct request_queue *q;
 	struct ore_per_dev_state *per_dev;
 	struct ore_io_state *read_ios;
 	unsigned first_dev = si->dev - (si->dev %
@@ -365,11 +364,10 @@ static int _add_to_r4w(struct ore_io_state *ios, struct ore_striping_info *si,
 
 		_ore_add_sg_seg(per_dev, gap, true);
 	}
-	q = osd_request_queue(ore_comp_dev(read_ios->oc, per_dev->dev));
-	added_len = bio_add_pc_page(q, per_dev->bio, page, pg_len,
-				    si->obj_offset % PAGE_SIZE);
+	added_len = bio_add_page(per_dev->bio, page, pg_len,
+				 si->obj_offset % PAGE_SIZE);
 	if (unlikely(added_len != pg_len)) {
-		ORE_DBGMSG("Failed to bio_add_pc_page bi_vcnt=%d\n",
+		ORE_DBGMSG("Failed to bio_add_page bi_vcnt=%d\n",
 			      per_dev->bio->bi_vcnt);
 		return -ENOMEM;
 	}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 204489e..a293b78 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -367,8 +367,6 @@ extern void bio_reset(struct bio *);
 void bio_chain(struct bio *, struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
-extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
-			   unsigned int, unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, unsigned int, int, gfp_t);
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] mtip32xx: handle arbitrary size bios
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (4 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 5/9] block: Gut bio_add_page(), kill bio_add_pc_page() Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-05 13:49   ` Josef Bacik
  2013-11-04 23:36 ` [PATCH 7/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch, Kent Overstreet

We get a measurable performance increase by handling this in the driver when
we're already looping over the biovec, instead of handling it separately in
generic_make_request() (or bio_add_page() originally)

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
---
 drivers/block/mtip32xx/mtip32xx.c | 46 +++++++++++++--------------------------
 1 file changed, 15 insertions(+), 31 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index d4c669b..c5a7a96 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -2648,24 +2648,6 @@ static void mtip_hw_submit_io(struct driver_data *dd, sector_t sector,
 }
 
 /*
- * Release a command slot.
- *
- * @dd  Pointer to the driver data structure.
- * @tag Slot tag
- *
- * return value
- *      None
- */
-static void mtip_hw_release_scatterlist(struct driver_data *dd, int tag,
-								int unaligned)
-{
-	struct semaphore *sem = unaligned ? &dd->port->cmd_slot_unal :
-							&dd->port->cmd_slot;
-	release_slot(dd->port, tag);
-	up(sem);
-}
-
-/*
  * Obtain a command slot and return its associated scatter list.
  *
  * @dd  Pointer to the driver data structure.
@@ -4016,21 +3998,22 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
 
 	sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
 	if (likely(sg != NULL)) {
-		if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
-			dev_warn(&dd->pdev->dev,
-				"Maximum number of SGL entries exceeded\n");
-			bio_io_error(bio);
-			mtip_hw_release_scatterlist(dd, tag, unaligned);
-			return;
-		}
-
 		/* Create the scatter list for this bio. */
 		bio_for_each_segment(bvec, bio, iter) {
-			sg_set_page(&sg[nents],
-					bvec.bv_page,
-					bvec.bv_len,
-					bvec.bv_offset);
-			nents++;
+			if (unlikely(nents == MTIP_MAX_SG)) {
+				struct bio *split = bio_clone(bio, GFP_NOIO);
+
+				split->bi_iter = iter;
+				bio->bi_iter.bi_size -= iter.bi_size;
+				bio_chain(split, bio);
+				generic_make_request(split);
+				break;
+			}
+
+			sg_set_page(&sg[nents++],
+				    bvec.bv_page,
+				    bvec.bv_len,
+				    bvec.bv_offset);
 		}
 
 		/* Issue the read/write. */
@@ -4145,6 +4128,7 @@ skip_create_disk:
 	blk_queue_max_hw_sectors(dd->queue, 0xffff);
 	blk_queue_max_segment_size(dd->queue, 0x400000);
 	blk_queue_io_min(dd->queue, 4096);
+	set_bit(QUEUE_FLAG_LARGEBIOS,	&dd->queue->queue_flags);
 
 	/*
 	 * write back cache is not supported in the device. FUA depends on
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] blk-lib.c: generic_make_request() handles large bios now
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (5 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 6/9] mtip32xx: handle arbitrary size bios Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:36 ` [PATCH 8/9] bcache: " Kent Overstreet
  2013-11-04 23:36 ` [PATCH 9/9] block: Add bio_get_user_pages() Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch, Kent Overstreet

generic_make_request() will now do for us what the code in blk-lib.c was
doing manually, with the bio_batch stuff - we still need some looping in
case we're trying to discard/zeroout more than around a gigabyte, but
when we can submit that much at a time doing the submissions in parallel
really shouldn't matter.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-lib.c | 175 ++++++++++----------------------------------------------
 1 file changed, 30 insertions(+), 145 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2da76c9..368c36a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -9,23 +9,6 @@
 
 #include "blk.h"
 
-struct bio_batch {
-	atomic_t		done;
-	unsigned long		flags;
-	struct completion	*wait;
-};
-
-static void bio_batch_end_io(struct bio *bio, int err)
-{
-	struct bio_batch *bb = bio->bi_private;
-
-	if (err && (err != -EOPNOTSUPP))
-		clear_bit(BIO_UPTODATE, &bb->flags);
-	if (atomic_dec_and_test(&bb->done))
-		complete(bb->wait);
-	bio_put(bio);
-}
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:	blockdev to issue discard for
@@ -40,15 +23,10 @@ static void bio_batch_end_io(struct bio *bio, int err)
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	int type = REQ_WRITE | REQ_DISCARD;
-	unsigned int max_discard_sectors, granularity;
-	int alignment;
-	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
-	struct blk_plug plug;
 
 	if (!q)
 		return -ENXIO;
@@ -56,78 +34,28 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
-	/* Zero-sector (unknown) and one-sector granularities are the same.  */
-	granularity = max(q->limits.discard_granularity >> 9, 1U);
-	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-	/*
-	 * Ensure that max_discard_sectors is of the proper
-	 * granularity, so that requests stay aligned after a split.
-	 */
-	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors)) {
-		/* Avoid infinite loop below. Being cautious never hurts. */
-		return -EOPNOTSUPP;
-	}
-
 	if (flags & BLKDEV_DISCARD_SECURE) {
 		if (!blk_queue_secdiscard(q))
 			return -EOPNOTSUPP;
 		type |= REQ_SECURE;
 	}
 
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
-
-	blk_start_plug(&plug);
 	while (nr_sects) {
-		unsigned int req_sects;
-		sector_t end_sect, tmp;
-
 		bio = bio_alloc(gfp_mask, 1);
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
-
-		req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-		/*
-		 * If splitting a request, and the next starting sector would be
-		 * misaligned, stop the discard at the previous aligned sector.
-		 */
-		end_sect = sector + req_sects;
-		tmp = end_sect;
-		if (req_sects < nr_sects &&
-		    sector_div(tmp, granularity) != alignment) {
-			end_sect = end_sect - alignment;
-			sector_div(end_sect, granularity);
-			end_sect = end_sect * granularity + alignment;
-			req_sects = end_sect - sector;
-		}
+		if (!bio)
+			return -ENOMEM;
 
-		bio->bi_iter.bi_sector = sector;
-		bio->bi_end_io = bio_batch_end_io;
 		bio->bi_bdev = bdev;
-		bio->bi_private = &bb;
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;
 
-		bio->bi_iter.bi_size = req_sects << 9;
-		nr_sects -= req_sects;
-		sector = end_sect;
+		sector += bio_sectors(bio);
+		nr_sects -= bio_sectors(bio);
 
-		atomic_inc(&bb.done);
-		submit_bio(type, bio);
+		ret = submit_bio_wait(type, bio);
+		if (ret)
+			break;
 	}
-	blk_finish_plug(&plug);
-
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
-
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -EIO;
 
 	return ret;
 }
@@ -148,61 +76,37 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 			    sector_t nr_sects, gfp_t gfp_mask,
 			    struct page *page)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
-	unsigned int max_write_same_sectors;
-	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
 
 	if (!q)
 		return -ENXIO;
 
-	max_write_same_sectors = q->limits.max_write_same_sectors;
-
-	if (max_write_same_sectors == 0)
+	if (!q->limits.max_write_same_sectors)
 		return -EOPNOTSUPP;
 
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
-
 	while (nr_sects) {
 		bio = bio_alloc(gfp_mask, 1);
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
+		if (!bio)
+			return -ENOMEM;
 
-		bio->bi_iter.bi_sector = sector;
-		bio->bi_end_io = bio_batch_end_io;
 		bio->bi_bdev = bdev;
-		bio->bi_private = &bb;
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;
 		bio->bi_vcnt = 1;
 		bio->bi_io_vec->bv_page = page;
 		bio->bi_io_vec->bv_offset = 0;
 		bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
 
-		if (nr_sects > max_write_same_sectors) {
-			bio->bi_iter.bi_size = max_write_same_sectors << 9;
-			nr_sects -= max_write_same_sectors;
-			sector += max_write_same_sectors;
-		} else {
-			bio->bi_iter.bi_size = nr_sects << 9;
-			nr_sects = 0;
-		}
+		sector += bio_sectors(bio);
+		nr_sects -= bio_sectors(bio);
 
-		atomic_inc(&bb.done);
-		submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
+		ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
+		if (ret)
+			break;
 	}
 
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
-
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -ENOTSUPP;
-
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
@@ -217,33 +121,22 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
  * Description:
  *  Generate and issue number of bios with zerofiled pages.
  */
-
 int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
-			sector_t nr_sects, gfp_t gfp_mask)
+			   sector_t nr_sects, gfp_t gfp_mask)
 {
-	int ret;
+	int ret = 0;
 	struct bio *bio;
-	struct bio_batch bb;
 	unsigned int sz;
-	DECLARE_COMPLETION_ONSTACK(wait);
-
-	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
-	bb.wait = &wait;
 
-	ret = 0;
-	while (nr_sects != 0) {
+	while (nr_sects) {
 		bio = bio_alloc(gfp_mask,
-				min(nr_sects, (sector_t)BIO_MAX_PAGES));
-		if (!bio) {
-			ret = -ENOMEM;
-			break;
-		}
+				min(nr_sects / (PAGE_SIZE >> 9),
+				    (sector_t)BIO_MAX_PAGES));
+		if (!bio)
+			return -ENOMEM;
 
 		bio->bi_iter.bi_sector = sector;
 		bio->bi_bdev   = bdev;
-		bio->bi_end_io = bio_batch_end_io;
-		bio->bi_private = &bb;
 
 		while (nr_sects != 0) {
 			sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
@@ -253,18 +146,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			if (ret < (sz << 9))
 				break;
 		}
-		ret = 0;
-		atomic_inc(&bb.done);
-		submit_bio(WRITE, bio);
-	}
-
-	/* Wait for bios in-flight */
-	if (!atomic_dec_and_test(&bb.done))
-		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		/* One of bios in the batch was completed with error.*/
-		ret = -EIO;
+		ret = submit_bio_wait(WRITE, bio);
+		if (ret)
+			break;
+	}
 
 	return ret;
 }
@@ -279,7 +165,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
  * Description:
  *  Generate and issue number of bios with zerofiled pages.
  */
-
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			 sector_t nr_sects, gfp_t gfp_mask)
 {
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] bcache: generic_make_request() handles large bios now
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (6 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 7/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  2013-11-04 23:36 ` [PATCH 9/9] block: Add bio_get_user_pages() Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch, Kent Overstreet

So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
---
 drivers/md/bcache/bcache.h    |  18 --------
 drivers/md/bcache/io.c        | 100 +-----------------------------------------
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++----
 drivers/md/bcache/super.c     |  33 ++------------
 drivers/md/bcache/util.h      |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 include/linux/bio.h           |  12 -----
 8 files changed, 19 insertions(+), 173 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 964353c..8f65331 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -241,19 +241,6 @@ struct keybuf {
 	DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-	struct bio_set		*bio_split;
-	mempool_t		*bio_split_hook;
-};
-
-struct bio_split_hook {
-	struct closure		cl;
-	struct bio_split_pool	*p;
-	struct bio		*bio;
-	bio_end_io_t		*bi_end_io;
-	void			*bi_private;
-};
-
 struct bcache_device {
 	struct closure		cl;
 
@@ -286,8 +273,6 @@ struct bcache_device {
 	int (*cache_miss)(struct btree *, struct search *,
 			  struct bio *, unsigned);
 	int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-	struct bio_split_pool	bio_split_hook;
 };
 
 struct io {
@@ -465,8 +450,6 @@ struct cache {
 	atomic_long_t		meta_sectors_written;
 	atomic_long_t		btree_sectors_written;
 	atomic_long_t		sectors_written;
-
-	struct bio_split_pool	bio_split_hook;
 };
 
 struct gc_stat {
@@ -901,7 +884,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include <linux/blkdev.h>
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	struct bio_vec bv;
-	struct bvec_iter iter;
-	unsigned ret = 0, seg = 0;
-
-	if (bio->bi_rw & REQ_DISCARD)
-		return min(bio_sectors(bio), q->limits.max_discard_sectors);
-
-	bio_for_each_segment(bv, bio, iter) {
-		struct bvec_merge_data bvm = {
-			.bi_bdev	= bio->bi_bdev,
-			.bi_sector	= bio->bi_iter.bi_sector,
-			.bi_size	= ret << 9,
-			.bi_rw		= bio->bi_rw,
-		};
-
-		if (seg == min_t(unsigned, BIO_MAX_PAGES,
-				 queue_max_segments(q)))
-			break;
-
-		if (q->merge_bvec_fn &&
-		    q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
-			break;
-
-		seg++;
-		ret += bv.bv_len >> 9;
-	}
-
-	ret = min(ret, queue_max_sectors(q));
-
-	WARN_ON(!ret);
-	ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);
-
-	return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-	s->bio->bi_end_io = s->bi_end_io;
-	s->bio->bi_private = s->bi_private;
-	bio_endio_nodec(s->bio, 0);
-
-	closure_debug_destroy(&s->cl);
-	mempool_free(s, s->p->bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-	struct closure *cl = bio->bi_private;
-	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-	if (error)
-		clear_bit(BIO_UPTODATE, &s->bio->bi_flags);
-
-	bio_put(bio);
-	closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-	struct bio_split_hook *s;
-	struct bio *n;
-
-	if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
-		goto submit;
-
-	if (bio_sectors(bio) <= bch_bio_max_sectors(bio))
-		goto submit;
-
-	s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
-	closure_init(&s->cl, NULL);
-
-	s->bio		= bio;
-	s->p		= p;
-	s->bi_end_io	= bio->bi_end_io;
-	s->bi_private	= bio->bi_private;
-	bio_get(bio);
-
-	do {
-		n = bio_next_split(bio, bch_bio_max_sectors(bio),
-				   GFP_NOIO, s->p->bio_split);
-
-		n->bi_end_io	= bch_bio_submit_split_endio;
-		n->bi_private	= &s->cl;
-
-		closure_get(&s->cl);
-		generic_make_request(n);
-	} while (n != bio);
-
-	continue_at(&s->cl, bch_bio_submit_split_done, NULL);
-submit:
-	generic_make_request(bio);
-}
-
 /* Bios with headers */
 
 void bch_bbio_free(struct bio *bio, struct cache_set *c)
@@ -138,7 +40,7 @@ void __bch_submit_bbio(struct bio *bio, struct cache_set *c)
 	bio->bi_bdev		= PTR_CACHE(c, &b->key, 0)->bdev;
 
 	b->submit_time_us = local_clock_us();
-	closure_bio_submit(bio, bio->bi_private, PTR_CACHE(c, &b->key, 0));
+	closure_bio_submit(bio, bio->bi_private);
 }
 
 void bch_submit_bbio(struct bio *bio, struct cache_set *c,
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 7eafdf0..15aeebe 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -60,7 +60,7 @@ reread:		left = ca->sb.bucket_size - offset;
 		bio->bi_private = &cl;
 		bch_bio_map(bio, data);
 
-		closure_bio_submit(bio, &cl, ca);
+		closure_bio_submit(bio, &cl);
 		closure_sync(&cl);
 
 		/* This function could be simpler now since we no longer write
@@ -632,7 +632,7 @@ static void journal_write_unlocked(struct closure *cl)
 	spin_unlock(&c->journal.lock);
 
 	while ((bio = bio_list_pop(&list)))
-		closure_bio_submit(bio, cl, c->cache[0]);
+		closure_bio_submit(bio, cl);
 
 	continue_at(cl, journal_write_done, NULL);
 }
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index be49d0f..134f4ed 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -849,7 +849,7 @@ static void cached_dev_read_error(struct closure *cl)
 
 		/* XXX: invalidate cache */
 
-		closure_bio_submit(bio, cl, s->d);
+		closure_bio_submit(bio, cl);
 	}
 
 	continue_at(cl, cached_dev_cache_miss_done, NULL);
@@ -973,7 +973,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
 	s->cache_miss	= miss;
 	s->iop.bio	= cache_bio;
 	bio_get(cache_bio);
-	closure_bio_submit(cache_bio, &s->cl, s->d);
+	closure_bio_submit(cache_bio, &s->cl);
 
 	return ret;
 out_put:
@@ -981,7 +981,7 @@ out_put:
 out_submit:
 	miss->bi_end_io		= request_endio;
 	miss->bi_private	= &s->cl;
-	closure_bio_submit(miss, &s->cl, s->d);
+	closure_bio_submit(miss, &s->cl);
 	return ret;
 }
 
@@ -1046,7 +1046,7 @@ static void cached_dev_write(struct cached_dev *dc, struct search *s)
 
 		if (!(bio->bi_rw & REQ_DISCARD) ||
 		    blk_queue_discard(bdev_get_queue(dc->bdev)))
-			closure_bio_submit(bio, cl, s->d);
+			closure_bio_submit(bio, cl);
 	} else if (s->iop.writeback) {
 		bch_writeback_add(dc);
 		s->iop.bio = bio;
@@ -1061,13 +1061,13 @@ static void cached_dev_write(struct cached_dev *dc, struct search *s)
 			flush->bi_end_io = request_endio;
 			flush->bi_private = cl;
 
-			closure_bio_submit(flush, cl, s->d);
+			closure_bio_submit(flush, cl);
 		}
 	} else {
 		s->iop.bio = bio_clone_bioset(bio, GFP_NOIO,
 					      dc->disk.bio_split);
 
-		closure_bio_submit(bio, cl, s->d);
+		closure_bio_submit(bio, cl);
 	}
 
 	closure_call(&s->iop.cl, bch_data_insert, NULL, cl);
@@ -1083,7 +1083,7 @@ static void cached_dev_nodata(struct closure *cl)
 		bch_journal_meta(s->iop.c, cl);
 
 	/* If it's a flush, we send the flush to the backing device too */
-	closure_bio_submit(bio, cl, s->d);
+	closure_bio_submit(bio, cl);
 
 	continue_at(cl, cached_dev_bio_complete, NULL);
 }
@@ -1130,7 +1130,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
 		    !blk_queue_discard(bdev_get_queue(dc->bdev)))
 			bio_endio(bio, 0);
 		else
-			bch_generic_make_request(bio, &d->bio_split_hook);
+			generic_make_request(bio);
 	}
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 60fb604..2947ea6 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -58,29 +58,6 @@ struct workqueue_struct *bcache_wq;
 
 #define BTREE_MAX_PAGES		(256 * 1024 / PAGE_SIZE)
 
-static void bio_split_pool_free(struct bio_split_pool *p)
-{
-	if (p->bio_split_hook)
-		mempool_destroy(p->bio_split_hook);
-
-	if (p->bio_split)
-		bioset_free(p->bio_split);
-}
-
-static int bio_split_pool_init(struct bio_split_pool *p)
-{
-	p->bio_split = bioset_create(4, 0);
-	if (!p->bio_split)
-		return -ENOMEM;
-
-	p->bio_split_hook = mempool_create_kmalloc_pool(4,
-				sizeof(struct bio_split_hook));
-	if (!p->bio_split_hook)
-		return -ENOMEM;
-
-	return 0;
-}
-
 /* Superblock */
 
 static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
@@ -512,7 +489,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, unsigned long rw)
 	bio->bi_private = ca;
 	bch_bio_map(bio, ca->disk_buckets);
 
-	closure_bio_submit(bio, &ca->prio, ca);
+	closure_bio_submit(bio, &ca->prio);
 	closure_sync(cl);
 }
 
@@ -738,7 +715,6 @@ static void bcache_device_free(struct bcache_device *d)
 		put_disk(d->disk);
 	}
 
-	bio_split_pool_free(&d->bio_split_hook);
 	if (d->bio_split)
 		bioset_free(d->bio_split);
 	if (is_vmalloc_addr(d->full_dirty_stripes))
@@ -791,7 +767,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 		return minor;
 
 	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
-	    bio_split_pool_init(&d->bio_split_hook) ||
 	    !(d->disk = alloc_disk(1))) {
 		ida_simple_remove(&bcache_minor, minor);
 		return -ENOMEM;
@@ -823,6 +798,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 	q->limits.physical_block_size	= block_size;
 	set_bit(QUEUE_FLAG_NONROT,	&d->disk->queue->queue_flags);
 	set_bit(QUEUE_FLAG_DISCARD,	&d->disk->queue->queue_flags);
+	set_bit(QUEUE_FLAG_LARGEBIOS,	&d->disk->queue->queue_flags);
 
 	blk_queue_flush(q, REQ_FLUSH|REQ_FUA);
 
@@ -1747,8 +1723,6 @@ void bch_cache_release(struct kobject *kobj)
 	if (ca->set)
 		ca->set->cache[ca->sb.nr_this_dev] = NULL;
 
-	bio_split_pool_free(&ca->bio_split_hook);
-
 	free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
 	kfree(ca->prio_buckets);
 	vfree(ca->buckets);
@@ -1793,8 +1767,7 @@ static int cache_alloc(struct cache_sb *sb, struct cache *ca)
 					  ca->sb.nbuckets)) ||
 	    !(ca->prio_buckets	= kzalloc(sizeof(uint64_t) * prio_buckets(ca) *
 					  2, GFP_KERNEL)) ||
-	    !(ca->disk_buckets	= alloc_bucket_pages(GFP_KERNEL, ca)) ||
-	    bio_split_pool_init(&ca->bio_split_hook))
+	    !(ca->disk_buckets	= alloc_bucket_pages(GFP_KERNEL, ca)))
 		return -ENOMEM;
 
 	ca->prio_last_buckets = ca->prio_buckets + prio_buckets(ca);
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index 362c4b3..4cafce4 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -3,6 +3,7 @@
 #define _BCACHE_UTIL_H
 
 #include <linux/errno.h>
+#include <linux/blkdev.h>
 #include <linux/kernel.h>
 #include <linux/llist.h>
 #include <linux/ratelimit.h>
@@ -568,10 +569,10 @@ static inline sector_t bdev_sectors(struct block_device *bdev)
 	return bdev->bd_inode->i_size >> 9;
 }
 
-#define closure_bio_submit(bio, cl, dev)				\
+#define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-	bch_generic_make_request(bio, &(dev)->bio_split_hook);		\
+	generic_make_request(bio);					\
 } while (0)
 
 uint64_t bch_crc64_update(uint64_t, const void *, size_t);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 04657e9..b3ce66e 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -190,7 +190,7 @@ static void write_dirty(struct closure *cl)
 	io->bio.bi_bdev		= io->dc->bdev;
 	io->bio.bi_end_io	= dirty_endio;
 
-	closure_bio_submit(&io->bio, cl, &io->dc->disk);
+	closure_bio_submit(&io->bio, cl);
 
 	continue_at(cl, write_dirty_finish, system_wq);
 }
@@ -210,7 +210,7 @@ static void read_dirty_submit(struct closure *cl)
 {
 	struct dirty_io *io = container_of(cl, struct dirty_io, cl);
 
-	closure_bio_submit(&io->bio, cl, &io->dc->disk);
+	closure_bio_submit(&io->bio, cl);
 
 	continue_at(cl, write_dirty, system_wq);
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a293b78..d0c0cc7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -244,18 +244,6 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
-static inline unsigned bio_segments(struct bio *bio)
-{
-	unsigned segs = 0;
-	struct bio_vec bv;
-	struct bvec_iter iter;
-
-	bio_for_each_segment(bv, bio, iter)
-		segs++;
-
-	return segs;
-}
-
 /*
  * get a reference to a bio, so it won't disappear. the intended use is
  * something like:
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] block: Add bio_get_user_pages()
  2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
                   ` (7 preceding siblings ...)
  2013-11-04 23:36 ` [PATCH 8/9] bcache: " Kent Overstreet
@ 2013-11-04 23:36 ` Kent Overstreet
  8 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-04 23:36 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-btrfs; +Cc: axboe, hch, Kent Overstreet

This replaces some of the code that was in __bio_map_user_iov(), and
soon we're going to use this helper in the dio code.

Note that this relies on the recent change to make
generic_make_request() take arbitrary sized bios - we're not using
bio_add_page() here.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/bio.c            | 124 +++++++++++++++++++++++++++-------------------------
 include/linux/bio.h |   2 +
 2 files changed, 67 insertions(+), 59 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index c60bfcb..a10b350 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1124,17 +1124,70 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
 }
 EXPORT_SYMBOL(bio_copy_user);
 
+/**
+ * bio_get_user_pages - pin user pages and add them to a biovec
+ * @bio: bio to add pages to
+ * @uaddr: start of user address
+ * @len: length in bytes
+ * @write_to_vm: bool indicating writing to pages or not
+ *
+ * Pins pages for up to @len bytes and appends them to @bio's bvec array. May
+ * pin only part of the requested pages - @bio need not have room for all the
+ * pages and can already have had pages added to it.
+ *
+ * Returns the number of bytes from @len added to @bio.
+ */
+ssize_t bio_get_user_pages(struct bio *bio, unsigned long uaddr,
+			   unsigned long len, int write_to_vm)
+{
+	int ret;
+	unsigned nr_pages, bytes;
+	unsigned offset = offset_in_page(uaddr);
+	struct bio_vec *bv;
+	struct page **pages;
+
+	nr_pages = min_t(size_t,
+			 DIV_ROUND_UP(len + offset, PAGE_SIZE),
+			 bio->bi_max_vecs - bio->bi_vcnt);
+
+	bv = &bio->bi_io_vec[bio->bi_vcnt];
+	pages = (void *) bv;
+
+	ret = get_user_pages_fast(uaddr, nr_pages, write_to_vm, pages);
+	if (ret < 0)
+		return ret;
+
+	bio->bi_vcnt += ret;
+	bytes = ret * PAGE_SIZE - offset;
+
+	while (ret--) {
+		bv[ret].bv_page = pages[ret];
+		bv[ret].bv_len = PAGE_SIZE;
+		bv[ret].bv_offset = 0;
+	}
+
+	bv[0].bv_offset += offset;
+	bv[0].bv_len -= offset;
+
+	if (bytes > len) {
+		bio->bi_io_vec[bio->bi_vcnt - 1].bv_len -= bytes - len;
+		bytes = len;
+	}
+
+	bio->bi_iter.bi_size += bytes;
+
+	return bytes;
+}
+EXPORT_SYMBOL(bio_get_user_pages);
+
 static struct bio *__bio_map_user_iov(struct request_queue *q,
 				      struct block_device *bdev,
 				      struct sg_iovec *iov, int iov_count,
 				      int write_to_vm, gfp_t gfp_mask)
 {
-	int i, j;
-	int nr_pages = 0;
-	struct page **pages;
+	ssize_t ret;
+	int i, nr_pages = 0;
 	struct bio *bio;
-	int cur_page = 0;
-	int ret, offset;
 
 	for (i = 0; i < iov_count; i++) {
 		unsigned long uaddr = (unsigned long)iov[i].iov_base;
@@ -1163,57 +1216,17 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	if (!bio)
 		return ERR_PTR(-ENOMEM);
 
-	ret = -ENOMEM;
-	pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
-	if (!pages)
-		goto out;
-
 	for (i = 0; i < iov_count; i++) {
-		unsigned long uaddr = (unsigned long)iov[i].iov_base;
-		unsigned long len = iov[i].iov_len;
-		unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		unsigned long start = uaddr >> PAGE_SHIFT;
-		const int local_nr_pages = end - start;
-		const int page_limit = cur_page + local_nr_pages;
-
-		ret = get_user_pages_fast(uaddr, local_nr_pages,
-				write_to_vm, &pages[cur_page]);
-		if (ret < local_nr_pages) {
-			ret = -EFAULT;
-			goto out_unmap;
-		}
-
-		offset = uaddr & ~PAGE_MASK;
-		for (j = cur_page; j < page_limit; j++) {
-			unsigned int bytes = PAGE_SIZE - offset;
+		ret = bio_get_user_pages(bio, (size_t) iov[i].iov_base,
+					 iov[i].iov_len,
+					 write_to_vm);
+		if (ret < 0)
+			goto out;
 
-			if (len <= 0)
-				break;
-			
-			if (bytes > len)
-				bytes = len;
-
-			/*
-			 * sorry...
-			 */
-			if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
-					    bytes)
-				break;
-
-			len -= bytes;
-			offset = 0;
-		}
-
-		cur_page = j;
-		/*
-		 * release the pages we didn't map into the bio, if any
-		 */
-		while (j < page_limit)
-			page_cache_release(pages[j++]);
+		if (ret != iov[i].iov_len)
+			break;
 	}
 
-	kfree(pages);
-
 	/*
 	 * set data direction, and check if mapped pages need bouncing
 	 */
@@ -1224,14 +1237,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	bio->bi_flags |= (1 << BIO_USER_MAPPED);
 	return bio;
 
- out_unmap:
-	for (i = 0; i < nr_pages; i++) {
-		if(!pages[i])
-			break;
-		page_cache_release(pages[i]);
-	}
  out:
-	kfree(pages);
 	bio_put(bio);
 	return ERR_PTR(ret);
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d0c0cc7..af4976e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -356,6 +356,8 @@ void bio_chain(struct bio *, struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
+extern ssize_t bio_get_user_pages(struct bio *, unsigned long,
+				  unsigned long, int);
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, unsigned int, int, gfp_t);
 struct sg_iovec;
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios
  2013-11-04 23:36 ` [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
@ 2013-11-04 23:56   ` Mike Christie
  2013-11-05  0:55     ` Kent Overstreet
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Christie @ 2013-11-04 23:56 UTC (permalink / raw)
  To: device-mapper development
  Cc: Kent Overstreet, linux-kernel, linux-fsdevel, linux-btrfs, axboe,
	hch, Alasdair Kergon

On 11/04/2013 03:36 PM, Kent Overstreet wrote:
> @@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
>  		 */
>  		blk_queue_bounce(q, &bio);
>  
> +		if (!blk_queue_largebios(q))
> +			split = blk_bio_segment_split(q, bio, q->bio_split);


Is it assumed bios coming down this path are created using bio_add_page?
If not, does blk_bio_segment_split need a queue_max_sectors or
queue_max_hw_sectors check? I only saw a segment count check below.


> +
> +struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
> +				  struct bio_set *bs)
> +{
> +	struct bio *split;
> +	struct bio_vec bv, bvprv;
> +	struct bvec_iter iter;
> +	unsigned seg_size = 0, nsegs = 0;
> +	int prev = 0;
> +
> +	struct bvec_merge_data bvm = {
> +		.bi_bdev	= bio->bi_bdev,
> +		.bi_sector	= bio->bi_iter.bi_sector,
> +		.bi_size	= 0,
> +		.bi_rw		= bio->bi_rw,
> +	};
> +
> +	if (bio->bi_rw & REQ_DISCARD)
> +		return blk_bio_discard_split(q, bio, bs);
> +
> +	if (bio->bi_rw & REQ_WRITE_SAME)
> +		return blk_bio_write_same_split(q, bio, bs);
> +
> +	bio_for_each_segment(bv, bio, iter) {
> +		if (q->merge_bvec_fn &&
> +		    q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
> +			goto split;
> +
> +		bvm.bi_size += bv.bv_len;
> +
> +		if (prev && blk_queue_cluster(q)) {
> +			if (seg_size + bv.bv_len > queue_max_segment_size(q))
> +				goto new_segment;
> +			if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
> +				goto new_segment;
> +			if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
> +				goto new_segment;
> +
> +			seg_size += bv.bv_len;
> +			bvprv = bv;
> +			prev = 1;
> +			continue;
> +		}
> +new_segment:
> +		if (nsegs == queue_max_segments(q))
> +			goto split;
> +
> +		nsegs++;
> +		bvprv = bv;
> +		prev = 1;
> +		seg_size = bv.bv_len;
> +	}
> +
> +	return NULL;
> +split:
> +	split = bio_clone_bioset(bio, GFP_NOIO, bs);
> +
> +	split->bi_iter.bi_size -= iter.bi_size;
> +	bio->bi_iter = iter;
> +
> +	if (bio_integrity(bio)) {
> +		bio_integrity_advance(bio, split->bi_iter.bi_size);
> +		bio_integrity_trim(split, 0, bio_sectors(split));
> +	}
> +
> +	return split;
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios
  2013-11-04 23:56   ` [dm-devel] " Mike Christie
@ 2013-11-05  0:55     ` Kent Overstreet
  0 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-05  0:55 UTC (permalink / raw)
  To: Mike Christie
  Cc: device-mapper development, linux-kernel, linux-fsdevel,
	linux-btrfs, axboe, hch, Alasdair Kergon

On Mon, Nov 04, 2013 at 03:56:52PM -0800, Mike Christie wrote:
> On 11/04/2013 03:36 PM, Kent Overstreet wrote:
> > @@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
> >  		 */
> >  		blk_queue_bounce(q, &bio);
> >  
> > +		if (!blk_queue_largebios(q))
> > +			split = blk_bio_segment_split(q, bio, q->bio_split);
> 
> 
> Is it assumed bios coming down this path are created using bio_add_page?
> If not, does blk_bio_segment_split need a queue_max_sectors or
> queue_max_hw_sectors check? I only saw a segment count check below.

Shoot, you're absolutely right - thanks, I'll have this fixed in the next
version.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/9] mtip32xx: handle arbitrary size bios
  2013-11-04 23:36 ` [PATCH 6/9] mtip32xx: handle arbitrary size bios Kent Overstreet
@ 2013-11-05 13:49   ` Josef Bacik
  0 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2013-11-05 13:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-fsdevel, linux-btrfs, axboe, hch

On Mon, Nov 04, 2013 at 03:36:24PM -0800, Kent Overstreet wrote:
> We get a measurable performance increase by handling this in the driver when
> we're already looping over the biovec, instead of handling it separately in
> generic_make_request() (or bio_add_page() originally)
> 
> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
> ---
>  drivers/block/mtip32xx/mtip32xx.c | 46 +++++++++++++--------------------------
>  1 file changed, 15 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
> index d4c669b..c5a7a96 100644
> --- a/drivers/block/mtip32xx/mtip32xx.c
> +++ b/drivers/block/mtip32xx/mtip32xx.c
> @@ -2648,24 +2648,6 @@ static void mtip_hw_submit_io(struct driver_data *dd, sector_t sector,
>  }
>  
>  /*
> - * Release a command slot.
> - *
> - * @dd  Pointer to the driver data structure.
> - * @tag Slot tag
> - *
> - * return value
> - *      None
> - */
> -static void mtip_hw_release_scatterlist(struct driver_data *dd, int tag,
> -								int unaligned)
> -{
> -	struct semaphore *sem = unaligned ? &dd->port->cmd_slot_unal :
> -							&dd->port->cmd_slot;
> -	release_slot(dd->port, tag);
> -	up(sem);
> -}
> -
> -/*
>   * Obtain a command slot and return its associated scatter list.
>   *
>   * @dd  Pointer to the driver data structure.
> @@ -4016,21 +3998,22 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
>  
>  	sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
>  	if (likely(sg != NULL)) {
> -		if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
> -			dev_warn(&dd->pdev->dev,
> -				"Maximum number of SGL entries exceeded\n");
> -			bio_io_error(bio);
> -			mtip_hw_release_scatterlist(dd, tag, unaligned);
> -			return;
> -		}
> -
>  		/* Create the scatter list for this bio. */
>  		bio_for_each_segment(bvec, bio, iter) {
> -			sg_set_page(&sg[nents],
> -					bvec.bv_page,
> -					bvec.bv_len,
> -					bvec.bv_offset);
> -			nents++;
> +			if (unlikely(nents == MTIP_MAX_SG)) {
> +				struct bio *split = bio_clone(bio, GFP_NOIO);
> +

Need to check for memory allocation failure here.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Convert various code to bio_for_each_segment()
  2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
@ 2013-11-05 13:53   ` Josef Bacik
  2013-11-07 11:26   ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2013-11-05 13:53 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-fsdevel, linux-btrfs, axboe, hch,
	Alexander Viro, Chris Mason, Jaegeuk Kim, Joern Engel,
	Prasad Joshi, Trond Myklebust

On Mon, Nov 04, 2013 at 03:36:19PM -0800, Kent Overstreet wrote:
> With immutable biovecs we don't want code accessing bi_io_vec directly -
> the uses this patch changes weren't incorrect since they all own the
> bio, but it makes the code harder to audit for no good reason - also,
> this will help with multipage bvecs later.
> 

The btrfs parts of this look good to me, you can add

Reviewed-by: Josef Bacik <jbacik@fusionio.com>

Thanks for this,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Convert various code to bio_for_each_segment()
  2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
  2013-11-05 13:53   ` Josef Bacik
@ 2013-11-07 11:26   ` Jan Kara
  2013-11-07 20:21     ` Kent Overstreet
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-11-07 11:26 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, linux-fsdevel, linux-btrfs, axboe, hch,
	Alexander Viro, Chris Mason, Jaegeuk Kim, Joern Engel,
	Prasad Joshi, Trond Myklebust

On Mon 04-11-13 15:36:19, Kent Overstreet wrote:
> With immutable biovecs we don't want code accessing bi_io_vec directly -
> the uses this patch changes weren't incorrect since they all own the
> bio, but it makes the code harder to audit for no good reason - also,
> this will help with multipage bvecs later.
  I think you've missed the code in fs/ext4/page-io.c in the conversion
(likely because it was added relatively recently).

								Honza
> 
> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Chris Mason <chris.mason@fusionio.com>
> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
> Cc: Joern Engel <joern@logfs.org>
> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>  fs/btrfs/compression.c           | 10 ++++------
>  fs/btrfs/disk-io.c               | 11 ++++-------
>  fs/btrfs/extent_io.c             | 37 ++++++++++++++-----------------------
>  fs/btrfs/inode.c                 | 15 ++++++---------
>  fs/f2fs/data.c                   | 13 +++++--------
>  fs/f2fs/segment.c                | 12 +++++-------
>  fs/logfs/dev_bdev.c              | 18 +++++++-----------
>  fs/mpage.c                       | 17 ++++++++---------
>  fs/nfs/blocklayout/blocklayout.c | 34 +++++++++++++---------------------
>  9 files changed, 66 insertions(+), 101 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 06ab821..52e7848 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -203,18 +203,16 @@ csum_failed:
>  	if (cb->errors) {
>  		bio_io_error(cb->orig_bio);
>  	} else {
> -		int bio_index = 0;
> -		struct bio_vec *bvec = cb->orig_bio->bi_io_vec;
> +		int i;
> +		struct bio_vec *bvec;
>  
>  		/*
>  		 * we have verified the checksum already, set page
>  		 * checked so the end_io handlers know about it
>  		 */
> -		while (bio_index < cb->orig_bio->bi_vcnt) {
> +		bio_for_each_segment_all(bvec, cb->orig_bio, i)
>  			SetPageChecked(bvec->bv_page);
> -			bvec++;
> -			bio_index++;
> -		}
> +
>  		bio_endio(cb->orig_bio, 0);
>  	}
>  
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 62176ad..733182e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -850,20 +850,17 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
>  
>  static int btree_csum_one_bio(struct bio *bio)
>  {
> -	struct bio_vec *bvec = bio->bi_io_vec;
> -	int bio_index = 0;
> +	struct bio_vec *bvec;
>  	struct btrfs_root *root;
> -	int ret = 0;
> +	int i, ret = 0;
>  
> -	WARN_ON(bio->bi_vcnt <= 0);
> -	while (bio_index < bio->bi_vcnt) {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
>  		ret = csum_dirty_buffer(root, bvec->bv_page);
>  		if (ret)
>  			break;
> -		bio_index++;
> -		bvec++;
>  	}
> +
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0df176a..ea5a08b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2014,7 +2014,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
>  	}
>  	bio->bi_bdev = dev->bdev;
>  	bio_add_page(bio, page, length, start - page_offset(page));
> -	btrfsic_submit_bio(WRITE_SYNC, bio);
> +	btrfsic_submit_bio(WRITE_SYNC, bio); /* XXX: submit_bio_wait() */
>  	wait_for_completion(&compl);
>  
>  	if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
> @@ -2340,12 +2340,13 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
>   */
>  static void end_bio_extent_writepage(struct bio *bio, int err)
>  {
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bvec;
>  	struct extent_io_tree *tree;
>  	u64 start;
>  	u64 end;
> +	int i;
>  
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		struct page *page = bvec->bv_page;
>  		tree = &BTRFS_I(page->mapping->host)->io_tree;
>  
> @@ -2363,14 +2364,11 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
>  		start = page_offset(page);
>  		end = start + bvec->bv_offset + bvec->bv_len - 1;
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> -
>  		if (end_extent_writepage(page, err, start, end))
>  			continue;
>  
>  		end_page_writeback(page);
> -	} while (bvec >= bio->bi_io_vec);
> +	}
>  
>  	bio_put(bio);
>  }
> @@ -2400,9 +2398,8 @@ endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
>   */
>  static void end_bio_extent_readpage(struct bio *bio, int err)
>  {
> +	struct bio_vec *bvec;
>  	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
> -	struct bio_vec *bvec = bio->bi_io_vec;
>  	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
>  	struct extent_io_tree *tree;
>  	u64 offset = 0;
> @@ -2413,11 +2410,12 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>  	u64 extent_len = 0;
>  	int mirror;
>  	int ret;
> +	int i;
>  
>  	if (err)
>  		uptodate = 0;
>  
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		struct page *page = bvec->bv_page;
>  		struct inode *inode = page->mapping->host;
>  
> @@ -2441,9 +2439,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>  		end = start + bvec->bv_offset + bvec->bv_len - 1;
>  		len = bvec->bv_len;
>  
> -		if (++bvec <= bvec_end)
> -			prefetchw(&bvec->bv_page->flags);
> -
>  		mirror = io_bio->mirror_num;
>  		if (likely(uptodate && tree->ops &&
>  			   tree->ops->readpage_end_io_hook)) {
> @@ -2524,7 +2519,7 @@ readpage_ok:
>  			extent_start = start;
>  			extent_len = end + 1 - start;
>  		}
> -	} while (bvec <= bvec_end);
> +	}
>  
>  	if (extent_len)
>  		endio_readpage_release_extent(tree, extent_start, extent_len,
> @@ -2555,7 +2550,6 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
>  	}
>  
>  	if (bio) {
> -		bio->bi_iter.bi_size = 0;
>  		bio->bi_bdev = bdev;
>  		bio->bi_iter.bi_sector = first_sector;
>  		btrfs_bio = btrfs_io_bio(bio);
> @@ -3418,20 +3412,18 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
>  
>  static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
>  {
> -	int uptodate = err == 0;
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bvec;
>  	struct extent_buffer *eb;
> -	int done;
> +	int i, done;
>  
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		struct page *page = bvec->bv_page;
>  
> -		bvec--;
>  		eb = (struct extent_buffer *)page->private;
>  		BUG_ON(!eb);
>  		done = atomic_dec_and_test(&eb->io_pages);
>  
> -		if (!uptodate || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) {
> +		if (err || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) {
>  			set_bit(EXTENT_BUFFER_IOERR, &eb->bflags);
>  			ClearPageUptodate(page);
>  			SetPageError(page);
> @@ -3443,10 +3435,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
>  			continue;
>  
>  		end_extent_buffer_writeback(eb);
> -	} while (bvec >= bio->bi_io_vec);
> +	}
>  
>  	bio_put(bio);
> -
>  }
>  
>  static int write_one_eb(struct extent_buffer *eb,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 6f5a64d..b7209a6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6765,17 +6765,16 @@ unlock_err:
>  static void btrfs_endio_direct_read(struct bio *bio, int err)
>  {
>  	struct btrfs_dio_private *dip = bio->bi_private;
> -	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
> -	struct bio_vec *bvec = bio->bi_io_vec;
> +	struct bio_vec *bvec;
>  	struct inode *inode = dip->inode;
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct bio *dio_bio;
>  	u32 *csums = (u32 *)dip->csum;
> -	int index = 0;
>  	u64 start;
> +	int i;
>  
>  	start = dip->logical_offset;
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
>  			struct page *page = bvec->bv_page;
>  			char *kaddr;
> @@ -6791,18 +6790,16 @@ static void btrfs_endio_direct_read(struct bio *bio, int err)
>  			local_irq_restore(flags);
>  
>  			flush_dcache_page(bvec->bv_page);
> -			if (csum != csums[index]) {
> +			if (csum != csums[i]) {
>  				btrfs_err(root->fs_info, "csum failed ino %llu off %llu csum %u expected csum %u",
>  					  btrfs_ino(inode), start, csum,
> -					  csums[index]);
> +					  csums[i]);
>  				err = -EIO;
>  			}
>  		}
>  
>  		start += bvec->bv_len;
> -		bvec++;
> -		index++;
> -	} while (bvec <= bvec_end);
> +	}
>  
>  	unlock_extent(&BTRFS_I(inode)->io_tree, dip->logical_offset,
>  		      dip->logical_offset + dip->bytes - 1);
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 97d8b34..dd02271 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -357,23 +357,20 @@ repeat:
>  
>  static void read_end_io(struct bio *bio, int err)
>  {
> -	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bvec;
> +	int i;
>  
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		struct page *page = bvec->bv_page;
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> -
> -		if (uptodate) {
> +		if (!err) {
>  			SetPageUptodate(page);
>  		} else {
>  			ClearPageUptodate(page);
>  			SetPageError(page);
>  		}
>  		unlock_page(page);
> -	} while (bvec >= bio->bi_io_vec);
> +	}
>  	bio_put(bio);
>  }
>  
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9d77ce1..4382c90 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -575,16 +575,14 @@ static const struct segment_allocation default_salloc_ops = {
>  
>  static void f2fs_end_io_write(struct bio *bio, int err)
>  {
> -	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
>  	struct bio_private *p = bio->bi_private;
> +	struct bio_vec *bvec;
> +	int i;
>  
> -	do {
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		struct page *page = bvec->bv_page;
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> -		if (!uptodate) {
> +		if (err) {
>  			SetPageError(page);
>  			if (page->mapping)
>  				set_bit(AS_EIO, &page->mapping->flags);
> @@ -593,7 +591,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
>  		}
>  		end_page_writeback(page);
>  		dec_page_count(p->sbi, F2FS_WRITEBACK);
> -	} while (bvec >= bio->bi_io_vec);
> +	}
>  
>  	if (p->is_sync)
>  		complete(p->wait);
> diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
> index a1b161f..ca42715 100644
> --- a/fs/logfs/dev_bdev.c
> +++ b/fs/logfs/dev_bdev.c
> @@ -67,22 +67,18 @@ static DECLARE_WAIT_QUEUE_HEAD(wq);
>  static void writeseg_end_io(struct bio *bio, int err)
>  {
>  	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bvec;
> +	int i;
>  	struct super_block *sb = bio->bi_private;
>  	struct logfs_super *super = logfs_super(sb);
> -	struct page *page;
>  
>  	BUG_ON(!uptodate); /* FIXME: Retry io or write elsewhere */
>  	BUG_ON(err);
> -	BUG_ON(bio->bi_vcnt == 0);
> -	do {
> -		page = bvec->bv_page;
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> -
> -		end_page_writeback(page);
> -		page_cache_release(page);
> -	} while (bvec >= bio->bi_io_vec);
> +
> +	bio_for_each_segment_all(bvec, bio, i) {
> +		end_page_writeback(bvec->bv_page);
> +		page_cache_release(bvec->bv_page);
> +	}
>  	bio_put(bio);
>  	if (atomic_dec_and_test(&super->s_pending_writes))
>  		wake_up(&wq);
> diff --git a/fs/mpage.c b/fs/mpage.c
> index 92b125f..4979ffa 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -43,16 +43,14 @@
>   */
>  static void mpage_end_io(struct bio *bio, int err)
>  {
> -	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bv;
> +	int i;
>  
> -	do {
> -		struct page *page = bvec->bv_page;
> +	bio_for_each_segment_all(bv, bio, i) {
> +		struct page *page = bv->bv_page;
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
>  		if (bio_data_dir(bio) == READ) {
> -			if (uptodate) {
> +			if (!err) {
>  				SetPageUptodate(page);
>  			} else {
>  				ClearPageUptodate(page);
> @@ -60,14 +58,15 @@ static void mpage_end_io(struct bio *bio, int err)
>  			}
>  			unlock_page(page);
>  		} else { /* bio_data_dir(bio) == WRITE */
> -			if (!uptodate) {
> +			if (err) {
>  				SetPageError(page);
>  				if (page->mapping)
>  					set_bit(AS_EIO, &page->mapping->flags);
>  			}
>  			end_page_writeback(page);
>  		}
> -	} while (bvec >= bio->bi_io_vec);
> +	}
> +
>  	bio_put(bio);
>  }
>  
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index af73896..56ff823 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -202,18 +202,14 @@ static struct bio *bl_add_page_to_bio(struct bio *bio, int npg, int rw,
>  static void bl_end_io_read(struct bio *bio, int err)
>  {
>  	struct parallel_io *par = bio->bi_private;
> -	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> +	struct bio_vec *bvec;
> +	int i;
>  
> -	do {
> -		struct page *page = bvec->bv_page;
> +	if (!err)
> +		bio_for_each_segment_all(bvec, bio, i)
> +			SetPageUptodate(bvec->bv_page);
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> -		if (uptodate)
> -			SetPageUptodate(page);
> -	} while (bvec >= bio->bi_io_vec);
> -	if (!uptodate) {
> +	if (err) {
>  		struct nfs_read_data *rdata = par->data;
>  		struct nfs_pgio_header *header = rdata->header;
>  
> @@ -384,20 +380,16 @@ static void mark_extents_written(struct pnfs_block_layout *bl,
>  static void bl_end_io_write_zero(struct bio *bio, int err)
>  {
>  	struct parallel_io *par = bio->bi_private;
> -	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> -	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
> -
> -	do {
> -		struct page *page = bvec->bv_page;
> +	struct bio_vec *bvec;
> +	int i;
>  
> -		if (--bvec >= bio->bi_io_vec)
> -			prefetchw(&bvec->bv_page->flags);
> +	bio_for_each_segment_all(bvec, bio, i) {
>  		/* This is the zeroing page we added */
> -		end_page_writeback(page);
> -		page_cache_release(page);
> -	} while (bvec >= bio->bi_io_vec);
> +		end_page_writeback(bvec->bv_page);
> +		page_cache_release(bvec->bv_page);
> +	}
>  
> -	if (unlikely(!uptodate)) {
> +	if (unlikely(err)) {
>  		struct nfs_write_data *data = par->data;
>  		struct nfs_pgio_header *header = data->header;
>  
> -- 
> 1.8.4.rc3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/9] block: Convert various code to bio_for_each_segment()
  2013-11-07 11:26   ` Jan Kara
@ 2013-11-07 20:21     ` Kent Overstreet
  0 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2013-11-07 20:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-fsdevel, linux-btrfs, axboe, hch,
	Alexander Viro, Chris Mason, Jaegeuk Kim, Joern Engel,
	Prasad Joshi, Trond Myklebust

On Thu, Nov 07, 2013 at 12:26:30PM +0100, Jan Kara wrote:
> On Mon 04-11-13 15:36:19, Kent Overstreet wrote:
> > With immutable biovecs we don't want code accessing bi_io_vec directly -
> > the uses this patch changes weren't incorrect since they all own the
> > bio, but it makes the code harder to audit for no good reason - also,
> > this will help with multipage bvecs later.
>   I think you've missed the code in fs/ext4/page-io.c in the conversion
> (likely because it was added relatively recently).

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-07 20:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-04 23:36 [PATCH] Block layer stuff/DIO rewrite prep for 3.14 Kent Overstreet
2013-11-04 23:36 ` [PATCH 1/9] block: Convert various code to bio_for_each_segment() Kent Overstreet
2013-11-05 13:53   ` Josef Bacik
2013-11-07 11:26   ` Jan Kara
2013-11-07 20:21     ` Kent Overstreet
2013-11-04 23:36 ` [PATCH 2/9] block: submit_bio_wait() conversions Kent Overstreet
2013-11-04 23:36 ` [PATCH 3/9] block: Move bouncing to generic_make_request() Kent Overstreet
2013-11-04 23:36 ` [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios Kent Overstreet
2013-11-04 23:56   ` [dm-devel] " Mike Christie
2013-11-05  0:55     ` Kent Overstreet
2013-11-04 23:36 ` [PATCH 5/9] block: Gut bio_add_page(), kill bio_add_pc_page() Kent Overstreet
2013-11-04 23:36 ` [PATCH 6/9] mtip32xx: handle arbitrary size bios Kent Overstreet
2013-11-05 13:49   ` Josef Bacik
2013-11-04 23:36 ` [PATCH 7/9] blk-lib.c: generic_make_request() handles large bios now Kent Overstreet
2013-11-04 23:36 ` [PATCH 8/9] bcache: " Kent Overstreet
2013-11-04 23:36 ` [PATCH 9/9] block: Add bio_get_user_pages() Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).