All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 RFC] utilize bio_clone_fast to clean up
@ 2017-04-18  1:16 Liu Bo
  2017-04-18  1:16 ` [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio Liu Bo
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

This attempts to use bio_clone_fast() in the places where we clone bio,
such as when bio got cloned for multiple disks and when bio got split
during dio submit.

One benefit is to simplify dio submit to avoid calling bio_add_page one by
one.

Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
faster because of copying the vector pointer directly, and bio_clone_fast
doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
currently have to use bi_iter to iterate bvec.

Liu Bo (6):
  Btrfs: use bio_clone_fast to clone our bio
  Btrfs: use bio_clone_bioset_partial to simplify DIO submit
  Btrfs: change how we iterate bios in endio
  Btrfs: record error if one block has failed to retry
  Btrfs: change check-integrity to use bvec_iter
  Btrfs: unify naming of btrfs_io_bio

 fs/btrfs/check-integrity.c |  27 +++---
 fs/btrfs/extent_io.c       |  18 +++-
 fs/btrfs/extent_io.h       |   1 +
 fs/btrfs/file-item.c       |  31 ++++---
 fs/btrfs/inode.c           | 203 ++++++++++++++++++++-------------------------
 fs/btrfs/volumes.h         |   1 +
 6 files changed, 138 insertions(+), 143 deletions(-)

-- 
2.5.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-05-17 17:53   ` David Sterba
  2017-04-18  1:16 ` [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

For raid1 and raid10, we clone the original bio to the bios which are then
sent to different disks.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 27fdb25..0d4aea4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2700,7 +2700,7 @@ struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask)
 	struct btrfs_io_bio *btrfs_bio;
 	struct bio *new;
 
-	new = bio_clone_bioset(bio, gfp_mask, btrfs_bioset);
+	new = bio_clone_fast(bio, gfp_mask, btrfs_bioset);
 	if (new) {
 		btrfs_bio = btrfs_io_bio(new);
 		btrfs_bio->csum = NULL;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
  2017-04-18  1:16 ` [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-05-11 14:16   ` David Sterba
  2017-05-16 14:37   ` Christoph Hellwig
  2017-04-18  1:16 ` [PATCH 3/6] Btrfs: change how we iterate bios in endio Liu Bo
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

Currently when mapping bio to limit bio to a single stripe length, we
split bio by adding page to bio one by one, but later we don't modify
the vector of bio at all, thus we can use bio_clone_fast to use the
original bio vector directly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c |  15 +++++++
 fs/btrfs/extent_io.h |   1 +
 fs/btrfs/inode.c     | 122 +++++++++++++++++++--------------------------------
 3 files changed, 62 insertions(+), 76 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0d4aea4..1b7156c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2726,6 +2726,21 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 	return bio;
 }
 
+struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size)
+{
+	struct bio *bio;
+
+	bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
+	if (bio) {
+		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
+		btrfs_bio->csum = NULL;
+		btrfs_bio->csum_allocated = NULL;
+		btrfs_bio->end_io = NULL;
+
+		bio_trim(bio, (offset >> 9), (size >> 9));
+	}
+	return bio;
+}
 
 static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 				       unsigned long bio_flags)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3e4fad4..3b2bc88 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -460,6 +460,7 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
 		gfp_t gfp_flags);
 struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs);
 struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask);
+struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size);
 
 struct btrfs_fs_info;
 struct btrfs_inode;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a18510b..6215720 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8230,16 +8230,6 @@ static void btrfs_end_dio_bio(struct bio *bio)
 	bio_put(bio);
 }
 
-static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev,
-				       u64 first_sector, gfp_t gfp_flags)
-{
-	struct bio *bio;
-	bio = btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags);
-	if (bio)
-		bio_associate_current(bio);
-	return bio;
-}
-
 static inline int btrfs_lookup_and_bind_dio_csum(struct inode *inode,
 						 struct btrfs_dio_private *dip,
 						 struct bio *bio,
@@ -8329,24 +8319,22 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct bio *bio;
 	struct bio *orig_bio = dip->orig_bio;
-	struct bio_vec *bvec;
 	u64 start_sector = orig_bio->bi_iter.bi_sector;
 	u64 file_offset = dip->logical_offset;
-	u64 submit_len = 0;
 	u64 map_length;
-	u32 blocksize = fs_info->sectorsize;
 	int async_submit = 0;
-	int nr_sectors;
+	int submit_len;
+	int clone_offset = 0;
+	int clone_len;
 	int ret;
-	int i, j;
 
-	map_length = orig_bio->bi_iter.bi_size;
+	submit_len = map_length = orig_bio->bi_iter.bi_size;
 	ret = btrfs_map_block(fs_info, btrfs_op(orig_bio), start_sector << 9,
 			      &map_length, NULL, 0);
 	if (ret)
 		return -EIO;
 
-	if (map_length >= orig_bio->bi_iter.bi_size) {
+	if (map_length >= submit_len) {
 		bio = orig_bio;
 		dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED;
 		goto submit;
@@ -8358,70 +8346,52 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
 	else
 		async_submit = 1;
 
-	bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
-	if (!bio)
-		return -ENOMEM;
-
-	bio->bi_opf = orig_bio->bi_opf;
-	bio->bi_private = dip;
-	bio->bi_end_io = btrfs_end_dio_bio;
-	btrfs_io_bio(bio)->logical = file_offset;
+	/* bio split */
 	atomic_inc(&dip->pending_bios);
+	while (submit_len > 0) {
+		/* map_length < submit_len, it's a int */
+		clone_len = min(submit_len, (int)map_length);
+		bio = btrfs_bio_clone_partial(orig_bio, GFP_NOFS, clone_offset, clone_len);
+		if (!bio)
+			goto out_err;
+		/* the above clone call also clone blkcg of orig_bio */
+
+		bio->bi_private = dip;
+		bio->bi_end_io = btrfs_end_dio_bio;
+		btrfs_io_bio(bio)->logical = file_offset;
+
+		ASSERT(submit_len >= clone_len);
+		submit_len -= clone_len;
+		if (submit_len == 0)
+			break;
 
-	bio_for_each_segment_all(bvec, orig_bio, j) {
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
-		i = 0;
-next_block:
-		if (unlikely(map_length < submit_len + blocksize ||
-		    bio_add_page(bio, bvec->bv_page, blocksize,
-			    bvec->bv_offset + (i * blocksize)) < blocksize)) {
-			/*
-			 * inc the count before we submit the bio so
-			 * we know the end IO handler won't happen before
-			 * we inc the count. Otherwise, the dip might get freed
-			 * before we're done setting it up
-			 */
-			atomic_inc(&dip->pending_bios);
-			ret = __btrfs_submit_dio_bio(bio, inode,
-						     file_offset, skip_sum,
-						     async_submit);
-			if (ret) {
-				bio_put(bio);
-				atomic_dec(&dip->pending_bios);
-				goto out_err;
-			}
-
-			start_sector += submit_len >> 9;
-			file_offset += submit_len;
-
-			submit_len = 0;
+		/*
+		 * increase the count before we submit the bio so we know the
+		 * end IO handler won't happen before we increase the
+		 * count. Otherwise, the dip might get freed before we're done
+		 * setting it up.
+		 */
+		atomic_inc(&dip->pending_bios);
 
-			bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
-						  start_sector, GFP_NOFS);
-			if (!bio)
-				goto out_err;
-			bio->bi_opf = orig_bio->bi_opf;
-			bio->bi_private = dip;
-			bio->bi_end_io = btrfs_end_dio_bio;
-			btrfs_io_bio(bio)->logical = file_offset;
+		ret = __btrfs_submit_dio_bio(bio, inode,
+					     file_offset, skip_sum,
+					     async_submit);
+		if (ret) {
+			bio_put(bio);
+			atomic_dec(&dip->pending_bios);
+			goto out_err;
+		}
 
-			map_length = orig_bio->bi_iter.bi_size;
-			ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
-					      start_sector << 9,
-					      &map_length, NULL, 0);
-			if (ret) {
-				bio_put(bio);
-				goto out_err;
-			}
+		clone_offset += clone_len;
+		start_sector += clone_len >> 9;
+		file_offset += clone_len;
 
-			goto next_block;
-		} else {
-			submit_len += blocksize;
-			if (--nr_sectors) {
-				i++;
-				goto next_block;
-			}
-		}
+		map_length = submit_len;
+		ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
+				      (start_sector << 9),
+				      &map_length, NULL, 0);
+		if (ret)
+			goto out_err;
 	}
 
 submit:
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] Btrfs: change how we iterate bios in endio
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
  2017-04-18  1:16 ` [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio Liu Bo
  2017-04-18  1:16 ` [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-04-18  1:16 ` [PATCH 4/6] Btrfs: record error if one block has failed to retry Liu Bo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio's read
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c |  1 +
 fs/btrfs/file-item.c | 31 +++++++++++++++----------------
 fs/btrfs/inode.c     | 33 +++++++++++++++++----------------
 fs/btrfs/volumes.h   |  1 +
 4 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1b7156c..54108d1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2738,6 +2738,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset
 		btrfs_bio->end_io = NULL;
 
 		bio_trim(bio, (offset >> 9), (size >> 9));
+		btrfs_bio->iter = bio->bi_iter;
 	}
 	return bio;
 }
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 64fcb31..9f6062c 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -164,7 +164,8 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 				   u64 logical_offset, u32 *dst, int dio)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
@@ -177,7 +178,7 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	u64 page_bytes_left;
 	u32 diff;
 	int nblocks;
-	int count = 0, i;
+	int count = 0;
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
 	path = btrfs_alloc_path();
@@ -206,8 +207,6 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	if (bio->bi_iter.bi_size > PAGE_SIZE * 8)
 		path->reada = READA_FORWARD;
 
-	WARN_ON(bio->bi_vcnt <= 0);
-
 	/*
 	 * the free space stuff is only read when it hasn't been
 	 * updated in the current transaction.  So, we can safely
@@ -223,13 +222,13 @@ static int __btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	if (dio)
 		offset = logical_offset;
 
-	bio_for_each_segment_all(bvec, bio, i) {
-		page_bytes_left = bvec->bv_len;
+	bio_for_each_segment(bvec, bio, iter) {
+		page_bytes_left = bvec.bv_len;
 		if (count)
 			goto next;
 
 		if (!dio)
-			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
 		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
 					       (u32 *)csum, nblocks);
 		if (count)
@@ -440,15 +439,15 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
 	struct btrfs_ordered_sum *sums;
 	struct btrfs_ordered_extent *ordered = NULL;
 	char *data;
-	struct bio_vec *bvec;
+	struct bvec_iter iter;
+	struct bio_vec bvec;
 	int index;
 	int nr_sectors;
-	int i, j;
 	unsigned long total_bytes = 0;
 	unsigned long this_sum_bytes = 0;
+	int i;
 	u64 offset;
 
-	WARN_ON(bio->bi_vcnt <= 0);
 	sums = kzalloc(btrfs_ordered_sum_size(fs_info, bio->bi_iter.bi_size),
 		       GFP_NOFS);
 	if (!sums)
@@ -465,19 +464,19 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
 	sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
 	index = 0;
 
-	bio_for_each_segment_all(bvec, bio, j) {
+	bio_for_each_segment(bvec, bio, iter) {
 		if (!contig)
-			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
 
 		if (!ordered) {
 			ordered = btrfs_lookup_ordered_extent(inode, offset);
 			BUG_ON(!ordered); /* Logic error */
 		}
 
-		data = kmap_atomic(bvec->bv_page);
+		data = kmap_atomic(bvec.bv_page);
 
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
-						 bvec->bv_len + fs_info->sectorsize
+						 bvec.bv_len + fs_info->sectorsize
 						 - 1);
 
 		for (i = 0; i < nr_sectors; i++) {
@@ -504,12 +503,12 @@ int btrfs_csum_one_bio(struct inode *inode, struct bio *bio,
 					+ total_bytes;
 				index = 0;
 
-				data = kmap_atomic(bvec->bv_page);
+				data = kmap_atomic(bvec.bv_page);
 			}
 
 			sums->sums[index] = ~(u32)0;
 			sums->sums[index]
-				= btrfs_csum_data(data + bvec->bv_offset
+				= btrfs_csum_data(data + bvec.bv_offset
 						+ (i * fs_info->sectorsize),
 						sums->sums[index],
 						fs_info->sectorsize);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6215720..fca2f1f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7857,6 +7857,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
 	struct bio *bio;
 	int isector;
 	int read_mode = 0;
+	int segs;
 	int ret;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -7872,9 +7873,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
 		return -EIO;
 	}
 
-	if ((failed_bio->bi_vcnt > 1)
-		|| (failed_bio->bi_io_vec->bv_len
-			> btrfs_inode_sectorsize(inode)))
+	segs = __bio_segments(failed_bio, &btrfs_io_bio(failed_bio)->iter);
+	if (segs > 1 ||
+	    (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
 		read_mode |= REQ_FAILFAST_DEV;
 
 	isector = start - btrfs_io_bio(failed_bio)->logical;
@@ -7933,13 +7934,13 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 				       struct btrfs_io_bio *io_bio)
 {
 	struct btrfs_fs_info *fs_info;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	struct btrfs_retry_complete done;
 	u64 start;
 	unsigned int pgoff;
 	u32 sectorsize;
 	int nr_sectors;
-	int i;
 	int ret;
 
 	fs_info = BTRFS_I(inode)->root->fs_info;
@@ -7948,16 +7949,16 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 	start = io_bio->logical;
 	done.inode = inode;
 
-	bio_for_each_segment_all(bvec, &io_bio->bio, i) {
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
-		pgoff = bvec->bv_offset;
+	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
+		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
+		pgoff = bvec.bv_offset;
 
 next_block_or_try_again:
 		done.uptodate = 0;
 		done.start = start;
 		init_completion(&done.done);
 
-		ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+		ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
 				pgoff, start, start + sectorsize - 1,
 				io_bio->mirror_num,
 				btrfs_retry_endio_nocsum, &done);
@@ -8025,7 +8026,8 @@ static int __btrfs_subio_endio_read(struct inode *inode,
 				    struct btrfs_io_bio *io_bio, int err)
 {
 	struct btrfs_fs_info *fs_info;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	struct btrfs_retry_complete done;
 	u64 start;
 	u64 offset = 0;
@@ -8033,7 +8035,6 @@ static int __btrfs_subio_endio_read(struct inode *inode,
 	int nr_sectors;
 	unsigned int pgoff;
 	int csum_pos;
-	int i;
 	int ret;
 
 	fs_info = BTRFS_I(inode)->root->fs_info;
@@ -8043,14 +8044,14 @@ static int __btrfs_subio_endio_read(struct inode *inode,
 	start = io_bio->logical;
 	done.inode = inode;
 
-	bio_for_each_segment_all(bvec, &io_bio->bio, i) {
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
+	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
+		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
 
-		pgoff = bvec->bv_offset;
+		pgoff = bvec.bv_offset;
 next_block:
 		csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
 		ret = __readpage_endio_check(inode, io_bio, csum_pos,
-					bvec->bv_page, pgoff, start,
+					bvec.bv_page, pgoff, start,
 					sectorsize);
 		if (likely(!ret))
 			goto next;
@@ -8059,7 +8060,7 @@ static int __btrfs_subio_endio_read(struct inode *inode,
 		done.start = start;
 		init_completion(&done.done);
 
-		ret = dio_read_error(inode, &io_bio->bio, bvec->bv_page,
+		ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
 				pgoff, start, start + sectorsize - 1,
 				io_bio->mirror_num,
 				btrfs_retry_endio, &done);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be812..558d73c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -280,6 +280,7 @@ struct btrfs_io_bio {
 	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
 	u8 *csum_allocated;
 	btrfs_io_bio_end_io_t *end_io;
+	struct bvec_iter iter;
 	struct bio bio;
 };
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] Btrfs: record error if one block has failed to retry
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
                   ` (2 preceding siblings ...)
  2017-04-18  1:16 ` [PATCH 3/6] Btrfs: change how we iterate bios in endio Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-05-17 18:32   ` David Sterba
  2017-04-18  1:16 ` [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter Liu Bo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

In the nocsum case of dio read endio, it will return immediately if an
error got returned when repairing, which left the rest blocks unrepaired.
The behavior is different from how buffered read endio works in the same
case.  This changes it to record error only and go on repairing the rest
blocks.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/inode.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fca2f1f..cc46d21 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7942,6 +7942,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 	u32 sectorsize;
 	int nr_sectors;
 	int ret;
+	int err;
 
 	fs_info = BTRFS_I(inode)->root->fs_info;
 	sectorsize = fs_info->sectorsize;
@@ -7962,8 +7963,10 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 				pgoff, start, start + sectorsize - 1,
 				io_bio->mirror_num,
 				btrfs_retry_endio_nocsum, &done);
-		if (ret)
-			return ret;
+		if (ret) {
+			err = ret;
+			goto next;
+		}
 
 		wait_for_completion(&done.done);
 
@@ -7972,6 +7975,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 			goto next_block_or_try_again;
 		}
 
+next:
 		start += sectorsize;
 
 		if (nr_sectors--) {
@@ -7980,7 +7984,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
 		}
 	}
 
-	return 0;
+	return err;
 }
 
 static void btrfs_retry_endio(struct bio *bio)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
                   ` (3 preceding siblings ...)
  2017-04-18  1:16 ` [PATCH 4/6] Btrfs: record error if one block has failed to retry Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-05-05 17:13   ` David Sterba
  2017-04-18  1:16 ` [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio Liu Bo
  2017-05-05 14:24 ` [PATCH 0/6 RFC] utilize bio_clone_fast to clean up David Sterba
  6 siblings, 1 reply; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

Some check-integrity code depends on bio->bi_vcnt, this changes it to use
bio segments because some bios passing here may not have a reliable
bi_vcnt.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/check-integrity.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index ab14c2e..8e7ce48 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2822,44 +2822,47 @@ static void __btrfsic_submit_bio(struct bio *bio)
 	dev_state = btrfsic_dev_state_lookup(bio->bi_bdev);
 	if (NULL != dev_state &&
 	    (bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
-		unsigned int i;
+		unsigned int i = 0;
 		u64 dev_bytenr;
 		u64 cur_bytenr;
-		struct bio_vec *bvec;
+		struct bio_vec bvec;
+		struct bvec_iter iter;
 		int bio_is_patched;
 		char **mapped_datav;
+		int segs = bio_segments(bio);
 
 		dev_bytenr = 512 * bio->bi_iter.bi_sector;
 		bio_is_patched = 0;
 		if (dev_state->state->print_mask &
 		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
 			pr_info("submit_bio(rw=%d,0x%x, bi_vcnt=%u, bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
-			       bio_op(bio), bio->bi_opf, bio->bi_vcnt,
+			       bio_op(bio), bio->bi_opf, segs,
 			       (unsigned long long)bio->bi_iter.bi_sector,
 			       dev_bytenr, bio->bi_bdev);
 
-		mapped_datav = kmalloc_array(bio->bi_vcnt,
+		mapped_datav = kmalloc_array(segs,
 					     sizeof(*mapped_datav), GFP_NOFS);
 		if (!mapped_datav)
 			goto leave;
 		cur_bytenr = dev_bytenr;
 
-		bio_for_each_segment_all(bvec, bio, i) {
-			BUG_ON(bvec->bv_len != PAGE_SIZE);
-			mapped_datav[i] = kmap(bvec->bv_page);
+		bio_for_each_segment(bvec, bio, iter) {
+			BUG_ON(bvec.bv_len != PAGE_SIZE);
+			mapped_datav[i] = kmap(bvec.bv_page);
+			i++;
 
 			if (dev_state->state->print_mask &
 			    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE)
 				pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n",
-				       i, cur_bytenr, bvec->bv_len, bvec->bv_offset);
-			cur_bytenr += bvec->bv_len;
+				       i, cur_bytenr, bvec.bv_len, bvec.bv_offset);
+			cur_bytenr += bvec.bv_len;
 		}
 		btrfsic_process_written_block(dev_state, dev_bytenr,
-					      mapped_datav, bio->bi_vcnt,
+					      mapped_datav, segs,
 					      bio, &bio_is_patched,
 					      NULL, bio->bi_opf);
-		bio_for_each_segment_all(bvec, bio, i)
-			kunmap(bvec->bv_page);
+		bio_for_each_segment(bvec, bio, iter)
+			kunmap(bvec.bv_page);
 		kfree(mapped_datav);
 	} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
 		if (dev_state->state->print_mask &
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
                   ` (4 preceding siblings ...)
  2017-04-18  1:16 ` [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter Liu Bo
@ 2017-04-18  1:16 ` Liu Bo
  2017-05-17 18:32   ` David Sterba
  2017-05-05 14:24 ` [PATCH 0/6 RFC] utilize bio_clone_fast to clean up David Sterba
  6 siblings, 1 reply; 18+ messages in thread
From: Liu Bo @ 2017-04-18  1:16 UTC (permalink / raw)
  To: linux-btrfs

All dio endio functions are using io_bio for struct btrfs_io_bio, this
makes btrfs_submit_direct to follow this convention.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/inode.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cc46d21..73e7a44 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8424,16 +8424,16 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 				loff_t file_offset)
 {
 	struct btrfs_dio_private *dip = NULL;
-	struct bio *io_bio = NULL;
-	struct btrfs_io_bio *btrfs_bio;
+	struct bio *bio = NULL;
+	struct btrfs_io_bio *io_bio;
 	int skip_sum;
 	bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
 	int ret = 0;
 
 	skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
 
-	io_bio = btrfs_bio_clone(dio_bio, GFP_NOFS);
-	if (!io_bio) {
+	bio = btrfs_bio_clone(dio_bio, GFP_NOFS);
+	if (!bio) {
 		ret = -ENOMEM;
 		goto free_ordered;
 	}
@@ -8449,17 +8449,17 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	dip->logical_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
 	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
-	io_bio->bi_private = dip;
-	dip->orig_bio = io_bio;
+	bio->bi_private = dip;
+	dip->orig_bio = bio;
 	dip->dio_bio = dio_bio;
 	atomic_set(&dip->pending_bios, 0);
-	btrfs_bio = btrfs_io_bio(io_bio);
-	btrfs_bio->logical = file_offset;
+	io_bio = btrfs_io_bio(bio);
+	io_bio->logical = file_offset;
 
 	if (write) {
-		io_bio->bi_end_io = btrfs_endio_direct_write;
+		bio->bi_end_io = btrfs_endio_direct_write;
 	} else {
-		io_bio->bi_end_io = btrfs_endio_direct_read;
+		bio->bi_end_io = btrfs_endio_direct_read;
 		dip->subio_endio = btrfs_subio_endio_read;
 	}
 
@@ -8482,8 +8482,8 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	if (!ret)
 		return;
 
-	if (btrfs_bio->end_io)
-		btrfs_bio->end_io(btrfs_bio, ret);
+	if (io_bio->end_io)
+		io_bio->end_io(io_bio, ret);
 
 free_ordered:
 	/*
@@ -8495,16 +8495,16 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	 * same as btrfs_endio_direct_[write|read] because we can't call these
 	 * callbacks - they require an allocated dip and a clone of dio_bio.
 	 */
-	if (io_bio && dip) {
-		io_bio->bi_error = -EIO;
-		bio_endio(io_bio);
+	if (bio && dip) {
+		bio->bi_error = -EIO;
+		bio_endio(bio);
 		/*
-		 * The end io callbacks free our dip, do the final put on io_bio
+		 * The end io callbacks free our dip, do the final put on bio
 		 * and all the cleanup and final put for dio_bio (through
 		 * dio_end_io()).
 		 */
 		dip = NULL;
-		io_bio = NULL;
+		bio = NULL;
 	} else {
 		if (write)
 			btrfs_endio_direct_write_update_ordered(inode,
@@ -8522,8 +8522,8 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		 */
 		dio_end_io(dio_bio, ret);
 	}
-	if (io_bio)
-		bio_put(io_bio);
+	if (bio)
+		bio_put(bio);
 	kfree(dip);
 }
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 RFC] utilize bio_clone_fast to clean up
  2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
                   ` (5 preceding siblings ...)
  2017-04-18  1:16 ` [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio Liu Bo
@ 2017-05-05 14:24 ` David Sterba
  2017-05-09 22:49   ` Liu Bo
  6 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2017-05-05 14:24 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:21PM -0700, Liu Bo wrote:
> This attempts to use bio_clone_fast() in the places where we clone bio,
> such as when bio got cloned for multiple disks and when bio got split
> during dio submit.
> 
> One benefit is to simplify dio submit to avoid calling bio_add_page one by
> one.
> 
> Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
> faster because of copying the vector pointer directly, and bio_clone_fast
> doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
> currently have to use bi_iter to iterate bvec.
> 
> Liu Bo (6):
>   Btrfs: use bio_clone_fast to clone our bio

Please extend the changelog of this patch, use the text in the cover
letter.

>   Btrfs: use bio_clone_bioset_partial to simplify DIO submit

This patch is too big, can you split it to smaller chunks? I was not
able to review it, it seems to touch several things at once, it's hard
to keep the context.

>   Btrfs: change how we iterate bios in endio
>   Btrfs: record error if one block has failed to retry
>   Btrfs: change check-integrity to use bvec_iter
>   Btrfs: unify naming of btrfs_io_bio

The rest looks ok.

Have you done perofrmance tests? Not that it's necessary, but would be
interesting to see the effects. The effects of simplified code are
likely unmeasurable, but the _fast version skips some mempool exercises
so this could lead to improvements under memory pressure. And these is
hardly deterministic conditions, could be hard. I'me expecting some
latency improvemtnest.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter
  2017-04-18  1:16 ` [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter Liu Bo
@ 2017-05-05 17:13   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-05 17:13 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:26PM -0700, Liu Bo wrote:
> Some check-integrity code depends on bio->bi_vcnt, this changes it to use
> bio segments because some bios passing here may not have a reliable
> bi_vcnt.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/check-integrity.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index ab14c2e..8e7ce48 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2822,44 +2822,47 @@ static void __btrfsic_submit_bio(struct bio *bio)
>  	dev_state = btrfsic_dev_state_lookup(bio->bi_bdev);
>  	if (NULL != dev_state &&
>  	    (bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
> -		unsigned int i;
> +		unsigned int i = 0;
>  		u64 dev_bytenr;
>  		u64 cur_bytenr;
> -		struct bio_vec *bvec;
> +		struct bio_vec bvec;
> +		struct bvec_iter iter;
>  		int bio_is_patched;
>  		char **mapped_datav;
> +		int segs = bio_segments(bio);

Type mismatch, bio_segments return unsigned.

>  
>  		dev_bytenr = 512 * bio->bi_iter.bi_sector;
>  		bio_is_patched = 0;
>  		if (dev_state->state->print_mask &
>  		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
>  			pr_info("submit_bio(rw=%d,0x%x, bi_vcnt=%u, bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
> -			       bio_op(bio), bio->bi_opf, bio->bi_vcnt,
> +			       bio_op(bio), bio->bi_opf, segs,
>  			       (unsigned long long)bio->bi_iter.bi_sector,
>  			       dev_bytenr, bio->bi_bdev);
>  
> -		mapped_datav = kmalloc_array(bio->bi_vcnt,
> +		mapped_datav = kmalloc_array(segs,
>  					     sizeof(*mapped_datav), GFP_NOFS);
>  		if (!mapped_datav)
>  			goto leave;
>  		cur_bytenr = dev_bytenr;
>  
> -		bio_for_each_segment_all(bvec, bio, i) {
> -			BUG_ON(bvec->bv_len != PAGE_SIZE);
> -			mapped_datav[i] = kmap(bvec->bv_page);
> +		bio_for_each_segment(bvec, bio, iter) {
> +			BUG_ON(bvec.bv_len != PAGE_SIZE);
> +			mapped_datav[i] = kmap(bvec.bv_page);
> +			i++;
>  
>  			if (dev_state->state->print_mask &
>  			    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE)
>  				pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n",
> -				       i, cur_bytenr, bvec->bv_len, bvec->bv_offset);
> -			cur_bytenr += bvec->bv_len;
> +				       i, cur_bytenr, bvec.bv_len, bvec.bv_offset);
> +			cur_bytenr += bvec.bv_len;
>  		}
>  		btrfsic_process_written_block(dev_state, dev_bytenr,
> -					      mapped_datav, bio->bi_vcnt,
> +					      mapped_datav, segs,
>  					      bio, &bio_is_patched,
>  					      NULL, bio->bi_opf);
> -		bio_for_each_segment_all(bvec, bio, i)
> -			kunmap(bvec->bv_page);
> +		bio_for_each_segment(bvec, bio, iter)
> +			kunmap(bvec.bv_page);
>  		kfree(mapped_datav);
>  	} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
>  		if (dev_state->state->print_mask &
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 RFC] utilize bio_clone_fast to clean up
  2017-05-05 14:24 ` [PATCH 0/6 RFC] utilize bio_clone_fast to clean up David Sterba
@ 2017-05-09 22:49   ` Liu Bo
  2017-05-10  4:28     ` Liu Bo
  2017-05-10 17:53     ` David Sterba
  0 siblings, 2 replies; 18+ messages in thread
From: Liu Bo @ 2017-05-09 22:49 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Fri, May 05, 2017 at 04:24:47PM +0200, David Sterba wrote:
> On Mon, Apr 17, 2017 at 06:16:21PM -0700, Liu Bo wrote:
> > This attempts to use bio_clone_fast() in the places where we clone bio,
> > such as when bio got cloned for multiple disks and when bio got split
> > during dio submit.
> > 
> > One benefit is to simplify dio submit to avoid calling bio_add_page one by
> > one.
> > 
> > Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
> > faster because of copying the vector pointer directly, and bio_clone_fast
> > doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
> > currently have to use bi_iter to iterate bvec.
> > 
> > Liu Bo (6):
> >   Btrfs: use bio_clone_fast to clone our bio
> 
> Please extend the changelog of this patch, use the text in the cover
> letter.
>

OK.

> >   Btrfs: use bio_clone_bioset_partial to simplify DIO submit
> 
> This patch is too big, can you split it to smaller chunks? I was not
> able to review it, it seems to touch several things at once, it's hard
> to keep the context.
>

Oh I see, the diff does look scary but the changes are in fact not
intrusive, I'll try to do something.

> >   Btrfs: change how we iterate bios in endio
> >   Btrfs: record error if one block has failed to retry
> >   Btrfs: change check-integrity to use bvec_iter
> >   Btrfs: unify naming of btrfs_io_bio
> 
> The rest looks ok.
> 
> Have you done perofrmance tests? Not that it's necessary, but would be
> interesting to see the effects. The effects of simplified code are
> likely unmeasurable, but the _fast version skips some mempool exercises
> so this could lead to improvements under memory pressure. And these is
> hardly deterministic conditions, could be hard. I'me expecting some
> latency improvemtnest.

I haven't done the perf. test since it is a RFC that I basically hope
to check whether the idea makes sense.

And yes, using bio_clone_fas could save us some memory which is
allocated for bio->bi_io_vec if (nr_iovecs > inline_vecs).

I'll do some tests to see if there is any perf. difference and drop a
notice to intel's test robot if they can do much broader perf. tests
against it.

Thank you for the comments.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 RFC] utilize bio_clone_fast to clean up
  2017-05-09 22:49   ` Liu Bo
@ 2017-05-10  4:28     ` Liu Bo
  2017-05-10 17:53     ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: Liu Bo @ 2017-05-10  4:28 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Tue, May 09, 2017 at 03:49:13PM -0700, Liu Bo wrote:
> On Fri, May 05, 2017 at 04:24:47PM +0200, David Sterba wrote:
> > On Mon, Apr 17, 2017 at 06:16:21PM -0700, Liu Bo wrote:
> > > This attempts to use bio_clone_fast() in the places where we clone bio,
> > > such as when bio got cloned for multiple disks and when bio got split
> > > during dio submit.
> > > 
> > > One benefit is to simplify dio submit to avoid calling bio_add_page one by
> > > one.
> > > 
> > > Another benefit is that comparing to bio_clone_bioset, bio_clone_fast is
> > > faster because of copying the vector pointer directly, and bio_clone_fast
> > > doesn't modify bi_vcnt, so the extra work is to fix up bi_vcnt usage we
> > > currently have to use bi_iter to iterate bvec.
> > > 
> > > Liu Bo (6):
> > >   Btrfs: use bio_clone_fast to clone our bio
> > 
> > Please extend the changelog of this patch, use the text in the cover
> > letter.
> >
> 
> OK.
> 
> > >   Btrfs: use bio_clone_bioset_partial to simplify DIO submit
> > 
> > This patch is too big, can you split it to smaller chunks? I was not
> > able to review it, it seems to touch several things at once, it's hard
> > to keep the context.
> >
> 
> Oh I see, the diff does look scary but the changes are in fact not
> intrusive, I'll try to do something.
> 
> > >   Btrfs: change how we iterate bios in endio
> > >   Btrfs: record error if one block has failed to retry
> > >   Btrfs: change check-integrity to use bvec_iter
> > >   Btrfs: unify naming of btrfs_io_bio
> > 
> > The rest looks ok.
> > 
> > Have you done perofrmance tests? Not that it's necessary, but would be
> > interesting to see the effects. The effects of simplified code are
> > likely unmeasurable, but the _fast version skips some mempool exercises
> > so this could lead to improvements under memory pressure. And these is
> > hardly deterministic conditions, could be hard. I'me expecting some
> > latency improvemtnest.
> 
> I haven't done the perf. test since it is a RFC that I basically hope
> to check whether the idea makes sense.
> 
> And yes, using bio_clone_fas could save us some memory which is
> allocated for bio->bi_io_vec if (nr_iovecs > inline_vecs).

A quick test[1] showed a slightly better result, I ran the test 5 times each
round and took the average value,

vanilla 4.11-rc5: 15.39s
patched 4.11-rc5: 14.12s

[1]
M=/mnt/btrfs

D1=/dev/pmem0p1
D2=/dev/pmem0p2

umount $M
mkfs.btrfs -f $D1 $D2 >/dev/null || exit

mount $D1 $M -onodatasum || exit

xfs_io -f -c "falloc 0 2G" $M/foo

time xfs_io -d -c "pwrite -b 128K 0 2G" $M/foo


Thanks,

-liubo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6 RFC] utilize bio_clone_fast to clean up
  2017-05-09 22:49   ` Liu Bo
  2017-05-10  4:28     ` Liu Bo
@ 2017-05-10 17:53     ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-10 17:53 UTC (permalink / raw)
  To: Liu Bo; +Cc: dsterba, linux-btrfs

On Tue, May 09, 2017 at 03:49:13PM -0700, Liu Bo wrote:
> > This patch is too big, can you split it to smaller chunks? I was not
> > able to review it, it seems to touch several things at once, it's hard
> > to keep the context.
> 
> Oh I see, the diff does look scary but the changes are in fact not
> intrusive, I'll try to do something.

On second read, I think it's not that bad as it looked. Some of the code
is just moved/indented differently and I see the core of the change now.

I'm concerned about the used types for length/size, but this can be done
as a followup patch. If you have split the patch already, please send
it anyway.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit
  2017-04-18  1:16 ` [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
@ 2017-05-11 14:16   ` David Sterba
  2017-05-16 14:37   ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-11 14:16 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:23PM -0700, Liu Bo wrote:
> Currently when mapping bio to limit bio to a single stripe length, we
> split bio by adding page to bio one by one, but later we don't modify
> the vector of bio at all, thus we can use bio_clone_fast to use the
> original bio vector directly.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c |  15 +++++++
>  fs/btrfs/extent_io.h |   1 +
>  fs/btrfs/inode.c     | 122 +++++++++++++++++++--------------------------------
>  3 files changed, 62 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0d4aea4..1b7156c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2726,6 +2726,21 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
>  	return bio;
>  }
>  
> +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size)
> +{
> +	struct bio *bio;
> +
> +	bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
> +	if (bio) {

Please switch that to

	bio = ...;
	if (!bio)
		return NULL;

	(the rest)

	return bio;

> +		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
> +		btrfs_bio->csum = NULL;
> +		btrfs_bio->csum_allocated = NULL;
> +		btrfs_bio->end_io = NULL;
> +
> +		bio_trim(bio, (offset >> 9), (size >> 9));

Hm, so bio_trim also uses ints for the parameters, let's stick to that.

> +	}
> +	return bio;
> +}
>  
>  static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
>  				       unsigned long bio_flags)
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 3e4fad4..3b2bc88 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -460,6 +460,7 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
>  		gfp_t gfp_flags);
>  struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs);
>  struct bio *btrfs_bio_clone(struct bio *bio, gfp_t gfp_mask);
> +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size);

line over 80 chars

>  
>  struct btrfs_fs_info;
>  struct btrfs_inode;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a18510b..6215720 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8230,16 +8230,6 @@ static void btrfs_end_dio_bio(struct bio *bio)
>  	bio_put(bio);
>  }
>  
> -static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev,
> -				       u64 first_sector, gfp_t gfp_flags)
> -{
> -	struct bio *bio;
> -	bio = btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags);
> -	if (bio)
> -		bio_associate_current(bio);
> -	return bio;
> -}
> -
>  static inline int btrfs_lookup_and_bind_dio_csum(struct inode *inode,
>  						 struct btrfs_dio_private *dip,
>  						 struct bio *bio,
> @@ -8329,24 +8319,22 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct bio *bio;
>  	struct bio *orig_bio = dip->orig_bio;
> -	struct bio_vec *bvec;
>  	u64 start_sector = orig_bio->bi_iter.bi_sector;
>  	u64 file_offset = dip->logical_offset;
> -	u64 submit_len = 0;
>  	u64 map_length;
> -	u32 blocksize = fs_info->sectorsize;
>  	int async_submit = 0;
> -	int nr_sectors;
> +	int submit_len;
> +	int clone_offset = 0;
> +	int clone_len;
>  	int ret;
> -	int i, j;
>  
> -	map_length = orig_bio->bi_iter.bi_size;
> +	submit_len = map_length = orig_bio->bi_iter.bi_size;

Please do 2 separate initialization statements.

>  	ret = btrfs_map_block(fs_info, btrfs_op(orig_bio), start_sector << 9,
>  			      &map_length, NULL, 0);
>  	if (ret)
>  		return -EIO;
>  
> -	if (map_length >= orig_bio->bi_iter.bi_size) {
> +	if (map_length >= submit_len) {
>  		bio = orig_bio;
>  		dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED;
>  		goto submit;
> @@ -8358,70 +8346,52 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip,
>  	else
>  		async_submit = 1;
>  
> -	bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
> -	if (!bio)
> -		return -ENOMEM;
> -
> -	bio->bi_opf = orig_bio->bi_opf;
> -	bio->bi_private = dip;
> -	bio->bi_end_io = btrfs_end_dio_bio;
> -	btrfs_io_bio(bio)->logical = file_offset;
> +	/* bio split */
>  	atomic_inc(&dip->pending_bios);
> +	while (submit_len > 0) {
> +		/* map_length < submit_len, it's a int */
> +		clone_len = min(submit_len, (int)map_length);

The types are mixed, map_length is u64 and cannot be easily switched to
int (cascading change to several btrfs functions). The other way would
require similar changes outside of btrfs.

At least please use min_t here. I'd rather see some sanity check
regarding silent trimming of map_length than just relying on the output
of btrfs_map_block.

> +		bio = btrfs_bio_clone_partial(orig_bio, GFP_NOFS, clone_offset, clone_len);
> +		if (!bio)
> +			goto out_err;
> +		/* the above clone call also clone blkcg of orig_bio */
> +
> +		bio->bi_private = dip;
> +		bio->bi_end_io = btrfs_end_dio_bio;
> +		btrfs_io_bio(bio)->logical = file_offset;
> +
> +		ASSERT(submit_len >= clone_len);
> +		submit_len -= clone_len;
> +		if (submit_len == 0)
> +			break;
>  
> -	bio_for_each_segment_all(bvec, orig_bio, j) {
> -		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
> -		i = 0;
> -next_block:
> -		if (unlikely(map_length < submit_len + blocksize ||
> -		    bio_add_page(bio, bvec->bv_page, blocksize,
> -			    bvec->bv_offset + (i * blocksize)) < blocksize)) {
> -			/*
> -			 * inc the count before we submit the bio so
> -			 * we know the end IO handler won't happen before
> -			 * we inc the count. Otherwise, the dip might get freed
> -			 * before we're done setting it up
> -			 */
> -			atomic_inc(&dip->pending_bios);
> -			ret = __btrfs_submit_dio_bio(bio, inode,
> -						     file_offset, skip_sum,
> -						     async_submit);
> -			if (ret) {
> -				bio_put(bio);
> -				atomic_dec(&dip->pending_bios);
> -				goto out_err;
> -			}
> -
> -			start_sector += submit_len >> 9;
> -			file_offset += submit_len;
> -
> -			submit_len = 0;
> +		/*
> +		 * increase the count before we submit the bio so we know the
> +		 * end IO handler won't happen before we increase the
> +		 * count. Otherwise, the dip might get freed before we're done
> +		 * setting it up.

Small nit, as you already reformat and improve the comment, "Increase
the count ..."

> +		 */
> +		atomic_inc(&dip->pending_bios);
>  
> -			bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
> -						  start_sector, GFP_NOFS);
> -			if (!bio)
> -				goto out_err;
> -			bio->bi_opf = orig_bio->bi_opf;
> -			bio->bi_private = dip;
> -			bio->bi_end_io = btrfs_end_dio_bio;
> -			btrfs_io_bio(bio)->logical = file_offset;
> +		ret = __btrfs_submit_dio_bio(bio, inode,
> +					     file_offset, skip_sum,
> +					     async_submit);

Also here, indentation level is removed, the arguments can be
reformated.

> +		if (ret) {
> +			bio_put(bio);
> +			atomic_dec(&dip->pending_bios);
> +			goto out_err;
> +		}
>  
> -			map_length = orig_bio->bi_iter.bi_size;
> -			ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
> -					      start_sector << 9,
> -					      &map_length, NULL, 0);
> -			if (ret) {
> -				bio_put(bio);
> -				goto out_err;
> -			}
> +		clone_offset += clone_len;
> +		start_sector += clone_len >> 9;
> +		file_offset += clone_len;
>  
> -			goto next_block;
> -		} else {
> -			submit_len += blocksize;
> -			if (--nr_sectors) {
> -				i++;
> -				goto next_block;
> -			}
> -		}
> +		map_length = submit_len;
> +		ret = btrfs_map_block(fs_info, btrfs_op(orig_bio),
> +				      (start_sector << 9),
> +				      &map_length, NULL, 0);
> +		if (ret)
> +			goto out_err;
>  	}

So, I think I understand the change, at least enough to make me
comfortable to put the series to for-next, once you update it. Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit
  2017-04-18  1:16 ` [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
  2017-05-11 14:16   ` David Sterba
@ 2017-05-16 14:37   ` Christoph Hellwig
  2017-05-16 17:15     ` Liu Bo
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2017-05-16 14:37 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

>  }
>  
> +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size)
> +{
> +	struct bio *bio;
> +
> +	bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
> +	if (bio) {

bio_clone_fast will never fail when backed by a bioset, which this
one always is.  Also you always pass GFP_NPFS as the gfp_mask argument,
it might make sense to hardcode that here.

> +		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
> +		btrfs_bio->csum = NULL;
> +		btrfs_bio->csum_allocated = NULL;
> +		btrfs_bio->end_io = NULL;
> +
> +		bio_trim(bio, (offset >> 9), (size >> 9));

No need for the inner braces here.

Last but not least do you even need this as a separate helper?

> +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size);

Over long line, please trim to 80 characters

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit
  2017-05-16 14:37   ` Christoph Hellwig
@ 2017-05-16 17:15     ` Liu Bo
  0 siblings, 0 replies; 18+ messages in thread
From: Liu Bo @ 2017-05-16 17:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs

On Tue, May 16, 2017 at 07:37:37AM -0700, Christoph Hellwig wrote:
> >  }
> >  
> > +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size)
> > +{
> > +	struct bio *bio;
> > +
> > +	bio = bio_clone_fast(orig, gfp_mask, btrfs_bioset);
> > +	if (bio) {
> 
> bio_clone_fast will never fail when backed by a bioset, which this
> one always is.  Also you always pass GFP_NPFS as the gfp_mask argument,
> it might make sense to hardcode that here.
> 

I see.

> > +		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
> > +		btrfs_bio->csum = NULL;
> > +		btrfs_bio->csum_allocated = NULL;
> > +		btrfs_bio->end_io = NULL;
> > +
> > +		bio_trim(bio, (offset >> 9), (size >> 9));
> 
> No need for the inner braces here.
> 
> Last but not least do you even need this as a separate helper?
> 

Not necessary indeed, but I need to access %btrfs_bioset which is
'static' defined in extent_io.c

> > +struct bio *btrfs_bio_clone_partial(struct bio *orig, gfp_t gfp_mask, int offset, int size);
> 
> Over long line, please trim to 80 characters

OK, fixed.

Thanks for the comments.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio
  2017-04-18  1:16 ` [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio Liu Bo
@ 2017-05-17 17:53   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-17 17:53 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:22PM -0700, Liu Bo wrote:
> For raid1 and raid10, we clone the original bio to the bios which are then
> sent to different disks.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] Btrfs: record error if one block has failed to retry
  2017-04-18  1:16 ` [PATCH 4/6] Btrfs: record error if one block has failed to retry Liu Bo
@ 2017-05-17 18:32   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-17 18:32 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:25PM -0700, Liu Bo wrote:
> In the nocsum case of dio read endio, it will return immediately if an
> error got returned when repairing, which left the rest blocks unrepaired.
> The behavior is different from how buffered read endio works in the same
> case.  This changes it to record error only and go on repairing the rest
> blocks.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

> ---
>  fs/btrfs/inode.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index fca2f1f..cc46d21 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7942,6 +7942,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
>  	u32 sectorsize;
>  	int nr_sectors;
>  	int ret;
> +	int err;
>  
>  	fs_info = BTRFS_I(inode)->root->fs_info;
>  	sectorsize = fs_info->sectorsize;
> @@ -7962,8 +7963,10 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
>  				pgoff, start, start + sectorsize - 1,
>  				io_bio->mirror_num,
>  				btrfs_retry_endio_nocsum, &done);
> -		if (ret)
> -			return ret;
> +		if (ret) {
> +			err = ret;
> +			goto next;
> +		}
>  
>  		wait_for_completion(&done.done);
>  
> @@ -7972,6 +7975,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
>  			goto next_block_or_try_again;
>  		}
>  
> +next:
>  		start += sectorsize;
>  
>  		if (nr_sectors--) {
> @@ -7980,7 +7984,7 @@ static int __btrfs_correct_data_nocsum(struct inode *inode,
>  		}
>  	}
>  
> -	return 0;
> +	return err;
>  }
>  
>  static void btrfs_retry_endio(struct bio *bio)
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio
  2017-04-18  1:16 ` [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio Liu Bo
@ 2017-05-17 18:32   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2017-05-17 18:32 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Apr 17, 2017 at 06:16:27PM -0700, Liu Bo wrote:
> All dio endio functions are using io_bio for struct btrfs_io_bio, this
> makes btrfs_submit_direct to follow this convention.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-05-17 18:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18  1:16 [PATCH 0/6 RFC] utilize bio_clone_fast to clean up Liu Bo
2017-04-18  1:16 ` [PATCH 1/6] Btrfs: use bio_clone_fast to clone our bio Liu Bo
2017-05-17 17:53   ` David Sterba
2017-04-18  1:16 ` [PATCH 2/6] Btrfs: use bio_clone_bioset_partial to simplify DIO submit Liu Bo
2017-05-11 14:16   ` David Sterba
2017-05-16 14:37   ` Christoph Hellwig
2017-05-16 17:15     ` Liu Bo
2017-04-18  1:16 ` [PATCH 3/6] Btrfs: change how we iterate bios in endio Liu Bo
2017-04-18  1:16 ` [PATCH 4/6] Btrfs: record error if one block has failed to retry Liu Bo
2017-05-17 18:32   ` David Sterba
2017-04-18  1:16 ` [PATCH 5/6] Btrfs: change check-integrity to use bvec_iter Liu Bo
2017-05-05 17:13   ` David Sterba
2017-04-18  1:16 ` [PATCH 6/6] Btrfs: unify naming of btrfs_io_bio Liu Bo
2017-05-17 18:32   ` David Sterba
2017-05-05 14:24 ` [PATCH 0/6 RFC] utilize bio_clone_fast to clean up David Sterba
2017-05-09 22:49   ` Liu Bo
2017-05-10  4:28     ` Liu Bo
2017-05-10 17:53     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.