linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/26] Prep work for immutable bio vecs
@ 2012-09-24 22:34 Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split() Kent Overstreet
                   ` (24 more replies)
  0 siblings, 25 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Kent Overstreet, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, neilb-l3A5Bk7waGM,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

Random assortment of refactoring and trivial cleanups;

Previous posting: http://thread.gmane.org/gmane.linux.kernel.bcache.devel/1055

No big changes since v2 - incorporated review feedback - mostly documentation,
did more testing, and fixed the deadlock Vivek noticed in my "convert integrity
to bvec_alloc_bs()" patch.

Patch series is on top of my patches for fixing a bio allocation deadlock with
stacked blocked devices, which is on top of Jens' tree.

Git tree is at:
http://evilpiepirate.org/git/linux-bcache.git block_stuff

v2: Patch series now has all the prep work to be done before abstracting
out the bio iterator, I think.

Kent Overstreet (26):
  block: Fix a buffer overrun in bio_integrity_split()
  block: Convert integrity to bvec_alloc_bs()
  block: Add bio_advance()
  block: Refactor blk_update_request()
  md: Convert md_trim_bio() to use bio_advance()
  block: Add bio_end_sector()
  block: Use bio_sectors() more consistently
  block: Change bio_split() to respect the current value of bi_idx
  block: Remove bi_idx references
  block: Remove some unnecessary bi_vcnt usage
  block: Add submit_bio_wait(), remove from md
  raid10: Use bio_reset()
  raid1: use bio_reset()
  raid5: use bio_reset()
  raid1: Refactor narrow_write_error() to not use bi_idx
  block: Add bio_copy_data()
  pktcdvd: use bio_copy_data()
  pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
  raid1: use bio_copy_data()
  bounce: Refactor __blk_queue_bounce to not use bi_io_vec
  block: Add bio_for_each_segment_all()
  block: Convert some code to bio_for_each_segment_all()
  block: Add bio_alloc_pages()
  block: Add an explicit bio flag for bios that own their bvec
  bio-integrity: Add explicit field for owner of bip_buf
  block: Add BIO_SUBMITTED flag, kill BIO_CLONED

 block/blk-core.c                         |  84 ++--------
 block/cfq-iosched.c                      |   7 +-
 block/deadline-iosched.c                 |   2 +-
 drivers/block/aoe/aoeblk.c               |   2 +-
 drivers/block/aoe/aoecmd.c               |   2 +-
 drivers/block/brd.c                      |   3 +-
 drivers/block/drbd/drbd_req.c            |   8 +-
 drivers/block/floppy.c                   |   1 -
 drivers/block/pktcdvd.c                  | 102 +++---------
 drivers/block/ps3vram.c                  |   2 +-
 drivers/md/dm-crypt.c                    |   3 +-
 drivers/md/dm-raid1.c                    |   2 +-
 drivers/md/dm-stripe.c                   |   2 +-
 drivers/md/dm-verity.c                   |   4 +-
 drivers/md/dm.c                          |   1 -
 drivers/md/faulty.c                      |   6 +-
 drivers/md/linear.c                      |   3 +-
 drivers/md/md.c                          |  19 +--
 drivers/md/raid0.c                       |   9 +-
 drivers/md/raid1.c                       | 133 +++++-----------
 drivers/md/raid10.c                      |  78 +++-------
 drivers/md/raid5.c                       |  49 +++---
 drivers/message/fusion/mptsas.c          |   6 +-
 drivers/s390/block/dcssblk.c             |   3 +-
 drivers/scsi/libsas/sas_expander.c       |   6 +-
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  10 +-
 fs/bio-integrity.c                       | 143 +++++++----------
 fs/bio.c                                 | 260 +++++++++++++++++++++++++------
 fs/btrfs/extent_io.c                     |   3 +-
 fs/buffer.c                              |   1 -
 fs/direct-io.c                           |   8 +-
 fs/exofs/ore.c                           |   2 +-
 fs/exofs/ore_raid.c                      |   2 +-
 fs/gfs2/lops.c                           |   2 +-
 fs/jfs/jfs_logmgr.c                      |   2 -
 fs/logfs/dev_bdev.c                      |   5 -
 include/linux/bio.h                      |  48 +++---
 include/linux/blk_types.h                |   3 +-
 include/trace/events/block.h             |  10 +-
 mm/bounce.c                              |  75 +++------
 mm/page_io.c                             |   1 -
 41 files changed, 499 insertions(+), 613 deletions(-)

-- 
1.7.12

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
       [not found]   ` <1348526106-17074-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34 ` [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs() Kent Overstreet
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: axboe, Martin K. Petersen, Kent Overstreet, tj, vgoyal

bio_integrity_split() seemed to be confusing pointers and arrays -
bip_vec in bio_integrity_payload is an array appended to the end of the
payload, so the bio_vecs in struct bio_pair need to come immediately
after the bio_integrity_payload they're for, and there was an assignment
in bio_integrity_split() that didn't make any sense.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio-integrity.c  | 3 ---
 include/linux/bio.h | 6 ++++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index a3f28f3..c7b6b52 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -697,9 +697,6 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
 	bp->iv1 = bip->bip_vec[0];
 	bp->iv2 = bip->bip_vec[0];
 
-	bp->bip1.bip_vec[0] = bp->iv1;
-	bp->bip2.bip_vec[0] = bp->iv2;
-
 	bp->iv1.bv_len = sectors * bi->tuple_size;
 	bp->iv2.bv_offset += sectors * bi->tuple_size;
 	bp->iv2.bv_len -= sectors * bi->tuple_size;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b31036f..8e2d108 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -200,8 +200,10 @@ struct bio_pair {
 	struct bio			bio1, bio2;
 	struct bio_vec			bv1, bv2;
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
-	struct bio_integrity_payload	bip1, bip2;
-	struct bio_vec			iv1, iv2;
+	struct bio_integrity_payload	bip1;
+	struct bio_vec			iv1;
+	struct bio_integrity_payload	bip2;
+	struct bio_vec			iv2;
 #endif
 	atomic_t			cnt;
 	int				error;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-10-02 15:12   ` [dm-devel] " Vivek Goyal
       [not found]   ` <1348526106-17074-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34 ` [PATCH v3 03/26] block: Add bio_advance() Kent Overstreet
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Martin K. Petersen

This adds a pointer to the bvec array to struct bio_integrity_payload,
instead of the bvecs always being inline; then the bvecs are allocated
with bvec_alloc_bs().

Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
mempool instead of the bioset, so that bio integrity can use a different
mempool for its bvecs, and thus avoid a potential deadlock.

This is eventually for immutable bio vecs - immutable bvecs aren't
useful if we still have to copy them, hence the need for the pointer.
Less code is always nice too, though.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio-integrity.c  | 132 +++++++++++++++++++---------------------------------
 fs/bio.c            |  34 ++++++--------
 include/linux/bio.h |  13 ++++--
 3 files changed, 72 insertions(+), 107 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index c7b6b52..65b708d 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -27,48 +27,11 @@
 #include <linux/workqueue.h>
 #include <linux/slab.h>
 
-struct integrity_slab {
-	struct kmem_cache *slab;
-	unsigned short nr_vecs;
-	char name[8];
-};
-
-#define IS(x) { .nr_vecs = x, .name = "bip-"__stringify(x) }
-struct integrity_slab bip_slab[BIOVEC_NR_POOLS] __read_mostly = {
-	IS(1), IS(4), IS(16), IS(64), IS(128), IS(BIO_MAX_PAGES),
-};
-#undef IS
+#define BIP_INLINE_VECS	4
 
+static struct kmem_cache *bip_slab;
 static struct workqueue_struct *kintegrityd_wq;
 
-static inline unsigned int vecs_to_idx(unsigned int nr)
-{
-	switch (nr) {
-	case 1:
-		return 0;
-	case 2 ... 4:
-		return 1;
-	case 5 ... 16:
-		return 2;
-	case 17 ... 64:
-		return 3;
-	case 65 ... 128:
-		return 4;
-	case 129 ... BIO_MAX_PAGES:
-		return 5;
-	default:
-		BUG();
-	}
-}
-
-static inline int use_bip_pool(unsigned int idx)
-{
-	if (idx == BIOVEC_MAX_IDX)
-		return 1;
-
-	return 0;
-}
-
 /**
  * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:	bio to attach integrity metadata to
@@ -84,37 +47,39 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
 						  unsigned int nr_vecs)
 {
 	struct bio_integrity_payload *bip;
-	unsigned int idx = vecs_to_idx(nr_vecs);
 	struct bio_set *bs = bio->bi_pool;
+	unsigned long idx = BIO_POOL_NONE;
+	unsigned inline_vecs;
+
+	if (!bs) {
+		bip = kmalloc(sizeof(struct bio_integrity_payload) +
+			      sizeof(struct bio_vec) * nr_vecs, gfp_mask);
+		inline_vecs = nr_vecs;
+	} else {
+		bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
+		inline_vecs = BIP_INLINE_VECS;
+	}
 
-	if (!bs)
-		bs = fs_bio_set;
-
-	BUG_ON(bio == NULL);
-	bip = NULL;
-
-	/* Lower order allocations come straight from slab */
-	if (!use_bip_pool(idx))
-		bip = kmem_cache_alloc(bip_slab[idx].slab, gfp_mask);
+	if (unlikely(!bip))
+		return NULL;
 
-	/* Use mempool if lower order alloc failed or max vecs were requested */
-	if (bip == NULL) {
-		idx = BIOVEC_MAX_IDX;  /* so we free the payload properly later */
-		bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
+	memset(bip, 0, sizeof(struct bio_integrity_payload));
 
-		if (unlikely(bip == NULL)) {
-			printk(KERN_ERR "%s: could not alloc bip\n", __func__);
-			return NULL;
-		}
+	if (nr_vecs > inline_vecs) {
+		bip->bip_vec = bvec_alloc_bs(gfp_mask, nr_vecs, &idx,
+					     bs->bvec_integrity_pool);
+		if (!bip->bip_vec)
+			goto err;
 	}
 
-	memset(bip, 0, sizeof(*bip));
-
 	bip->bip_slab = idx;
 	bip->bip_bio = bio;
 	bio->bi_integrity = bip;
 
 	return bip;
+err:
+	mempool_free(bip, bs->bio_integrity_pool);
+	return NULL;
 }
 EXPORT_SYMBOL(bio_integrity_alloc);
 
@@ -130,20 +95,20 @@ void bio_integrity_free(struct bio *bio)
 	struct bio_integrity_payload *bip = bio->bi_integrity;
 	struct bio_set *bs = bio->bi_pool;
 
-	if (!bs)
-		bs = fs_bio_set;
-
-	BUG_ON(bip == NULL);
-
 	/* A cloned bio doesn't own the integrity metadata */
 	if (!bio_flagged(bio, BIO_CLONED) && !bio_flagged(bio, BIO_FS_INTEGRITY)
 	    && bip->bip_buf != NULL)
 		kfree(bip->bip_buf);
 
-	if (use_bip_pool(bip->bip_slab))
+	if (bs) {
+		if (bip->bip_slab != BIO_POOL_NONE)
+			bvec_free_bs(bs->bvec_integrity_pool, bip->bip_vec,
+				     bip->bip_slab);
+
 		mempool_free(bip, bs->bio_integrity_pool);
-	else
-		kmem_cache_free(bip_slab[bip->bip_slab].slab, bip);
+	} else {
+		kfree(bip);
+	}
 
 	bio->bi_integrity = NULL;
 }
@@ -697,6 +662,9 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
 	bp->iv1 = bip->bip_vec[0];
 	bp->iv2 = bip->bip_vec[0];
 
+	bp->bip1.bip_vec = &bp->iv1;
+	bp->bip2.bip_vec = &bp->iv2;
+
 	bp->iv1.bv_len = sectors * bi->tuple_size;
 	bp->iv2.bv_offset += sectors * bi->tuple_size;
 	bp->iv2.bv_len -= sectors * bi->tuple_size;
@@ -743,13 +711,14 @@ EXPORT_SYMBOL(bio_integrity_clone);
 
 int bioset_integrity_create(struct bio_set *bs, int pool_size)
 {
-	unsigned int max_slab = vecs_to_idx(BIO_MAX_PAGES);
-
 	if (bs->bio_integrity_pool)
 		return 0;
 
-	bs->bio_integrity_pool =
-		mempool_create_slab_pool(pool_size, bip_slab[max_slab].slab);
+	bs->bio_integrity_pool = mempool_create_slab_pool(pool_size, bip_slab);
+
+	bs->bvec_integrity_pool = biovec_create_pool(bs, pool_size);
+	if (!bs->bvec_integrity_pool)
+		return -1;
 
 	if (!bs->bio_integrity_pool)
 		return -1;
@@ -762,13 +731,14 @@ void bioset_integrity_free(struct bio_set *bs)
 {
 	if (bs->bio_integrity_pool)
 		mempool_destroy(bs->bio_integrity_pool);
+
+	if (bs->bvec_integrity_pool)
+		mempool_destroy(bs->bio_integrity_pool);
 }
 EXPORT_SYMBOL(bioset_integrity_free);
 
 void __init bio_integrity_init(void)
 {
-	unsigned int i;
-
 	/*
 	 * kintegrityd won't block much but may burn a lot of CPU cycles.
 	 * Make it highpri CPU intensive wq with max concurrency of 1.
@@ -778,14 +748,10 @@ void __init bio_integrity_init(void)
 	if (!kintegrityd_wq)
 		panic("Failed to create kintegrityd\n");
 
-	for (i = 0 ; i < BIOVEC_NR_POOLS ; i++) {
-		unsigned int size;
-
-		size = sizeof(struct bio_integrity_payload)
-			+ bip_slab[i].nr_vecs * sizeof(struct bio_vec);
-
-		bip_slab[i].slab =
-			kmem_cache_create(bip_slab[i].name, size, 0,
-					  SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
-	}
+	bip_slab = kmem_cache_create("bio_integrity_payload",
+				     sizeof(struct bio_integrity_payload) +
+				     sizeof(struct bio_vec) * BIP_INLINE_VECS,
+				     0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	if (!bip_slab)
+		panic("Failed to create slab\n");
 }
diff --git a/fs/bio.c b/fs/bio.c
index 63f05a9..ac8b443 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -158,12 +158,12 @@ unsigned int bvec_nr_vecs(unsigned short idx)
 	return bvec_slabs[idx].nr_vecs;
 }
 
-void bvec_free_bs(struct bio_set *bs, struct bio_vec *bv, unsigned int idx)
+void bvec_free_bs(mempool_t *pool, struct bio_vec *bv, unsigned int idx)
 {
 	BIO_BUG_ON(idx >= BIOVEC_NR_POOLS);
 
 	if (idx == BIOVEC_MAX_IDX)
-		mempool_free(bv, bs->bvec_pool);
+		mempool_free(bv, pool);
 	else {
 		struct biovec_slab *bvs = bvec_slabs + idx;
 
@@ -172,7 +172,7 @@ void bvec_free_bs(struct bio_set *bs, struct bio_vec *bv, unsigned int idx)
 }
 
 struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
-			      struct bio_set *bs)
+			      mempool_t *pool)
 {
 	struct bio_vec *bvl;
 
@@ -208,7 +208,7 @@ struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx,
 	 */
 	if (*idx == BIOVEC_MAX_IDX) {
 fallback:
-		bvl = mempool_alloc(bs->bvec_pool, gfp_mask);
+		bvl = mempool_alloc(pool, gfp_mask);
 	} else {
 		struct biovec_slab *bvs = bvec_slabs + *idx;
 		gfp_t __gfp_mask = gfp_mask & ~(__GFP_WAIT | __GFP_IO);
@@ -251,7 +251,7 @@ static void bio_free(struct bio *bio)
 
 	if (bs) {
 		if (bio_has_allocated_vec(bio))
-			bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));
+			bvec_free_bs(bs->bvec_pool, bio->bi_io_vec, BIO_POOL_IDX(bio));
 
 		/*
 		 * If we have front padding, adjust the bio pointer before freeing
@@ -440,11 +440,11 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 	bio_init(bio);
 
 	if (nr_iovecs > inline_vecs) {
-		bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs);
+		bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
 		if (!bvl && gfp_mask != saved_gfp) {
 			punt_bios_to_rescuer(bs);
 			gfp_mask = saved_gfp;
-			bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs);
+			bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
 		}
 
 		if (unlikely(!bvl))
@@ -1655,20 +1655,11 @@ EXPORT_SYMBOL(bio_sector_offset);
  * create memory pools for biovec's in a bio_set.
  * use the global biovec slabs created for general use.
  */
-static int biovec_create_pools(struct bio_set *bs, int pool_entries)
+mempool_t *biovec_create_pool(struct bio_set *bs, int pool_entries)
 {
 	struct biovec_slab *bp = bvec_slabs + BIOVEC_MAX_IDX;
 
-	bs->bvec_pool = mempool_create_slab_pool(pool_entries, bp->slab);
-	if (!bs->bvec_pool)
-		return -ENOMEM;
-
-	return 0;
-}
-
-static void biovec_free_pools(struct bio_set *bs)
-{
-	mempool_destroy(bs->bvec_pool);
+	return mempool_create_slab_pool(pool_entries, bp->slab);
 }
 
 void bioset_free(struct bio_set *bs)
@@ -1679,8 +1670,10 @@ void bioset_free(struct bio_set *bs)
 	if (bs->bio_pool)
 		mempool_destroy(bs->bio_pool);
 
+	if (bs->bvec_pool)
+		mempool_destroy(bs->bvec_pool);
+
 	bioset_integrity_free(bs);
-	biovec_free_pools(bs);
 	bio_put_slab(bs);
 
 	kfree(bs);
@@ -1725,7 +1718,8 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
 	if (!bs->bio_pool)
 		goto bad;
 
-	if (biovec_create_pools(bs, pool_size))
+	bs->bvec_pool = biovec_create_pool(bs, pool_size);
+	if (!bs->bvec_pool)
 		goto bad;
 
 	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e2d108..a7c021d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -182,7 +182,10 @@ struct bio_integrity_payload {
 	unsigned short		bip_idx;	/* current bip_vec index */
 
 	struct work_struct	bip_work;	/* I/O completion */
-	struct bio_vec		bip_vec[0];	/* embedded bvec array */
+
+	struct bio_vec		*bip_vec;
+	struct bio_vec		bip_inline_vecs[0];/* embedded bvec array */
+
 };
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
@@ -213,6 +216,7 @@ extern void bio_pair_release(struct bio_pair *dbio);
 
 extern struct bio_set *bioset_create(unsigned int, unsigned int);
 extern void bioset_free(struct bio_set *);
+mempool_t *biovec_create_pool(struct bio_set *bs, int pool_entries);
 
 extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
 extern void bio_put(struct bio *);
@@ -288,8 +292,8 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     int, int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
-extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set *);
-extern void bvec_free_bs(struct bio_set *, struct bio_vec *, unsigned int);
+extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, mempool_t *);
+extern void bvec_free_bs(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
 
 #ifdef CONFIG_BLK_CGROUP
@@ -511,10 +515,11 @@ struct bio_set {
 	unsigned int front_pad;
 
 	mempool_t *bio_pool;
+	mempool_t *bvec_pool;
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 	mempool_t *bio_integrity_pool;
+	mempool_t *bvec_integrity_pool;
 #endif
-	mempool_t *bvec_pool;
 
 	/*
 	 * Deadlock avoidance for stacking block drivers: see comments in
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 03/26] block: Add bio_advance()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split() Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 04/26] block: Refactor blk_update_request() Kent Overstreet
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal

This is prep work for immutable bio vecs; we first want to centralize
where bvecs are modified.

Next two patches convert some existing code to use this function.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 fs/bio.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index ac8b443..9a3935f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -750,6 +750,47 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+/**
+ * bio_advance - increment/complete a bio by some number of bytes
+ * @bio:	bio to advance
+ * @bytes:	number of bytes to complete
+ *
+ * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
+ * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
+ * be updated on the last bvec as well.
+ *
+ * @bio will then represent the remaining, uncompleted portion of the io.
+ */
+void bio_advance(struct bio *bio, unsigned bytes)
+{
+	if (bio_integrity(bio))
+		bio_integrity_advance(bio, bytes);
+
+	bio->bi_sector += bytes >> 9;
+	bio->bi_size -= bytes;
+
+	if (!bio->bi_size)
+		return;
+
+	while (bytes) {
+		if (unlikely(bio->bi_idx >= bio->bi_vcnt)) {
+			WARN_ONCE(1, "bio idx %d >= vcnt %d\n",
+				  bio->bi_idx, bio->bi_vcnt);
+			break;
+		}
+
+		if (bytes >= bio_iovec(bio)->bv_len) {
+			bytes -= bio_iovec(bio)->bv_len;
+			bio->bi_idx++;
+		} else {
+			bio_iovec(bio)->bv_len -= bytes;
+			bio_iovec(bio)->bv_offset += bytes;
+			bytes = 0;
+		}
+	}
+}
+EXPORT_SYMBOL(bio_advance);
+
 struct bio_map_data {
 	struct bio_vec *iovecs;
 	struct sg_iovec *sgvecs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a7c021d..4e32be1 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -251,6 +251,8 @@ extern void bio_endio(struct bio *, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
+extern void bio_advance(struct bio *, unsigned);
+
 extern void bio_init(struct bio *);
 extern void bio_reset(struct bio *);
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 04/26] block: Refactor blk_update_request()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (2 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 03/26] block: Add bio_advance() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-10-02 18:43   ` [dm-devel] " Vivek Goyal
  2012-10-02 18:59   ` Vivek Goyal
       [not found] ` <1348526106-17074-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Note that req_bio_endio() now always calls bio_advance() - which means
it always loops over the biovec, not just on partial completions. Don't
expect it to affect performance, but worth noting.

Tested it by forcing partial updates, and dumping before and after on
various bio/bvec fields when doing a partial update.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c | 80 +++++++++-----------------------------------------------
 1 file changed, 12 insertions(+), 68 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a17869f..a8a1a9e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
 	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
 		error = -EIO;
 
-	if (unlikely(nbytes > bio->bi_size)) {
-		printk(KERN_ERR "%s: want %u bytes done, %u left\n",
-		       __func__, nbytes, bio->bi_size);
-		nbytes = bio->bi_size;
-	}
-
 	if (unlikely(rq->cmd_flags & REQ_QUIET))
 		set_bit(BIO_QUIET, &bio->bi_flags);
 
-	bio->bi_size -= nbytes;
-	bio->bi_sector += (nbytes >> 9);
-
-	if (bio_integrity(bio))
-		bio_integrity_advance(bio, nbytes);
+	bio_advance(bio, nbytes);
 
 	/* don't actually finish bio if it's part of flush sequence */
 	if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
@@ -2219,8 +2209,7 @@ EXPORT_SYMBOL(blk_fetch_request);
  **/
 bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 {
-	int total_bytes, bio_nbytes, next_idx = 0;
-	struct bio *bio;
+	int total_bytes;
 
 	if (!req->bio)
 		return false;
@@ -2264,56 +2253,21 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 
 	blk_account_io_completion(req, nr_bytes);
 
-	total_bytes = bio_nbytes = 0;
-	while ((bio = req->bio) != NULL) {
-		int nbytes;
+	total_bytes = 0;
+	while (req->bio) {
+		struct bio *bio = req->bio;
+		unsigned bio_bytes = min(bio->bi_size, nr_bytes);
 
-		if (nr_bytes >= bio->bi_size) {
+		if (bio_bytes == bio->bi_size)
 			req->bio = bio->bi_next;
-			nbytes = bio->bi_size;
-			req_bio_endio(req, bio, nbytes, error);
-			next_idx = 0;
-			bio_nbytes = 0;
-		} else {
-			int idx = bio->bi_idx + next_idx;
-
-			if (unlikely(idx >= bio->bi_vcnt)) {
-				blk_dump_rq_flags(req, "__end_that");
-				printk(KERN_ERR "%s: bio idx %d >= vcnt %d\n",
-				       __func__, idx, bio->bi_vcnt);
-				break;
-			}
-
-			nbytes = bio_iovec_idx(bio, idx)->bv_len;
-			BIO_BUG_ON(nbytes > bio->bi_size);
-
-			/*
-			 * not a complete bvec done
-			 */
-			if (unlikely(nbytes > nr_bytes)) {
-				bio_nbytes += nr_bytes;
-				total_bytes += nr_bytes;
-				break;
-			}
 
-			/*
-			 * advance to the next vector
-			 */
-			next_idx++;
-			bio_nbytes += nbytes;
-		}
+		req_bio_endio(req, bio, bio_bytes, error);
 
-		total_bytes += nbytes;
-		nr_bytes -= nbytes;
+		total_bytes += bio_bytes;
+		nr_bytes -= bio_bytes;
 
-		bio = req->bio;
-		if (bio) {
-			/*
-			 * end more in this run, or just return 'not-done'
-			 */
-			if (unlikely(nr_bytes <= 0))
-				break;
-		}
+		if (!nr_bytes)
+			break;
 	}
 
 	/*
@@ -2329,16 +2283,6 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 		return false;
 	}
 
-	/*
-	 * if the request wasn't completed, update state
-	 */
-	if (bio_nbytes) {
-		req_bio_endio(req, bio, bio_nbytes, error);
-		bio->bi_idx += next_idx;
-		bio_iovec(bio)->bv_offset += nr_bytes;
-		bio_iovec(bio)->bv_len -= nr_bytes;
-	}
-
 	req->__data_len -= total_bytes;
 	req->buffer = bio_data(req->bio);
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 05/26] md: Convert md_trim_bio() to use bio_advance()
       [not found] ` <1348526106-17074-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-24 22:34   ` Kent Overstreet
       [not found]     ` <1348526106-17074-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:35   ` [PATCH v3 23/26] block: Add bio_alloc_pages() Kent Overstreet
  1 sibling, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Kent Overstreet, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, neilb-l3A5Bk7waGM,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
CC: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
---
 drivers/md/md.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7a2b079..51ce48c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -190,25 +190,16 @@ void md_trim_bio(struct bio *bio, int offset, int size)
 	struct bio_vec *bvec;
 	int sofar = 0;
 
-	size <<= 9;
 	if (offset == 0 && size == bio->bi_size)
 		return;
 
-	bio->bi_sector += offset;
-	bio->bi_size = size;
-	offset <<= 9;
 	clear_bit(BIO_SEG_VALID, &bio->bi_flags);
 
-	while (bio->bi_idx < bio->bi_vcnt &&
-	       bio->bi_io_vec[bio->bi_idx].bv_len <= offset) {
-		/* remove this whole bio_vec */
-		offset -= bio->bi_io_vec[bio->bi_idx].bv_len;
-		bio->bi_idx++;
-	}
-	if (bio->bi_idx < bio->bi_vcnt) {
-		bio->bi_io_vec[bio->bi_idx].bv_offset += offset;
-		bio->bi_io_vec[bio->bi_idx].bv_len -= offset;
-	}
+	bio_advance(bio, offset << 9);
+
+	size <<= 9;
+	bio->bi_size = size;
+
 	/* avoid any complications with bi_idx being non-zero*/
 	if (bio->bi_idx) {
 		memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 06/26] block: Add bio_end_sector()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (4 preceding siblings ...)
       [not found] ` <1348526106-17074-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-24 22:34 ` Kent Overstreet
       [not found]   ` <1348526106-17074-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34 ` [PATCH v3 07/26] block: Use bio_sectors() more consistently Kent Overstreet
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Lars Ellenberg,
	Jiri Kosina, Alasdair Kergon, Martin Schwidefsky, Heiko Carstens,
	linux-s390, Chris Mason, Steven Whitehouse

Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
CC: Chris Mason <chris.mason@fusionio.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
---
 block/blk-core.c              |  2 +-
 block/cfq-iosched.c           |  7 ++-----
 block/deadline-iosched.c      |  2 +-
 drivers/block/drbd/drbd_req.c |  2 +-
 drivers/block/pktcdvd.c       |  6 +++---
 drivers/md/dm-stripe.c        |  2 +-
 drivers/md/dm-verity.c        |  2 +-
 drivers/md/faulty.c           |  6 ++----
 drivers/md/linear.c           |  3 +--
 drivers/md/raid1.c            |  4 ++--
 drivers/md/raid5.c            | 14 +++++++-------
 drivers/s390/block/dcssblk.c  |  3 +--
 fs/btrfs/extent_io.c          |  3 +--
 fs/gfs2/lops.c                |  2 +-
 include/linux/bio.h           |  1 +
 15 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a8a1a9e..df9bb5f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1550,7 +1550,7 @@ static void handle_bad_sector(struct bio *bio)
 	printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
 			bdevname(bio->bi_bdev, b),
 			bio->bi_rw,
-			(unsigned long long)bio->bi_sector + bio_sectors(bio),
+			(unsigned long long)bio_end_sector(bio),
 			(long long)(i_size_read(bio->bi_bdev->bd_inode) >> 9));
 
 	set_bit(BIO_EOF, &bio->bi_flags);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fb52df9..59b64b7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1883,11 +1883,8 @@ cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio *bio)
 		return NULL;
 
 	cfqq = cic_to_cfqq(cic, cfq_bio_sync(bio));
-	if (cfqq) {
-		sector_t sector = bio->bi_sector + bio_sectors(bio);
-
-		return elv_rb_find(&cfqq->sort_list, sector);
-	}
+	if (cfqq)
+		return elv_rb_find(&cfqq->sort_list, bio_end_sector(bio));
 
 	return NULL;
 }
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 599b12e..ef19d5f 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -132,7 +132,7 @@ deadline_merge(struct request_queue *q, struct request **req, struct bio *bio)
 	 * check for front merge
 	 */
 	if (dd->front_merges) {
-		sector_t sector = bio->bi_sector + bio_sectors(bio);
+		sector_t sector = bio_end_sector(bio);
 
 		__rq = elv_rb_find(&dd->sort_list[bio_data_dir(bio)], sector);
 		if (__rq) {
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 01b2ac6..d90a1fd 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 	/* to make some things easier, force alignment of requests within the
 	 * granularity of our hash tables */
 	s_enr = bio->bi_sector >> HT_SHIFT;
-	e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT : s_enr;
+	e_enr = (bio_end_sector(bio) - 1) >> HT_SHIFT;
 
 	if (likely(s_enr == e_enr)) {
 		do {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 2e7de7a..26938e8 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -901,7 +901,7 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
 			pd->iosched.successive_reads += bio->bi_size >> 10;
 		else {
 			pd->iosched.successive_reads = 0;
-			pd->iosched.last_write = bio->bi_sector + bio_sectors(bio);
+			pd->iosched.last_write = bio_end_sector(bio);
 		}
 		if (pd->iosched.successive_reads >= HI_SPEED_SWITCH) {
 			if (pd->read_speed == pd->write_speed) {
@@ -2454,7 +2454,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 	zone = ZONE(bio->bi_sector, pd);
 	VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
 		(unsigned long long)bio->bi_sector,
-		(unsigned long long)(bio->bi_sector + bio_sectors(bio)));
+		(unsigned long long)bio_end_sector(bio));
 
 	/* Check if we have to split the bio */
 	{
@@ -2462,7 +2462,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 		sector_t last_zone;
 		int first_sectors;
 
-		last_zone = ZONE(bio->bi_sector + bio_sectors(bio) - 1, pd);
+		last_zone = ZONE(bio_end_sector(bio) - 1, pd);
 		if (last_zone != zone) {
 			BUG_ON(last_zone != zone + pd->settings.size);
 			first_sectors = last_zone - bio->bi_sector;
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a087bf2..ac1a8cd 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -257,7 +257,7 @@ static int stripe_map_discard(struct stripe_c *sc, struct bio *bio,
 	sector_t begin, end;
 
 	stripe_map_range_sector(sc, bio->bi_sector, target_stripe, &begin);
-	stripe_map_range_sector(sc, bio->bi_sector + bio_sectors(bio),
+	stripe_map_range_sector(sc, bio_end_sector(bio),
 				target_stripe, &end);
 	if (begin < end) {
 		bio->bi_bdev = sc->stripe[target_stripe].dev->bdev;
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 254d192..8f95664 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -477,7 +477,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio,
 		return -EIO;
 	}
 
-	if ((bio->bi_sector + bio_sectors(bio)) >>
+	if (bio_end_sector(bio) >>
 	    (v->data_dev_block_bits - SECTOR_SHIFT) > v->data_blocks) {
 		DMERR_LIMIT("io out of range");
 		return -EIO;
diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 45135f6..4a5ca4b 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -185,8 +185,7 @@ static void make_request(struct mddev *mddev, struct bio *bio)
 			return;
 		}
 
-		if (check_sector(conf, bio->bi_sector, bio->bi_sector+(bio->bi_size>>9),
-				 WRITE))
+		if (check_sector(conf, bio->bi_sector, bio_end_sector(bio), WRITE))
 			failit = 1;
 		if (check_mode(conf, WritePersistent)) {
 			add_sector(conf, bio->bi_sector, WritePersistent);
@@ -196,8 +195,7 @@ static void make_request(struct mddev *mddev, struct bio *bio)
 			failit = 1;
 	} else {
 		/* read request */
-		if (check_sector(conf, bio->bi_sector, bio->bi_sector + (bio->bi_size>>9),
-				 READ))
+		if (check_sector(conf, bio->bi_sector, bio_end_sector(bio), READ))
 			failit = 1;
 		if (check_mode(conf, ReadTransient))
 			failit = 1;
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fa211d8..af391db 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -304,8 +304,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 		bio_io_error(bio);
 		return;
 	}
-	if (unlikely(bio->bi_sector + (bio->bi_size >> 9) >
-		     tmp_dev->end_sector)) {
+	if (unlikely(bio_end_sector(bio) > tmp_dev->end_sector)) {
 		/* This bio crosses a device boundary, so we have to
 		 * split it.
 		 */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 611b5f7..32eab5b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1010,7 +1010,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	md_write_start(mddev, bio); /* wait on superblock update early */
 
 	if (bio_data_dir(bio) == WRITE &&
-	    bio->bi_sector + bio->bi_size/512 > mddev->suspend_lo &&
+	    bio_end_sector(bio) > mddev->suspend_lo &&
 	    bio->bi_sector < mddev->suspend_hi) {
 		/* As the suspend_* range is controlled by
 		 * userspace, we want an interruptible
@@ -1021,7 +1021,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 			flush_signals(current);
 			prepare_to_wait(&conf->wait_barrier,
 					&w, TASK_INTERRUPTIBLE);
-			if (bio->bi_sector + bio->bi_size/512 <= mddev->suspend_lo ||
+			if (bio_end_sector(bio) <= mddev->suspend_lo ||
 			    bio->bi_sector >= mddev->suspend_hi)
 				break;
 			schedule();
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index adda94d..45a632c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2377,11 +2377,11 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
 	} else
 		bip = &sh->dev[dd_idx].toread;
 	while (*bip && (*bip)->bi_sector < bi->bi_sector) {
-		if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
+		if (bio_end_sector(*bip) > bi->bi_sector)
 			goto overlap;
 		bip = & (*bip)->bi_next;
 	}
-	if (*bip && (*bip)->bi_sector < bi->bi_sector + ((bi->bi_size)>>9))
+	if (*bip && (*bip)->bi_sector < bio_end_sector(bi))
 		goto overlap;
 
 	BUG_ON(*bip && bi->bi_next && (*bip) != bi->bi_next);
@@ -2397,8 +2397,8 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
 		     sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
 			     bi && bi->bi_sector <= sector;
 		     bi = r5_next_bio(bi, sh->dev[dd_idx].sector)) {
-			if (bi->bi_sector + (bi->bi_size>>9) >= sector)
-				sector = bi->bi_sector + (bi->bi_size>>9);
+			if (bio_end_sector(bi) >= sector)
+				sector = bio_end_sector(bi);
 		}
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
@@ -3908,7 +3908,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
 						    0,
 						    &dd_idx, NULL);
 
-	end_sector = align_bi->bi_sector + (align_bi->bi_size >> 9);
+	end_sector = bio_end_sector(align_bi);
 	rcu_read_lock();
 	rdev = rcu_dereference(conf->disks[dd_idx].replacement);
 	if (!rdev || test_bit(Faulty, &rdev->flags) ||
@@ -4090,7 +4090,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 		return;
 
 	logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
-	last_sector = bi->bi_sector + (bi->bi_size>>9);
+	last_sector = bio_end_sector(bi);
 	bi->bi_next = NULL;
 	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
 
@@ -4553,7 +4553,7 @@ static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
 	logical_sector = raid_bio->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
 	sector = raid5_compute_sector(conf, logical_sector,
 				      0, &dd_idx, NULL);
-	last_sector = raid_bio->bi_sector + (raid_bio->bi_size>>9);
+	last_sector = bio_end_sector(raid_bio);
 
 	for (; logical_sector < last_sector;
 	     logical_sector += STRIPE_SECTORS,
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index a5a55da..1d2826b 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -832,8 +832,7 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 	if ((bio->bi_sector & 7) != 0 || (bio->bi_size & 4095) != 0)
 		/* Request is not page-aligned. */
 		goto fail;
-	if (((bio->bi_size >> 9) + bio->bi_sector)
-			> get_capacity(bio->bi_bdev->bd_disk)) {
+	if (bio_end_sector(bio) > get_capacity(bio->bi_bdev->bd_disk)) {
 		/* Request beyond end of DCSS segment. */
 		goto fail;
 	}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4c87847..e1122a6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2480,8 +2480,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
 		if (old_compressed)
 			contig = bio->bi_sector == sector;
 		else
-			contig = bio->bi_sector + (bio->bi_size >> 9) ==
-				sector;
+			contig = bio_end_sector(bio) == sector;
 
 		if (prev_bio_flags != bio_flags || !contig ||
 		    merge_bio(tree, page, offset, page_size, bio, bio_flags) ||
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 8ff95a2..4245a7e 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -300,7 +300,7 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno)
 	u64 nblk;
 
 	if (bio) {
-		nblk = bio->bi_sector + bio_sectors(bio);
+		nblk = bio_end_sector(bio);
 		nblk >>= sdp->sd_fsb2bb_shift;
 		if (blkno == nblk)
 			return bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4e32be1..d985e90 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -67,6 +67,7 @@
 #define bio_offset(bio)		bio_iovec((bio))->bv_offset
 #define bio_segments(bio)	((bio)->bi_vcnt - (bio)->bi_idx)
 #define bio_sectors(bio)	((bio)->bi_size >> 9)
+#define bio_end_sector(bio)	((bio)->bi_sector + bio_sectors(bio))
 
 static inline unsigned int bio_cur_bytes(struct bio *bio)
 {
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 07/26] block: Use bio_sectors() more consistently
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (5 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 06/26] block: Add bio_end_sector() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
       [not found]   ` <1348526106-17074-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-25  0:54   ` Ed Cashin
  2012-09-24 22:34 ` [PATCH v3 08/26] block: Change bio_split() to respect the current value of bi_idx Kent Overstreet
                   ` (17 subsequent siblings)
  24 siblings, 2 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: axboe, Nick Piggin, Ed L. Cashin, Jiri Kosina, Kent Overstreet,
	Jim Paris, Geoff Levand, Steven Rostedt, Alasdair Kergon, tj,
	vgoyal

Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: "Ed L. Cashin" <ecashin@coraid.com>
CC: Nick Piggin <npiggin@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Jim Paris <jim@jtan.com>
CC: Geoff Levand <geoff@infradead.org>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/block/aoe/aoecmd.c   |  2 +-
 drivers/block/brd.c          |  3 +--
 drivers/block/pktcdvd.c      |  2 +-
 drivers/block/ps3vram.c      |  2 +-
 drivers/md/dm-raid1.c        |  2 +-
 drivers/md/raid0.c           |  6 +++---
 drivers/md/raid1.c           | 17 ++++++++---------
 drivers/md/raid10.c          | 24 +++++++++++-------------
 drivers/md/raid5.c           |  8 ++++----
 include/trace/events/block.h | 10 +++++-----
 10 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index de0435e..2b52ebc 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -720,7 +720,7 @@ gettgt(struct aoedev *d, char *addr)
 static inline void
 diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t sector)
 {
-	unsigned long n_sect = bio->bi_size >> 9;
+	unsigned long n_sect = bio_sectors(bio);
 	const int rw = bio_data_dir(bio);
 	struct hd_struct *part;
 	int cpu;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 531ceb3..d5c4978 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -334,8 +334,7 @@ static void brd_make_request(struct request_queue *q, struct bio *bio)
 	int err = -EIO;
 
 	sector = bio->bi_sector;
-	if (sector + (bio->bi_size >> SECTOR_SHIFT) >
-						get_capacity(bdev->bd_disk))
+	if (sector + bio_sectors(bio) > get_capacity(bdev->bd_disk))
 		goto out;
 
 	if (unlikely(bio->bi_rw & REQ_DISCARD)) {
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 26938e8..2c27744 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2433,7 +2433,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 		cloned_bio->bi_bdev = pd->bdev;
 		cloned_bio->bi_private = psd;
 		cloned_bio->bi_end_io = pkt_end_io_read_cloned;
-		pd->stats.secs_r += bio->bi_size >> 9;
+		pd->stats.secs_r += bio_sectors(bio);
 		pkt_queue_bio(pd, cloned_bio);
 		return;
 	}
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index f58cdcf..1ff38e8 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -553,7 +553,7 @@ static struct bio *ps3vram_do_bio(struct ps3_system_bus_device *dev,
 	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
 	int write = bio_data_dir(bio) == WRITE;
 	const char *op = write ? "write" : "read";
-	loff_t offset = bio->bi_sector << 9;
+	loff_t offset = bio_sectors(bio);
 	int error = 0;
 	struct bio_vec *bvec;
 	unsigned int i;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index bc5ddba8..3dac2de 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -457,7 +457,7 @@ static void map_region(struct dm_io_region *io, struct mirror *m,
 {
 	io->bdev = m->dev->bdev;
 	io->sector = map_sector(m, bio);
-	io->count = bio->bi_size >> 9;
+	io->count = bio_sectors(bio);
 }
 
 static void hold_bio(struct mirror_set *ms, struct bio *bio)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a9e4fa9..89016cb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -485,11 +485,11 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
 {
 	if (likely(is_power_of_2(chunk_sects))) {
 		return chunk_sects >= ((bio->bi_sector & (chunk_sects-1))
-					+ (bio->bi_size >> 9));
+					+ bio_sectors(bio));
 	} else{
 		sector_t sector = bio->bi_sector;
 		return chunk_sects >= (sector_div(sector, chunk_sects)
-						+ (bio->bi_size >> 9));
+						+ bio_sectors(bio));
 	}
 }
 
@@ -543,7 +543,7 @@ bad_map:
 	printk("md/raid0:%s: make_request bug: can't convert block across chunks"
 	       " or bigger than %dk %llu %d\n",
 	       mdname(mddev), chunk_sects / 2,
-	       (unsigned long long)bio->bi_sector, bio->bi_size >> 10);
+	       (unsigned long long)bio->bi_sector, bio_sectors(bio) / 2);
 
 	bio_io_error(bio);
 	return;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 32eab5b..deff0cd 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -267,7 +267,7 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
 			 (bio_data_dir(bio) == WRITE) ? "write" : "read",
 			 (unsigned long long) bio->bi_sector,
 			 (unsigned long long) bio->bi_sector +
-			 (bio->bi_size >> 9) - 1);
+			 bio_sectors(bio) - 1);
 
 		call_bio_endio(r1_bio);
 	}
@@ -458,7 +458,7 @@ static void raid1_end_write_request(struct bio *bio, int error)
 					 " %llu-%llu\n",
 					 (unsigned long long) mbio->bi_sector,
 					 (unsigned long long) mbio->bi_sector +
-					 (mbio->bi_size >> 9) - 1);
+					 bio_sectors(mbio) - 1);
 				call_bio_endio(r1_bio);
 			}
 		}
@@ -1041,7 +1041,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
 
 	r1_bio->master_bio = bio;
-	r1_bio->sectors = bio->bi_size >> 9;
+	r1_bio->sectors = bio_sectors(bio);
 	r1_bio->state = 0;
 	r1_bio->mddev = mddev;
 	r1_bio->sector = bio->bi_sector;
@@ -1119,7 +1119,7 @@ read_again:
 			r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
 
 			r1_bio->master_bio = bio;
-			r1_bio->sectors = (bio->bi_size >> 9) - sectors_handled;
+			r1_bio->sectors = bio_sectors(bio) - sectors_handled;
 			r1_bio->state = 0;
 			r1_bio->mddev = mddev;
 			r1_bio->sector = bio->bi_sector + sectors_handled;
@@ -1320,14 +1320,14 @@ read_again:
 	/* Mustn't call r1_bio_write_done before this next test,
 	 * as it could result in the bio being freed.
 	 */
-	if (sectors_handled < (bio->bi_size >> 9)) {
+	if (sectors_handled < bio_sectors(bio)) {
 		r1_bio_write_done(r1_bio);
 		/* We need another r1_bio.  It has already been counted
 		 * in bio->bi_phys_segments
 		 */
 		r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
 		r1_bio->master_bio = bio;
-		r1_bio->sectors = (bio->bi_size >> 9) - sectors_handled;
+		r1_bio->sectors = bio_sectors(bio) - sectors_handled;
 		r1_bio->state = 0;
 		r1_bio->mddev = mddev;
 		r1_bio->sector = bio->bi_sector + sectors_handled;
@@ -1936,7 +1936,7 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
 		wbio->bi_rw = WRITE;
 		wbio->bi_end_io = end_sync_write;
 		atomic_inc(&r1_bio->remaining);
-		md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9);
+		md_sync_acct(conf->mirrors[i].rdev->bdev, bio_sectors(wbio));
 
 		generic_make_request(wbio);
 	}
@@ -2272,8 +2272,7 @@ read_more:
 			r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
 
 			r1_bio->master_bio = mbio;
-			r1_bio->sectors = (mbio->bi_size >> 9)
-					  - sectors_handled;
+			r1_bio->sectors = bio_sectors(mbio) - sectors_handled;
 			r1_bio->state = 0;
 			set_bit(R1BIO_ReadError, &r1_bio->state);
 			r1_bio->mddev = mddev;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 1c2eb38..9715aaf 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1075,7 +1075,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	/* If this request crosses a chunk boundary, we need to
 	 * split it.  This will only happen for 1 PAGE (or less) requests.
 	 */
-	if (unlikely((bio->bi_sector & chunk_mask) + (bio->bi_size >> 9)
+	if (unlikely((bio->bi_sector & chunk_mask) + bio_sectors(bio)
 		     > chunk_sects
 		     && (conf->geo.near_copies < conf->geo.raid_disks
 			 || conf->prev.near_copies < conf->prev.raid_disks))) {
@@ -1115,7 +1115,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	bad_map:
 		printk("md/raid10:%s: make_request bug: can't convert block across chunks"
 		       " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
-		       (unsigned long long)bio->bi_sector, bio->bi_size >> 10);
+		       (unsigned long long)bio->bi_sector, bio_sectors(bio) / 2);
 
 		bio_io_error(bio);
 		return;
@@ -1130,7 +1130,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	 */
 	wait_barrier(conf);
 
-	sectors = bio->bi_size >> 9;
+	sectors = bio_sectors(bio);
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    bio->bi_sector < conf->reshape_progress &&
 	    bio->bi_sector + sectors > conf->reshape_progress) {
@@ -1232,8 +1232,7 @@ read_again:
 			r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 
 			r10_bio->master_bio = bio;
-			r10_bio->sectors = ((bio->bi_size >> 9)
-					    - sectors_handled);
+			r10_bio->sectors = bio_sectors(bio) - sectors_handled;
 			r10_bio->state = 0;
 			r10_bio->mddev = mddev;
 			r10_bio->sector = bio->bi_sector + sectors_handled;
@@ -1455,7 +1454,7 @@ retry_write:
 	 * after checking if we need to go around again.
 	 */
 
-	if (sectors_handled < (bio->bi_size >> 9)) {
+	if (sectors_handled < bio_sectors(bio)) {
 		one_write_done(r10_bio);
 		/* We need another r10_bio.  It has already been counted
 		 * in bio->bi_phys_segments.
@@ -1463,7 +1462,7 @@ retry_write:
 		r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 
 		r10_bio->master_bio = bio;
-		r10_bio->sectors = (bio->bi_size >> 9) - sectors_handled;
+		r10_bio->sectors = bio_sectors(bio) - sectors_handled;
 
 		r10_bio->mddev = mddev;
 		r10_bio->sector = bio->bi_sector + sectors_handled;
@@ -1984,7 +1983,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		d = r10_bio->devs[i].devnum;
 		atomic_inc(&conf->mirrors[d].rdev->nr_pending);
 		atomic_inc(&r10_bio->remaining);
-		md_sync_acct(conf->mirrors[d].rdev->bdev, tbio->bi_size >> 9);
+		md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(tbio));
 
 		tbio->bi_sector += conf->mirrors[d].rdev->data_offset;
 		tbio->bi_bdev = conf->mirrors[d].rdev->bdev;
@@ -2009,7 +2008,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		d = r10_bio->devs[i].devnum;
 		atomic_inc(&r10_bio->remaining);
 		md_sync_acct(conf->mirrors[d].replacement->bdev,
-			     tbio->bi_size >> 9);
+			     bio_sectors(tbio));
 		generic_make_request(tbio);
 	}
 
@@ -2135,13 +2134,13 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	wbio2 = r10_bio->devs[1].repl_bio;
 	if (wbio->bi_end_io) {
 		atomic_inc(&conf->mirrors[d].rdev->nr_pending);
-		md_sync_acct(conf->mirrors[d].rdev->bdev, wbio->bi_size >> 9);
+		md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(wbio));
 		generic_make_request(wbio);
 	}
 	if (wbio2 && wbio2->bi_end_io) {
 		atomic_inc(&conf->mirrors[d].replacement->nr_pending);
 		md_sync_acct(conf->mirrors[d].replacement->bdev,
-			     wbio2->bi_size >> 9);
+			     bio_sectors(wbio2));
 		generic_make_request(wbio2);
 	}
 }
@@ -2571,8 +2570,7 @@ read_more:
 		r10_bio = mempool_alloc(conf->r10bio_pool,
 					GFP_NOIO);
 		r10_bio->master_bio = mbio;
-		r10_bio->sectors = (mbio->bi_size >> 9)
-			- sectors_handled;
+		r10_bio->sectors = bio_sectors(mbio) - sectors_handled;
 		r10_bio->state = 0;
 		set_bit(R10BIO_ReadError,
 			&r10_bio->state);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 45a632c..cbfc424 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -88,7 +88,7 @@ static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
  */
 static inline struct bio *r5_next_bio(struct bio *bio, sector_t sector)
 {
-	int sectors = bio->bi_size >> 9;
+	int sectors = bio_sectors(bio);
 	if (bio->bi_sector + sectors < sector + STRIPE_SECTORS)
 		return bio->bi_next;
 	else
@@ -3771,7 +3771,7 @@ static int in_chunk_boundary(struct mddev *mddev, struct bio *bio)
 {
 	sector_t sector = bio->bi_sector + get_start_sect(bio->bi_bdev);
 	unsigned int chunk_sectors = mddev->chunk_sectors;
-	unsigned int bio_sectors = bio->bi_size >> 9;
+	unsigned int bio_sectors = bio_sectors(bio);
 
 	if (mddev->new_chunk_sectors < mddev->chunk_sectors)
 		chunk_sectors = mddev->new_chunk_sectors;
@@ -3861,7 +3861,7 @@ static int bio_fits_rdev(struct bio *bi)
 {
 	struct request_queue *q = bdev_get_queue(bi->bi_bdev);
 
-	if ((bi->bi_size>>9) > queue_max_sectors(q))
+	if (bio_sectors(bi) > queue_max_sectors(q))
 		return 0;
 	blk_recount_segments(q, bi);
 	if (bi->bi_phys_segments > queue_max_segments(q))
@@ -3931,7 +3931,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
 		align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
 
 		if (!bio_fits_rdev(align_bi) ||
-		    is_badblock(rdev, align_bi->bi_sector, align_bi->bi_size>>9,
+		    is_badblock(rdev, align_bi->bi_sector, bio_sectors(align_bi),
 				&first_bad, &bad_sectors)) {
 			/* too big in some way, or has a known bad block */
 			bio_put(align_bi);
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 05c5e61..3c210ac 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -193,7 +193,7 @@ TRACE_EVENT(block_bio_bounce,
 		__entry->dev		= bio->bi_bdev ?
 					  bio->bi_bdev->bd_dev : 0;
 		__entry->sector		= bio->bi_sector;
-		__entry->nr_sector	= bio->bi_size >> 9;
+		__entry->nr_sector	= bio_sectors(bio);
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
@@ -230,7 +230,7 @@ TRACE_EVENT(block_bio_complete,
 	TP_fast_assign(
 		__entry->dev		= bio->bi_bdev->bd_dev;
 		__entry->sector		= bio->bi_sector;
-		__entry->nr_sector	= bio->bi_size >> 9;
+		__entry->nr_sector	= bio_sectors(bio);
 		__entry->error		= error;
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
 	),
@@ -258,7 +258,7 @@ DECLARE_EVENT_CLASS(block_bio,
 	TP_fast_assign(
 		__entry->dev		= bio->bi_bdev->bd_dev;
 		__entry->sector		= bio->bi_sector;
-		__entry->nr_sector	= bio->bi_size >> 9;
+		__entry->nr_sector	= bio_sectors(bio);
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
@@ -330,7 +330,7 @@ DECLARE_EVENT_CLASS(block_get_rq,
 	TP_fast_assign(
 		__entry->dev		= bio ? bio->bi_bdev->bd_dev : 0;
 		__entry->sector		= bio ? bio->bi_sector : 0;
-		__entry->nr_sector	= bio ? bio->bi_size >> 9 : 0;
+		__entry->nr_sector	= bio ? bio_sectors(bio) : 0;
 		blk_fill_rwbs(__entry->rwbs,
 			      bio ? bio->bi_rw : 0, __entry->nr_sector);
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
@@ -506,7 +506,7 @@ TRACE_EVENT(block_bio_remap,
 	TP_fast_assign(
 		__entry->dev		= bio->bi_bdev->bd_dev;
 		__entry->sector		= bio->bi_sector;
-		__entry->nr_sector	= bio->bi_size >> 9;
+		__entry->nr_sector	= bio_sectors(bio);
 		__entry->old_dev	= dev;
 		__entry->old_sector	= from;
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 08/26] block: Change bio_split() to respect the current value of bi_idx
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (6 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 07/26] block: Use bio_sectors() more consistently Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 09/26] block: Remove bi_idx references Kent Overstreet
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Lars Ellenberg,
	Martin K. Petersen

In the current code bio_split() won't be seeing partially completed bios
so this doesn't change any behaviour, but this makes the code a bit
clearer as to what bio_split() actually requires.

The immediate purpose of the patch is removing unnecessary bi_idx
references, but the end goal is to allow partial completed bios to be
submitted, which along with immutable biovecs enables effecient bio
splitting.

Some of the callers were (double) checking that bios could be split, so
update their checks too.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/block/drbd/drbd_req.c | 6 +++---
 drivers/md/raid0.c            | 3 +--
 drivers/md/raid10.c           | 3 +--
 fs/bio-integrity.c            | 4 ++--
 fs/bio.c                      | 7 +++----
 5 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index d90a1fd..4d09e74 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1155,11 +1155,11 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 
 	/* can this bio be split generically?
 	 * Maybe add our own split-arbitrary-bios function. */
-	if (bio->bi_vcnt != 1 || bio->bi_idx != 0 || bio->bi_size > DRBD_MAX_BIO_SIZE) {
+	if (bio_segments(bio) != 1 || bio->bi_size > DRBD_MAX_BIO_SIZE) {
 		/* rather error out here than BUG in bio_split */
 		dev_err(DEV, "bio would need to, but cannot, be split: "
-		    "(vcnt=%u,idx=%u,size=%u,sector=%llu)\n",
-		    bio->bi_vcnt, bio->bi_idx, bio->bi_size,
+		    "(segments=%u,size=%u,sector=%llu)\n",
+		    bio_segments(bio), bio->bi_size,
 		    (unsigned long long)bio->bi_sector);
 		bio_endio(bio, -EINVAL);
 	} else {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 89016cb..769b3cb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -510,8 +510,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 		sector_t sector = bio->bi_sector;
 		struct bio_pair *bp;
 		/* Sanity check -- queue functions should prevent this happening */
-		if (bio->bi_vcnt != 1 ||
-		    bio->bi_idx != 0)
+		if (bio_segments(bio) != 1)
 			goto bad_map;
 		/* This is a one page bio that upper layers
 		 * refuse to split for us, so we need to split it.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9715aaf..bbd08f5 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1081,8 +1081,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 			 || conf->prev.near_copies < conf->prev.raid_disks))) {
 		struct bio_pair *bp;
 		/* Sanity check -- queue functions should prevent this happening */
-		if (bio->bi_vcnt != 1 ||
-		    bio->bi_idx != 0)
+		if (bio_segments(bio) != 1)
 			goto bad_map;
 		/* This is a one page bio that upper layers
 		 * refuse to split for us, so we need to split it.
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 65b708d..f175814 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -659,8 +659,8 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
 	bp->bio1.bi_integrity = &bp->bip1;
 	bp->bio2.bi_integrity = &bp->bip2;
 
-	bp->iv1 = bip->bip_vec[0];
-	bp->iv2 = bip->bip_vec[0];
+	bp->iv1 = bip->bip_vec[bip->bip_idx];
+	bp->iv2 = bip->bip_vec[bip->bip_idx];
 
 	bp->bip1.bip_vec = &bp->iv1;
 	bp->bip2.bip_vec = &bp->iv2;
diff --git a/fs/bio.c b/fs/bio.c
index 9a3935f..062ba8f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1616,8 +1616,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
 	trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
 				bi->bi_sector + first_sectors);
 
-	BUG_ON(bi->bi_vcnt != 1);
-	BUG_ON(bi->bi_idx != 0);
+	BUG_ON(bio_segments(bi) != 1);
 	atomic_set(&bp->cnt, 3);
 	bp->error = 0;
 	bp->bio1 = *bi;
@@ -1626,8 +1625,8 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
 	bp->bio2.bi_size -= first_sectors << 9;
 	bp->bio1.bi_size = first_sectors << 9;
 
-	bp->bv1 = bi->bi_io_vec[0];
-	bp->bv2 = bi->bi_io_vec[0];
+	bp->bv1 = *bio_iovec(bi);
+	bp->bv2 = *bio_iovec(bi);
 
 	if (bio_is_rw(bi)) {
 		bp->bv2.bv_offset += first_sectors << 9;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 09/26] block: Remove bi_idx references
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (7 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 08/26] block: Change bio_split() to respect the current value of bi_idx Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
       [not found]   ` <1348526106-17074-10-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34 ` [PATCH v3 10/26] block: Remove some unnecessary bi_vcnt usage Kent Overstreet
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal

For immutable bvecs, all bi_idx usage needs to be audited - so here
we're removing all the unnecessary uses.

Most of these are places where it was being initialized on a bio that
was just allocated, a few others are conversions to standard macros.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 drivers/block/aoe/aoeblk.c | 2 +-
 drivers/block/floppy.c     | 1 -
 drivers/md/dm-verity.c     | 2 +-
 drivers/md/raid10.c        | 1 -
 fs/buffer.c                | 1 -
 fs/jfs/jfs_logmgr.c        | 2 --
 fs/logfs/dev_bdev.c        | 5 -----
 mm/page_io.c               | 1 -
 8 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 321de7b..6e4420a 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -199,7 +199,7 @@ aoeblk_make_request(struct request_queue *q, struct bio *bio)
 	buf->bio = bio;
 	buf->resid = bio->bi_size;
 	buf->sector = bio->bi_sector;
-	buf->bv = &bio->bi_io_vec[bio->bi_idx];
+	buf->bv = bio_iovec(bio);
 	buf->bv_resid = buf->bv->bv_len;
 	WARN_ON(buf->bv_resid == 0);
 	buf->bv_off = buf->bv->bv_offset;
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 95e52879..24e5cef 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3778,7 +3778,6 @@ static int __floppy_read_block_0(struct block_device *bdev)
 	bio_vec.bv_len = size;
 	bio_vec.bv_offset = 0;
 	bio.bi_vcnt = 1;
-	bio.bi_idx = 0;
 	bio.bi_size = size;
 	bio.bi_bdev = bdev;
 	bio.bi_sector = 0;
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 8f95664..4431503 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -496,7 +496,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio,
 
 	bio->bi_end_io = verity_end_io;
 	bio->bi_private = io;
-	io->io_vec_size = bio->bi_vcnt - bio->bi_idx;
+	io->io_vec_size = bio_segments(bio);
 	if (io->io_vec_size < DM_VERITY_IO_VEC_INLINE)
 		io->io_vec = io->io_vec_inline;
 	else
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index bbd08f5..6d06d83 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4249,7 +4249,6 @@ read_more:
 	read_bio->bi_flags &= ~(BIO_POOL_MASK - 1);
 	read_bio->bi_flags |= 1 << BIO_UPTODATE;
 	read_bio->bi_vcnt = 0;
-	read_bio->bi_idx = 0;
 	read_bio->bi_size = 0;
 	r10_bio->master_bio = read_bio;
 	r10_bio->read_slot = r10_bio->devs[r10_bio->read_slot].devnum;
diff --git a/fs/buffer.c b/fs/buffer.c
index 58e2e7b..38d8793 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2893,7 +2893,6 @@ int submit_bh(int rw, struct buffer_head * bh)
 	bio->bi_io_vec[0].bv_offset = bh_offset(bh);
 
 	bio->bi_vcnt = 1;
-	bio->bi_idx = 0;
 	bio->bi_size = bh->b_size;
 
 	bio->bi_end_io = end_bio_bh_io_sync;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 2eb952c..8ae5e35 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2004,7 +2004,6 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 	bio->bi_io_vec[0].bv_offset = bp->l_offset;
 
 	bio->bi_vcnt = 1;
-	bio->bi_idx = 0;
 	bio->bi_size = LOGPSIZE;
 
 	bio->bi_end_io = lbmIODone;
@@ -2145,7 +2144,6 @@ static void lbmStartIO(struct lbuf * bp)
 	bio->bi_io_vec[0].bv_offset = bp->l_offset;
 
 	bio->bi_vcnt = 1;
-	bio->bi_idx = 0;
 	bio->bi_size = LOGPSIZE;
 
 	bio->bi_end_io = lbmIODone;
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index e784a21..550475c 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -32,7 +32,6 @@ static int sync_request(struct page *page, struct block_device *bdev, int rw)
 	bio_vec.bv_len = PAGE_SIZE;
 	bio_vec.bv_offset = 0;
 	bio.bi_vcnt = 1;
-	bio.bi_idx = 0;
 	bio.bi_size = PAGE_SIZE;
 	bio.bi_bdev = bdev;
 	bio.bi_sector = page->index * (PAGE_SIZE >> 9);
@@ -108,7 +107,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
 		if (i >= max_pages) {
 			/* Block layer cannot split bios :( */
 			bio->bi_vcnt = i;
-			bio->bi_idx = 0;
 			bio->bi_size = i * PAGE_SIZE;
 			bio->bi_bdev = super->s_bdev;
 			bio->bi_sector = ofs >> 9;
@@ -136,7 +134,6 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
 		unlock_page(page);
 	}
 	bio->bi_vcnt = nr_pages;
-	bio->bi_idx = 0;
 	bio->bi_size = nr_pages * PAGE_SIZE;
 	bio->bi_bdev = super->s_bdev;
 	bio->bi_sector = ofs >> 9;
@@ -202,7 +199,6 @@ static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
 		if (i >= max_pages) {
 			/* Block layer cannot split bios :( */
 			bio->bi_vcnt = i;
-			bio->bi_idx = 0;
 			bio->bi_size = i * PAGE_SIZE;
 			bio->bi_bdev = super->s_bdev;
 			bio->bi_sector = ofs >> 9;
@@ -224,7 +220,6 @@ static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
 		bio->bi_io_vec[i].bv_offset = 0;
 	}
 	bio->bi_vcnt = nr_pages;
-	bio->bi_idx = 0;
 	bio->bi_size = nr_pages * PAGE_SIZE;
 	bio->bi_bdev = super->s_bdev;
 	bio->bi_sector = ofs >> 9;
diff --git a/mm/page_io.c b/mm/page_io.c
index 78eee32..8d3c0c0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -35,7 +35,6 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 		bio->bi_io_vec[0].bv_len = PAGE_SIZE;
 		bio->bi_io_vec[0].bv_offset = 0;
 		bio->bi_vcnt = 1;
-		bio->bi_idx = 0;
 		bio->bi_size = PAGE_SIZE;
 		bio->bi_end_io = end_io;
 	}
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 10/26] block: Remove some unnecessary bi_vcnt usage
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (8 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 09/26] block: Remove bi_idx references Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md Kent Overstreet
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Eric Moore,
	James E.J. Bottomley, linux-scsi

More prep work for immutable bvecs/effecient bio splitting - usage of
bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
makes that easier.

Plus, bio_segments() is really what this code wanted, as it respects the
current value of bi_idx.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Eric Moore <Eric.Moore@lsi.com>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
---
 drivers/message/fusion/mptsas.c          |  6 +++---
 drivers/scsi/libsas/sas_expander.c       |  6 +++---
 drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +++++-----
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 551262e..5406a9f 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 	}
 
 	/* do we need to support multiple segments? */
-	if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
+	if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
 		printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u %u, rsp %u %u\n",
-		    ioc->name, __func__, req->bio->bi_vcnt, blk_rq_bytes(req),
-		    rsp->bio->bi_vcnt, blk_rq_bytes(rsp));
+		    ioc->name, __func__, bio_segments(req->bio), blk_rq_bytes(req),
+		    bio_segments(rsp->bio), blk_rq_bytes(rsp));
 		return -EINVAL;
 	}
 
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index efc6e72..ee331a7 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2151,10 +2151,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 	}
 
 	/* do we need to support multiple segments? */
-	if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
+	if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
 		printk("%s: multiple segments req %u %u, rsp %u %u\n",
-		       __func__, req->bio->bi_vcnt, blk_rq_bytes(req),
-		       rsp->bio->bi_vcnt, blk_rq_bytes(rsp));
+		       __func__, bio_segments(req->bio), blk_rq_bytes(req),
+		       bio_segments(rsp->bio), blk_rq_bytes(rsp));
 		return -EINVAL;
 	}
 
diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index c6cf20f..403a57b 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1939,7 +1939,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 	ioc->transport_cmds.status = MPT2_CMD_PENDING;
 
 	/* Check if the request is split across multiple segments */
-	if (req->bio->bi_vcnt > 1) {
+	if (bio_segments(req->bio) > 1) {
 		u32 offset = 0;
 
 		/* Allocate memory and copy the request */
@@ -1971,7 +1971,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 
 	/* Check if the response needs to be populated across
 	 * multiple segments */
-	if (rsp->bio->bi_vcnt > 1) {
+	if (bio_segments(rsp->bio) > 1) {
 		pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
 		    &pci_dma_in);
 		if (!pci_addr_in) {
@@ -2038,7 +2038,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 	sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
 	    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
 	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
-	if (req->bio->bi_vcnt > 1) {
+	if (bio_segments(req->bio) > 1) {
 		ioc->base_add_sg_single(psge, sgl_flags |
 		    (blk_rq_bytes(req) - 4), pci_dma_out);
 	} else {
@@ -2054,7 +2054,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 	    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
 	    MPI2_SGE_FLAGS_END_OF_LIST);
 	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
-	if (rsp->bio->bi_vcnt > 1) {
+	if (bio_segments(rsp->bio) > 1) {
 		ioc->base_add_sg_single(psge, sgl_flags |
 		    (blk_rq_bytes(rsp) + 4), pci_dma_in);
 	} else {
@@ -2099,7 +2099,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 		    le16_to_cpu(mpi_reply->ResponseDataLength);
 		/* check if the resp needs to be copied from the allocated
 		 * pci mem */
-		if (rsp->bio->bi_vcnt > 1) {
+		if (bio_segments(rsp->bio) > 1) {
 			u32 offset = 0;
 			u32 bytes_to_copy =
 			    le16_to_cpu(mpi_reply->ResponseDataLength);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (9 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 10/26] block: Remove some unnecessary bi_vcnt usage Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
       [not found]   ` <1348526106-17074-12-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34 ` [PATCH v3 12/26] raid10: Use bio_reset() Kent Overstreet
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

Random cleanup - this code was duplicated and it's not really specific
to md.

Also added the ability to return the actual error code.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
---
 drivers/md/raid1.c  | 19 -------------------
 drivers/md/raid10.c | 19 -------------------
 fs/bio.c            | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |  1 +
 4 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index deff0cd..28f506a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2048,25 +2048,6 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
 	}
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-	complete((struct completion *)bio->bi_private);
-}
-
-static int submit_bio_wait(int rw, struct bio *bio)
-{
-	struct completion event;
-	rw |= REQ_SYNC;
-
-	init_completion(&event);
-	bio->bi_private = &event;
-	bio->bi_end_io = bi_complete;
-	submit_bio(rw, bio);
-	wait_for_completion(&event);
-
-	return test_bit(BIO_UPTODATE, &bio->bi_flags);
-}
-
 static int narrow_write_error(struct r1bio *r1_bio, int i)
 {
 	struct mddev *mddev = r1_bio->mddev;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6d06d83..f001c1b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2410,25 +2410,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 	}
 }
 
-static void bi_complete(struct bio *bio, int error)
-{
-	complete((struct completion *)bio->bi_private);
-}
-
-static int submit_bio_wait(int rw, struct bio *bio)
-{
-	struct completion event;
-	rw |= REQ_SYNC;
-
-	init_completion(&event);
-	bio->bi_private = &event;
-	bio->bi_end_io = bi_complete;
-	submit_bio(rw, bio);
-	wait_for_completion(&event);
-
-	return test_bit(BIO_UPTODATE, &bio->bi_flags);
-}
-
 static int narrow_write_error(struct r10bio *r10_bio, int i)
 {
 	struct bio *bio = r10_bio->master_bio;
diff --git a/fs/bio.c b/fs/bio.c
index 062ba8f..9a30dcf 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -750,6 +750,42 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+struct submit_bio_ret {
+	struct completion event;
+	int error;
+};
+
+static void submit_bio_wait_endio(struct bio *bio, int error)
+{
+	struct submit_bio_ret *ret = bio->bi_private;
+
+	ret->error = error;
+	complete(&ret->event);
+}
+
+/**
+ * submit_bio_wait - submit a bio, and wait until it completes
+ * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
+ * @bio: The &struct bio which describes the I/O
+ *
+ * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
+ * bio_endio() on failure.
+ */
+int submit_bio_wait(int rw, struct bio *bio)
+{
+	struct submit_bio_ret ret;
+
+	rw |= REQ_SYNC;
+	init_completion(&ret.event);
+	bio->bi_private = &ret;
+	bio->bi_end_io = submit_bio_wait_endio;
+	submit_bio(rw, bio);
+	wait_for_completion(&ret.event);
+
+	return ret.error;
+}
+EXPORT_SYMBOL(submit_bio_wait);
+
 /**
  * bio_advance - increment/complete a bio by some number of bytes
  * @bio:	bio to advance
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d985e90..3bf696b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -252,6 +252,7 @@ extern void bio_endio(struct bio *, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
+extern int submit_bio_wait(int rw, struct bio *bio);
 extern void bio_advance(struct bio *, unsigned);
 
 extern void bio_init(struct bio *);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 12/26] raid10: Use bio_reset()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (10 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 13/26] raid1: use bio_reset() Kent Overstreet
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

More prep work for immutable bio vecs, mainly getting rid of references
to bi_idx.

bio_reset was being open coded in a few places. The one in sync_request
was a bit nontrivial to convert, so could use some extra eyeballs.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid10.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f001c1b..6b83207 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1958,13 +1958,10 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		 * First we need to fixup bv_offset, bv_len and
 		 * bi_vecs, as the read request might have corrupted these
 		 */
+		bio_reset(tbio);
+
 		tbio->bi_vcnt = vcnt;
 		tbio->bi_size = r10_bio->sectors << 9;
-		tbio->bi_idx = 0;
-		tbio->bi_phys_segments = 0;
-		tbio->bi_flags &= ~(BIO_POOL_MASK - 1);
-		tbio->bi_flags |= 1 << BIO_UPTODATE;
-		tbio->bi_next = NULL;
 		tbio->bi_rw = WRITE;
 		tbio->bi_private = r10_bio;
 		tbio->bi_sector = r10_bio->devs[i].addr;
@@ -2970,6 +2967,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 					}
 				}
 				bio = r10_bio->devs[0].bio;
+				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
 				bio->bi_private = r10_bio;
@@ -2994,6 +2992,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 				rdev = mirror->rdev;
 				if (!test_bit(In_sync, &rdev->flags)) {
 					bio = r10_bio->devs[1].bio;
+					bio_reset(bio);
 					bio->bi_next = biolist;
 					biolist = bio;
 					bio->bi_private = r10_bio;
@@ -3022,6 +3021,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 				if (rdev == NULL || bio == NULL ||
 				    test_bit(Faulty, &rdev->flags))
 					break;
+				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
 				bio->bi_private = r10_bio;
@@ -3120,7 +3120,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 				r10_bio->devs[i].repl_bio->bi_end_io = NULL;
 
 			bio = r10_bio->devs[i].bio;
-			bio->bi_end_io = NULL;
+			bio_reset(bio);
 			clear_bit(BIO_UPTODATE, &bio->bi_flags);
 			if (conf->mirrors[d].rdev == NULL ||
 			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
@@ -3157,6 +3157,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 
 			/* Need to set up for writing to the replacement */
 			bio = r10_bio->devs[i].repl_bio;
+			bio_reset(bio);
 			clear_bit(BIO_UPTODATE, &bio->bi_flags);
 
 			sector = r10_bio->devs[i].addr;
@@ -3190,17 +3191,6 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 	}
 
-	for (bio = biolist; bio ; bio=bio->bi_next) {
-
-		bio->bi_flags &= ~(BIO_POOL_MASK - 1);
-		if (bio->bi_end_io)
-			bio->bi_flags |= 1 << BIO_UPTODATE;
-		bio->bi_vcnt = 0;
-		bio->bi_idx = 0;
-		bio->bi_phys_segments = 0;
-		bio->bi_size = 0;
-	}
-
 	nr_sectors = 0;
 	if (sector_nr + max_sync < max_sector)
 		max_sector = sector_nr + max_sync;
@@ -4253,17 +4243,14 @@ read_more:
 		}
 		if (!rdev2 || test_bit(Faulty, &rdev2->flags))
 			continue;
+
+		bio_reset(b);
 		b->bi_bdev = rdev2->bdev;
 		b->bi_sector = r10_bio->devs[s/2].addr + rdev2->new_data_offset;
 		b->bi_private = r10_bio;
 		b->bi_end_io = end_reshape_write;
 		b->bi_rw = WRITE;
-		b->bi_flags &= ~(BIO_POOL_MASK - 1);
-		b->bi_flags |= 1 << BIO_UPTODATE;
 		b->bi_next = blist;
-		b->bi_vcnt = 0;
-		b->bi_idx = 0;
-		b->bi_size = 0;
 		blist = b;
 	}
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 13/26] raid1: use bio_reset()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (11 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 12/26] raid10: Use bio_reset() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 14/26] raid5: " Kent Overstreet
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 28f506a..4614b9e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1851,7 +1851,7 @@ static int process_checks(struct r1bio *r1_bio)
 		struct bio *sbio = r1_bio->bios[i];
 		int size;
 
-		if (r1_bio->bios[i]->bi_end_io != end_sync_read)
+		if (sbio->bi_end_io != end_sync_read)
 			continue;
 
 		if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
@@ -1876,16 +1876,15 @@ static int process_checks(struct r1bio *r1_bio)
 			continue;
 		}
 		/* fixup the bio for reuse */
+		bio_reset(sbio);
 		sbio->bi_vcnt = vcnt;
 		sbio->bi_size = r1_bio->sectors << 9;
-		sbio->bi_idx = 0;
-		sbio->bi_phys_segments = 0;
-		sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
-		sbio->bi_flags |= 1 << BIO_UPTODATE;
-		sbio->bi_next = NULL;
 		sbio->bi_sector = r1_bio->sector +
 			conf->mirrors[i].rdev->data_offset;
 		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+		sbio->bi_end_io = end_sync_read;
+		sbio->bi_private = r1_bio;
+
 		size = sbio->bi_size;
 		for (j = 0; j < vcnt ; j++) {
 			struct bio_vec *bi;
@@ -2426,18 +2425,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		struct md_rdev *rdev;
 		bio = r1_bio->bios[i];
-
-		/* take from bio_init */
-		bio->bi_next = NULL;
-		bio->bi_flags &= ~(BIO_POOL_MASK-1);
-		bio->bi_flags |= 1 << BIO_UPTODATE;
-		bio->bi_rw = READ;
-		bio->bi_vcnt = 0;
-		bio->bi_idx = 0;
-		bio->bi_phys_segments = 0;
-		bio->bi_size = 0;
-		bio->bi_end_io = NULL;
-		bio->bi_private = NULL;
+		bio_reset(bio);
 
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
 		if (rdev == NULL ||
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 14/26] raid5: use bio_reset()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (12 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 13/26] raid1: use bio_reset() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 15/26] raid1: Refactor narrow_write_error() to not use bi_idx Kent Overstreet
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

Had to shuffle the code around a bit (where bi_rw and bi_end_io were
set), but shouldn't really be anything tricky here

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
---
 drivers/md/raid5.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cbfc424..5e327ae 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -561,14 +561,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 		bi = &sh->dev[i].req;
 		rbi = &sh->dev[i].rreq; /* For writing to replacement */
 
-		bi->bi_rw = rw;
-		rbi->bi_rw = rw;
-		if (rw & WRITE) {
-			bi->bi_end_io = raid5_end_write_request;
-			rbi->bi_end_io = raid5_end_write_request;
-		} else
-			bi->bi_end_io = raid5_end_read_request;
-
 		rcu_read_lock();
 		rrdev = rcu_dereference(conf->disks[i].replacement);
 		smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */
@@ -643,7 +635,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 
 			set_bit(STRIPE_IO_STARTED, &sh->state);
 
+			bio_reset(bi);
 			bi->bi_bdev = rdev->bdev;
+			bi->bi_rw = rw;
+			bi->bi_end_io = (rw & WRITE)
+				? raid5_end_write_request
+				: raid5_end_read_request;
+			bi->bi_private = sh;
+
 			pr_debug("%s: for %llu schedule op %ld on disc %d\n",
 				__func__, (unsigned long long)sh->sector,
 				bi->bi_rw, i);
@@ -657,12 +656,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
 				bi->bi_rw |= REQ_FLUSH;
 
-			bi->bi_flags = 1 << BIO_UPTODATE;
-			bi->bi_idx = 0;
 			bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
 			bi->bi_io_vec[0].bv_offset = 0;
 			bi->bi_size = STRIPE_SIZE;
-			bi->bi_next = NULL;
 			if (rrdev)
 				set_bit(R5_DOUBLE_LOCKED, &sh->dev[i].flags);
 			generic_make_request(bi);
@@ -674,7 +670,13 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 
 			set_bit(STRIPE_IO_STARTED, &sh->state);
 
+			bio_reset(rbi);
 			rbi->bi_bdev = rrdev->bdev;
+			rbi->bi_rw = rw;
+			BUG_ON(!(rw & WRITE));
+			rbi->bi_end_io = raid5_end_write_request;
+			rbi->bi_private = sh;
+
 			pr_debug("%s: for %llu schedule op %ld on "
 				 "replacement disc %d\n",
 				__func__, (unsigned long long)sh->sector,
@@ -686,12 +688,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			else
 				rbi->bi_sector = (sh->sector
 						  + rrdev->data_offset);
-			rbi->bi_flags = 1 << BIO_UPTODATE;
-			rbi->bi_idx = 0;
 			rbi->bi_io_vec[0].bv_len = STRIPE_SIZE;
 			rbi->bi_io_vec[0].bv_offset = 0;
 			rbi->bi_size = STRIPE_SIZE;
-			rbi->bi_next = NULL;
 			generic_make_request(rbi);
 		}
 		if (!rdev && !rrdev) {
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 15/26] raid1: Refactor narrow_write_error() to not use bi_idx
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (13 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 14/26] raid5: " Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 16/26] block: Add bio_copy_data() Kent Overstreet
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal

More bi_idx removal. This code was just open coding bio_clone(). This
could probably be further improved by using bio_advance() instead of
skipping over null pages, but that'd be a larger rework.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4614b9e..af39f74 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2052,8 +2052,6 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	struct md_rdev *rdev = conf->mirrors[i].rdev;
-	int vcnt, idx;
-	struct bio_vec *vec;
 
 	/* bio has the data to be written to device 'i' where
 	 * we just recently had a write error.
@@ -2081,30 +2079,32 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
 		   & ~(sector_t)(block_sectors - 1))
 		- sector;
 
-	if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
-		vcnt = r1_bio->behind_page_count;
-		vec = r1_bio->behind_bvecs;
-		idx = 0;
-		while (vec[idx].bv_page == NULL)
-			idx++;
-	} else {
-		vcnt = r1_bio->master_bio->bi_vcnt;
-		vec = r1_bio->master_bio->bi_io_vec;
-		idx = r1_bio->master_bio->bi_idx;
-	}
 	while (sect_to_write) {
 		struct bio *wbio;
 		if (sectors > sect_to_write)
 			sectors = sect_to_write;
 		/* Write at 'sector' for 'sectors'*/
 
-		wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
-		memcpy(wbio->bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
-		wbio->bi_sector = r1_bio->sector;
+		if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
+			unsigned vcnt = r1_bio->behind_page_count;
+			struct bio_vec *vec = r1_bio->behind_bvecs;
+
+			while (!vec->bv_page) {
+				vec++;
+				vcnt--;
+			}
+
+			wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
+			memcpy(wbio->bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
+
+			wbio->bi_vcnt = vcnt;
+		} else {
+			wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
+		}
+
 		wbio->bi_rw = WRITE;
-		wbio->bi_vcnt = vcnt;
+		wbio->bi_sector = r1_bio->sector;
 		wbio->bi_size = r1_bio->sectors << 9;
-		wbio->bi_idx = idx;
 
 		md_trim_bio(wbio, sector - r1_bio->sector, sectors);
 		wbio->bi_sector += rdev->data_offset;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 16/26] block: Add bio_copy_data()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (14 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 15/26] raid1: Refactor narrow_write_error() to not use bi_idx Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 17/26] pktcdvd: use bio_copy_data() Kent Overstreet
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal

This gets open coded quite a bit and it's tricky to get right, so make a
generic version and convert some existing users over to it instead.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 fs/bio.c            | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |  2 ++
 2 files changed, 72 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 9a30dcf..d433825 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -827,6 +827,76 @@ void bio_advance(struct bio *bio, unsigned bytes)
 }
 EXPORT_SYMBOL(bio_advance);
 
+/**
+ * bio_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data(struct bio *dst, struct bio *src)
+{
+	struct bio_vec *src_bv, *dst_bv;
+	unsigned src_offset, dst_offset, bytes;
+	void *src_p, *dst_p;
+
+	src_bv = bio_iovec(src);
+	dst_bv = bio_iovec(dst);
+
+	src_offset = src_bv->bv_offset;
+	dst_offset = dst_bv->bv_offset;
+
+	while (1) {
+		if (src_offset == src_bv->bv_offset + src_bv->bv_len) {
+			src_bv++;
+			if (src_bv == bio_iovec_idx(src, src->bi_vcnt)) {
+				src = src->bi_next;
+				if (!src)
+					break;
+
+				src_bv = bio_iovec(src);
+			}
+
+			src_offset = src_bv->bv_offset;
+		}
+
+		if (dst_offset == dst_bv->bv_offset + dst_bv->bv_len) {
+			dst_bv++;
+			if (dst_bv == bio_iovec_idx(dst, dst->bi_vcnt)) {
+				dst = dst->bi_next;
+				if (!dst)
+					break;
+
+				dst_bv = bio_iovec(dst);
+			}
+
+			dst_offset = dst_bv->bv_offset;
+		}
+
+		bytes = min(dst_bv->bv_offset + dst_bv->bv_len - dst_offset,
+			    src_bv->bv_offset + src_bv->bv_len - src_offset);
+
+		src_p = kmap_atomic(src_bv->bv_page);
+		dst_p = kmap_atomic(dst_bv->bv_page);
+
+		memcpy(dst_p + dst_bv->bv_offset,
+		       src_p + src_bv->bv_offset,
+		       bytes);
+
+		kunmap_atomic(dst_p);
+		kunmap_atomic(src_p);
+
+		src_offset += bytes;
+		dst_offset += bytes;
+	}
+}
+EXPORT_SYMBOL(bio_copy_data);
+
 struct bio_map_data {
 	struct bio_vec *iovecs;
 	struct sg_iovec *sgvecs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3bf696b..467e101 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -289,6 +289,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 }
 #endif
 
+extern void bio_copy_data(struct bio *dst, struct bio *src);
+
 extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
 				 unsigned long, unsigned int, int, gfp_t);
 extern struct bio *bio_copy_user_iov(struct request_queue *,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 17/26] pktcdvd: use bio_copy_data()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (15 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 16/26] block: Add bio_copy_data() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 18/26] pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage Kent Overstreet
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Jiri Kosina

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>
---
 drivers/block/pktcdvd.c | 79 ++++++++-----------------------------------------
 1 file changed, 12 insertions(+), 67 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 2c27744..783c96c 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -948,31 +948,6 @@ static int pkt_set_segment_merging(struct pktcdvd_device *pd, struct request_que
 }
 
 /*
- * Copy CD_FRAMESIZE bytes from src_bio into a destination page
- */
-static void pkt_copy_bio_data(struct bio *src_bio, int seg, int offs, struct page *dst_page, int dst_offs)
-{
-	unsigned int copy_size = CD_FRAMESIZE;
-
-	while (copy_size > 0) {
-		struct bio_vec *src_bvl = bio_iovec_idx(src_bio, seg);
-		void *vfrom = kmap_atomic(src_bvl->bv_page) +
-			src_bvl->bv_offset + offs;
-		void *vto = page_address(dst_page) + dst_offs;
-		int len = min_t(int, copy_size, src_bvl->bv_len - offs);
-
-		BUG_ON(len < 0);
-		memcpy(vto, vfrom, len);
-		kunmap_atomic(vfrom);
-
-		seg++;
-		offs = 0;
-		dst_offs += len;
-		copy_size -= len;
-	}
-}
-
-/*
  * Copy all data for this packet to pkt->pages[], so that
  * a) The number of required segments for the write bio is minimized, which
  *    is necessary for some scsi controllers.
@@ -1325,55 +1300,35 @@ try_next_bio:
  */
 static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
-	struct bio *bio;
 	int f;
-	int frames_write;
 	struct bio_vec *bvec = pkt->w_bio->bi_io_vec;
 
+	bio_reset(pkt->w_bio);
+	pkt->w_bio->bi_sector = pkt->sector;
+	pkt->w_bio->bi_bdev = pd->bdev;
+	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
+	pkt->w_bio->bi_private = pkt;
+
+	/* XXX: locking? */
 	for (f = 0; f < pkt->frames; f++) {
 		bvec[f].bv_page = pkt->pages[(f * CD_FRAMESIZE) / PAGE_SIZE];
 		bvec[f].bv_offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
+		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
+			BUG();
 	}
+	VPRINTK(DRIVER_NAME": vcnt=%d\n", pkt->w_bio->bi_vcnt);
 
 	/*
 	 * Fill-in bvec with data from orig_bios.
 	 */
-	frames_write = 0;
 	spin_lock(&pkt->lock);
-	bio_list_for_each(bio, &pkt->orig_bios) {
-		int segment = bio->bi_idx;
-		int src_offs = 0;
-		int first_frame = (bio->bi_sector - pkt->sector) / (CD_FRAMESIZE >> 9);
-		int num_frames = bio->bi_size / CD_FRAMESIZE;
-		BUG_ON(first_frame < 0);
-		BUG_ON(first_frame + num_frames > pkt->frames);
-		for (f = first_frame; f < first_frame + num_frames; f++) {
-			struct bio_vec *src_bvl = bio_iovec_idx(bio, segment);
-
-			while (src_offs >= src_bvl->bv_len) {
-				src_offs -= src_bvl->bv_len;
-				segment++;
-				BUG_ON(segment >= bio->bi_vcnt);
-				src_bvl = bio_iovec_idx(bio, segment);
-			}
+	bio_copy_data(pkt->w_bio, pkt->orig_bios.head);
 
-			if (src_bvl->bv_len - src_offs >= CD_FRAMESIZE) {
-				bvec[f].bv_page = src_bvl->bv_page;
-				bvec[f].bv_offset = src_bvl->bv_offset + src_offs;
-			} else {
-				pkt_copy_bio_data(bio, segment, src_offs,
-						  bvec[f].bv_page, bvec[f].bv_offset);
-			}
-			src_offs += CD_FRAMESIZE;
-			frames_write++;
-		}
-	}
 	pkt_set_state(pkt, PACKET_WRITE_WAIT_STATE);
 	spin_unlock(&pkt->lock);
 
 	VPRINTK("pkt_start_write: Writing %d frames for zone %llx\n",
-		frames_write, (unsigned long long)pkt->sector);
-	BUG_ON(frames_write != pkt->write_size);
+		pkt->write_size, (unsigned long long)pkt->sector);
 
 	if (test_bit(PACKET_MERGE_SEGS, &pd->flags) || (pkt->write_size < pkt->frames)) {
 		pkt_make_local_copy(pkt, bvec);
@@ -1383,16 +1338,6 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	}
 
 	/* Start the write request */
-	bio_reset(pkt->w_bio);
-	pkt->w_bio->bi_sector = pkt->sector;
-	pkt->w_bio->bi_bdev = pd->bdev;
-	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
-	pkt->w_bio->bi_private = pkt;
-	for (f = 0; f < pkt->frames; f++)
-		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
-			BUG();
-	VPRINTK(DRIVER_NAME": vcnt=%d\n", pkt->w_bio->bi_vcnt);
-
 	atomic_set(&pkt->io_wait, 1);
 	pkt->w_bio->bi_rw = WRITE;
 	pkt_queue_bio(pd, pkt->w_bio);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 18/26] pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (16 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 17/26] pktcdvd: use bio_copy_data() Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:34 ` [PATCH v3 19/26] raid1: use bio_copy_data() Kent Overstreet
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: axboe, Jiri Kosina, Kent Overstreet, tj, vgoyal

In the short term this'll help with code auditing, and if this code ever
gets used now it's converted :)

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jiri Kosina <jkosina@suse.cz>
---
 drivers/block/pktcdvd.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 783c96c..1119042 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1156,16 +1156,15 @@ static int pkt_start_recovery(struct packet_data *pkt)
 	new_sector = new_block * (CD_FRAMESIZE >> 9);
 	pkt->sector = new_sector;
 
+	bio_reset(pkt->bio);
+	pkt->bio->bi_bdev = pd->bdev;
+	pkt->bio->bi_rw = REQ_WRITE;
 	pkt->bio->bi_sector = new_sector;
-	pkt->bio->bi_next = NULL;
-	pkt->bio->bi_flags = 1 << BIO_UPTODATE;
-	pkt->bio->bi_idx = 0;
-
-	BUG_ON(pkt->bio->bi_rw != REQ_WRITE);
-	BUG_ON(pkt->bio->bi_vcnt != pkt->frames);
-	BUG_ON(pkt->bio->bi_size != pkt->frames * CD_FRAMESIZE);
-	BUG_ON(pkt->bio->bi_end_io != pkt_end_io_packet_write);
-	BUG_ON(pkt->bio->bi_private != pkt);
+	pkt->bio->bi_size = pkt->frames * CD_FRAMESIZE;
+	pkt->bio->bi_vcnt = pkt->frames;
+
+	pkt->bio->bi_end_io = pkt_end_io_packet_write;
+	pkt->bio->bi_private = pkt;
 
 	drop_super(sb);
 	return 1;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 19/26] raid1: use bio_copy_data()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (17 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 18/26] pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage Kent Overstreet
@ 2012-09-24 22:34 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 20/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec Kent Overstreet
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:34 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

This doesn't really delete any code _yet_, but once immutable bvecs are
done we can just delete the rest of the code in that loop.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index af39f74..1c40d90 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1895,10 +1895,9 @@ static int process_checks(struct r1bio *r1_bio)
 			else
 				bi->bv_len = size;
 			size -= PAGE_SIZE;
-			memcpy(page_address(bi->bv_page),
-			       page_address(pbio->bi_io_vec[j].bv_page),
-			       PAGE_SIZE);
 		}
+
+		bio_copy_data(sbio, pbio);
 	}
 	return 0;
 }
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 20/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (18 preceding siblings ...)
  2012-09-24 22:34 ` [PATCH v3 19/26] raid1: use bio_copy_data() Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 21/26] block: Add bio_for_each_segment_all() Kent Overstreet
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel; +Cc: tj, axboe, Kent Overstreet, vgoyal

A bunch of what __blk_queue_bounce() was doing was problematic for the
immutable bvec work; this cleans that up and the code is quite a bit
smaller, too.

The __bio_for_each_segment() in copy_to_high_bio_irq() was changed
because that one's looping over the original bio, not the bounce bio -
a later patch renames __bio_for_each_segment() ->
bio_for_each_segment_all(), and documents that
bio_for_each_segment_all() is only for code that owns the bio.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
---
 mm/bounce.c | 73 ++++++++++++++++---------------------------------------------
 1 file changed, 19 insertions(+), 54 deletions(-)

diff --git a/mm/bounce.c b/mm/bounce.c
index 0420867..3068300 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -101,7 +101,7 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 	struct bio_vec *tovec, *fromvec;
 	int i;
 
-	__bio_for_each_segment(tovec, to, i, 0) {
+	bio_for_each_segment(tovec, to, i) {
 		fromvec = from->bi_io_vec + i;
 
 		/*
@@ -181,78 +181,43 @@ static void bounce_end_io_read_isa(struct bio *bio, int err)
 static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 			       mempool_t *pool)
 {
-	struct page *page;
-	struct bio *bio = NULL;
-	int i, rw = bio_data_dir(*bio_orig);
+	struct bio *bio;
+	int rw = bio_data_dir(*bio_orig);
 	struct bio_vec *to, *from;
+	unsigned i;
 
-	bio_for_each_segment(from, *bio_orig, i) {
-		page = from->bv_page;
+	bio_for_each_segment(from, *bio_orig, i)
+		if (page_to_pfn(from->bv_page) > queue_bounce_pfn(q))
+			goto bounce;
 
-		/*
-		 * is destination page below bounce pfn?
-		 */
-		if (page_to_pfn(page) <= queue_bounce_pfn(q))
-			continue;
-
-		/*
-		 * irk, bounce it
-		 */
-		if (!bio) {
-			unsigned int cnt = (*bio_orig)->bi_vcnt;
+	return;
+bounce:
+	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
 
-			bio = bio_alloc(GFP_NOIO, cnt);
-			memset(bio->bi_io_vec, 0, cnt * sizeof(struct bio_vec));
-		}
-			
+	bio_for_each_segment(to, bio, i) {
+		struct page *page = to->bv_page;
 
-		to = bio->bi_io_vec + i;
+		if (page_to_pfn(page) <= queue_bounce_pfn(q))
+			continue;
 
-		to->bv_page = mempool_alloc(pool, q->bounce_gfp);
-		to->bv_len = from->bv_len;
-		to->bv_offset = from->bv_offset;
 		inc_zone_page_state(to->bv_page, NR_BOUNCE);
+		to->bv_page = mempool_alloc(pool, q->bounce_gfp);
 
 		if (rw == WRITE) {
 			char *vto, *vfrom;
 
-			flush_dcache_page(from->bv_page);
+			flush_dcache_page(page);
+
 			vto = page_address(to->bv_page) + to->bv_offset;
-			vfrom = kmap(from->bv_page) + from->bv_offset;
+			vfrom = kmap_atomic(page) + to->bv_offset;
 			memcpy(vto, vfrom, to->bv_len);
-			kunmap(from->bv_page);
+			kunmap_atomic(vfrom);
 		}
 	}
 
-	/*
-	 * no pages bounced
-	 */
-	if (!bio)
-		return;
-
 	trace_block_bio_bounce(q, *bio_orig);
 
-	/*
-	 * at least one page was bounced, fill in possible non-highmem
-	 * pages
-	 */
-	__bio_for_each_segment(from, *bio_orig, i, 0) {
-		to = bio_iovec_idx(bio, i);
-		if (!to->bv_page) {
-			to->bv_page = from->bv_page;
-			to->bv_len = from->bv_len;
-			to->bv_offset = from->bv_offset;
-		}
-	}
-
-	bio->bi_bdev = (*bio_orig)->bi_bdev;
 	bio->bi_flags |= (1 << BIO_BOUNCED);
-	bio->bi_sector = (*bio_orig)->bi_sector;
-	bio->bi_rw = (*bio_orig)->bi_rw;
-
-	bio->bi_vcnt = (*bio_orig)->bi_vcnt;
-	bio->bi_idx = (*bio_orig)->bi_idx;
-	bio->bi_size = (*bio_orig)->bi_size;
 
 	if (pool == page_pool) {
 		bio->bi_end_io = bounce_end_io_write;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 21/26] block: Add bio_for_each_segment_all()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (19 preceding siblings ...)
  2012-09-24 22:35 ` [PATCH v3 20/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 22/26] block: Convert some code to bio_for_each_segment_all() Kent Overstreet
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: axboe, Kent Overstreet, Boaz Harrosh, tj, vgoyal

__bio_for_each_segment() iterates bvecs from the specified index
instead of bio->bv_idx.  Currently, the only usage is to walk all the
bvecs after the bio has been advanced by specifying 0 index.

For immutable bvecs, we need to split these apart;
bio_for_each_segment() is going to have a different implementation.
This will also help document the intent of code that's using it -
bio_for_each_segment_all() is only legal to use for code that owns the
bio.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Neil Brown <neilb@suse.de>
CC: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/md/raid1.c  |  2 +-
 fs/bio.c            | 12 ++++++------
 fs/exofs/ore.c      |  2 +-
 fs/exofs/ore_raid.c |  2 +-
 include/linux/bio.h | 16 +++++++++-------
 mm/bounce.c         |  2 +-
 6 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1c40d90..6a96e1f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1283,7 +1283,7 @@ read_again:
 			 * know the original bi_idx, so we just free
 			 * them all
 			 */
-			__bio_for_each_segment(bvec, mbio, j, 0)
+			bio_for_each_segment_all(bvec, mbio, j)
 				bvec->bv_page = r1_bio->behind_bvecs[j].bv_page;
 			if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags))
 				atomic_inc(&r1_bio->behind_remaining);
diff --git a/fs/bio.c b/fs/bio.c
index d433825..a641efd 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -959,7 +959,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
 	int iov_idx = 0;
 	unsigned int iov_off = 0;
 
-	__bio_for_each_segment(bvec, bio, i, 0) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		char *bv_addr = page_address(bvec->bv_page);
 		unsigned int bv_len = iovecs[i].bv_len;
 
@@ -1141,7 +1141,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	return bio;
 cleanup:
 	if (!map_data)
-		bio_for_each_segment(bvec, bio, i)
+		bio_for_each_segment_all(bvec, bio, i)
 			__free_page(bvec->bv_page);
 
 	bio_put(bio);
@@ -1355,7 +1355,7 @@ static void __bio_unmap_user(struct bio *bio)
 	/*
 	 * make sure we dirty pages we wrote to
 	 */
-	__bio_for_each_segment(bvec, bio, i, 0) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		if (bio_data_dir(bio) == READ)
 			set_page_dirty_lock(bvec->bv_page);
 
@@ -1461,7 +1461,7 @@ static void bio_copy_kern_endio(struct bio *bio, int err)
 	int i;
 	char *p = bmd->sgvecs[0].iov_base;
 
-	__bio_for_each_segment(bvec, bio, i, 0) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		char *addr = page_address(bvec->bv_page);
 		int len = bmd->iovecs[i].bv_len;
 
@@ -1501,7 +1501,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
 	if (!reading) {
 		void *p = data;
 
-		bio_for_each_segment(bvec, bio, i) {
+		bio_for_each_segment_all(bvec, bio, i) {
 			char *addr = page_address(bvec->bv_page);
 
 			memcpy(addr, p, bvec->bv_len);
@@ -1783,7 +1783,7 @@ sector_t bio_sector_offset(struct bio *bio, unsigned short index,
 	if (index >= bio->bi_idx)
 		index = bio->bi_vcnt - 1;
 
-	__bio_for_each_segment(bv, bio, i, 0) {
+	bio_for_each_segment_all(bv, bio, i) {
 		if (i == index) {
 			if (offset > bv->bv_offset)
 				sectors += (offset - bv->bv_offset) / sector_sz;
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index f936cb5..b744228 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -401,7 +401,7 @@ static void _clear_bio(struct bio *bio)
 	struct bio_vec *bv;
 	unsigned i;
 
-	__bio_for_each_segment(bv, bio, i, 0) {
+	bio_for_each_segment_all(bv, bio, i) {
 		unsigned this_count = bv->bv_len;
 
 		if (likely(PAGE_SIZE == this_count))
diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c
index 5f376d1..4dec928 100644
--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -432,7 +432,7 @@ static void _mark_read4write_pages_uptodate(struct ore_io_state *ios, int ret)
 		if (!bio)
 			continue;
 
-		__bio_for_each_segment(bv, bio, i, 0) {
+		bio_for_each_segment_all(bv, bio, i) {
 			struct page *page = bv->bv_page;
 
 			SetPageUptodate(page);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 467e101..32966c9 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -137,16 +137,18 @@ static inline int bio_has_allocated_vec(struct bio *bio)
 #define bio_io_error(bio) bio_endio((bio), -EIO)
 
 /*
- * drivers should not use the __ version unless they _really_ want to
- * run through the entire bio and not just pending pieces
+ * drivers should _never_ use the all version - the bio may have been split
+ * before it got to the driver and the driver won't own all of it
  */
-#define __bio_for_each_segment(bvl, bio, i, start_idx)			\
-	for (bvl = bio_iovec_idx((bio), (start_idx)), i = (start_idx);	\
-	     i < (bio)->bi_vcnt;					\
-	     bvl++, i++)
+#define bio_for_each_segment_all(bvl, bio, i)				\
+	for (i = 0;							\
+	     bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt;	\
+	     i++)
 
 #define bio_for_each_segment(bvl, bio, i)				\
-	__bio_for_each_segment(bvl, bio, i, (bio)->bi_idx)
+	for (i = (bio)->bi_idx;						\
+	     bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt;	\
+	     i++)
 
 /*
  * get a reference to a bio, so it won't disappear. the intended use is
diff --git a/mm/bounce.c b/mm/bounce.c
index 3068300..89324e2 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -134,7 +134,7 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool, int err)
 	/*
 	 * free up bounce indirect pages used
 	 */
-	__bio_for_each_segment(bvec, bio, i, 0) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		org_vec = bio_orig->bi_io_vec + i;
 		if (bvec->bv_page == org_vec->bv_page)
 			continue;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 22/26] block: Convert some code to bio_for_each_segment_all()
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (20 preceding siblings ...)
  2012-09-24 22:35 ` [PATCH v3 21/26] block: Add bio_for_each_segment_all() Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 24/26] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: axboe, Kent Overstreet, vgoyal, tj, Alasdair Kergon, Alexander Viro

More prep work for immutable bvecs:

A few places in the code were either open coding or using the wrong
version - fix.

After we introduce the bvec iter, it'll no longer be possible to modify
the biovec through bio_for_each_segment_all() - it doesn't increment a
pointer to the current bvec, you pass in a struct bio_vec (not a
pointer) which is updated with what the current biovec would be (taking
into account bi_bvec_done and bi_size).

So because of that it's more worthwhile to be consistent about
bio_for_each_segment()/bio_for_each_segment_all() usage.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Alexander Viro <viro@zeniv.linux.org.uk>
---
 drivers/md/dm-crypt.c |  3 +--
 drivers/md/raid1.c    | 10 +++-------
 fs/bio.c              | 20 ++++++++++----------
 fs/direct-io.c        |  8 ++++----
 mm/bounce.c           |  2 +-
 5 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index bbf459b..f50798e 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -858,8 +858,7 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
 	unsigned int i;
 	struct bio_vec *bv;
 
-	for (i = 0; i < clone->bi_vcnt; i++) {
-		bv = bio_iovec_idx(clone, i);
+	bio_for_each_segment_all(bv, clone, i) {
 		BUG_ON(!bv->bv_page);
 		mempool_free(bv->bv_page, cc->page_pool);
 		bv->bv_page = NULL;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6a96e1f..dff0473 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -921,7 +921,7 @@ static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
 	if (unlikely(!bvecs))
 		return;
 
-	bio_for_each_segment(bvec, bio, i) {
+	bio_for_each_segment_all(bvec, bio, i) {
 		bvecs[i] = *bvec;
 		bvecs[i].bv_page = alloc_page(GFP_NOIO);
 		if (unlikely(!bvecs[i].bv_page))
@@ -1276,12 +1276,8 @@ read_again:
 			struct bio_vec *bvec;
 			int j;
 
-			/* Yes, I really want the '__' version so that
-			 * we clear any unused pointer in the io_vec, rather
-			 * than leave them unchanged.  This is important
-			 * because when we come to free the pages, we won't
-			 * know the original bi_idx, so we just free
-			 * them all
+			/*
+			 * We trimmed the bio, so _all is legit
 			 */
 			bio_for_each_segment_all(bvec, mbio, j)
 				bvec->bv_page = r1_bio->behind_bvecs[j].bv_page;
diff --git a/fs/bio.c b/fs/bio.c
index a641efd..6ba9f27 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1546,11 +1546,11 @@ EXPORT_SYMBOL(bio_copy_kern);
  */
 void bio_set_pages_dirty(struct bio *bio)
 {
-	struct bio_vec *bvec = bio->bi_io_vec;
+	struct bio_vec *bvec;
 	int i;
 
-	for (i = 0; i < bio->bi_vcnt; i++) {
-		struct page *page = bvec[i].bv_page;
+	bio_for_each_segment_all(bvec, bio, i) {
+		struct page *page = bvec->bv_page;
 
 		if (page && !PageCompound(page))
 			set_page_dirty_lock(page);
@@ -1559,11 +1559,11 @@ void bio_set_pages_dirty(struct bio *bio)
 
 static void bio_release_pages(struct bio *bio)
 {
-	struct bio_vec *bvec = bio->bi_io_vec;
+	struct bio_vec *bvec;
 	int i;
 
-	for (i = 0; i < bio->bi_vcnt; i++) {
-		struct page *page = bvec[i].bv_page;
+	bio_for_each_segment_all(bvec, bio, i) {
+		struct page *page = bvec->bv_page;
 
 		if (page)
 			put_page(page);
@@ -1612,16 +1612,16 @@ static void bio_dirty_fn(struct work_struct *work)
 
 void bio_check_pages_dirty(struct bio *bio)
 {
-	struct bio_vec *bvec = bio->bi_io_vec;
+	struct bio_vec *bvec;
 	int nr_clean_pages = 0;
 	int i;
 
-	for (i = 0; i < bio->bi_vcnt; i++) {
-		struct page *page = bvec[i].bv_page;
+	bio_for_each_segment_all(bvec, bio, i) {
+		struct page *page = bvec->bv_page;
 
 		if (PageDirty(page) || PageCompound(page)) {
 			page_cache_release(page);
-			bvec[i].bv_page = NULL;
+			bvec->bv_page = NULL;
 		} else {
 			nr_clean_pages++;
 		}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index f86c720..6089916 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -441,8 +441,8 @@ static struct bio *dio_await_one(struct dio *dio)
 static int dio_bio_complete(struct dio *dio, struct bio *bio)
 {
 	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
-	struct bio_vec *bvec = bio->bi_io_vec;
-	int page_no;
+	struct bio_vec *bvec;
+	unsigned i;
 
 	if (!uptodate)
 		dio->io_error = -EIO;
@@ -450,8 +450,8 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	if (dio->is_async && dio->rw == READ) {
 		bio_check_pages_dirty(bio);	/* transfers ownership */
 	} else {
-		for (page_no = 0; page_no < bio->bi_vcnt; page_no++) {
-			struct page *page = bvec[page_no].bv_page;
+		bio_for_each_segment_all(bvec, bio, i) {
+			struct page *page = bvec->bv_page;
 
 			if (dio->rw == READ && !PageCompound(page))
 				set_page_dirty_lock(page);
diff --git a/mm/bounce.c b/mm/bounce.c
index 89324e2..bd7079a 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -194,7 +194,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 bounce:
 	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
 
-	bio_for_each_segment(to, bio, i) {
+	bio_for_each_segment_all(to, bio, i) {
 		struct page *page = to->bv_page;
 
 		if (page_to_pfn(page) <= queue_bounce_pfn(q))
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 23/26] block: Add bio_alloc_pages()
       [not found] ` <1348526106-17074-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-24 22:34   ` [PATCH v3 05/26] md: Convert md_trim_bio() to use bio_advance() Kent Overstreet
@ 2012-09-24 22:35   ` Kent Overstreet
  1 sibling, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Kent Overstreet, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, neilb-l3A5Bk7waGM,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

More utility code to replace stuff that's getting open coded.

Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
CC: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
---
 drivers/md/raid1.c  | 16 +++-------------
 fs/bio.c            | 28 ++++++++++++++++++++++++++++
 include/linux/bio.h |  1 +
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index dff0473..97f80b9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -92,7 +92,6 @@ static void r1bio_pool_free(void *r1_bio, void *data)
 static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct pool_info *pi = data;
-	struct page *page;
 	struct r1bio *r1_bio;
 	struct bio *bio;
 	int i, j;
@@ -122,14 +121,10 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 		j = 1;
 	while(j--) {
 		bio = r1_bio->bios[j];
-		for (i = 0; i < RESYNC_PAGES; i++) {
-			page = alloc_page(gfp_flags);
-			if (unlikely(!page))
-				goto out_free_pages;
+		bio->bi_vcnt = RESYNC_PAGES;
 
-			bio->bi_io_vec[i].bv_page = page;
-			bio->bi_vcnt = i+1;
-		}
+		if (bio_alloc_pages(bio, gfp_flags))
+			goto out_free_bio;
 	}
 	/* If not user-requests, copy the page pointers to all bios */
 	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
@@ -143,11 +138,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 
 	return r1_bio;
 
-out_free_pages:
-	for (j=0 ; j < pi->raid_disks; j++)
-		for (i=0; i < r1_bio->bios[j]->bi_vcnt ; i++)
-			put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
-	j = -1;
 out_free_bio:
 	while (++j < pi->raid_disks)
 		bio_put(r1_bio->bios[j]);
diff --git a/fs/bio.c b/fs/bio.c
index 6ba9f27..8e10d49 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -828,6 +828,34 @@ void bio_advance(struct bio *bio, unsigned bytes)
 EXPORT_SYMBOL(bio_advance);
 
 /**
+ * bio_alloc_pages - allocates a single page for each bvec in a bio
+ * @bio: bio to allocate pages for
+ * @gfp_mask: flags for allocation
+ *
+ * Allocates pages up to @bio->bi_vcnt.
+ *
+ * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
+ * freed.
+ */
+int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
+{
+	int i;
+	struct bio_vec *bv;
+
+	bio_for_each_segment_all(bv, bio, i) {
+		bv->bv_page = alloc_page(gfp_mask);
+		if (!bv->bv_page) {
+			while (--bv >= bio->bi_io_vec)
+				__free_page(bv->bv_page);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_alloc_pages);
+
+/**
  * bio_copy_data - copy contents of data buffers from one chain of bios to
  * another
  * @src: source bio list
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 32966c9..5a55ce5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -292,6 +292,7 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 #endif
 
 extern void bio_copy_data(struct bio *dst, struct bio *src);
+extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
 
 extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
 				 unsigned long, unsigned int, int, gfp_t);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 24/26] block: Add an explicit bio flag for bios that own their bvec
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (21 preceding siblings ...)
  2012-09-24 22:35 ` [PATCH v3 22/26] block: Convert some code to bio_for_each_segment_all() Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 25/26] bio-integrity: Add explicit field for owner of bip_buf Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 26/26] block: Add BIO_SUBMITTED flag, kill BIO_CLONED Kent Overstreet
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal

This is for the new bio splitting code. When we split a bio, if the
split occured on a bvec boundry we reuse the bvec for the new bio. But
that means bio_free() can't free it, hence the explicit flag.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/bio.c                  | 4 +++-
 include/linux/bio.h       | 5 -----
 include/linux/blk_types.h | 1 +
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 8e10d49..00639e6 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -250,7 +250,7 @@ static void bio_free(struct bio *bio)
 	__bio_free(bio);
 
 	if (bs) {
-		if (bio_has_allocated_vec(bio))
+		if (bio_flagged(bio, BIO_OWNS_VEC))
 			bvec_free_bs(bs->bvec_pool, bio->bi_io_vec, BIO_POOL_IDX(bio));
 
 		/*
@@ -449,6 +449,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 
 		if (unlikely(!bvl))
 			goto err_free;
+
+		bio->bi_flags |= 1 << BIO_OWNS_VEC;
 	} else if (nr_iovecs) {
 		bvl = bio->bi_inline_vecs;
 	}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a55ce5..bd130b6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -85,11 +85,6 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
-static inline int bio_has_allocated_vec(struct bio *bio)
-{
-	return bio->bi_io_vec && bio->bi_io_vec != bio->bi_inline_vecs;
-}
-
 /*
  * will die
  */
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cdf1119..6547103 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -117,6 +117,7 @@ struct bio {
  * BIO_POOL_IDX()
  */
 #define BIO_RESET_BITS	12
+#define BIO_OWNS_VEC	12	/* bio_free() should free bvec */
 
 #define bio_flagged(bio, flag)	((bio)->bi_flags & (1 << (flag)))
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 25/26] bio-integrity: Add explicit field for owner of bip_buf
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (22 preceding siblings ...)
  2012-09-24 22:35 ` [PATCH v3 24/26] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  2012-09-24 22:35 ` [PATCH v3 26/26] block: Add BIO_SUBMITTED flag, kill BIO_CLONED Kent Overstreet
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Martin K. Petersen

This was the only real user of BIO_CLONED, which didn't have very clear
semantics. Convert to its own flag so we can get rid of BIO_CLONED.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
---
 fs/bio-integrity.c  | 5 ++---
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index f175814..c5f33b4 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -95,9 +95,7 @@ void bio_integrity_free(struct bio *bio)
 	struct bio_integrity_payload *bip = bio->bi_integrity;
 	struct bio_set *bs = bio->bi_pool;
 
-	/* A cloned bio doesn't own the integrity metadata */
-	if (!bio_flagged(bio, BIO_CLONED) && !bio_flagged(bio, BIO_FS_INTEGRITY)
-	    && bip->bip_buf != NULL)
+	if (bip->bip_owns_buf)
 		kfree(bip->bip_buf);
 
 	if (bs) {
@@ -384,6 +382,7 @@ int bio_integrity_prep(struct bio *bio)
 		return -EIO;
 	}
 
+	bip->bip_owns_buf = 1;
 	bip->bip_buf = buf;
 	bip->bip_size = len;
 	bip->bip_sector = bio->bi_sector;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bd130b6..d316fb3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -178,6 +178,7 @@ struct bio_integrity_payload {
 	unsigned short		bip_slab;	/* slab the bip came from */
 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
 	unsigned short		bip_idx;	/* current bip_vec index */
+	unsigned		bip_owns_buf:1;	/* should free bip_buf */
 
 	struct work_struct	bip_work;	/* I/O completion */
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 26/26] block: Add BIO_SUBMITTED flag, kill BIO_CLONED
  2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
                   ` (23 preceding siblings ...)
  2012-09-24 22:35 ` [PATCH v3 25/26] bio-integrity: Add explicit field for owner of bip_buf Kent Overstreet
@ 2012-09-24 22:35 ` Kent Overstreet
  24 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 22:35 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, neilb, vgoyal, Alasdair Kergon,
	Martin K. Petersen

BIO_CLONED wasn't very useful, and didn't have very clear semantics, so
kill it.

Replace it with a more useful flag - BIO_SUBMITTED means the bio has
been passed to generic_make_request() and the bvec can no longer be
modified.

Roll both changes into the same patch so we can steal the old bit for
the new flag.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-core.c          | 2 ++
 drivers/md/dm.c           | 1 -
 fs/bio-integrity.c        | 1 -
 fs/bio.c                  | 8 +++++---
 include/linux/blk_types.h | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index df9bb5f..fcf4335 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1631,6 +1631,8 @@ generic_make_request_checks(struct bio *bio)
 
 	might_sleep();
 
+	bio->bi_flags |= 1 << BIO_SUBMITTED;
+
 	if (bio_check_eod(bio, nr_sectors))
 		goto end_io;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8378797..777e70d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1065,7 +1065,6 @@ static struct bio *split_bvec(struct bio *bio, sector_t sector,
 	clone->bi_size = to_bytes(len);
 	clone->bi_io_vec->bv_offset = offset;
 	clone->bi_io_vec->bv_len = clone->bi_size;
-	clone->bi_flags |= 1 << BIO_CLONED;
 
 	if (bio_integrity(bio)) {
 		bio_integrity_clone(clone, bio, GFP_NOIO);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index c5f33b4..93ec7d9 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -623,7 +623,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
 
 	BUG_ON(bip == NULL);
 	BUG_ON(bi == NULL);
-	BUG_ON(!bio_flagged(bio, BIO_CLONED));
 
 	nr_sectors = bio_integrity_hw_sectors(bi, sectors);
 	bip->bip_sector = bip->bip_sector + offset;
diff --git a/fs/bio.c b/fs/bio.c
index 00639e6..dab51bf 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -531,7 +531,7 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 	 */
 	bio->bi_sector = bio_src->bi_sector;
 	bio->bi_bdev = bio_src->bi_bdev;
-	bio->bi_flags |= 1 << BIO_CLONED;
+	bio->bi_flags |= (bio_src->bi_flags & (1 << BIO_SUBMITTED));
 	bio->bi_rw = bio_src->bi_rw;
 	bio->bi_vcnt = bio_src->bi_vcnt;
 	bio->bi_size = bio_src->bi_size;
@@ -604,9 +604,9 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
 	struct bio_vec *bvec;
 
 	/*
-	 * cloned bio must not modify vec list
+	 * submitted bio must not modify vec list
 	 */
-	if (unlikely(bio_flagged(bio, BIO_CLONED)))
+	if (unlikely(bio_flagged(bio, BIO_SUBMITTED)))
 		return 0;
 
 	if (((bio->bi_size + len) >> 9) > max_sectors)
@@ -844,6 +844,8 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
 	int i;
 	struct bio_vec *bv;
 
+	BUG_ON(bio_flagged(bio, BIO_SUBMITTED));
+
 	bio_for_each_segment_all(bv, bio, i) {
 		bv->bv_page = alloc_page(gfp_mask);
 		if (!bv->bv_page) {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 6547103..3958888 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -103,7 +103,7 @@ struct bio {
 #define BIO_RW_BLOCK	1	/* RW_AHEAD set, and read/write would block */
 #define BIO_EOF		2	/* out-out-bounds error */
 #define BIO_SEG_VALID	3	/* bi_phys_segments valid */
-#define BIO_CLONED	4	/* doesn't own data */
+#define BIO_SUBMITTED	4	/* bio has been submitted */
 #define BIO_BOUNCED	5	/* bio is a bounce bio */
 #define BIO_USER_MAPPED 6	/* contains user pages */
 #define BIO_EOPNOTSUPP	7	/* not supported */
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 07/26] block: Use bio_sectors() more consistently
       [not found]   ` <1348526106-17074-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-24 23:04     ` Jim Paris
       [not found]       ` <20120924230449.GA2040-SRSuHwkuBJlaX0KmTac7FA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Jim Paris @ 2012-09-24 23:04 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, neilb-l3A5Bk7waGM,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, Ed L. Cashin, Nick Piggin,
	Jiri Kosina, Geoff Levand, Alasdair Kergon, Steven Rostedt

Kent Overstreet wrote:
> Bunch of places in the code weren't using it where they could be -
> this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
> into a struct bvec_iter.
> 
> Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> CC: "Ed L. Cashin" <ecashin-kFqnmIA2sJXQT0dZR+AlfA@public.gmane.org>
> CC: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> CC: Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org>
> CC: Jim Paris <jim-XrPbb/hENzg@public.gmane.org>
> CC: Geoff Levand <geoff-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Alasdair Kergon <agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> CC: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> ---
>  drivers/block/aoe/aoecmd.c   |  2 +-
>  drivers/block/brd.c          |  3 +--
>  drivers/block/pktcdvd.c      |  2 +-
>  drivers/block/ps3vram.c      |  2 +-
>  drivers/md/dm-raid1.c        |  2 +-
>  drivers/md/raid0.c           |  6 +++---
>  drivers/md/raid1.c           | 17 ++++++++---------
>  drivers/md/raid10.c          | 24 +++++++++++-------------
>  drivers/md/raid5.c           |  8 ++++----
>  include/trace/events/block.h | 10 +++++-----
>  10 files changed, 36 insertions(+), 40 deletions(-)
...
> diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
> index f58cdcf..1ff38e8 100644
> --- a/drivers/block/ps3vram.c
> +++ b/drivers/block/ps3vram.c
> @@ -553,7 +553,7 @@ static struct bio *ps3vram_do_bio(struct ps3_system_bus_device *dev,
>  	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>  	int write = bio_data_dir(bio) == WRITE;
>  	const char *op = write ? "write" : "read";
> -	loff_t offset = bio->bi_sector << 9;
> +	loff_t offset = bio_sectors(bio);

This doesn't look right to me: bio_sectors(bio) is (bio->bi_size>>9),
not (bio->bi_sector<<9~)?

-jim

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 07/26] block: Use bio_sectors() more consistently
       [not found]       ` <20120924230449.GA2040-SRSuHwkuBJlaX0KmTac7FA@public.gmane.org>
@ 2012-09-24 23:09         ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-24 23:09 UTC (permalink / raw)
  To: Jim Paris
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, neilb-l3A5Bk7waGM,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, Ed L. Cashin, Nick Piggin,
	Jiri Kosina, Geoff Levand, Alasdair Kergon, Steven Rostedt

On Mon, Sep 24, 2012 at 07:04:50PM -0400, Jim Paris wrote:
> Kent Overstreet wrote:
> > Bunch of places in the code weren't using it where they could be -
> > this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
> > into a struct bvec_iter.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > CC: "Ed L. Cashin" <ecashin-kFqnmIA2sJXQT0dZR+AlfA@public.gmane.org>
> > CC: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > CC: Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org>
> > CC: Jim Paris <jim-XrPbb/hENzg@public.gmane.org>
> > CC: Geoff Levand <geoff-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> > CC: Alasdair Kergon <agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > CC: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> > CC: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>
> > CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> > ---
> >  drivers/block/aoe/aoecmd.c   |  2 +-
> >  drivers/block/brd.c          |  3 +--
> >  drivers/block/pktcdvd.c      |  2 +-
> >  drivers/block/ps3vram.c      |  2 +-
> >  drivers/md/dm-raid1.c        |  2 +-
> >  drivers/md/raid0.c           |  6 +++---
> >  drivers/md/raid1.c           | 17 ++++++++---------
> >  drivers/md/raid10.c          | 24 +++++++++++-------------
> >  drivers/md/raid5.c           |  8 ++++----
> >  include/trace/events/block.h | 10 +++++-----
> >  10 files changed, 36 insertions(+), 40 deletions(-)
> ...
> > diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
> > index f58cdcf..1ff38e8 100644
> > --- a/drivers/block/ps3vram.c
> > +++ b/drivers/block/ps3vram.c
> > @@ -553,7 +553,7 @@ static struct bio *ps3vram_do_bio(struct ps3_system_bus_device *dev,
> >  	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
> >  	int write = bio_data_dir(bio) == WRITE;
> >  	const char *op = write ? "write" : "read";
> > -	loff_t offset = bio->bi_sector << 9;
> > +	loff_t offset = bio_sectors(bio);
> 
> This doesn't look right to me: bio_sectors(bio) is (bio->bi_size>>9),
> not (bio->bi_sector<<9~)?

Whoops, you're right - that's completely wrong. Thanks.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 07/26] block: Use bio_sectors() more consistently
  2012-09-24 22:34 ` [PATCH v3 07/26] block: Use bio_sectors() more consistently Kent Overstreet
       [not found]   ` <1348526106-17074-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-25  0:54   ` Ed Cashin
  1 sibling, 0 replies; 69+ messages in thread
From: Ed Cashin @ 2012-09-25  0:54 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, neilb, vgoyal,
	Nick Piggin, Jiri Kosina, Jim Paris, Geoff Levand,
	Alasdair Kergon, Steven Rostedt

On Sep 24, 2012, at 6:34 PM, Kent Overstreet wrote:

> Bunch of places in the code weren't using it where they could be -
> this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
> into a struct bvec_iter.

The aoe changes look fine to me, thanks.

-- 
  Ed Cashin
  ecashin@coraid.com	ecashin@noserose.net
  http://www.coraid.com	http://noserose.net/e/

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
       [not found]   ` <1348526106-17074-12-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-25  5:51     ` Hannes Reinecke
  2012-09-25 22:15       ` Kent Overstreet
  2012-10-02 19:41     ` Vivek Goyal
  1 sibling, 1 reply; 69+ messages in thread
From: Hannes Reinecke @ 2012-09-25  5:51 UTC (permalink / raw)
  To: device-mapper development
  Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On 09/25/2012 12:34 AM, Kent Overstreet wrote:
> Random cleanup - this code was duplicated and it's not really specific
> to md.
> 
> Also added the ability to return the actual error code.
> 
> Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> CC: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/md/raid1.c  | 19 -------------------
>  drivers/md/raid10.c | 19 -------------------
>  fs/bio.c            | 36 ++++++++++++++++++++++++++++++++++++
>  include/linux/bio.h |  1 +
>  4 files changed, 37 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index deff0cd..28f506a 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2048,25 +2048,6 @@ static void fix_read_error(struct r1conf *conf, int read_disk,
>  	}
>  }
>  
> -static void bi_complete(struct bio *bio, int error)
> -{
> -	complete((struct completion *)bio->bi_private);
> -}
> -
> -static int submit_bio_wait(int rw, struct bio *bio)
> -{
> -	struct completion event;
> -	rw |= REQ_SYNC;
> -
> -	init_completion(&event);
> -	bio->bi_private = &event;
> -	bio->bi_end_io = bi_complete;
> -	submit_bio(rw, bio);
> -	wait_for_completion(&event);
> -
> -	return test_bit(BIO_UPTODATE, &bio->bi_flags);
> -}
> -
>  static int narrow_write_error(struct r1bio *r1_bio, int i)
>  {
>  	struct mddev *mddev = r1_bio->mddev;
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6d06d83..f001c1b 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2410,25 +2410,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>  	}
>  }
>  
> -static void bi_complete(struct bio *bio, int error)
> -{
> -	complete((struct completion *)bio->bi_private);
> -}
> -
> -static int submit_bio_wait(int rw, struct bio *bio)
> -{
> -	struct completion event;
> -	rw |= REQ_SYNC;
> -
> -	init_completion(&event);
> -	bio->bi_private = &event;
> -	bio->bi_end_io = bi_complete;
> -	submit_bio(rw, bio);
> -	wait_for_completion(&event);
> -
> -	return test_bit(BIO_UPTODATE, &bio->bi_flags);
> -}
> -
>  static int narrow_write_error(struct r10bio *r10_bio, int i)
>  {
>  	struct bio *bio = r10_bio->master_bio;
> diff --git a/fs/bio.c b/fs/bio.c
> index 062ba8f..9a30dcf 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -750,6 +750,42 @@ int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
>  }
>  EXPORT_SYMBOL(bio_add_page);
>  
> +struct submit_bio_ret {
> +	struct completion event;
> +	int error;
> +};
> +
> +static void submit_bio_wait_endio(struct bio *bio, int error)
> +{
> +	struct submit_bio_ret *ret = bio->bi_private;
> +
> +	ret->error = error;
> +	complete(&ret->event);
> +}
> +
> +/**
> + * submit_bio_wait - submit a bio, and wait until it completes
> + * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
> + * @bio: The &struct bio which describes the I/O
> + *
> + * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
> + * bio_endio() on failure.
> + */
> +int submit_bio_wait(int rw, struct bio *bio)
> +{
> +	struct submit_bio_ret ret;
> +
> +	rw |= REQ_SYNC;
> +	init_completion(&ret.event);
> +	bio->bi_private = &ret;
> +	bio->bi_end_io = submit_bio_wait_endio;

Hmm. As this is meant to be a generic function, blindly overwriting
the bi_end_io pointer doesn't look like a good idea; the caller
could have set something there.

Please add at least a WARN_ON(bio->bi_end_io) prior to modifying it.

> +	submit_bio(rw, bio);
> +	wait_for_completion(&ret.event);
> +
> +	return ret.error;
> +}
> +EXPORT_SYMBOL(submit_bio_wait);
> +
>  /**
>   * bio_advance - increment/complete a bio by some number of bytes
>   * @bio:	bio to advance
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d985e90..3bf696b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -252,6 +252,7 @@ extern void bio_endio(struct bio *, int);
>  struct request_queue;
>  extern int bio_phys_segments(struct request_queue *, struct bio *);
>  
> +extern int submit_bio_wait(int rw, struct bio *bio);
>  extern void bio_advance(struct bio *, unsigned);
>  
>  extern void bio_init(struct bio *);
> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 06/26] block: Add bio_end_sector()
       [not found]   ` <1348526106-17074-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-25 11:54     ` Lars Ellenberg
       [not found]       ` <20120925115452.GF8143-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
  2012-10-02 18:10     ` [dm-devel] " Vivek Goyal
  1 sibling, 1 reply; 69+ messages in thread
From: Lars Ellenberg @ 2012-09-25 11:54 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	Chris Mason, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, neilb-l3A5Bk7waGM,
	Jiri Kosina, Heiko Carstens, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	tj-DgEjT+Ai2ygdnm+yROfE0A, linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, Martin Schwidefsky,
	Steven Whitehouse, Alasdair Kergon, Lars Ellenberg

On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:
> Just a little convenience macro - main reason to add it now is preparing
> for immutable bio vecs, it'll reduce the size of the patch that puts
> bi_sector/bi_size/bi_idx into a struct bvec_iter.


For the DRBD part:

> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index 01b2ac6..d90a1fd 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
>  	/* to make some things easier, force alignment of requests within the
>  	 * granularity of our hash tables */
>  	s_enr = bio->bi_sector >> HT_SHIFT;
> -	e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT : s_enr;
> +	e_enr = (bio_end_sector(bio) - 1) >> HT_SHIFT;

You ignored the bio->bi_size ? : ;

#define bio_end_sector(bio)	((bio)->bi_sector + bio_sectors(bio))
which turns out (bio->bi_sector + (bio->bi_size >> 9))

Note that bi_size may be 0, bio_end_sector(bio)-1 then is bi_sector -1,
for an empty flush with bi_sector == 0, this ends up as (sector_t)-1ULL,
and this code path breaks horribly.


+	e_enr = bio->bi_size ? (bio_end_sector(bio) - 1) >> HT_SHIFT : s_enr;


Thanks,
	Lars

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Drbd-dev] [PATCH v3 06/26] block: Add bio_end_sector()
       [not found]       ` <20120925115452.GF8143-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
@ 2012-09-25 22:06         ` Kent Overstreet
       [not found]           ` <20120925220624.GC22647-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-09-25 22:06 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Chris Mason,
	neilb-l3A5Bk7waGM, Jiri Kosina, Heiko Carstens,
	Martin Schwidefsky, Alasdair Kergon, tj-DgEjT+Ai2ygdnm+yROfE0A,
	Steven Whitehouse, vgoyal-H+wXaHxf7aLQT0dZR+AlfA, Lars Ellenberg

On Tue, Sep 25, 2012 at 01:54:52PM +0200, Lars Ellenberg wrote:
> On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:
> > Just a little convenience macro - main reason to add it now is preparing
> > for immutable bio vecs, it'll reduce the size of the patch that puts
> > bi_sector/bi_size/bi_idx into a struct bvec_iter.
> 
> 
> For the DRBD part:
> 
> > diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> > index 01b2ac6..d90a1fd 100644
> > --- a/drivers/block/drbd/drbd_req.c
> > +++ b/drivers/block/drbd/drbd_req.c
> > @@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
> >  	/* to make some things easier, force alignment of requests within the
> >  	 * granularity of our hash tables */
> >  	s_enr = bio->bi_sector >> HT_SHIFT;
> > -	e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT : s_enr;
> > +	e_enr = (bio_end_sector(bio) - 1) >> HT_SHIFT;
> 
> You ignored the bio->bi_size ? : ;
> 
> #define bio_end_sector(bio)	((bio)->bi_sector + bio_sectors(bio))
> which turns out (bio->bi_sector + (bio->bi_size >> 9))
> 
> Note that bi_size may be 0, bio_end_sector(bio)-1 then is bi_sector -1,
> for an empty flush with bi_sector == 0, this ends up as (sector_t)-1ULL,
> and this code path breaks horribly.

Man, that was dumb of me - thanks for catching it. Version below look
good?

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 01b2ac6..47f55db 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 	/* to make some things easier, force alignment of requests within the
 	 * granularity of our hash tables */
 	s_enr = bio->bi_sector >> HT_SHIFT;
-	e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT : s_enr;
+	e_enr = bio->bi_size ? (bio_end_sector(bio) - 1) >> HT_SHIFT : s_enr;
 
 	if (likely(s_enr == e_enr)) {
 		do {

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
  2012-09-25  5:51     ` [dm-devel] " Hannes Reinecke
@ 2012-09-25 22:15       ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-25 22:15 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, linux-bcache, linux-kernel, tj, axboe, vgoyal

On Tue, Sep 25, 2012 at 07:51:07AM +0200, Hannes Reinecke wrote:
> On 09/25/2012 12:34 AM, Kent Overstreet wrote:
> > +/**
> > + * submit_bio_wait - submit a bio, and wait until it completes
> > + * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
> > + * @bio: The &struct bio which describes the I/O
> > + *
> > + * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
> > + * bio_endio() on failure.
> > + */
> > +int submit_bio_wait(int rw, struct bio *bio)
> > +{
> > +	struct submit_bio_ret ret;
> > +
> > +	rw |= REQ_SYNC;
> > +	init_completion(&ret.event);
> > +	bio->bi_private = &ret;
> > +	bio->bi_end_io = submit_bio_wait_endio;
> 
> Hmm. As this is meant to be a generic function, blindly overwriting
> the bi_end_io pointer doesn't look like a good idea; the caller
> could have set something there.
> 
> Please add at least a WARN_ON(bio->bi_end_io) prior to modifying it.

Nah, the general rule with bios is after it's completed anything
could've been modified; we don't document or enforce otherwise with
bi_end_io (and there's a fair amount of code that saves/sets bi_end_io,
and I don't think it all restores the original before calling it).
I'm not going to special case this unless we start documenting/enforcing
it in general.

Besides that, setting a callback on something that's being used
synchronously is just dumb. Personally, I make damn sure to read and
understand code I'm using. I mean, maybe if this restriction was in the
slightest way subtle, but... how else would submit_bio_wait() be
implemented? It's kind of obvious if you think for two seconds about it.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 05/26] md: Convert md_trim_bio() to use bio_advance()
       [not found]     ` <1348526106-17074-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-26  0:38       ` NeilBrown
       [not found]         ` <20120926103827.4d880cf4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: NeilBrown @ 2012-09-26  0:38 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, vgoyal-H+wXaHxf7aLQT0dZR+AlfA

[-- Attachment #1: Type: text/plain, Size: 1698 bytes --]

On Mon, 24 Sep 2012 15:34:45 -0700 Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:

> Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> CC: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
> ---
>  drivers/md/md.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 7a2b079..51ce48c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -190,25 +190,16 @@ void md_trim_bio(struct bio *bio, int offset, int size)
>  	struct bio_vec *bvec;
>  	int sofar = 0;
>  
> -	size <<= 9;
>  	if (offset == 0 && size == bio->bi_size)
>  		return;

Safer to leave the "size <<= 9" where it is, otherwise that condition will
always fail.

Otherwise,
 Acked-by: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>

NeilBrown


>  
> -	bio->bi_sector += offset;
> -	bio->bi_size = size;
> -	offset <<= 9;
>  	clear_bit(BIO_SEG_VALID, &bio->bi_flags);
>  
> -	while (bio->bi_idx < bio->bi_vcnt &&
> -	       bio->bi_io_vec[bio->bi_idx].bv_len <= offset) {
> -		/* remove this whole bio_vec */
> -		offset -= bio->bi_io_vec[bio->bi_idx].bv_len;
> -		bio->bi_idx++;
> -	}
> -	if (bio->bi_idx < bio->bi_vcnt) {
> -		bio->bi_io_vec[bio->bi_idx].bv_offset += offset;
> -		bio->bi_io_vec[bio->bi_idx].bv_len -= offset;
> -	}
> +	bio_advance(bio, offset << 9);
> +
> +	size <<= 9;
> +	bio->bi_size = size;
> +
>  	/* avoid any complications with bi_idx being non-zero*/
>  	if (bio->bi_idx) {
>  		memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 06/26] block: Add bio_end_sector()
       [not found]           ` <20120925220624.GC22647-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-26 15:16             ` Lars Ellenberg
  0 siblings, 0 replies; 69+ messages in thread
From: Lars Ellenberg @ 2012-09-26 15:16 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Chris Mason,
	neilb-l3A5Bk7waGM, Jiri Kosina, Heiko Carstens,
	Martin Schwidefsky, Alasdair Kergon, tj-DgEjT+Ai2ygdnm+yROfE0A,
	Steven Whitehouse, vgoyal-H+wXaHxf7aLQT0dZR+AlfA, Lars Ellenberg

On Tue, Sep 25, 2012 at 03:06:24PM -0700, Kent Overstreet wrote:
> On Tue, Sep 25, 2012 at 01:54:52PM +0200, Lars Ellenberg wrote:
> > On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:
> > > Just a little convenience macro - main reason to add it now is preparing
> > > for immutable bio vecs, it'll reduce the size of the patch that puts
> > > bi_sector/bi_size/bi_idx into a struct bvec_iter.
> > 
> > 
> > For the DRBD part:

> Version below look good?
> 
> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index 01b2ac6..47f55db 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
>  	/* to make some things easier, force alignment of requests within the
>  	 * granularity of our hash tables */
>  	s_enr = bio->bi_sector >> HT_SHIFT;
> -	e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT : s_enr;
> +	e_enr = bio->bi_size ? (bio_end_sector(bio) - 1) >> HT_SHIFT : s_enr;
>  
>  	if (likely(s_enr == e_enr)) {
>  		do {

Sure.

Cheers,

	Lars

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 05/26] md: Convert md_trim_bio() to use bio_advance()
       [not found]         ` <20120926103827.4d880cf4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2012-09-27  4:41           ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-09-27  4:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw, vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Sep 26, 2012 at 10:38:27AM +1000, NeilBrown wrote:
> On Mon, 24 Sep 2012 15:34:45 -0700 Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> wrote:
> 
> > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > CC: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
> > ---
> >  drivers/md/md.c | 19 +++++--------------
> >  1 file changed, 5 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 7a2b079..51ce48c 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -190,25 +190,16 @@ void md_trim_bio(struct bio *bio, int offset, int size)
> >  	struct bio_vec *bvec;
> >  	int sofar = 0;
> >  
> > -	size <<= 9;
> >  	if (offset == 0 && size == bio->bi_size)
> >  		return;
> 
> Safer to leave the "size <<= 9" where it is, otherwise that condition will
> always fail.

Whoops, moving that certainly didn't make sense. Thanks!

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]   ` <1348526106-17074-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-01 21:23     ` Vivek Goyal
       [not found]       ` <20121001212336.GA17165-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-01 21:23 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Mon, Sep 24, 2012 at 03:34:41PM -0700, Kent Overstreet wrote:
> bio_integrity_split() seemed to be confusing pointers and arrays -
> bip_vec in bio_integrity_payload is an array appended to the end of the
> payload, so the bio_vecs in struct bio_pair need to come immediately
> after the bio_integrity_payload they're for, and there was an assignment
> in bio_integrity_split() that didn't make any sense.
> 
> Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/bio-integrity.c  | 3 ---
>  include/linux/bio.h | 6 ++++--
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> index a3f28f3..c7b6b52 100644
> --- a/fs/bio-integrity.c
> +++ b/fs/bio-integrity.c
> @@ -697,9 +697,6 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
>  	bp->iv1 = bip->bip_vec[0];
>  	bp->iv2 = bip->bip_vec[0];
>  
> -	bp->bip1.bip_vec[0] = bp->iv1;
> -	bp->bip2.bip_vec[0] = bp->iv2;
> -
>  	bp->iv1.bv_len = sectors * bi->tuple_size;
>  	bp->iv2.bv_offset += sectors * bi->tuple_size;
>  	bp->iv2.bv_len -= sectors * bi->tuple_size;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index b31036f..8e2d108 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -200,8 +200,10 @@ struct bio_pair {
>  	struct bio			bio1, bio2;
>  	struct bio_vec			bv1, bv2;
>  #if defined(CONFIG_BLK_DEV_INTEGRITY)
> -	struct bio_integrity_payload	bip1, bip2;
> -	struct bio_vec			iv1, iv2;
> +	struct bio_integrity_payload	bip1;
> +	struct bio_vec			iv1;
> +	struct bio_integrity_payload	bip2;
> +	struct bio_vec			iv2;
>  #endif

I think it probably is a good idea to put a comment here so that we
know that certain elements of structure assume ordering.

Also I am wondering that what's the gurantee that there are no padding
bytes between bipi1 and iv1 (or bip2 or iv2). I think if there are padding
bytes then the assumption that bio_vec is always following bip will be
broken?

Also had a general question about split logic. We seem to have only one
global pool for bio pair (bio_split_pool). So in the IO stack if we split
a bio more than once, we have the deadlock possibility again?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]       ` <20121001212336.GA17165-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-01 21:36         ` Kent Overstreet
  2012-10-01 21:42         ` Kent Overstreet
  1 sibling, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-01 21:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Mon, Oct 01, 2012 at 05:23:36PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:41PM -0700, Kent Overstreet wrote:
> > bio_integrity_split() seemed to be confusing pointers and arrays -
> > bip_vec in bio_integrity_payload is an array appended to the end of the
> > payload, so the bio_vecs in struct bio_pair need to come immediately
> > after the bio_integrity_payload they're for, and there was an assignment
> > in bio_integrity_split() that didn't make any sense.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/bio-integrity.c  | 3 ---
> >  include/linux/bio.h | 6 ++++--
> >  2 files changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > index a3f28f3..c7b6b52 100644
> > --- a/fs/bio-integrity.c
> > +++ b/fs/bio-integrity.c
> > @@ -697,9 +697,6 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> >  	bp->iv1 = bip->bip_vec[0];
> >  	bp->iv2 = bip->bip_vec[0];
> >  
> > -	bp->bip1.bip_vec[0] = bp->iv1;
> > -	bp->bip2.bip_vec[0] = bp->iv2;
> > -
> >  	bp->iv1.bv_len = sectors * bi->tuple_size;
> >  	bp->iv2.bv_offset += sectors * bi->tuple_size;
> >  	bp->iv2.bv_len -= sectors * bi->tuple_size;
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index b31036f..8e2d108 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -200,8 +200,10 @@ struct bio_pair {
> >  	struct bio			bio1, bio2;
> >  	struct bio_vec			bv1, bv2;
> >  #if defined(CONFIG_BLK_DEV_INTEGRITY)
> > -	struct bio_integrity_payload	bip1, bip2;
> > -	struct bio_vec			iv1, iv2;
> > +	struct bio_integrity_payload	bip1;
> > +	struct bio_vec			iv1;
> > +	struct bio_integrity_payload	bip2;
> > +	struct bio_vec			iv2;
> >  #endif
> 
> I think it probably is a good idea to put a comment here so that we
> know that certain elements of structure assume ordering.

I'd agree, but I am getting rid of that requirement in the next patch...

> Also I am wondering that what's the gurantee that there are no padding
> bytes between bipi1 and iv1 (or bip2 or iv2). I think if there are padding
> bytes then the assumption that bio_vec is always following bip will be
> broken?

Feh, that is an issue. It wouldn't be an issue if we never referred to
the embedded bvecs - and only referred to bip->bip_inline_vecs - but we
don't.

I'll have to fix that.

> Also had a general question about split logic. We seem to have only one
> global pool for bio pair (bio_split_pool). So in the IO stack if we split
> a bio more than once, we have the deadlock possibility again?

Yes.

I have a fix for that in my patch queue. There's no trivial fix because
the current bio_split implementation requires its own mempool - either
we'd have to add that mempool to struct bio_set (ew, no) or we'd have to
have all the callers also allocate their own bio_pairi mempool.

My approach gets rid of the need for the bio_pair mempool by adding
generic bio chaining, which requires adding a single refcount to struct
bio - bi_remaining, and bio_endio() does an atomic_dec_and_test() on
that refcount. Chaining is also done with a flag indicating that
bi_private points to a bio, instead of a bio_chain_endio function.

A bio_chain_endio() function would be cleaner, but the problem is with
arbitrary and unlimited bio splitting, completing a bio can complete an
unlimited number of splits and use an unbounded amount of stack. (tail
call optimization would be another way of solving that, but building
with frame pointers also disables sibling call optimization so we can't
depend on that).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]       ` <20121001212336.GA17165-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-10-01 21:36         ` Kent Overstreet
@ 2012-10-01 21:42         ` Kent Overstreet
       [not found]           ` <20121001214241.GE26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-10-01 21:42 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Mon, Oct 01, 2012 at 05:23:36PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:41PM -0700, Kent Overstreet wrote:
> > bio_integrity_split() seemed to be confusing pointers and arrays -
> > bip_vec in bio_integrity_payload is an array appended to the end of the
> > payload, so the bio_vecs in struct bio_pair need to come immediately
> > after the bio_integrity_payload they're for, and there was an assignment
> > in bio_integrity_split() that didn't make any sense.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/bio-integrity.c  | 3 ---
> >  include/linux/bio.h | 6 ++++--
> >  2 files changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > index a3f28f3..c7b6b52 100644
> > --- a/fs/bio-integrity.c
> > +++ b/fs/bio-integrity.c
> > @@ -697,9 +697,6 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> >  	bp->iv1 = bip->bip_vec[0];
> >  	bp->iv2 = bip->bip_vec[0];
> >  
> > -	bp->bip1.bip_vec[0] = bp->iv1;
> > -	bp->bip2.bip_vec[0] = bp->iv2;
> > -
> >  	bp->iv1.bv_len = sectors * bi->tuple_size;
> >  	bp->iv2.bv_offset += sectors * bi->tuple_size;
> >  	bp->iv2.bv_len -= sectors * bi->tuple_size;
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index b31036f..8e2d108 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -200,8 +200,10 @@ struct bio_pair {
> >  	struct bio			bio1, bio2;
> >  	struct bio_vec			bv1, bv2;
> >  #if defined(CONFIG_BLK_DEV_INTEGRITY)
> > -	struct bio_integrity_payload	bip1, bip2;
> > -	struct bio_vec			iv1, iv2;
> > +	struct bio_integrity_payload	bip1;
> > +	struct bio_vec			iv1;
> > +	struct bio_integrity_payload	bip2;
> > +	struct bio_vec			iv2;
> >  #endif
> 
> I think it probably is a good idea to put a comment here so that we
> know that certain elements of structure assume ordering.
> 
> Also I am wondering that what's the gurantee that there are no padding
> bytes between bipi1 and iv1 (or bip2 or iv2). I think if there are padding
> bytes then the assumption that bio_vec is always following bip will be
> broken?

Here's the new patch:


commit e270c9ca843b5c86d59431b0d7a676b7846946d6
Author: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Date:   Mon Oct 1 14:41:08 2012 -0700

    block: Fix a buffer overrun in bio_integrity_split()
    
    bio_integrity_split() seemed to be confusing pointers and arrays -
    bip_vec in bio_integrity_payload is an array appended to the end of the
    payload, so the bio_vecs in struct bio_pair need to come immediately
    after the bio_integrity_payload they're for, and there was an assignment
    in bio_integrity_split() that didn't make any sense.
    
    Also, changed bio_integrity_split() to not refer to the bvecs embedded
    in struct bio_pair, in case there's padding between them and
    bip->bip_vec.
    
    Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
    CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
    CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index a3f28f3..4ae22a8 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
 	bp->bio1.bi_integrity = &bp->bip1;
 	bp->bio2.bi_integrity = &bp->bip2;
 
-	bp->iv1 = bip->bip_vec[0];
-	bp->iv2 = bip->bip_vec[0];
+	*bp->bip1.bip_vec = bip->bip_vec[0];
+	*bp->bip2.bip_vec = bip->bip_vec[0];
 
-	bp->bip1.bip_vec[0] = bp->iv1;
-	bp->bip2.bip_vec[0] = bp->iv2;
-
-	bp->iv1.bv_len = sectors * bi->tuple_size;
-	bp->iv2.bv_offset += sectors * bi->tuple_size;
-	bp->iv2.bv_len -= sectors * bi->tuple_size;
+	bp->bip1.bip_vec->bv_len = sectors * bi->tuple_size;
+	bp->bip2.bip_vec->bv_offset += sectors * bi->tuple_size;
+	bp->bip2.bip_vec->bv_len -= sectors * bi->tuple_size;
 
 	bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
 	bp->bip2.bip_sector = bio->bi_integrity->bip_sector + nr_sectors;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b31036f..8e2d108 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -200,8 +200,10 @@ struct bio_pair {
 	struct bio			bio1, bio2;
 	struct bio_vec			bv1, bv2;
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
-	struct bio_integrity_payload	bip1, bip2;
-	struct bio_vec			iv1, iv2;
+	struct bio_integrity_payload	bip1;
+	struct bio_vec			iv1;
+	struct bio_integrity_payload	bip2;
+	struct bio_vec			iv2;
 #endif
 	atomic_t			cnt;
 	int				error;

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]           ` <20121001214241.GE26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 14:08             ` Vivek Goyal
       [not found]               ` <20121002140847.GD758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 14:08 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Mon, Oct 01, 2012 at 02:42:41PM -0700, Kent Overstreet wrote:

[..]
> Here's the new patch:
> 
> 
> commit e270c9ca843b5c86d59431b0d7a676b7846946d6
> Author: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Date:   Mon Oct 1 14:41:08 2012 -0700
> 
>     block: Fix a buffer overrun in bio_integrity_split()
>     
>     bio_integrity_split() seemed to be confusing pointers and arrays -
>     bip_vec in bio_integrity_payload is an array appended to the end of the
>     payload, so the bio_vecs in struct bio_pair need to come immediately
>     after the bio_integrity_payload they're for, and there was an assignment
>     in bio_integrity_split() that didn't make any sense.
>     
>     Also, changed bio_integrity_split() to not refer to the bvecs embedded
>     in struct bio_pair, in case there's padding between them and
>     bip->bip_vec.
>     
>     Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>     CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
>     CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> index a3f28f3..4ae22a8 100644
> --- a/fs/bio-integrity.c
> +++ b/fs/bio-integrity.c
> @@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
>  	bp->bio1.bi_integrity = &bp->bip1;
>  	bp->bio2.bi_integrity = &bp->bip2;
>  
> -	bp->iv1 = bip->bip_vec[0];
> -	bp->iv2 = bip->bip_vec[0];
> +	*bp->bip1.bip_vec = bip->bip_vec[0];
> +	*bp->bip2.bip_vec = bip->bip_vec[0];

I think this is horrible. Why not introduce bvec pointer in bip (like bio),
to cover the case when bvec are not inline.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
  2012-09-24 22:34 ` [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs() Kent Overstreet
@ 2012-10-02 15:12   ` Vivek Goyal
  2012-10-02 20:52     ` Kent Overstreet
       [not found]   ` <1348526106-17074-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 15:12 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, axboe, Martin K. Petersen, tj

On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> This adds a pointer to the bvec array to struct bio_integrity_payload,
> instead of the bvecs always being inline; then the bvecs are allocated
> with bvec_alloc_bs().

Ok, you are introducing bio_vec pointer in this patch. May be we can
do it earlier so that we take care of bio pair related issue.

> 
> Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
> mempool instead of the bioset, so that bio integrity can use a different
> mempool for its bvecs, and thus avoid a potential deadlock.

If you are taking mempool as input, then bvec_alloc_bs() name does not
make sense, as I think the very fact "bs" in the name their suggests
that you are taking a pointer to bio set (and not mempool).

Given the fact that bio_vec pool for integrity inside bio set, why not
take additional boolean/enum argument to function bvec_alloc_bs() and
that argument can tell whehter to allocate bvec from which bvec pool.

> 
> This is eventually for immutable bio vecs - immutable bvecs aren't
> useful if we still have to copy them, hence the need for the pointer.
> Less code is always nice too, though.

Can you explain a bit more about immutable bio vecs. I know there was
some discussion and I missed it. So either point me to right mail thread
or just explain a bit more what are immutable bio vecs, how we are
planning to use these.

Is it about during split we don't want to copy bio vecs and that's why we
need a pointer so that newly created bio/bip can point to parent's bio_vec?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
       [not found]   ` <1348526106-17074-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 15:37     ` Vivek Goyal
  2012-10-02 21:00       ` Kent Overstreet
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 15:37 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:

[..]
>  /**
>   * bio_integrity_alloc - Allocate integrity payload and attach it to bio
>   * @bio:	bio to attach integrity metadata to
> @@ -84,37 +47,39 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
>  						  unsigned int nr_vecs)
>  {
>  	struct bio_integrity_payload *bip;
> -	unsigned int idx = vecs_to_idx(nr_vecs);
>  	struct bio_set *bs = bio->bi_pool;
> +	unsigned long idx = BIO_POOL_NONE;
> +	unsigned inline_vecs;
> +
> +	if (!bs) {
> +		bip = kmalloc(sizeof(struct bio_integrity_payload) +
> +			      sizeof(struct bio_vec) * nr_vecs, gfp_mask);
> +		inline_vecs = nr_vecs;
> +	} else {
> +		bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
> +		inline_vecs = BIP_INLINE_VECS;
> +	}
>  
> -	if (!bs)
> -		bs = fs_bio_set;

Ok, this is change of behavior. Previously we will fall back to fs_bio_set
and now you do kmalloc. This change looks to be independent of bip_vec
pointer. Can you please break it out in a separate patch and also explain
that how does this change help.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 06/26] block: Add bio_end_sector()
       [not found]   ` <1348526106-17074-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-25 11:54     ` Lars Ellenberg
@ 2012-10-02 18:10     ` Vivek Goyal
       [not found]       ` <20121002181001.GB3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 18:10 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Chris Mason, Jiri Kosina,
	Heiko Carstens, Martin Schwidefsky, Alasdair Kergon,
	tj-DgEjT+Ai2ygdnm+yROfE0A, Steven Whitehouse, Lars Ellenberg

On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:

[..]
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 4e32be1..d985e90 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -67,6 +67,7 @@
>  #define bio_offset(bio)		bio_iovec((bio))->bv_offset
>  #define bio_segments(bio)	((bio)->bi_vcnt - (bio)->bi_idx)
>  #define bio_sectors(bio)	((bio)->bi_size >> 9)
> +#define bio_end_sector(bio)	((bio)->bi_sector + bio_sectors(bio))

May be it is just me. But bio_end_sector() kind of sounds that it will
calculate to the last sector of bio. So I thought of it more as
bio_last_sector() and not the sector which is next to the last sector.

Will it make sense to introduce bio_last_sector() and use +1 everywhere.
Or may be we need a better name. Can't think of one though.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 04/26] block: Refactor blk_update_request()
  2012-09-24 22:34 ` [PATCH v3 04/26] block: Refactor blk_update_request() Kent Overstreet
@ 2012-10-02 18:43   ` Vivek Goyal
       [not found]     ` <20121002184359.GC3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-10-02 18:59   ` Vivek Goyal
  1 sibling, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 18:43 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe

On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> Converts it to use bio_advance(), simplifying it quite a bit in the
> process.
> 
> Note that req_bio_endio() now always calls bio_advance() - which means
> it always loops over the biovec, not just on partial completions. Don't
> expect it to affect performance, but worth noting.
> 
> Tested it by forcing partial updates, and dumping before and after on
> various bio/bvec fields when doing a partial update.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> CC: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-core.c | 80 +++++++++-----------------------------------------------
>  1 file changed, 12 insertions(+), 68 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a17869f..a8a1a9e 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
>  	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
>  		error = -EIO;
>  
> -	if (unlikely(nbytes > bio->bi_size)) {
> -		printk(KERN_ERR "%s: want %u bytes done, %u left\n",
> -		       __func__, nbytes, bio->bi_size);
> -		nbytes = bio->bi_size;
> -	}
> -

You are dropping this warning because nobody is calling req_bio_endio()
with bytes greater than bio size in current code?

Otherwise this looks like a good simplification of blk_update_request()
code.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 04/26] block: Refactor blk_update_request()
  2012-09-24 22:34 ` [PATCH v3 04/26] block: Refactor blk_update_request() Kent Overstreet
  2012-10-02 18:43   ` [dm-devel] " Vivek Goyal
@ 2012-10-02 18:59   ` Vivek Goyal
       [not found]     ` <20121002185955.GD3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 18:59 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: tj, axboe, linux-bcache, linux-kernel, dm-devel

On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> Converts it to use bio_advance(), simplifying it quite a bit in the
> process.
> 
> Note that req_bio_endio() now always calls bio_advance() - which means
> it always loops over the biovec, not just on partial completions. Don't
> expect it to affect performance, but worth noting.
> 
> Tested it by forcing partial updates, and dumping before and after on
> various bio/bvec fields when doing a partial update.

Kent,

In general, have you tested all these changes with FLUSH and DISCARD 
bios/requests. Some of the code paths tend to break down because of these
special bios don't have payload.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 09/26] block: Remove bi_idx references
       [not found]   ` <1348526106-17074-10-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 19:24     ` Vivek Goyal
  2012-10-02 20:16       ` Kent Overstreet
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 19:24 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Mon, Sep 24, 2012 at 03:34:49PM -0700, Kent Overstreet wrote:

[..]
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 95e52879..24e5cef 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -3778,7 +3778,6 @@ static int __floppy_read_block_0(struct block_device *bdev)
>  	bio_vec.bv_len = size;
>  	bio_vec.bv_offset = 0;
>  	bio.bi_vcnt = 1;
> -	bio.bi_idx = 0;

So leaving bio.bi_idx uninitialized is safer than setting it to zero? Can
you explain a bit more that why this change is required? 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
       [not found]   ` <1348526106-17074-12-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-25  5:51     ` [dm-devel] " Hannes Reinecke
@ 2012-10-02 19:41     ` Vivek Goyal
       [not found]       ` <20121002194132.GF3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 19:41 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Mon, Sep 24, 2012 at 03:34:51PM -0700, Kent Overstreet wrote:
> Random cleanup - this code was duplicated and it's not really specific
> to md.
> 
> Also added the ability to return the actual error code.

Who is going to make use of actual error code and why checking
BIO_UPTODATE is not sufficient (as existing code is doing)?

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 04/26] block: Refactor blk_update_request()
       [not found]     ` <20121002185955.GD3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:09       ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:09 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 02:59:55PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> > Converts it to use bio_advance(), simplifying it quite a bit in the
> > process.
> > 
> > Note that req_bio_endio() now always calls bio_advance() - which means
> > it always loops over the biovec, not just on partial completions. Don't
> > expect it to affect performance, but worth noting.
> > 
> > Tested it by forcing partial updates, and dumping before and after on
> > various bio/bvec fields when doing a partial update.
> 
> Kent,
> 
> In general, have you tested all these changes with FLUSH and DISCARD 
> bios/requests. Some of the code paths tend to break down because of these
> special bios don't have payload.

Believe so, I should double check - also, I changed the bio_advance()
patch since the one I posted to make it handle DISCARD and WRITE_SAME
requests more explicitly.

(At this level, discards do (sometimes) have payloads, it's just always
a single page)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 04/26] block: Refactor blk_update_request()
       [not found]     ` <20121002184359.GC3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:10       ` Kent Overstreet
  2012-10-02 20:14         ` Vivek Goyal
  0 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 02:43:59PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> > Converts it to use bio_advance(), simplifying it quite a bit in the
> > process.
> > 
> > Note that req_bio_endio() now always calls bio_advance() - which means
> > it always loops over the biovec, not just on partial completions. Don't
> > expect it to affect performance, but worth noting.
> > 
> > Tested it by forcing partial updates, and dumping before and after on
> > various bio/bvec fields when doing a partial update.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > ---
> >  block/blk-core.c | 80 +++++++++-----------------------------------------------
> >  1 file changed, 12 insertions(+), 68 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index a17869f..a8a1a9e 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
> >  	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> >  		error = -EIO;
> >  
> > -	if (unlikely(nbytes > bio->bi_size)) {
> > -		printk(KERN_ERR "%s: want %u bytes done, %u left\n",
> > -		       __func__, nbytes, bio->bi_size);
> > -		nbytes = bio->bi_size;
> > -	}
> > -
> 
> You are dropping this warning because nobody is calling req_bio_endio()
> with bytes greater than bio size in current code?

Not dropping it, just moved it to bio_advance()

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
       [not found]       ` <20121002194132.GF3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:11         ` Kent Overstreet
       [not found]           ` <20121002201105.GL26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-10-04  6:07           ` Hannes Reinecke
  0 siblings, 2 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 03:41:32PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:51PM -0700, Kent Overstreet wrote:
> > Random cleanup - this code was duplicated and it's not really specific
> > to md.
> > 
> > Also added the ability to return the actual error code.
> 
> Who is going to make use of actual error code and why checking
> BIO_UPTODATE is not sufficient (as existing code is doing)?

Some things do, though it's not common and I forget where I saw it -
checking for -ENOTSUPPORTED vs. other stuff

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 04/26] block: Refactor blk_update_request()
  2012-10-02 20:10       ` Kent Overstreet
@ 2012-10-02 20:14         ` Vivek Goyal
       [not found]           ` <20121002201451.GH758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 20:14 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe

On Tue, Oct 02, 2012 at 01:10:14PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 02:43:59PM -0400, Vivek Goyal wrote:
> > On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> > > Converts it to use bio_advance(), simplifying it quite a bit in the
> > > process.
> > > 
> > > Note that req_bio_endio() now always calls bio_advance() - which means
> > > it always loops over the biovec, not just on partial completions. Don't
> > > expect it to affect performance, but worth noting.
> > > 
> > > Tested it by forcing partial updates, and dumping before and after on
> > > various bio/bvec fields when doing a partial update.
> > > 
> > > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > > CC: Jens Axboe <axboe@kernel.dk>
> > > ---
> > >  block/blk-core.c | 80 +++++++++-----------------------------------------------
> > >  1 file changed, 12 insertions(+), 68 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index a17869f..a8a1a9e 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
> > >  	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> > >  		error = -EIO;
> > >  
> > > -	if (unlikely(nbytes > bio->bi_size)) {
> > > -		printk(KERN_ERR "%s: want %u bytes done, %u left\n",
> > > -		       __func__, nbytes, bio->bi_size);
> > > -		nbytes = bio->bi_size;
> > > -	}
> > > -
> > 
> > You are dropping this warning because nobody is calling req_bio_endio()
> > with bytes greater than bio size in current code?
> 
> Not dropping it, just moved it to bio_advance()

bio_advance() is checking bio vec count and idx and not nr_bytes.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 09/26] block: Remove bi_idx references
  2012-10-02 19:24     ` [dm-devel] " Vivek Goyal
@ 2012-10-02 20:16       ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:16 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe

On Tue, Oct 02, 2012 at 03:24:34PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:49PM -0700, Kent Overstreet wrote:
> 
> [..]
> > diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> > index 95e52879..24e5cef 100644
> > --- a/drivers/block/floppy.c
> > +++ b/drivers/block/floppy.c
> > @@ -3778,7 +3778,6 @@ static int __floppy_read_block_0(struct block_device *bdev)
> >  	bio_vec.bv_len = size;
> >  	bio_vec.bv_offset = 0;
> >  	bio.bi_vcnt = 1;
> > -	bio.bi_idx = 0;
> 
> So leaving bio.bi_idx uninitialized is safer than setting it to zero? Can
> you explain a bit more that why this change is required? 

It's not uninitialized, because it just called bio_init() - it was
entirely redundant before.

It's not strictly necessary, but immutable bio vecs require getting rid
of a lot of bi_idx usage, and everything that's left is either
unecessary or better/cleaner done some other way, so it's best to just
get rid of all of them - immutable bvecs changes the semantics of bi_idx
just enough that directly using it tends to be wrong.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
       [not found]           ` <20121002201105.GL26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:16             ` Vivek Goyal
       [not found]               ` <20121002201630.GI758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 20:16 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 01:11:05PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 03:41:32PM -0400, Vivek Goyal wrote:
> > On Mon, Sep 24, 2012 at 03:34:51PM -0700, Kent Overstreet wrote:
> > > Random cleanup - this code was duplicated and it's not really specific
> > > to md.
> > > 
> > > Also added the ability to return the actual error code.
> > 
> > Who is going to make use of actual error code and why checking
> > BIO_UPTODATE is not sufficient (as existing code is doing)?
> 
> Some things do, though it's not common and I forget where I saw it -
> checking for -ENOTSUPPORTED vs. other stuff

May be we can introduce "submit_bio_ret" stuff when we find the actual
user in the series. Justifying code change becomes easier. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 06/26] block: Add bio_end_sector()
       [not found]       ` <20121002181001.GB3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:20         ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Chris Mason, Jiri Kosina,
	Heiko Carstens, Martin Schwidefsky, Alasdair Kergon,
	tj-DgEjT+Ai2ygdnm+yROfE0A, Steven Whitehouse, Lars Ellenberg

On Tue, Oct 02, 2012 at 02:10:01PM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:
> 
> [..]
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 4e32be1..d985e90 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -67,6 +67,7 @@
> >  #define bio_offset(bio)		bio_iovec((bio))->bv_offset
> >  #define bio_segments(bio)	((bio)->bi_vcnt - (bio)->bi_idx)
> >  #define bio_sectors(bio)	((bio)->bi_size >> 9)
> > +#define bio_end_sector(bio)	((bio)->bi_sector + bio_sectors(bio))
> 
> May be it is just me. But bio_end_sector() kind of sounds that it will
> calculate to the last sector of bio. So I thought of it more as
> bio_last_sector() and not the sector which is next to the last sector.
> 
> Will it make sense to introduce bio_last_sector() and use +1 everywhere.
> Or may be we need a better name. Can't think of one though.

Ugh, that sounds like it'd be just begging for fencepost errors. I've
never ran into a situation where I needed bio->bi_sector +
bio_sectors(bio) - 1, either.

I kind of see your point... it seems like there should be a name for
this concept (same as a pointer to the end of an array), but I can't
think of one.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
       [not found]               ` <20121002201630.GI758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:22                 ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 04:16:30PM -0400, Vivek Goyal wrote:
> On Tue, Oct 02, 2012 at 01:11:05PM -0700, Kent Overstreet wrote:
> > On Tue, Oct 02, 2012 at 03:41:32PM -0400, Vivek Goyal wrote:
> > > On Mon, Sep 24, 2012 at 03:34:51PM -0700, Kent Overstreet wrote:
> > > > Random cleanup - this code was duplicated and it's not really specific
> > > > to md.
> > > > 
> > > > Also added the ability to return the actual error code.
> > > 
> > > Who is going to make use of actual error code and why checking
> > > BIO_UPTODATE is not sufficient (as existing code is doing)?
> > 
> > Some things do, though it's not common and I forget where I saw it -
> > checking for -ENOTSUPPORTED vs. other stuff
> 
> May be we can introduce "submit_bio_ret" stuff when we find the actual
> user in the series. Justifying code change becomes easier. 

Eh, IMO as generic code it's just better/more sensible that way;
bio_endio() does pass an actual error code, so the sync version should
pass it up too. Otherwise it's a needless inconsistency.

Honestly I would prefer sticking an error field in struct bio. That'd be
useful for other things, too.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 04/26] block: Refactor blk_update_request()
       [not found]           ` <20121002201451.GH758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:25             ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	axboe-tSWWG44O7X1aa/9Udqfwiw

On Tue, Oct 02, 2012 at 04:14:51PM -0400, Vivek Goyal wrote:
> On Tue, Oct 02, 2012 at 01:10:14PM -0700, Kent Overstreet wrote:
> > On Tue, Oct 02, 2012 at 02:43:59PM -0400, Vivek Goyal wrote:
> > > On Mon, Sep 24, 2012 at 03:34:44PM -0700, Kent Overstreet wrote:
> > > > Converts it to use bio_advance(), simplifying it quite a bit in the
> > > > process.
> > > > 
> > > > Note that req_bio_endio() now always calls bio_advance() - which means
> > > > it always loops over the biovec, not just on partial completions. Don't
> > > > expect it to affect performance, but worth noting.
> > > > 
> > > > Tested it by forcing partial updates, and dumping before and after on
> > > > various bio/bvec fields when doing a partial update.
> > > > 
> > > > Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > > CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > > > ---
> > > >  block/blk-core.c | 80 +++++++++-----------------------------------------------
> > > >  1 file changed, 12 insertions(+), 68 deletions(-)
> > > > 
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index a17869f..a8a1a9e 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -158,20 +158,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
> > > >  	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> > > >  		error = -EIO;
> > > >  
> > > > -	if (unlikely(nbytes > bio->bi_size)) {
> > > > -		printk(KERN_ERR "%s: want %u bytes done, %u left\n",
> > > > -		       __func__, nbytes, bio->bi_size);
> > > > -		nbytes = bio->bi_size;
> > > > -	}
> > > > -
> > > 
> > > You are dropping this warning because nobody is calling req_bio_endio()
> > > with bytes greater than bio size in current code?
> > 
> > Not dropping it, just moved it to bio_advance()
> 
> bio_advance() is checking bio vec count and idx and not nr_bytes.

Whoops, -ENOCOFFEE... I didn't fully read that code fragment.

Yes, req_bio_endio() is only called from one place, and
blk_update_request() never calls it with nbytes > bio->bi_size (and
after the refactor it's more obviously impossible).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]               ` <20121002140847.GD758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:26                 ` Kent Overstreet
       [not found]                   ` <20121002202643.GQ26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-10-02 22:30                   ` Martin K. Petersen
  0 siblings, 2 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 10:08:47AM -0400, Vivek Goyal wrote:
> On Mon, Oct 01, 2012 at 02:42:41PM -0700, Kent Overstreet wrote:
> 
> [..]
> > Here's the new patch:
> > 
> > 
> > commit e270c9ca843b5c86d59431b0d7a676b7846946d6
> > Author: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Date:   Mon Oct 1 14:41:08 2012 -0700
> > 
> >     block: Fix a buffer overrun in bio_integrity_split()
> >     
> >     bio_integrity_split() seemed to be confusing pointers and arrays -
> >     bip_vec in bio_integrity_payload is an array appended to the end of the
> >     payload, so the bio_vecs in struct bio_pair need to come immediately
> >     after the bio_integrity_payload they're for, and there was an assignment
> >     in bio_integrity_split() that didn't make any sense.
> >     
> >     Also, changed bio_integrity_split() to not refer to the bvecs embedded
> >     in struct bio_pair, in case there's padding between them and
> >     bip->bip_vec.
> >     
> >     Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> >     CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> >     CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > 
> > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > index a3f28f3..4ae22a8 100644
> > --- a/fs/bio-integrity.c
> > +++ b/fs/bio-integrity.c
> > @@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> >  	bp->bio1.bi_integrity = &bp->bip1;
> >  	bp->bio2.bi_integrity = &bp->bip2;
> >  
> > -	bp->iv1 = bip->bip_vec[0];
> > -	bp->iv2 = bip->bip_vec[0];
> > +	*bp->bip1.bip_vec = bip->bip_vec[0];
> > +	*bp->bip2.bip_vec = bip->bip_vec[0];
> 
> I think this is horrible. Why not introduce bvec pointer in bip (like bio),
> to cover the case when bvec are not inline.

That's... exactly what the next patch in the series does.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]                   ` <20121002202643.GQ26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 20:32                     ` Vivek Goyal
  2012-10-02 21:01                       ` Kent Overstreet
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 20:32 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 01:26:43PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 10:08:47AM -0400, Vivek Goyal wrote:
> > On Mon, Oct 01, 2012 at 02:42:41PM -0700, Kent Overstreet wrote:
> > 
> > [..]
> > > Here's the new patch:
> > > 
> > > 
> > > commit e270c9ca843b5c86d59431b0d7a676b7846946d6
> > > Author: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > Date:   Mon Oct 1 14:41:08 2012 -0700
> > > 
> > >     block: Fix a buffer overrun in bio_integrity_split()
> > >     
> > >     bio_integrity_split() seemed to be confusing pointers and arrays -
> > >     bip_vec in bio_integrity_payload is an array appended to the end of the
> > >     payload, so the bio_vecs in struct bio_pair need to come immediately
> > >     after the bio_integrity_payload they're for, and there was an assignment
> > >     in bio_integrity_split() that didn't make any sense.
> > >     
> > >     Also, changed bio_integrity_split() to not refer to the bvecs embedded
> > >     in struct bio_pair, in case there's padding between them and
> > >     bip->bip_vec.
> > >     
> > >     Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > >     CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > >     CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > > 
> > > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > > index a3f28f3..4ae22a8 100644
> > > --- a/fs/bio-integrity.c
> > > +++ b/fs/bio-integrity.c
> > > @@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> > >  	bp->bio1.bi_integrity = &bp->bip1;
> > >  	bp->bio2.bi_integrity = &bp->bip2;
> > >  
> > > -	bp->iv1 = bip->bip_vec[0];
> > > -	bp->iv2 = bip->bip_vec[0];
> > > +	*bp->bip1.bip_vec = bip->bip_vec[0];
> > > +	*bp->bip2.bip_vec = bip->bip_vec[0];
> > 
> > I think this is horrible. Why not introduce bvec pointer in bip (like bio),
> > to cover the case when bvec are not inline.
> 
> That's... exactly what the next patch in the series does.

Yes, but if you want to push some of the these bug fixes in stable (as martin
had said), we need to introduce that bip->bio_vec pointer early. Also that
next patch is doing lot other other things like getting rid of bip_slabs
and we don't require all that to fix this particular bug.

In fact I would say that it is beter to fix this blk integrity bug in a
separate patchset so that it can be pushed out earlier.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
  2012-10-02 15:12   ` [dm-devel] " Vivek Goyal
@ 2012-10-02 20:52     ` Kent Overstreet
       [not found]       ` <20121002205249.GR26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 20:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: axboe, Martin K. Petersen, dm-devel, linux-kernel, linux-bcache, tj

On Tue, Oct 02, 2012 at 11:12:02AM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> > This adds a pointer to the bvec array to struct bio_integrity_payload,
> > instead of the bvecs always being inline; then the bvecs are allocated
> > with bvec_alloc_bs().
> 
> Ok, you are introducing bio_vec pointer in this patch. May be we can
> do it earlier so that we take care of bio pair related issue.

I was just trying to make the bugfix patch small, since people
complained about too much stuff being done in one patch.

> > Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
> > mempool instead of the bioset, so that bio integrity can use a different
> > mempool for its bvecs, and thus avoid a potential deadlock.
> 
> If you are taking mempool as input, then bvec_alloc_bs() name does not
> make sense, as I think the very fact "bs" in the name their suggests
> that you are taking a pointer to bio set (and not mempool).
> 
> Given the fact that bio_vec pool for integrity inside bio set, why not
> take additional boolean/enum argument to function bvec_alloc_bs() and
> that argument can tell whehter to allocate bvec from which bvec pool.

Eww, boolean arguments are always bad. 

You are right about bvec_alloc_bs() being misnamed now... honestly it
doesn't bother me, but it should probably just be renamed to
bvec_alloc().

Much prefer the mempool arg to a boolean, though.

> > This is eventually for immutable bio vecs - immutable bvecs aren't
> > useful if we still have to copy them, hence the need for the pointer.
> > Less code is always nice too, though.
> 
> Can you explain a bit more about immutable bio vecs. I know there was
> some discussion and I missed it. So either point me to right mail thread
> or just explain a bit more what are immutable bio vecs, how we are
> planning to use these.

http://article.gmane.org/gmane.linux.kernel.bcache.devel/890

That's my writeup - Tejun and I have been talking about this for ages,
and then Martin Petersen independently brought it up recently and that
was when I decided to just dive in and do it - but they'd probably have
their own ideas about what it's good for. (Tejun wants to reduce the
number of different scatter/gather list implementations).

> Is it about during split we don't want to copy bio vecs and that's why we
> need a pointer so that newly created bio/bip can point to parent's bio_vec?

That's one reason. It enables efficient arbitrary bio splitting, which
enables a whole host of cleanups and simplifications (I've deleted ~1500
lines of code in my tree).

Also, the fact that bv_len and bv_offset are modified is a pain; the
owner of the bio can't use them after the bio has completed, if it
needed to know what memory the bio pointed to it's got to store that
somewhere else. This fixes that.

It's a real pain for error handling in stacking block drivers - if they
want to be able to retry the bio, they have to clone not just the bio
itself but the entire bvec.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
  2012-10-02 15:37     ` Vivek Goyal
@ 2012-10-02 21:00       ` Kent Overstreet
       [not found]         ` <20121002210006.GS26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 21:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache, linux-kernel, dm-devel, axboe, Martin K. Petersen, tj

On Tue, Oct 02, 2012 at 11:37:37AM -0400, Vivek Goyal wrote:
> On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> 
> [..]
> >  /**
> >   * bio_integrity_alloc - Allocate integrity payload and attach it to bio
> >   * @bio:	bio to attach integrity metadata to
> > @@ -84,37 +47,39 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
> >  						  unsigned int nr_vecs)
> >  {
> >  	struct bio_integrity_payload *bip;
> > -	unsigned int idx = vecs_to_idx(nr_vecs);
> >  	struct bio_set *bs = bio->bi_pool;
> > +	unsigned long idx = BIO_POOL_NONE;
> > +	unsigned inline_vecs;
> > +
> > +	if (!bs) {
> > +		bip = kmalloc(sizeof(struct bio_integrity_payload) +
> > +			      sizeof(struct bio_vec) * nr_vecs, gfp_mask);
> > +		inline_vecs = nr_vecs;
> > +	} else {
> > +		bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
> > +		inline_vecs = BIP_INLINE_VECS;
> > +	}
> >  
> > -	if (!bs)
> > -		bs = fs_bio_set;
> 
> Ok, this is change of behavior. Previously we will fall back to fs_bio_set
> and now you do kmalloc. This change looks to be independent of bip_vec
> pointer. Can you please break it out in a separate patch and also explain
> that how does this change help.

I'm not sure it's worth breaking out into a separate patch, but I
definitely should've mentioned it in the patch description.

It just didn't make sense to be using fs_bio_set if a bio_set wasn't
specified before - if a bio set wasn't specified we're still using
kmalloc for the bio_integrity_payload, so we're not protected from
memory allocation failures and it doesn't buy us anything.

All it does is introduce the possibility of deadlock, if we weren't
supposed to be using fs_bio_set for whatever reason.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
  2012-10-02 20:32                     ` Vivek Goyal
@ 2012-10-02 21:01                       ` Kent Overstreet
       [not found]                         ` <20121002210143.GT26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 21:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: axboe, Martin K. Petersen, dm-devel, linux-kernel, linux-bcache, tj

On Tue, Oct 02, 2012 at 04:32:53PM -0400, Vivek Goyal wrote:
> On Tue, Oct 02, 2012 at 01:26:43PM -0700, Kent Overstreet wrote:
> > On Tue, Oct 02, 2012 at 10:08:47AM -0400, Vivek Goyal wrote:
> > > On Mon, Oct 01, 2012 at 02:42:41PM -0700, Kent Overstreet wrote:
> > > 
> > > [..]
> > > > Here's the new patch:
> > > > 
> > > > 
> > > > commit e270c9ca843b5c86d59431b0d7a676b7846946d6
> > > > Author: Kent Overstreet <koverstreet@google.com>
> > > > Date:   Mon Oct 1 14:41:08 2012 -0700
> > > > 
> > > >     block: Fix a buffer overrun in bio_integrity_split()
> > > >     
> > > >     bio_integrity_split() seemed to be confusing pointers and arrays -
> > > >     bip_vec in bio_integrity_payload is an array appended to the end of the
> > > >     payload, so the bio_vecs in struct bio_pair need to come immediately
> > > >     after the bio_integrity_payload they're for, and there was an assignment
> > > >     in bio_integrity_split() that didn't make any sense.
> > > >     
> > > >     Also, changed bio_integrity_split() to not refer to the bvecs embedded
> > > >     in struct bio_pair, in case there's padding between them and
> > > >     bip->bip_vec.
> > > >     
> > > >     Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > > >     CC: Jens Axboe <axboe@kernel.dk>
> > > >     CC: Martin K. Petersen <martin.petersen@oracle.com>
> > > > 
> > > > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > > > index a3f28f3..4ae22a8 100644
> > > > --- a/fs/bio-integrity.c
> > > > +++ b/fs/bio-integrity.c
> > > > @@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> > > >  	bp->bio1.bi_integrity = &bp->bip1;
> > > >  	bp->bio2.bi_integrity = &bp->bip2;
> > > >  
> > > > -	bp->iv1 = bip->bip_vec[0];
> > > > -	bp->iv2 = bip->bip_vec[0];
> > > > +	*bp->bip1.bip_vec = bip->bip_vec[0];
> > > > +	*bp->bip2.bip_vec = bip->bip_vec[0];
> > > 
> > > I think this is horrible. Why not introduce bvec pointer in bip (like bio),
> > > to cover the case when bvec are not inline.
> > 
> > That's... exactly what the next patch in the series does.
> 
> Yes, but if you want to push some of the these bug fixes in stable (as martin
> had said), we need to introduce that bip->bio_vec pointer early. Also that
> next patch is doing lot other other things like getting rid of bip_slabs
> and we don't require all that to fix this particular bug.
> 
> In fact I would say that it is beter to fix this blk integrity bug in a
> separate patchset so that it can be pushed out earlier.

I'm honestly not sure what your complaint about my bugfix patch was -
it's small and complete, it does fix the bug. I don't follow why you
think we need to introduce the bip->bio_vec pointer early...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]                         ` <20121002210143.GT26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 21:58                           ` Vivek Goyal
       [not found]                             ` <20121002215845.GB14471-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 21:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 02:01:43PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 04:32:53PM -0400, Vivek Goyal wrote:
> > On Tue, Oct 02, 2012 at 01:26:43PM -0700, Kent Overstreet wrote:
> > > On Tue, Oct 02, 2012 at 10:08:47AM -0400, Vivek Goyal wrote:
> > > > On Mon, Oct 01, 2012 at 02:42:41PM -0700, Kent Overstreet wrote:
> > > > 
> > > > [..]
> > > > > Here's the new patch:
> > > > > 
> > > > > 
> > > > > commit e270c9ca843b5c86d59431b0d7a676b7846946d6
> > > > > Author: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > > > Date:   Mon Oct 1 14:41:08 2012 -0700
> > > > > 
> > > > >     block: Fix a buffer overrun in bio_integrity_split()
> > > > >     
> > > > >     bio_integrity_split() seemed to be confusing pointers and arrays -
> > > > >     bip_vec in bio_integrity_payload is an array appended to the end of the
> > > > >     payload, so the bio_vecs in struct bio_pair need to come immediately
> > > > >     after the bio_integrity_payload they're for, and there was an assignment
> > > > >     in bio_integrity_split() that didn't make any sense.
> > > > >     
> > > > >     Also, changed bio_integrity_split() to not refer to the bvecs embedded
> > > > >     in struct bio_pair, in case there's padding between them and
> > > > >     bip->bip_vec.
> > > > >     
> > > > >     Signed-off-by: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > > >     CC: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> > > > >     CC: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > > > > 
> > > > > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
> > > > > index a3f28f3..4ae22a8 100644
> > > > > --- a/fs/bio-integrity.c
> > > > > +++ b/fs/bio-integrity.c
> > > > > @@ -694,15 +694,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
> > > > >  	bp->bio1.bi_integrity = &bp->bip1;
> > > > >  	bp->bio2.bi_integrity = &bp->bip2;
> > > > >  
> > > > > -	bp->iv1 = bip->bip_vec[0];
> > > > > -	bp->iv2 = bip->bip_vec[0];
> > > > > +	*bp->bip1.bip_vec = bip->bip_vec[0];
> > > > > +	*bp->bip2.bip_vec = bip->bip_vec[0];
> > > > 
> > > > I think this is horrible. Why not introduce bvec pointer in bip (like bio),
> > > > to cover the case when bvec are not inline.
> > > 
> > > That's... exactly what the next patch in the series does.
> > 
> > Yes, but if you want to push some of the these bug fixes in stable (as martin
> > had said), we need to introduce that bip->bio_vec pointer early. Also that
> > next patch is doing lot other other things like getting rid of bip_slabs
> > and we don't require all that to fix this particular bug.
> > 
> > In fact I would say that it is beter to fix this blk integrity bug in a
> > separate patchset so that it can be pushed out earlier.
> 
> I'm honestly not sure what your complaint about my bugfix patch was -
> it's small and complete, it does fix the bug. I don't follow why you
> think we need to introduce the bip->bio_vec pointer early...

I think having iv1 and iv2 and then not even accessing these using 
bp->iv1 and bp->iv2 is a bad idea even for bugfix.

I have never seen a code which says, hey I have defined two fields in a
struct but, don't access those fields directly(as there might be padding
issues). These fields are just there for blocking a chunk of memory but are
never meant to be accessed directly. I think, that's what my issue is. It
is bad programming (does not matter whether it is bug fix or not).

For your series it probably is still fine as you will overide it pretty
soon but what about stable. Anybody looking at that code might want
to say, hey why not directly initialize bp->iv1 instead of trying to
do *bp->bip1.bip_vec. And everybody will say, yes looks fine and boom
a bug is introduced because we did bad programming.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
       [not found]         ` <20121002210006.GS26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 22:02           ` Vivek Goyal
  0 siblings, 0 replies; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 22:02 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 02:00:06PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 11:37:37AM -0400, Vivek Goyal wrote:
> > On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> > 
> > [..]
> > >  /**
> > >   * bio_integrity_alloc - Allocate integrity payload and attach it to bio
> > >   * @bio:	bio to attach integrity metadata to
> > > @@ -84,37 +47,39 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
> > >  						  unsigned int nr_vecs)
> > >  {
> > >  	struct bio_integrity_payload *bip;
> > > -	unsigned int idx = vecs_to_idx(nr_vecs);
> > >  	struct bio_set *bs = bio->bi_pool;
> > > +	unsigned long idx = BIO_POOL_NONE;
> > > +	unsigned inline_vecs;
> > > +
> > > +	if (!bs) {
> > > +		bip = kmalloc(sizeof(struct bio_integrity_payload) +
> > > +			      sizeof(struct bio_vec) * nr_vecs, gfp_mask);
> > > +		inline_vecs = nr_vecs;
> > > +	} else {
> > > +		bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
> > > +		inline_vecs = BIP_INLINE_VECS;
> > > +	}
> > >  
> > > -	if (!bs)
> > > -		bs = fs_bio_set;
> > 
> > Ok, this is change of behavior. Previously we will fall back to fs_bio_set
> > and now you do kmalloc. This change looks to be independent of bip_vec
> > pointer. Can you please break it out in a separate patch and also explain
> > that how does this change help.
> 
> I'm not sure it's worth breaking out into a separate patch, but I
> definitely should've mentioned it in the patch description.

It helps a lot with reviewing the patches. 
> 
> It just didn't make sense to be using fs_bio_set if a bio_set wasn't
> specified before - if a bio set wasn't specified we're still using
> kmalloc for the bio_integrity_payload, so we're not protected from
> memory allocation failures and it doesn't buy us anything.
> 
> All it does is introduce the possibility of deadlock, if we weren't
> supposed to be using fs_bio_set for whatever reason.

I think more people will be willing to review patches if you break
them down in to small patches with proper explanation. This particular
change is orthogonal to allocating integrity vecs from bioset. So
really think it does make sense to keep this change in a separate
patch with proper changelog.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
       [not found]       ` <20121002205249.GR26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 22:05         ` Vivek Goyal
  2012-10-02 22:17           ` Kent Overstreet
  0 siblings, 1 reply; 69+ messages in thread
From: Vivek Goyal @ 2012-10-02 22:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 01:52:50PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 11:12:02AM -0400, Vivek Goyal wrote:
> > On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> > > This adds a pointer to the bvec array to struct bio_integrity_payload,
> > > instead of the bvecs always being inline; then the bvecs are allocated
> > > with bvec_alloc_bs().
> > 
> > Ok, you are introducing bio_vec pointer in this patch. May be we can
> > do it earlier so that we take care of bio pair related issue.
> 
> I was just trying to make the bugfix patch small, since people
> complained about too much stuff being done in one patch.

Can't we introduce the pointer while we retain bip_slabs as it is. Then
this patch will be small. I think this patch is big only because you
are trying to allocate integrity vecs from bio_set out of line.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
       [not found]                             ` <20121002215845.GB14471-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 22:07                               ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 22:07 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	Martin K. Petersen, tj-DgEjT+Ai2ygdnm+yROfE0A

On Tue, Oct 02, 2012 at 05:58:45PM -0400, Vivek Goyal wrote:
> On Tue, Oct 02, 2012 at 02:01:43PM -0700, Kent Overstreet wrote:
> > I'm honestly not sure what your complaint about my bugfix patch was -
> > it's small and complete, it does fix the bug. I don't follow why you
> > think we need to introduce the bip->bio_vec pointer early...
> 
> I think having iv1 and iv2 and then not even accessing these using 
> bp->iv1 and bp->iv2 is a bad idea even for bugfix.
> 
> I have never seen a code which says, hey I have defined two fields in a
> struct but, don't access those fields directly(as there might be padding
> issues). These fields are just there for blocking a chunk of memory but are
> never meant to be accessed directly. I think, that's what my issue is. It
> is bad programming (does not matter whether it is bug fix or not).
> 
> For your series it probably is still fine as you will overide it pretty
> soon but what about stable. Anybody looking at that code might want
> to say, hey why not directly initialize bp->iv1 instead of trying to
> do *bp->bip1.bip_vec. And everybody will say, yes looks fine and boom
> a bug is introduced because we did bad programming.

Ok. It's definitely a bit weird and unusual, and if I wasn't getting rid
of it in the next patch it would definitely merit a comment.

For stable... wtf would they be making that kind of change for, and
without reading the relevant code?

Eh, maybe I will stick in that comment and take it out in the next
patch.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs()
  2012-10-02 22:05         ` [dm-devel] " Vivek Goyal
@ 2012-10-02 22:17           ` Kent Overstreet
  0 siblings, 0 replies; 69+ messages in thread
From: Kent Overstreet @ 2012-10-02 22:17 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-bcache, linux-kernel, dm-devel, axboe, Martin K. Petersen, tj

On Tue, Oct 02, 2012 at 06:05:53PM -0400, Vivek Goyal wrote:
> On Tue, Oct 02, 2012 at 01:52:50PM -0700, Kent Overstreet wrote:
> > On Tue, Oct 02, 2012 at 11:12:02AM -0400, Vivek Goyal wrote:
> > > On Mon, Sep 24, 2012 at 03:34:42PM -0700, Kent Overstreet wrote:
> > > > This adds a pointer to the bvec array to struct bio_integrity_payload,
> > > > instead of the bvecs always being inline; then the bvecs are allocated
> > > > with bvec_alloc_bs().
> > > 
> > > Ok, you are introducing bio_vec pointer in this patch. May be we can
> > > do it earlier so that we take care of bio pair related issue.
> > 
> > I was just trying to make the bugfix patch small, since people
> > complained about too much stuff being done in one patch.
> 
> Can't we introduce the pointer while we retain bip_slabs as it is. Then
> this patch will be small. I think this patch is big only because you
> are trying to allocate integrity vecs from bio_set out of line.

Ok - yeah, that makes sense to break out. I think I'll just make the
integrity stuff a separate patch series, it's going to be like 4 patches
now.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split()
  2012-10-02 20:26                 ` Kent Overstreet
       [not found]                   ` <20121002202643.GQ26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-02 22:30                   ` Martin K. Petersen
  1 sibling, 0 replies; 69+ messages in thread
From: Martin K. Petersen @ 2012-10-02 22:30 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Vivek Goyal, linux-bcache, linux-kernel, dm-devel, axboe,
	Martin K. Petersen, tj

>>>>> "Kent" == Kent Overstreet <koverstreet@google.com> writes:

>> > + *bp->bip1.bip_vec = bip->bip_vec[0];
>> > + *bp->bip2.bip_vec = bip->bip_vec[0];
>> 
>> I think this is horrible.

Yep.


>> Why not introduce bvec pointer in bip (like bio), to cover the case
>> when bvec are not inline.

Kent> That's... exactly what the next patch in the series does.

I'm perfectly ok with a patch that introduces the pointer and fixes the
bio_pair case. As long as that's all it does.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md
  2012-10-02 20:11         ` Kent Overstreet
       [not found]           ` <20121002201105.GL26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-10-04  6:07           ` Hannes Reinecke
  1 sibling, 0 replies; 69+ messages in thread
From: Hannes Reinecke @ 2012-10-04  6:07 UTC (permalink / raw)
  To: device-mapper development
  Cc: Kent Overstreet, Vivek Goyal, tj, axboe, linux-bcache, linux-kernel

On 10/02/2012 10:11 PM, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 03:41:32PM -0400, Vivek Goyal wrote:
>> On Mon, Sep 24, 2012 at 03:34:51PM -0700, Kent Overstreet wrote:
>>> Random cleanup - this code was duplicated and it's not really specific
>>> to md.
>>>
>>> Also added the ability to return the actual error code.
>>
>> Who is going to make use of actual error code and why checking
>> BIO_UPTODATE is not sufficient (as existing code is doing)?
>
> Some things do, though it's not common and I forget where I saw it -
> checking for -ENOTSUPPORTED vs. other stuff
>
There are also additional error codes like -ENOLINK when the nexus 
is dropped. Quite useful in multipathed or clustered environs.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2012-10-04  6:07 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-24 22:34 [PATCH v3 00/26] Prep work for immutable bio vecs Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 01/26] block: Fix a buffer overrun in bio_integrity_split() Kent Overstreet
     [not found]   ` <1348526106-17074-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-01 21:23     ` [dm-devel] " Vivek Goyal
     [not found]       ` <20121001212336.GA17165-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-01 21:36         ` Kent Overstreet
2012-10-01 21:42         ` Kent Overstreet
     [not found]           ` <20121001214241.GE26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 14:08             ` Vivek Goyal
     [not found]               ` <20121002140847.GD758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:26                 ` Kent Overstreet
     [not found]                   ` <20121002202643.GQ26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:32                     ` Vivek Goyal
2012-10-02 21:01                       ` Kent Overstreet
     [not found]                         ` <20121002210143.GT26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 21:58                           ` [dm-devel] " Vivek Goyal
     [not found]                             ` <20121002215845.GB14471-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 22:07                               ` Kent Overstreet
2012-10-02 22:30                   ` Martin K. Petersen
2012-09-24 22:34 ` [PATCH v3 02/26] block: Convert integrity to bvec_alloc_bs() Kent Overstreet
2012-10-02 15:12   ` [dm-devel] " Vivek Goyal
2012-10-02 20:52     ` Kent Overstreet
     [not found]       ` <20121002205249.GR26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 22:05         ` [dm-devel] " Vivek Goyal
2012-10-02 22:17           ` Kent Overstreet
     [not found]   ` <1348526106-17074-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 15:37     ` Vivek Goyal
2012-10-02 21:00       ` Kent Overstreet
     [not found]         ` <20121002210006.GS26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 22:02           ` Vivek Goyal
2012-09-24 22:34 ` [PATCH v3 03/26] block: Add bio_advance() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 04/26] block: Refactor blk_update_request() Kent Overstreet
2012-10-02 18:43   ` [dm-devel] " Vivek Goyal
     [not found]     ` <20121002184359.GC3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:10       ` Kent Overstreet
2012-10-02 20:14         ` Vivek Goyal
     [not found]           ` <20121002201451.GH758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:25             ` Kent Overstreet
2012-10-02 18:59   ` Vivek Goyal
     [not found]     ` <20121002185955.GD3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:09       ` [dm-devel] " Kent Overstreet
     [not found] ` <1348526106-17074-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-24 22:34   ` [PATCH v3 05/26] md: Convert md_trim_bio() to use bio_advance() Kent Overstreet
     [not found]     ` <1348526106-17074-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-26  0:38       ` NeilBrown
     [not found]         ` <20120926103827.4d880cf4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2012-09-27  4:41           ` Kent Overstreet
2012-09-24 22:35   ` [PATCH v3 23/26] block: Add bio_alloc_pages() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 06/26] block: Add bio_end_sector() Kent Overstreet
     [not found]   ` <1348526106-17074-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-25 11:54     ` Lars Ellenberg
     [not found]       ` <20120925115452.GF8143-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
2012-09-25 22:06         ` [Drbd-dev] " Kent Overstreet
     [not found]           ` <20120925220624.GC22647-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-26 15:16             ` Lars Ellenberg
2012-10-02 18:10     ` [dm-devel] " Vivek Goyal
     [not found]       ` <20121002181001.GB3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:20         ` Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 07/26] block: Use bio_sectors() more consistently Kent Overstreet
     [not found]   ` <1348526106-17074-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-24 23:04     ` Jim Paris
     [not found]       ` <20120924230449.GA2040-SRSuHwkuBJlaX0KmTac7FA@public.gmane.org>
2012-09-24 23:09         ` Kent Overstreet
2012-09-25  0:54   ` Ed Cashin
2012-09-24 22:34 ` [PATCH v3 08/26] block: Change bio_split() to respect the current value of bi_idx Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 09/26] block: Remove bi_idx references Kent Overstreet
     [not found]   ` <1348526106-17074-10-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 19:24     ` [dm-devel] " Vivek Goyal
2012-10-02 20:16       ` Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 10/26] block: Remove some unnecessary bi_vcnt usage Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md Kent Overstreet
     [not found]   ` <1348526106-17074-12-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-25  5:51     ` [dm-devel] " Hannes Reinecke
2012-09-25 22:15       ` Kent Overstreet
2012-10-02 19:41     ` Vivek Goyal
     [not found]       ` <20121002194132.GF3283-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:11         ` Kent Overstreet
     [not found]           ` <20121002201105.GL26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:16             ` Vivek Goyal
     [not found]               ` <20121002201630.GI758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-02 20:22                 ` Kent Overstreet
2012-10-04  6:07           ` Hannes Reinecke
2012-09-24 22:34 ` [PATCH v3 12/26] raid10: Use bio_reset() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 13/26] raid1: use bio_reset() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 14/26] raid5: " Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 15/26] raid1: Refactor narrow_write_error() to not use bi_idx Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 16/26] block: Add bio_copy_data() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 17/26] pktcdvd: use bio_copy_data() Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 18/26] pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage Kent Overstreet
2012-09-24 22:34 ` [PATCH v3 19/26] raid1: use bio_copy_data() Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 20/26] bounce: Refactor __blk_queue_bounce to not use bi_io_vec Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 21/26] block: Add bio_for_each_segment_all() Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 22/26] block: Convert some code to bio_for_each_segment_all() Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 24/26] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 25/26] bio-integrity: Add explicit field for owner of bip_buf Kent Overstreet
2012-09-24 22:35 ` [PATCH v3 26/26] block: Add BIO_SUBMITTED flag, kill BIO_CLONED Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).