linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] Block cleanups
@ 2012-07-24 20:11 Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 01/12] block: Generalized bio pool freeing Kent Overstreet
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

CHANGES SINCE LAST VERSION:

Split off the "make generic_make_request() handle arbitrary size bios" - this
patch series now just kills bi_destructor and reworks bio splitting.

Split out the "rework bio splitting" patch based on Boaz's feedback - now I've
got one patch that introduces the new bio_split() and another that rewrites
bio_pair_split(), which makes the patches much more readable.

Going to try and finish off the rest of the patches that get rid of
merge_bvec_fn, but that'll be a separate patch series.

Kent Overstreet (12):
  block: Generalized bio pool freeing
  dm: Use bioset's front_pad for dm_rq_clone_bio_info
  block: Add bio_reset()
  pktcdvd: Switch to bio_kmalloc()
  block: Kill bi_destructor
  block: Add an explicit bio flag for bios that own their bvec
  block: Rename bio_split() -> bio_pair_split()
  block: Introduce new bio_split()
  block: Rework bio_pair_split()
  block: Add bio_clone_kmalloc()
  block: Add bio_clone_bioset()
  block: Only clone bio vecs that are in use

 Documentation/block/biodoc.txt      |    5 -
 block/blk-core.c                    |   10 +-
 drivers/block/drbd/drbd_main.c      |   13 +--
 drivers/block/drbd/drbd_req.c       |   18 +--
 drivers/block/osdblk.c              |    3 +-
 drivers/block/pktcdvd.c             |  121 ++++++-----------
 drivers/block/rbd.c                 |    8 +-
 drivers/md/dm-crypt.c               |    9 --
 drivers/md/dm-io.c                  |   11 --
 drivers/md/dm.c                     |   60 ++-------
 drivers/md/linear.c                 |    6 +-
 drivers/md/md.c                     |   44 +------
 drivers/md/raid0.c                  |    8 +-
 drivers/md/raid10.c                 |   23 +---
 drivers/target/target_core_iblock.c |    9 --
 fs/bio-integrity.c                  |   44 ------
 fs/bio.c                            |  264 ++++++++++++++++++++++++-----------
 fs/exofs/ore.c                      |    5 +-
 include/linux/bio.h                 |   37 ++---
 include/linux/blk_types.h           |    9 +-
 20 files changed, 281 insertions(+), 426 deletions(-)

-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 01/12] block: Generalized bio pool freeing
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 11:06   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/md/dm-crypt.c               |    9 ---------
 drivers/md/dm-io.c                  |   11 -----------
 drivers/md/dm.c                     |   20 --------------------
 drivers/md/md.c                     |   28 ++++------------------------
 drivers/target/target_core_iblock.c |   15 +++++++--------
 fs/bio.c                            |   26 ++++++++------------------
 include/linux/blk_types.h           |    3 +++
 7 files changed, 22 insertions(+), 90 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3f06df5..40716d8 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -808,14 +808,6 @@ static int crypt_convert(struct crypt_config *cc,
 	return 0;
 }
 
-static void dm_crypt_bio_destructor(struct bio *bio)
-{
-	struct dm_crypt_io *io = bio->bi_private;
-	struct crypt_config *cc = io->target->private;
-
-	bio_free(bio, cc->bs);
-}
-
 /*
  * Generate a new unfragmented bio with the given size
  * This should never violate the device limitations
@@ -985,7 +977,6 @@ static void clone_init(struct dm_crypt_io *io, struct bio *clone)
 	clone->bi_end_io  = crypt_endio;
 	clone->bi_bdev    = cc->dev->bdev;
 	clone->bi_rw      = io->base_bio->bi_rw;
-	clone->bi_destructor = dm_crypt_bio_destructor;
 }
 
 static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index ea5dd28..1c46f97 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -249,16 +249,6 @@ static void vm_dp_init(struct dpages *dp, void *data)
 	dp->context_ptr = data;
 }
 
-static void dm_bio_destructor(struct bio *bio)
-{
-	unsigned region;
-	struct io *io;
-
-	retrieve_io_and_region_from_bio(bio, &io, &region);
-
-	bio_free(bio, io->client->bios);
-}
-
 /*
  * Functions for getting the pages from kernel memory.
  */
@@ -317,7 +307,6 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
 		bio->bi_sector = where->sector + (where->count - remaining);
 		bio->bi_bdev = where->bdev;
 		bio->bi_end_io = endio;
-		bio->bi_destructor = dm_bio_destructor;
 		store_io_and_region_in_bio(bio, io, region);
 
 		if (rw & REQ_DISCARD) {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e24143c..40b7735 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -681,11 +681,6 @@ static void clone_endio(struct bio *bio, int error)
 		}
 	}
 
-	/*
-	 * Store md for cleanup instead of tio which is about to get freed.
-	 */
-	bio->bi_private = md->bs;
-
 	free_tio(md, tio);
 	bio_put(bio);
 	dec_pending(io, error);
@@ -1013,11 +1008,6 @@ static void __map_bio(struct dm_target *ti, struct bio *clone,
 		/* error the io and bail out, or requeue it if needed */
 		md = tio->io->md;
 		dec_pending(tio->io, r);
-		/*
-		 * Store bio_set for cleanup.
-		 */
-		clone->bi_end_io = NULL;
-		clone->bi_private = md->bs;
 		bio_put(clone);
 		free_tio(md, tio);
 	} else if (r) {
@@ -1036,13 +1026,6 @@ struct clone_info {
 	unsigned short idx;
 };
 
-static void dm_bio_destructor(struct bio *bio)
-{
-	struct bio_set *bs = bio->bi_private;
-
-	bio_free(bio, bs);
-}
-
 /*
  * Creates a little bio that just does part of a bvec.
  */
@@ -1054,7 +1037,6 @@ static struct bio *split_bvec(struct bio *bio, sector_t sector,
 	struct bio_vec *bv = bio->bi_io_vec + idx;
 
 	clone = bio_alloc_bioset(GFP_NOIO, 1, bs);
-	clone->bi_destructor = dm_bio_destructor;
 	*clone->bi_io_vec = *bv;
 
 	clone->bi_sector = sector;
@@ -1086,7 +1068,6 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
 
 	clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
 	__bio_clone(clone, bio);
-	clone->bi_destructor = dm_bio_destructor;
 	clone->bi_sector = sector;
 	clone->bi_idx = idx;
 	clone->bi_vcnt = idx + bv_count;
@@ -1131,7 +1112,6 @@ static void __issue_target_request(struct clone_info *ci, struct dm_target *ti,
 	 */
 	clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
 	__bio_clone(clone, ci->bio);
-	clone->bi_destructor = dm_bio_destructor;
 	if (len) {
 		clone->bi_sector = ci->sector;
 		clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d5ab449..f9d16dc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -155,32 +155,17 @@ static int start_readonly;
  * like bio_clone, but with a local bio set
  */
 
-static void mddev_bio_destructor(struct bio *bio)
-{
-	struct mddev *mddev, **mddevp;
-
-	mddevp = (void*)bio;
-	mddev = mddevp[-1];
-
-	bio_free(bio, mddev->bio_set);
-}
-
 struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
 			    struct mddev *mddev)
 {
 	struct bio *b;
-	struct mddev **mddevp;
 
 	if (!mddev || !mddev->bio_set)
 		return bio_alloc(gfp_mask, nr_iovecs);
 
-	b = bio_alloc_bioset(gfp_mask, nr_iovecs,
-			     mddev->bio_set);
+	b = bio_alloc_bioset(gfp_mask, nr_iovecs, mddev->bio_set);
 	if (!b)
 		return NULL;
-	mddevp = (void*)b;
-	mddevp[-1] = mddev;
-	b->bi_destructor = mddev_bio_destructor;
 	return b;
 }
 EXPORT_SYMBOL_GPL(bio_alloc_mddev);
@@ -189,18 +174,14 @@ struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
 			    struct mddev *mddev)
 {
 	struct bio *b;
-	struct mddev **mddevp;
 
 	if (!mddev || !mddev->bio_set)
 		return bio_clone(bio, gfp_mask);
 
-	b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs,
-			     mddev->bio_set);
+	b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
 	if (!b)
 		return NULL;
-	mddevp = (void*)b;
-	mddevp[-1] = mddev;
-	b->bi_destructor = mddev_bio_destructor;
+
 	__bio_clone(b, bio);
 	if (bio_integrity(bio)) {
 		int ret;
@@ -5056,8 +5037,7 @@ int md_run(struct mddev *mddev)
 	}
 
 	if (mddev->bio_set == NULL)
-		mddev->bio_set = bioset_create(BIO_POOL_SIZE,
-					       sizeof(struct mddev *));
+		mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0);
 
 	spin_lock(&pers_lock);
 	pers = find_pers(mddev->level, mddev->clevel);
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index fd47950..be65582 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -447,14 +447,6 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
 	kfree(ibr);
 }
 
-static void iblock_bio_destructor(struct bio *bio)
-{
-	struct se_cmd *cmd = bio->bi_private;
-	struct iblock_dev *ib_dev = cmd->se_dev->dev_ptr;
-
-	bio_free(bio, ib_dev->ibd_bio_set);
-}
-
 static struct bio *
 iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
 {
@@ -475,8 +467,15 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
 	}
 
 	bio->bi_bdev = ib_dev->ibd_bd;
+<<<<<<< HEAD
 	bio->bi_private = cmd;
 	bio->bi_destructor = iblock_bio_destructor;
+||||||| merged common ancestors
+	bio->bi_private = task;
+	bio->bi_destructor = iblock_bio_destructor;
+=======
+	bio->bi_private = task;
+>>>>>>> block: Generalized bio pool freeing
 	bio->bi_end_io = &iblock_bio_done;
 	bio->bi_sector = lba;
 	return bio;
diff --git a/fs/bio.c b/fs/bio.c
index 73922ab..1c6c8b7 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -271,10 +271,6 @@ EXPORT_SYMBOL(bio_init);
  *   bio_alloc_bioset will try its own mempool to satisfy the allocation.
  *   If %__GFP_WAIT is set then we will block on the internal pool waiting
  *   for a &struct bio to become free.
- *
- *   Note that the caller must set ->bi_destructor on successful return
- *   of a bio, to do the appropriate freeing of the bio once the reference
- *   count drops to zero.
  **/
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
@@ -289,6 +285,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 	bio = p + bs->front_pad;
 
 	bio_init(bio);
+	bio->bi_pool = bs;
 
 	if (unlikely(!nr_iovecs))
 		goto out_set;
@@ -315,11 +312,6 @@ err_free:
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
 
-static void bio_fs_destructor(struct bio *bio)
-{
-	bio_free(bio, fs_bio_set);
-}
-
 /**
  *	bio_alloc - allocate a new bio, memory pool backed
  *	@gfp_mask: allocation mask to use
@@ -341,12 +333,7 @@ static void bio_fs_destructor(struct bio *bio)
  */
 struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
-	struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
-
-	if (bio)
-		bio->bi_destructor = bio_fs_destructor;
-
-	return bio;
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
 }
 EXPORT_SYMBOL(bio_alloc);
 
@@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
 	if (atomic_dec_and_test(&bio->bi_cnt)) {
 		bio_disassociate_task(bio);
 		bio->bi_next = NULL;
-		bio->bi_destructor(bio);
+
+		if (bio->bi_pool)
+			bio_free(bio, bio->bi_pool);
+		else
+			bio->bi_destructor(bio);
 	}
 }
 EXPORT_SYMBOL(bio_put);
@@ -473,12 +464,11 @@ EXPORT_SYMBOL(__bio_clone);
  */
 struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 {
-	struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, fs_bio_set);
+	struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
 
 	if (!b)
 		return NULL;
 
-	b->bi_destructor = bio_fs_destructor;
 	__bio_clone(b, bio);
 
 	if (bio_integrity(bio)) {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0edb65d..293547e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -80,6 +80,9 @@ struct bio {
 	struct bio_integrity_payload *bi_integrity;  /* data integrity */
 #endif
 
+	/* If bi_pool is non NULL, bi_destructor is not called */
+	struct bio_set		*bi_pool;
+
 	bio_destructor_t	*bi_destructor;	/* destructor */
 
 	/*
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 01/12] block: Generalized bio pool freeing Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 03/12] block: Add bio_reset() Kent Overstreet
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/md/dm.c |   31 +++++--------------------------
 1 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 40b7735..4014696 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -92,6 +92,7 @@ struct dm_rq_target_io {
 struct dm_rq_clone_bio_info {
 	struct bio *orig;
 	struct dm_rq_target_io *tio;
+	struct bio clone;
 };
 
 union map_info *dm_get_mapinfo(struct bio *bio)
@@ -467,16 +468,6 @@ static void free_rq_tio(struct dm_rq_target_io *tio)
 	mempool_free(tio, tio->md->tio_pool);
 }
 
-static struct dm_rq_clone_bio_info *alloc_bio_info(struct mapped_device *md)
-{
-	return mempool_alloc(md->io_pool, GFP_ATOMIC);
-}
-
-static void free_bio_info(struct dm_rq_clone_bio_info *info)
-{
-	mempool_free(info, info->tio->md->io_pool);
-}
-
 static int md_in_flight(struct mapped_device *md)
 {
 	return atomic_read(&md->pending[READ]) +
@@ -1438,30 +1429,17 @@ void dm_dispatch_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(dm_dispatch_request);
 
-static void dm_rq_bio_destructor(struct bio *bio)
-{
-	struct dm_rq_clone_bio_info *info = bio->bi_private;
-	struct mapped_device *md = info->tio->md;
-
-	free_bio_info(info);
-	bio_free(bio, md->bs);
-}
-
 static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
 				 void *data)
 {
 	struct dm_rq_target_io *tio = data;
-	struct mapped_device *md = tio->md;
-	struct dm_rq_clone_bio_info *info = alloc_bio_info(md);
-
-	if (!info)
-		return -ENOMEM;
+	struct dm_rq_clone_bio_info *info =
+		container_of(bio, struct dm_rq_clone_bio_info, clone);
 
 	info->orig = bio_orig;
 	info->tio = tio;
 	bio->bi_end_io = end_clone_bio;
 	bio->bi_private = info;
-	bio->bi_destructor = dm_rq_bio_destructor;
 
 	return 0;
 }
@@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
 	if (!pools->tio_pool)
 		goto free_io_pool_and_out;
 
-	pools->bs = bioset_create(pool_size, 0);
+	pools->bs = bioset_create(pool_size,
+				  offsetof(struct dm_rq_clone_bio_info, orig));
 	if (!pools->bs)
 		goto free_tio_pool_and_out;
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 03/12] block: Add bio_reset()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 01/12] block: Generalized bio pool freeing Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 11:19   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 fs/bio.c                  |   10 ++++++++++
 include/linux/bio.h       |    1 +
 include/linux/blk_types.h |    6 ++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 1c6c8b7..252e253 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -261,6 +261,16 @@ void bio_init(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_init);
 
+void bio_reset(struct bio *bio)
+{
+	/* Clear all flags below BIO_OWNS_VEC */
+	unsigned long flags = bio->bi_flags & (~0UL << BIO_OWNS_VEC);
+
+	memset(bio, 0, BIO_RESET_BYTES);
+	bio->bi_flags = flags|(1 << BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2643589..ba60319 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
 extern struct bio *bio_clone(struct bio *, gfp_t);
 
 extern void bio_init(struct bio *);
+extern void bio_reset(struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 293547e..40411e2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -59,6 +59,10 @@ struct bio {
 	unsigned int		bi_seg_front_size;
 	unsigned int		bi_seg_back_size;
 
+	/*
+	 * Everything starting with bi_max_vecs will be preserved by bio_reset()
+	 */
+
 	unsigned int		bi_max_vecs;	/* max bvl_vecs we can hold */
 
 	atomic_t		bi_cnt;		/* pin count */
@@ -93,6 +97,8 @@ struct bio {
 	struct bio_vec		bi_inline_vecs[0];
 };
 
+#define BIO_RESET_BYTES		offsetof(struct bio, bi_max_vecs)
+
 /*
  * bio flags
  */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (2 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 03/12] block: Add bio_reset() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 11:29   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 05/12] block: Kill bi_destructor Kent Overstreet
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/block/pktcdvd.c |  115 ++++++++++++++++-------------------------------
 1 files changed, 39 insertions(+), 76 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..6fe693a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -522,36 +522,38 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
 	}
 }
 
-static void pkt_bio_destructor(struct bio *bio)
+static void pkt_end_io_read(struct bio *bio, int err)
 {
-	kfree(bio->bi_io_vec);
-	kfree(bio);
-}
+	struct packet_data *pkt = bio->bi_private;
+	struct pktcdvd_device *pd = pkt->pd;
+	BUG_ON(!pd);
 
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
-	struct bio_vec *bvl = NULL;
-	struct bio *bio;
+	VPRINTK("pkt_end_io_read: bio=%p sec0=%llx sec=%llx err=%d\n", bio,
+		(unsigned long long)pkt->sector, (unsigned long long)bio->bi_sector, err);
 
-	bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
-	if (!bio)
-		goto no_bio;
-	bio_init(bio);
+	if (err)
+		atomic_inc(&pkt->io_errors);
+	if (atomic_dec_and_test(&pkt->io_wait)) {
+		atomic_inc(&pkt->run_sm);
+		wake_up(&pd->wqueue);
+	}
+	pkt_bio_finished(pd);
+}
 
-	bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
-	if (!bvl)
-		goto no_bvl;
+static void pkt_end_io_packet_write(struct bio *bio, int err)
+{
+	struct packet_data *pkt = bio->bi_private;
+	struct pktcdvd_device *pd = pkt->pd;
+	BUG_ON(!pd);
 
-	bio->bi_max_vecs = nr_iovecs;
-	bio->bi_io_vec = bvl;
-	bio->bi_destructor = pkt_bio_destructor;
+	VPRINTK("pkt_end_io_packet_write: id=%d, err=%d\n", pkt->id, err);
 
-	return bio;
+	pd->stats.pkt_ended++;
 
- no_bvl:
-	kfree(bio);
- no_bio:
-	return NULL;
+	pkt_bio_finished(pd);
+	atomic_dec(&pkt->io_wait);
+	atomic_inc(&pkt->run_sm);
+	wake_up(&pd->wqueue);
 }
 
 /*
@@ -567,10 +569,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
 		goto no_pkt;
 
 	pkt->frames = frames;
-	pkt->w_bio = pkt_bio_alloc(frames);
+	pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
 	if (!pkt->w_bio)
 		goto no_bio;
 
+	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
+	pkt->w_bio->bi_private = pkt;
+
 	for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
 		pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
 		if (!pkt->pages[i])
@@ -581,9 +586,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
 	bio_list_init(&pkt->orig_bios);
 
 	for (i = 0; i < frames; i++) {
-		struct bio *bio = pkt_bio_alloc(1);
+		struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
 		if (!bio)
 			goto no_rd_bio;
+
+		bio->bi_end_io = pkt_end_io_read;
+		bio->bi_private = pkt;
 		pkt->r_bios[i] = bio;
 	}
 
@@ -1036,40 +1044,6 @@ static void pkt_make_local_copy(struct packet_data *pkt, struct bio_vec *bvec)
 	}
 }
 
-static void pkt_end_io_read(struct bio *bio, int err)
-{
-	struct packet_data *pkt = bio->bi_private;
-	struct pktcdvd_device *pd = pkt->pd;
-	BUG_ON(!pd);
-
-	VPRINTK("pkt_end_io_read: bio=%p sec0=%llx sec=%llx err=%d\n", bio,
-		(unsigned long long)pkt->sector, (unsigned long long)bio->bi_sector, err);
-
-	if (err)
-		atomic_inc(&pkt->io_errors);
-	if (atomic_dec_and_test(&pkt->io_wait)) {
-		atomic_inc(&pkt->run_sm);
-		wake_up(&pd->wqueue);
-	}
-	pkt_bio_finished(pd);
-}
-
-static void pkt_end_io_packet_write(struct bio *bio, int err)
-{
-	struct packet_data *pkt = bio->bi_private;
-	struct pktcdvd_device *pd = pkt->pd;
-	BUG_ON(!pd);
-
-	VPRINTK("pkt_end_io_packet_write: id=%d, err=%d\n", pkt->id, err);
-
-	pd->stats.pkt_ended++;
-
-	pkt_bio_finished(pd);
-	atomic_dec(&pkt->io_wait);
-	atomic_inc(&pkt->run_sm);
-	wake_up(&pd->wqueue);
-}
-
 /*
  * Schedule reads for the holes in a packet
  */
@@ -1111,21 +1085,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 	 * Schedule reads for missing parts of the packet.
 	 */
 	for (f = 0; f < pkt->frames; f++) {
-		struct bio_vec *vec;
-
 		int p, offset;
+
 		if (written[f])
 			continue;
+
 		bio = pkt->r_bios[f];
-		vec = bio->bi_io_vec;
-		bio_init(bio);
-		bio->bi_max_vecs = 1;
-		bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
-		bio->bi_bdev = pd->bdev;
-		bio->bi_end_io = pkt_end_io_read;
-		bio->bi_private = pkt;
-		bio->bi_io_vec = vec;
-		bio->bi_destructor = pkt_bio_destructor;
+		bio_reset(bio);
+		bio->bi_sector	= pkt->sector + f * (CD_FRAMESIZE >> 9);
+		bio->bi_bdev	= pd->bdev;
 
 		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
 		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1386,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	}
 
 	/* Start the write request */
-	bio_init(pkt->w_bio);
-	pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
+	bio_reset(pkt->w_bio);
 	pkt->w_bio->bi_sector = pkt->sector;
 	pkt->w_bio->bi_bdev = pd->bdev;
-	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
-	pkt->w_bio->bi_private = pkt;
-	pkt->w_bio->bi_io_vec = bvec;
-	pkt->w_bio->bi_destructor = pkt_bio_destructor;
 	for (f = 0; f < pkt->frames; f++)
 		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
 			BUG();
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 05/12] block: Kill bi_destructor
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (3 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 11:39   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 06/12] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This also changes the semantics of bio_free() a bit - it now also frees
bios allocated by bio_kmalloc(). It's also no longer exported, as
without bi_destructor there should be no need for it to be called
anywhere else.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 Documentation/block/biodoc.txt      |    5 -----
 block/blk-core.c                    |    2 +-
 drivers/block/drbd/drbd_main.c      |   13 +------------
 drivers/target/target_core_iblock.c |    8 --------
 fs/bio.c                            |   32 +++++++++++++++++---------------
 include/linux/bio.h                 |    2 +-
 include/linux/blk_types.h           |    3 ---
 7 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0..8df5e8e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
        bio_end_io_t	*bi_end_io;  /* bi_end_io (bio) */
        atomic_t		bi_cnt;	     /* pin count: free when it hits zero */
        void             *bi_private;
-       bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
 };
 
 With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
 so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
 given size from these slabs.
 
-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
-
 The bio_get() routine may be used to hold an extra reference on a bio prior
 to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/block/blk-core.c b/block/blk-core.c
index 93eb3e4..e9058c2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2794,7 +2794,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 
 free_and_out:
 	if (bio)
-		bio_free(bio, bs);
+		bio_free(bio);
 	blk_rq_unprep_clone(rq);
 
 	return -ENOMEM;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 920ede2..19bf632 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops = {
 	.release = drbd_release,
 };
 
-static void bio_destructor_drbd(struct bio *bio)
-{
-	bio_free(bio, drbd_md_io_bio_set);
-}
-
 struct bio *bio_alloc_drbd(gfp_t gfp_mask)
 {
-	struct bio *bio;
-
 	if (!drbd_md_io_bio_set)
 		return bio_alloc(gfp_mask, 1);
 
-	bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
-	if (!bio)
-		return NULL;
-	bio->bi_destructor = bio_destructor_drbd;
-	return bio;
+	return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
 }
 
 #ifdef __CHECKER__
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index be65582..9338d0602 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -467,15 +467,7 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
 	}
 
 	bio->bi_bdev = ib_dev->ibd_bd;
-<<<<<<< HEAD
 	bio->bi_private = cmd;
-	bio->bi_destructor = iblock_bio_destructor;
-||||||| merged common ancestors
-	bio->bi_private = task;
-	bio->bi_destructor = iblock_bio_destructor;
-=======
-	bio->bi_private = task;
->>>>>>> block: Generalized bio pool freeing
 	bio->bi_end_io = &iblock_bio_done;
 	bio->bi_sector = lba;
 	return bio;
diff --git a/fs/bio.c b/fs/bio.c
index 252e253..a301071 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -56,6 +56,9 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
  */
 struct bio_set *fs_bio_set;
 
+/* Only used as a sentinal value */
+static struct bio_set bio_kmalloc_pool;
+
 /*
  * Our slab pool management
  */
@@ -232,10 +235,21 @@ fallback:
 	return bvl;
 }
 
-void bio_free(struct bio *bio, struct bio_set *bs)
+void bio_free(struct bio *bio)
 {
+	struct bio_set *bs = bio->bi_pool;
 	void *p;
 
+	BUG_ON(!bs);
+
+	if (bs == &bio_kmalloc_pool) {
+		/* Bio was allocated by bio_kmalloc() */
+		if (bio_integrity(bio))
+			bio_integrity_free(bio, fs_bio_set);
+		kfree(bio);
+		return;
+	}
+
 	if (bio_has_allocated_vec(bio))
 		bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));
 
@@ -347,13 +361,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 }
 EXPORT_SYMBOL(bio_alloc);
 
-static void bio_kmalloc_destructor(struct bio *bio)
-{
-	if (bio_integrity(bio))
-		bio_integrity_free(bio, fs_bio_set);
-	kfree(bio);
-}
-
 /**
  * bio_kmalloc - allocate a bio for I/O using kmalloc()
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -380,7 +387,7 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 	bio->bi_flags |= BIO_POOL_NONE << BIO_POOL_OFFSET;
 	bio->bi_max_vecs = nr_iovecs;
 	bio->bi_io_vec = bio->bi_inline_vecs;
-	bio->bi_destructor = bio_kmalloc_destructor;
+	bio->bi_pool = &bio_kmalloc_pool;
 
 	return bio;
 }
@@ -418,12 +425,7 @@ void bio_put(struct bio *bio)
 	 */
 	if (atomic_dec_and_test(&bio->bi_cnt)) {
 		bio_disassociate_task(bio);
-		bio->bi_next = NULL;
-
-		if (bio->bi_pool)
-			bio_free(bio, bio->bi_pool);
-		else
-			bio->bi_destructor(bio);
+		bio_free(bio);
 	}
 }
 EXPORT_SYMBOL(bio_put);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ba60319..393c87e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -216,7 +216,7 @@ extern struct bio *bio_alloc(gfp_t, unsigned int);
 extern struct bio *bio_kmalloc(gfp_t, unsigned int);
 extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
 extern void bio_put(struct bio *);
-extern void bio_free(struct bio *, struct bio_set *);
+extern void bio_free(struct bio *);
 
 extern void bio_endio(struct bio *, int);
 struct request_queue;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 40411e2..fa45a12 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -84,11 +84,8 @@ struct bio {
 	struct bio_integrity_payload *bi_integrity;  /* data integrity */
 #endif
 
-	/* If bi_pool is non NULL, bi_destructor is not called */
 	struct bio_set		*bi_pool;
 
-	bio_destructor_t	*bi_destructor;	/* destructor */
-
 	/*
 	 * We can inline a number of vecs at the end of the bio, to avoid
 	 * double allocations for a small number of bio_vecs. This member
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 06/12] block: Add an explicit bio flag for bios that own their bvec
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (4 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 05/12] block: Kill bi_destructor Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 07/12] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

This is for the new bio splitting code. When we split a bio, if the
split occured on a bvec boundry we reuse the bvec for the new bio. But
that means bio_free() can't free it, hence the explicit flag.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 fs/bio.c                  |    3 ++-
 include/linux/bio.h       |    5 -----
 include/linux/blk_types.h |    1 +
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index a301071..2631d0b 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -250,7 +250,7 @@ void bio_free(struct bio *bio)
 		return;
 	}
 
-	if (bio_has_allocated_vec(bio))
+	if (bio_flagged(bio, BIO_OWNS_VEC))
 		bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));
 
 	if (bio_integrity(bio))
@@ -323,6 +323,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 			goto err_free;
 
 		nr_iovecs = bvec_nr_vecs(idx);
+		bio->bi_flags |= 1 << BIO_OWNS_VEC;
 	}
 out_set:
 	bio->bi_flags |= idx << BIO_POOL_OFFSET;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 393c87e..484b96e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -84,11 +84,6 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
-static inline int bio_has_allocated_vec(struct bio *bio)
-{
-	return bio->bi_io_vec && bio->bi_io_vec != bio->bi_inline_vecs;
-}
-
 /*
  * will die
  */
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index fa45a12..44245af 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -111,6 +111,7 @@ struct bio {
 #define BIO_FS_INTEGRITY 9	/* fs owns integrity data, not block layer */
 #define BIO_QUIET	10	/* Make BIO Quiet */
 #define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
+#define BIO_OWNS_VEC	12	/* bio_free() should free bvec */
 #define bio_flagged(bio, flag)	((bio)->bi_flags & (1 << (flag)))
 
 /*
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 07/12] block: Rename bio_split() -> bio_pair_split()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (5 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 06/12] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 08/12] block: Introduce new bio_split() Kent Overstreet
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

This is prep work for introducing a more general bio_split()

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/block/drbd/drbd_req.c |    2 +-
 drivers/block/pktcdvd.c       |    2 +-
 drivers/block/rbd.c           |    3 ++-
 drivers/md/linear.c           |    2 +-
 drivers/md/raid0.c            |    6 +++---
 drivers/md/raid10.c           |    4 ++--
 fs/bio.c                      |    4 ++--
 include/linux/bio.h           |    2 +-
 8 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 8e93a6a..b5cfa3b 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1150,7 +1150,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 		const int sps = 1 << HT_SHIFT; /* sectors per slot */
 		const int mask = sps - 1;
 		const sector_t first_sectors = sps - (sect & mask);
-		bp = bio_split(bio, first_sectors);
+		bp = bio_pair_split(bio, first_sectors);
 
 		/* we need to get a "reference count" (ap_bio_cnt)
 		 * to avoid races with the disconnect/reconnect/suspend code.
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 6fe693a..12a14c0 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2467,7 +2467,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 		if (last_zone != zone) {
 			BUG_ON(last_zone != zone + pd->settings.size);
 			first_sectors = last_zone - bio->bi_sector;
-			bp = bio_split(bio, first_sectors);
+			bp = bio_pair_split(bio, first_sectors);
 			BUG_ON(!bp);
 			pkt_make_request(q, &bp->bio1);
 			pkt_make_request(q, &bp->bio2);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8f428a8..e33c224 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -732,7 +732,8 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
 
 			/* split the bio. We'll release it either in the next
 			   call, or it will have to be released outside */
-			bp = bio_split(old_chain, (len - total) / SECTOR_SIZE);
+			bp = bio_pair_split(old_chain,
+					    (len - total) / SECTOR_SIZE);
 			if (!bp)
 				goto err_out;
 
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fa211d8..e860cb9 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -314,7 +314,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 
 		rcu_read_unlock();
 
-		bp = bio_split(bio, end_sector - bio->bi_sector);
+		bp = bio_pair_split(bio, end_sector - bio->bi_sector);
 
 		linear_make_request(mddev, &bp->bio1);
 		linear_make_request(mddev, &bp->bio2);
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index de63a1f..c89c8aa 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -516,11 +516,11 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 		 * refuse to split for us, so we need to split it.
 		 */
 		if (likely(is_power_of_2(chunk_sects)))
-			bp = bio_split(bio, chunk_sects - (sector &
+			bp = bio_pair_split(bio, chunk_sects - (sector &
 							   (chunk_sects-1)));
 		else
-			bp = bio_split(bio, chunk_sects -
-				       sector_div(sector, chunk_sects));
+			bp = bio_pair_split(bio, chunk_sects -
+					    sector_div(sector, chunk_sects));
 		raid0_make_request(mddev, &bp->bio1);
 		raid0_make_request(mddev, &bp->bio2);
 		bio_pair_release(bp);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8da6282..be75924 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1063,8 +1063,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		/* This is a one page bio that upper layers
 		 * refuse to split for us, so we need to split it.
 		 */
-		bp = bio_split(bio,
-			       chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+		bp = bio_pair_split(bio,
+				    chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
 
 		/* Each of these 'make_request' calls will call 'wait_barrier'.
 		 * If the first succeeds but the second blocks due to the resync
diff --git a/fs/bio.c b/fs/bio.c
index 2631d0b..5d02aa5 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1493,7 +1493,7 @@ static void bio_pair_end_2(struct bio *bi, int err)
 /*
  * split a bio - only worry about a bio with a single page in its iovec
  */
-struct bio_pair *bio_split(struct bio *bi, int first_sectors)
+struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
 {
 	struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);
 
@@ -1536,7 +1536,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
 
 	return bp;
 }
-EXPORT_SYMBOL(bio_split);
+EXPORT_SYMBOL(bio_pair_split);
 
 /**
  *      bio_sector_offset - Find hardware sector offset in bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 484b96e..fdcc8dc 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -201,7 +201,7 @@ struct bio_pair {
 	atomic_t			cnt;
 	int				error;
 };
-extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
+extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
 extern void bio_pair_release(struct bio_pair *dbio);
 
 extern struct bio_set *bioset_create(unsigned int, unsigned int);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 08/12] block: Introduce new bio_split()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (6 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 07/12] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 11:55   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 09/12] block: Rework bio_pair_split() Kent Overstreet
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 fs/bio.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 5d02aa5..a15e121 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
 EXPORT_SYMBOL(bio_pair_split);
 
 /**
+ * bio_split - split a bio
+ * @bio:	bio to split
+ * @sectors:	number of sectors to split from the front of @bio
+ * @gfp:	gfp mask
+ * @bs:		bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * If bio_split() is running under generic_make_request(), it's not safe to
+ * allocate more than one bio from the same bio set. Therefore, if it is running
+ * under generic_make_request() it masks out __GFP_WAIT when doing the
+ * allocation. The caller must check for failure if there's any possibility of
+ * it being called from under generic_make_request(); it is then the caller's
+ * responsibility to retry from a safe context (by e.g. punting to workqueue).
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+		      gfp_t gfp, struct bio_set *bs)
+{
+	unsigned idx, vcnt = 0, nbytes = sectors << 9;
+	struct bio_vec *bv;
+	struct bio *ret = NULL;
+
+	BUG_ON(sectors <= 0);
+
+	/*
+	 * If we're being called from underneath generic_make_request() and we
+	 * already allocated any bios from this bio set, we risk deadlock if we
+	 * use the mempool. So instead, we possibly fail and let the caller punt
+	 * to workqueue or somesuch and retry in a safe context.
+	 */
+	if (current->bio_list)
+		gfp &= ~__GFP_WAIT;
+
+	if (sectors >= bio_sectors(bio))
+		return bio;
+
+	trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+			  bio->bi_sector + sectors);
+
+	bio_for_each_segment(bv, bio, idx) {
+		vcnt = idx - bio->bi_idx;
+
+		if (!nbytes) {
+			ret = bio_alloc_bioset(gfp, 0, bs);
+			if (!ret)
+				return NULL;
+
+			ret->bi_io_vec = bio_iovec(bio);
+			ret->bi_flags |= 1 << BIO_CLONED;
+			break;
+		} else if (nbytes < bv->bv_len) {
+			ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+			if (!ret)
+				return NULL;
+
+			memcpy(ret->bi_io_vec, bio_iovec(bio),
+			       sizeof(struct bio_vec) * vcnt);
+
+			ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+			bv->bv_offset	+= nbytes;
+			bv->bv_len	-= nbytes;
+			break;
+		}
+
+		nbytes -= bv->bv_len;
+	}
+
+	ret->bi_bdev	= bio->bi_bdev;
+	ret->bi_sector	= bio->bi_sector;
+	ret->bi_size	= sectors << 9;
+	ret->bi_rw	= bio->bi_rw;
+	ret->bi_vcnt	= vcnt;
+	ret->bi_max_vecs = vcnt;
+	ret->bi_end_io	= bio->bi_end_io;
+	ret->bi_private	= bio->bi_private;
+
+	bio->bi_sector	+= sectors;
+	bio->bi_size	-= sectors << 9;
+	bio->bi_idx	 = idx;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_clone(ret, bio, gfp, bs);
+		bio_integrity_trim(ret, 0, bio_sectors(ret));
+		bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
  *      bio_sector_offset - Find hardware sector offset in bio
  *      @bio:           bio to inspect
  *      @index:         bio_vec index
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 09/12] block: Rework bio_pair_split()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (7 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 08/12] block: Introduce new bio_split() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 12:03   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 10/12] block: Add bio_clone_kmalloc() Kent Overstreet
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

This changes bio_pair_split() to use the new bio_split() underneath,
which gets rid of the single page bio limitation. The various callers
are fixed up for the slightly different struct bio_pair, and to remove
the unnecessary checks.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/block/drbd/drbd_req.c |   16 +-------
 drivers/block/pktcdvd.c       |    4 +-
 drivers/block/rbd.c           |    7 ++--
 drivers/md/linear.c           |    4 +-
 drivers/md/raid0.c            |    6 ++--
 drivers/md/raid10.c           |   21 ++---------
 fs/bio-integrity.c            |   44 -----------------------
 fs/bio.c                      |   78 ++++++++++++----------------------------
 include/linux/bio.h           |   25 +++++--------
 9 files changed, 50 insertions(+), 155 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index b5cfa3b..fea4a40 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1123,18 +1123,6 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 		do {
 			inc_ap_bio(mdev, 1);
 		} while (drbd_make_request_common(mdev, bio, start_time));
-		return;
-	}
-
-	/* can this bio be split generically?
-	 * Maybe add our own split-arbitrary-bios function. */
-	if (bio->bi_vcnt != 1 || bio->bi_idx != 0 || bio->bi_size > DRBD_MAX_BIO_SIZE) {
-		/* rather error out here than BUG in bio_split */
-		dev_err(DEV, "bio would need to, but cannot, be split: "
-		    "(vcnt=%u,idx=%u,size=%u,sector=%llu)\n",
-		    bio->bi_vcnt, bio->bi_idx, bio->bi_size,
-		    (unsigned long long)bio->bi_sector);
-		bio_endio(bio, -EINVAL);
 	} else {
 		/* This bio crosses some boundary, so we have to split it. */
 		struct bio_pair *bp;
@@ -1161,10 +1149,10 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
 
 		D_ASSERT(e_enr == s_enr + 1);
 
-		while (drbd_make_request_common(mdev, &bp->bio1, start_time))
+		while (drbd_make_request_common(mdev, &bp->split, start_time))
 			inc_ap_bio(mdev, 1);
 
-		while (drbd_make_request_common(mdev, &bp->bio2, start_time))
+		while (drbd_make_request_common(mdev, bio, start_time))
 			inc_ap_bio(mdev, 1);
 
 		dec_ap_bio(mdev);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 12a14c0..1465155 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2469,8 +2469,8 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
 			first_sectors = last_zone - bio->bi_sector;
 			bp = bio_pair_split(bio, first_sectors);
 			BUG_ON(!bp);
-			pkt_make_request(q, &bp->bio1);
-			pkt_make_request(q, &bp->bio2);
+			pkt_make_request(q, &bp->split);
+			pkt_make_request(q, bio);
 			bio_pair_release(bp);
 			return;
 		}
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e33c224..692cf05 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -732,14 +732,13 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
 
 			/* split the bio. We'll release it either in the next
 			   call, or it will have to be released outside */
-			bp = bio_pair_split(old_chain,
-					    (len - total) / SECTOR_SIZE);
+			bp = bio_pair_split(old_chain, (len - total) / SECTOR_SIZE);
 			if (!bp)
 				goto err_out;
 
-			__bio_clone(tmp, &bp->bio1);
+			__bio_clone(tmp, &bp->split);
 
-			*next = &bp->bio2;
+			*next = bp->orig;
 		} else {
 			__bio_clone(tmp, old_chain);
 			*next = old_chain->bi_next;
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index e860cb9..7c6cafd 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -316,8 +316,8 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 
 		bp = bio_pair_split(bio, end_sector - bio->bi_sector);
 
-		linear_make_request(mddev, &bp->bio1);
-		linear_make_request(mddev, &bp->bio2);
+		linear_make_request(mddev, &bp->split);
+		linear_make_request(mddev, bio);
 		bio_pair_release(bp);
 		return;
 	}
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c89c8aa..3469adf 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -520,9 +520,9 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 							   (chunk_sects-1)));
 		else
 			bp = bio_pair_split(bio, chunk_sects -
-					    sector_div(sector, chunk_sects));
-		raid0_make_request(mddev, &bp->bio1);
-		raid0_make_request(mddev, &bp->bio2);
+				       sector_div(sector, chunk_sects));
+		raid0_make_request(mddev, &bp->split);
+		raid0_make_request(mddev, bio);
 		bio_pair_release(bp);
 		return;
 	}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index be75924..1a67a78 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1056,15 +1056,9 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		     && (conf->geo.near_copies < conf->geo.raid_disks
 			 || conf->prev.near_copies < conf->prev.raid_disks))) {
 		struct bio_pair *bp;
-		/* Sanity check -- queue functions should prevent this happening */
-		if (bio->bi_vcnt != 1 ||
-		    bio->bi_idx != 0)
-			goto bad_map;
-		/* This is a one page bio that upper layers
-		 * refuse to split for us, so we need to split it.
-		 */
+
 		bp = bio_pair_split(bio,
-				    chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+				    chunk_sects - (bio->bi_sector & (chunk_sects - 1)));
 
 		/* Each of these 'make_request' calls will call 'wait_barrier'.
 		 * If the first succeeds but the second blocks due to the resync
@@ -1078,8 +1072,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		conf->nr_waiting++;
 		spin_unlock_irq(&conf->resync_lock);
 
-		make_request(mddev, &bp->bio1);
-		make_request(mddev, &bp->bio2);
+		make_request(mddev, &bp->split);
+		make_request(mddev, bio);
 
 		spin_lock_irq(&conf->resync_lock);
 		conf->nr_waiting--;
@@ -1088,13 +1082,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 
 		bio_pair_release(bp);
 		return;
-	bad_map:
-		printk("md/raid10:%s: make_request bug: can't convert block across chunks"
-		       " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
-		       (unsigned long long)bio->bi_sector, bio->bi_size >> 10);
-
-		bio_io_error(bio);
-		return;
 	}
 
 	md_write_start(mddev, bio);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e85c04b..9ed2c44 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -682,50 +682,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
 EXPORT_SYMBOL(bio_integrity_trim);
 
 /**
- * bio_integrity_split - Split integrity metadata
- * @bio:	Protected bio
- * @bp:		Resulting bio_pair
- * @sectors:	Offset
- *
- * Description: Splits an integrity page into a bio_pair.
- */
-void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
-{
-	struct blk_integrity *bi;
-	struct bio_integrity_payload *bip = bio->bi_integrity;
-	unsigned int nr_sectors;
-
-	if (bio_integrity(bio) == 0)
-		return;
-
-	bi = bdev_get_integrity(bio->bi_bdev);
-	BUG_ON(bi == NULL);
-	BUG_ON(bip->bip_vcnt != 1);
-
-	nr_sectors = bio_integrity_hw_sectors(bi, sectors);
-
-	bp->bio1.bi_integrity = &bp->bip1;
-	bp->bio2.bi_integrity = &bp->bip2;
-
-	bp->iv1 = bip->bip_vec[0];
-	bp->iv2 = bip->bip_vec[0];
-
-	bp->bip1.bip_vec[0] = bp->iv1;
-	bp->bip2.bip_vec[0] = bp->iv2;
-
-	bp->iv1.bv_len = sectors * bi->tuple_size;
-	bp->iv2.bv_offset += sectors * bi->tuple_size;
-	bp->iv2.bv_len -= sectors * bi->tuple_size;
-
-	bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
-	bp->bip2.bip_sector = bio->bi_integrity->bip_sector + nr_sectors;
-
-	bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
-	bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
-}
-EXPORT_SYMBOL(bio_integrity_split);
-
-/**
  * bio_integrity_clone - Callback for cloning bios with integrity metadata
  * @bio:	New bio
  * @bio_src:	Original bio
diff --git a/fs/bio.c b/fs/bio.c
index a15e121..fa6dee4 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -37,7 +37,7 @@
  */
 #define BIO_INLINE_VECS		4
 
-static mempool_t *bio_split_pool __read_mostly;
+static struct bio_set *bio_split_pool __read_mostly;
 
 /*
  * if you change this list, also change bvec_alloc or things will
@@ -1462,77 +1462,48 @@ EXPORT_SYMBOL(bio_endio);
 void bio_pair_release(struct bio_pair *bp)
 {
 	if (atomic_dec_and_test(&bp->cnt)) {
-		struct bio *master = bp->bio1.bi_private;
+		bp->orig->bi_end_io = bp->bi_end_io;
+		bp->orig->bi_private = bp->bi_private;
 
-		bio_endio(master, bp->error);
-		mempool_free(bp, bp->bio2.bi_private);
+		bio_endio(bp->orig, 0);
+		bio_put(&bp->split);
 	}
 }
 EXPORT_SYMBOL(bio_pair_release);
 
-static void bio_pair_end_1(struct bio *bi, int err)
+static void bio_pair_end(struct bio *bio, int error)
 {
-	struct bio_pair *bp = container_of(bi, struct bio_pair, bio1);
+	struct bio_pair *bp = bio->bi_private;
 
-	if (err)
-		bp->error = err;
+	if (error)
+		clear_bit(BIO_UPTODATE, &bp->orig->bi_flags);
 
 	bio_pair_release(bp);
 }
 
-static void bio_pair_end_2(struct bio *bi, int err)
+struct bio_pair *bio_pair_split(struct bio *bio, int first_sectors)
 {
-	struct bio_pair *bp = container_of(bi, struct bio_pair, bio2);
-
-	if (err)
-		bp->error = err;
-
-	bio_pair_release(bp);
-}
+	struct bio_pair *bp;
+	struct bio *split = bio_split(bio, first_sectors,
+				      GFP_NOIO, bio_split_pool);
 
-/*
- * split a bio - only worry about a bio with a single page in its iovec
- */
-struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
-{
-	struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);
+	if (!split)
+		return NULL;
 
-	if (!bp)
-		return bp;
+	BUG_ON(split == bio);
 
-	trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
-				bi->bi_sector + first_sectors);
+	bp = container_of(split, struct bio_pair, split);
 
-	BUG_ON(bi->bi_vcnt != 1);
-	BUG_ON(bi->bi_idx != 0);
 	atomic_set(&bp->cnt, 3);
-	bp->error = 0;
-	bp->bio1 = *bi;
-	bp->bio2 = *bi;
-	bp->bio2.bi_sector += first_sectors;
-	bp->bio2.bi_size -= first_sectors << 9;
-	bp->bio1.bi_size = first_sectors << 9;
-
-	bp->bv1 = bi->bi_io_vec[0];
-	bp->bv2 = bi->bi_io_vec[0];
-	bp->bv2.bv_offset += first_sectors << 9;
-	bp->bv2.bv_len -= first_sectors << 9;
-	bp->bv1.bv_len = first_sectors << 9;
-
-	bp->bio1.bi_io_vec = &bp->bv1;
-	bp->bio2.bi_io_vec = &bp->bv2;
-
-	bp->bio1.bi_max_vecs = 1;
-	bp->bio2.bi_max_vecs = 1;
 
-	bp->bio1.bi_end_io = bio_pair_end_1;
-	bp->bio2.bi_end_io = bio_pair_end_2;
+	bp->bi_end_io = bio->bi_end_io;
+	bp->bi_private = bio->bi_private;
 
-	bp->bio1.bi_private = bi;
-	bp->bio2.bi_private = bio_split_pool;
+	bio->bi_private = bp;
+	bio->bi_end_io = bio_pair_end;
 
-	if (bio_integrity(bi))
-		bio_integrity_split(bi, bp, first_sectors);
+	split->bi_private = bp;
+	split->bi_end_io = bio_pair_end;
 
 	return bp;
 }
@@ -1846,8 +1817,7 @@ static int __init init_bio(void)
 	if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
 		panic("bio: can't create integrity pool\n");
 
-	bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
-						     sizeof(struct bio_pair));
+	bio_split_pool = bioset_create(BIO_POOL_SIZE, offsetof(struct bio_pair, split));
 	if (!bio_split_pool)
 		panic("bio: can't create split pool\n");
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index fdcc8dc..9720544 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -192,15 +192,17 @@ struct bio_integrity_payload {
  *   in bio2.bi_private
  */
 struct bio_pair {
-	struct bio			bio1, bio2;
-	struct bio_vec			bv1, bv2;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
-	struct bio_integrity_payload	bip1, bip2;
-	struct bio_vec			iv1, iv2;
-#endif
-	atomic_t			cnt;
-	int				error;
+	atomic_t		cnt;
+
+	bio_end_io_t		*bi_end_io;
+	void			*bi_private;
+
+	struct bio		*orig;
+	struct bio		split;
 };
+
+extern struct bio *bio_split(struct bio *bio, int sectors,
+			     gfp_t gfp, struct bio_set *bs);
 extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
 extern void bio_pair_release(struct bio_pair *dbio);
 
@@ -512,7 +514,6 @@ extern int bio_integrity_prep(struct bio *);
 extern void bio_integrity_endio(struct bio *, int);
 extern void bio_integrity_advance(struct bio *, unsigned int);
 extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
-extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
 extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t, struct bio_set *);
 extern int bioset_integrity_create(struct bio_set *, int);
 extern void bioset_integrity_free(struct bio_set *);
@@ -556,12 +557,6 @@ static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 	return 0;
 }
 
-static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
-				       int sectors)
-{
-	return;
-}
-
 static inline void bio_integrity_advance(struct bio *bio,
 					 unsigned int bytes_done)
 {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 10/12] block: Add bio_clone_kmalloc()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (8 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 09/12] block: Rework bio_pair_split() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-25 12:05   ` Boaz Harrosh
  2012-07-24 20:11 ` [PATCH v4 11/12] block: Add bio_clone_bioset() Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 12/12] block: Only clone bio vecs that are in use Kent Overstreet
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/block/osdblk.c |    3 +--
 fs/bio.c               |   13 +++++++++++++
 fs/exofs/ore.c         |    5 ++---
 include/linux/bio.h    |    1 +
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
 	struct bio *tmp, *new_chain = NULL, *tail = NULL;
 
 	while (old_chain) {
-		tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+		tmp = bio_clone_kmalloc(old_chain, gfpmask);
 		if (!tmp)
 			goto err_out;
 
-		__bio_clone(tmp, old_chain);
 		tmp->bi_bdev = NULL;
 		gfpmask &= ~__GFP_WAIT;
 		tmp->bi_next = NULL;
diff --git a/fs/bio.c b/fs/bio.c
index fa6dee4..9d0ceb2 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -499,6 +499,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(bio_clone);
 
+struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
+{
+	struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
+
+	if (!b)
+		return NULL;
+
+	__bio_clone(b, bio);
+
+	return b;
+}
+EXPORT_SYMBOL(bio_clone_kmalloc);
+
 /**
  *	bio_get_nr_vecs		- return approx number of vecs
  *	@bdev:  I/O target
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 24a49d4..a8d92fc 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -814,8 +814,8 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
 			struct bio *bio;
 
 			if (per_dev != master_dev) {
-				bio = bio_kmalloc(GFP_KERNEL,
-						  master_dev->bio->bi_max_vecs);
+				bio = bio_clone_kmalloc(master_dev->bio,
+							GFP_KERNEL);
 				if (unlikely(!bio)) {
 					ORE_DBGMSG(
 					      "Failed to allocate BIO size=%u\n",
@@ -824,7 +824,6 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
 					goto out;
 				}
 
-				__bio_clone(bio, master_dev->bio);
 				bio->bi_bdev = NULL;
 				bio->bi_next = NULL;
 				per_dev->offset = master_dev->offset;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 9720544..e180f1d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -221,6 +221,7 @@ extern int bio_phys_segments(struct request_queue *, struct bio *);
 
 extern void __bio_clone(struct bio *, struct bio *);
 extern struct bio *bio_clone(struct bio *, gfp_t);
+struct bio *bio_clone_kmalloc(struct bio *, gfp_t);
 
 extern void bio_init(struct bio *);
 extern void bio_reset(struct bio *);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 11/12] block: Add bio_clone_bioset()
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (9 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 10/12] block: Add bio_clone_kmalloc() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-07-24 20:11 ` [PATCH v4 12/12] block: Only clone bio vecs that are in use Kent Overstreet
  11 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

This consolidates some code, and will help in a later patch changing how
bio cloning works.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 block/blk-core.c    |    8 +-------
 drivers/md/dm.c     |    4 ++--
 drivers/md/md.c     |   20 +-------------------
 fs/bio.c            |   16 ++++++++++++----
 include/linux/bio.h |    1 +
 5 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e9058c2..10a6e08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2768,16 +2768,10 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	blk_rq_init(NULL, rq);
 
 	__rq_for_each_bio(bio_src, rq_src) {
-		bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs);
+		bio = bio_clone_bioset(bio_src, gfp_mask, bs);
 		if (!bio)
 			goto free_and_out;
 
-		__bio_clone(bio, bio_src);
-
-		if (bio_integrity(bio_src) &&
-		    bio_integrity_clone(bio, bio_src, gfp_mask, bs))
-			goto free_and_out;
-
 		if (bio_ctr && bio_ctr(bio, bio_src, data))
 			goto free_and_out;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4014696..3f3c26e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1101,8 +1101,8 @@ static void __issue_target_request(struct clone_info *ci, struct dm_target *ti,
 	 * ci->bio->bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
 	 * and discard, so no need for concern about wasted bvec allocations.
 	 */
-	clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
-	__bio_clone(clone, ci->bio);
+	clone = bio_clone_bioset(ci->bio, GFP_NOIO, ci->md->bs);
+
 	if (len) {
 		clone->bi_sector = ci->sector;
 		clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f9d16dc..069c3bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
 struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
 			    struct mddev *mddev)
 {
-	struct bio *b;
-
 	if (!mddev || !mddev->bio_set)
 		return bio_clone(bio, gfp_mask);
 
-	b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
-	if (!b)
-		return NULL;
-
-	__bio_clone(b, bio);
-	if (bio_integrity(bio)) {
-		int ret;
-
-		ret = bio_integrity_clone(b, bio, gfp_mask, mddev->bio_set);
-
-		if (ret < 0) {
-			bio_put(b);
-			return NULL;
-		}
-	}
-
-	return b;
+	return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
 }
 EXPORT_SYMBOL_GPL(bio_clone_mddev);
 
diff --git a/fs/bio.c b/fs/bio.c
index 9d0ceb2..7a0801d 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -469,15 +469,17 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 EXPORT_SYMBOL(__bio_clone);
 
 /**
- *	bio_clone	-	clone a bio
+ *	bio_clone_bioset -	clone a bio
  *	@bio: bio to clone
  *	@gfp_mask: allocation priority
+ *	@bs: bio_set to allocate from
  *
  * 	Like __bio_clone, only also allocates the returned bio
  */
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
+			     struct bio_set *bs)
 {
-	struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
+	struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
 
 	if (!b)
 		return NULL;
@@ -487,7 +489,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 	if (bio_integrity(bio)) {
 		int ret;
 
-		ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
+		ret = bio_integrity_clone(b, bio, gfp_mask, bs);
 
 		if (ret < 0) {
 			bio_put(b);
@@ -497,6 +499,12 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 
 	return b;
 }
+EXPORT_SYMBOL(bio_clone_bioset);
+
+struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+{
+	return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
+}
 EXPORT_SYMBOL(bio_clone);
 
 struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index e180f1d..fb90584 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -220,6 +220,7 @@ struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
 extern void __bio_clone(struct bio *, struct bio *);
+extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
 extern struct bio *bio_clone(struct bio *, gfp_t);
 struct bio *bio_clone_kmalloc(struct bio *, gfp_t);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 12/12] block: Only clone bio vecs that are in use
  2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
                   ` (10 preceding siblings ...)
  2012-07-24 20:11 ` [PATCH v4 11/12] block: Add bio_clone_bioset() Kent Overstreet
@ 2012-07-24 20:11 ` Kent Overstreet
  2012-08-07  3:17   ` Muthu Kumar
  11 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-24 20:11 UTC (permalink / raw)
  To: linux-bcache, linux-kernel, dm-devel
  Cc: Kent Overstreet, tj, axboe, agk, neilb, drbd-dev, bharrosh,
	vgoyal, mpatocka, sage, yehuda

bcache creates large bios internally, and then splits them according to
the device requirements before it sends them down. If a lower level
device tries to clone the bio, and the original bio had more than
BIO_MAX_PAGES, the clone will fail unecessarily.

We can fix this by only cloning the bio vecs that are actually in use.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 drivers/block/rbd.c |    2 +-
 drivers/md/dm.c     |    5 ++---
 fs/bio.c            |   13 +++++++------
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 692cf05..21edfe5 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -714,7 +714,7 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
 	}
 
 	while (old_chain && (total < len)) {
-		tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+		tmp = bio_kmalloc(gfpmask, bio_segments(old_chain));
 		if (!tmp)
 			goto err_out;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3f3c26e..193fb19 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1057,11 +1057,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
 {
 	struct bio *clone;
 
-	clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
+	clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);
 	__bio_clone(clone, bio);
 	clone->bi_sector = sector;
-	clone->bi_idx = idx;
-	clone->bi_vcnt = idx + bv_count;
+	clone->bi_vcnt = bv_count;
 	clone->bi_size = to_bytes(len);
 	clone->bi_flags &= ~(1 << BIO_SEG_VALID);
 
diff --git a/fs/bio.c b/fs/bio.c
index 7a0801d..ec6a357 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -451,8 +451,9 @@ EXPORT_SYMBOL(bio_phys_segments);
  */
 void __bio_clone(struct bio *bio, struct bio *bio_src)
 {
-	memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
-		bio_src->bi_max_vecs * sizeof(struct bio_vec));
+	memcpy(bio->bi_io_vec,
+	       bio_iovec(bio_src),
+	       bio_segments(bio_src) * sizeof(struct bio_vec));
 
 	/*
 	 * most users will be overriding ->bi_bdev with a new target,
@@ -461,10 +462,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 	bio->bi_sector = bio_src->bi_sector;
 	bio->bi_bdev = bio_src->bi_bdev;
 	bio->bi_flags |= 1 << BIO_CLONED;
+	bio->bi_flags &= ~(1 << BIO_SEG_VALID);
 	bio->bi_rw = bio_src->bi_rw;
-	bio->bi_vcnt = bio_src->bi_vcnt;
+	bio->bi_vcnt = bio_segments(bio_src);
 	bio->bi_size = bio_src->bi_size;
-	bio->bi_idx = bio_src->bi_idx;
 }
 EXPORT_SYMBOL(__bio_clone);
 
@@ -479,7 +480,7 @@ EXPORT_SYMBOL(__bio_clone);
 struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
 			     struct bio_set *bs)
 {
-	struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
+	struct bio *b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);
 
 	if (!b)
 		return NULL;
@@ -509,7 +510,7 @@ EXPORT_SYMBOL(bio_clone);
 
 struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
 {
-	struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
+	struct bio *b = bio_kmalloc(gfp_mask, bio_segments(bio));
 
 	if (!b)
 		return NULL;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 01/12] block: Generalized bio pool freeing
  2012-07-24 20:11 ` [PATCH v4 01/12] block: Generalized bio pool freeing Kent Overstreet
@ 2012-07-25 11:06   ` Boaz Harrosh
  2012-07-25 23:38     ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 11:06 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> With the old code, when you allocate a bio from a bio pool you have to
> implement your own destructor that knows how to find the bio pool the
> bio was originally allocated from.
> 
> This adds a new field to struct bio (bi_pool) and changes
> bio_alloc_bioset() to use it. This makes various bio destructors
> unnecessary, so they're then deleted.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>

<snip>

> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index fd47950..be65582 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -447,14 +447,6 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
>  	kfree(ibr);
>  }
>  
> -static void iblock_bio_destructor(struct bio *bio)
> -{
> -	struct se_cmd *cmd = bio->bi_private;
> -	struct iblock_dev *ib_dev = cmd->se_dev->dev_ptr;
> -
> -	bio_free(bio, ib_dev->ibd_bio_set);
> -}
> -
>  static struct bio *
>  iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
>  {
> @@ -475,8 +467,15 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
>  	}
>  
>  	bio->bi_bdev = ib_dev->ibd_bd;
> +<<<<<<< HEAD
>  	bio->bi_private = cmd;
>  	bio->bi_destructor = iblock_bio_destructor;
> +||||||| merged common ancestors
> +	bio->bi_private = task;
> +	bio->bi_destructor = iblock_bio_destructor;
> +=======
> +	bio->bi_private = task;
> +>>>>>>> block: Generalized bio pool freeing
>  	bio->bi_end_io = &iblock_bio_done;
>  	bio->bi_sector = lba;
>  	return bio;


You left out a rebase merge conflict. Did you allmodconfig compile
these patches?

Boaz

> diff --git a/fs/bio.c b/fs/bio.c
> index 73922ab..1c6c8b7 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -271,10 +271,6 @@ EXPORT_SYMBOL(bio_init);
>   *   bio_alloc_bioset will try its own mempool to satisfy the allocation.
>   *   If %__GFP_WAIT is set then we will block on the internal pool waiting
>   *   for a &struct bio to become free.
> - *
> - *   Note that the caller must set ->bi_destructor on successful return
> - *   of a bio, to do the appropriate freeing of the bio once the reference
> - *   count drops to zero.
>   **/
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  {
> @@ -289,6 +285,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  	bio = p + bs->front_pad;
>  
>  	bio_init(bio);
> +	bio->bi_pool = bs;
>  
>  	if (unlikely(!nr_iovecs))
>  		goto out_set;
> @@ -315,11 +312,6 @@ err_free:
>  }
>  EXPORT_SYMBOL(bio_alloc_bioset);
>  
> -static void bio_fs_destructor(struct bio *bio)
> -{
> -	bio_free(bio, fs_bio_set);
> -}
> -
>  /**
>   *	bio_alloc - allocate a new bio, memory pool backed
>   *	@gfp_mask: allocation mask to use
> @@ -341,12 +333,7 @@ static void bio_fs_destructor(struct bio *bio)
>   */
>  struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
>  {
> -	struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
> -
> -	if (bio)
> -		bio->bi_destructor = bio_fs_destructor;
> -
> -	return bio;
> +	return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
>  }
>  EXPORT_SYMBOL(bio_alloc);
>  
> @@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
>  	if (atomic_dec_and_test(&bio->bi_cnt)) {
>  		bio_disassociate_task(bio);
>  		bio->bi_next = NULL;
> -		bio->bi_destructor(bio);
> +
> +		if (bio->bi_pool)
> +			bio_free(bio, bio->bi_pool);
> +		else
> +			bio->bi_destructor(bio);
>  	}
>  }
>  EXPORT_SYMBOL(bio_put);
> @@ -473,12 +464,11 @@ EXPORT_SYMBOL(__bio_clone);
>   */
>  struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>  {
> -	struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, fs_bio_set);
> +	struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
>  
>  	if (!b)
>  		return NULL;
>  
> -	b->bi_destructor = bio_fs_destructor;
>  	__bio_clone(b, bio);
>  
>  	if (bio_integrity(bio)) {
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 0edb65d..293547e 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -80,6 +80,9 @@ struct bio {
>  	struct bio_integrity_payload *bi_integrity;  /* data integrity */
>  #endif
>  
> +	/* If bi_pool is non NULL, bi_destructor is not called */
> +	struct bio_set		*bi_pool;
> +
>  	bio_destructor_t	*bi_destructor;	/* destructor */
>  
>  	/*



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 03/12] block: Add bio_reset()
  2012-07-24 20:11 ` [PATCH v4 03/12] block: Add bio_reset() Kent Overstreet
@ 2012-07-25 11:19   ` Boaz Harrosh
  2012-07-25 22:56     ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 11:19 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> Reusing bios is something that's been highly frowned upon in the past,
> but driver code keeps doing it anyways. If it's going to happen anyways,
> we should provide a generic method.
> 
> This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> was open coding it, by doing a bio_init() and resetting bi_destructor.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> ---
>  fs/bio.c                  |   10 ++++++++++
>  include/linux/bio.h       |    1 +
>  include/linux/blk_types.h |    6 ++++++
>  3 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 1c6c8b7..252e253 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -261,6 +261,16 @@ void bio_init(struct bio *bio)
>  }
>  EXPORT_SYMBOL(bio_init);
>  
> +void bio_reset(struct bio *bio)
> +{
> +	/* Clear all flags below BIO_OWNS_VEC */
> +	unsigned long flags = bio->bi_flags & (~0UL << BIO_OWNS_VEC);
> +


Hey I have not seen these FLAGS thing before. Are these new?

Anyway. Please NO!!! for one you need to put a big fat comment
over at flags definitions. And two what happens when one adds
a new flag. Is it reset or not reset?

I'd rather you define a flags mask for those that need to be
preserved, at header, plus a comment that any needed-to-be-preserved
cross init flags, must be added to the mask.

Never, ever, hide such crap so deep in the code.

Boaz

> +	memset(bio, 0, BIO_RESET_BYTES);
> +	bio->bi_flags = flags|(1 << BIO_UPTODATE);
> +}
> +EXPORT_SYMBOL(bio_reset);
> +
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 2643589..ba60319 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
>  extern struct bio *bio_clone(struct bio *, gfp_t);
>  
>  extern void bio_init(struct bio *);
> +extern void bio_reset(struct bio *);
>  
>  extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
>  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 293547e..40411e2 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -59,6 +59,10 @@ struct bio {
>  	unsigned int		bi_seg_front_size;
>  	unsigned int		bi_seg_back_size;
>  
> +	/*
> +	 * Everything starting with bi_max_vecs will be preserved by bio_reset()
> +	 */
> +
>  	unsigned int		bi_max_vecs;	/* max bvl_vecs we can hold */
>  
>  	atomic_t		bi_cnt;		/* pin count */
> @@ -93,6 +97,8 @@ struct bio {
>  	struct bio_vec		bi_inline_vecs[0];
>  };
>  
> +#define BIO_RESET_BYTES		offsetof(struct bio, bi_max_vecs)
> +
>  /*
>   * bio flags
>   */



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc()
  2012-07-24 20:11 ` [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
@ 2012-07-25 11:29   ` Boaz Harrosh
  2012-07-25 23:01     ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 11:29 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> This is prep work for killing bi_destructor - previously, pktcdvd had
> its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> necessitating its own bi_destructor implementation.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> ---
>  drivers/block/pktcdvd.c |  115 ++++++++++++++++-------------------------------
>  1 files changed, 39 insertions(+), 76 deletions(-)
> 
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index ba66e44..6fe693a 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -522,36 +522,38 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
>  	}
>  }
>  
> -static void pkt_bio_destructor(struct bio *bio)
> +static void pkt_end_io_read(struct bio *bio, int err)
>  {
> -	kfree(bio->bi_io_vec);
> -	kfree(bio);
> -}


Again here, you decided to move the pkt_end_io_read && pkt_end_io_packet_write
functions from below, to above here. Which makes it impossible to find
any bugs by just reviewing the patch.

So I have not reviewed it. 

I know that you wanted so you can reference them at pkt_alloc_packet_data.
I'd use a forward reference, in this case. And a move in a next patch. But
this is just me. Perhaps the owner of this code can review it?

Cheers
Boaz

> +	struct packet_data *pkt = bio->bi_private;
> +	struct pktcdvd_device *pd = pkt->pd;
> +	BUG_ON(!pd);
>  
> -static struct bio *pkt_bio_alloc(int nr_iovecs)
> -{
> -	struct bio_vec *bvl = NULL;
> -	struct bio *bio;
> +	VPRINTK("pkt_end_io_read: bio=%p sec0=%llx sec=%llx err=%d\n", bio,
> +		(unsigned long long)pkt->sector, (unsigned long long)bio->bi_sector, err);
>  
> -	bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
> -	if (!bio)
> -		goto no_bio;
> -	bio_init(bio);
> +	if (err)
> +		atomic_inc(&pkt->io_errors);
> +	if (atomic_dec_and_test(&pkt->io_wait)) {
> +		atomic_inc(&pkt->run_sm);
> +		wake_up(&pd->wqueue);
> +	}
> +	pkt_bio_finished(pd);
> +}
>  
> -	bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
> -	if (!bvl)
> -		goto no_bvl;
> +static void pkt_end_io_packet_write(struct bio *bio, int err)
> +{
> +	struct packet_data *pkt = bio->bi_private;
> +	struct pktcdvd_device *pd = pkt->pd;
> +	BUG_ON(!pd);
>  
> -	bio->bi_max_vecs = nr_iovecs;
> -	bio->bi_io_vec = bvl;
> -	bio->bi_destructor = pkt_bio_destructor;
> +	VPRINTK("pkt_end_io_packet_write: id=%d, err=%d\n", pkt->id, err);
>  
> -	return bio;
> +	pd->stats.pkt_ended++;
>  
> - no_bvl:
> -	kfree(bio);
> - no_bio:
> -	return NULL;
> +	pkt_bio_finished(pd);
> +	atomic_dec(&pkt->io_wait);
> +	atomic_inc(&pkt->run_sm);
> +	wake_up(&pd->wqueue);
>  }
>  
>  /*
> @@ -567,10 +569,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
>  		goto no_pkt;
>  
>  	pkt->frames = frames;
> -	pkt->w_bio = pkt_bio_alloc(frames);
> +	pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
>  	if (!pkt->w_bio)
>  		goto no_bio;
>  
> +	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> +	pkt->w_bio->bi_private = pkt;
> +
>  	for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
>  		pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
>  		if (!pkt->pages[i])
> @@ -581,9 +586,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
>  	bio_list_init(&pkt->orig_bios);
>  
>  	for (i = 0; i < frames; i++) {
> -		struct bio *bio = pkt_bio_alloc(1);
> +		struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
>  		if (!bio)
>  			goto no_rd_bio;
> +
> +		bio->bi_end_io = pkt_end_io_read;
> +		bio->bi_private = pkt;
>  		pkt->r_bios[i] = bio;
>  	}
>  
> @@ -1036,40 +1044,6 @@ static void pkt_make_local_copy(struct packet_data *pkt, struct bio_vec *bvec)
>  	}
>  }
>  
> -static void pkt_end_io_read(struct bio *bio, int err)
> -{
> -	struct packet_data *pkt = bio->bi_private;
> -	struct pktcdvd_device *pd = pkt->pd;
> -	BUG_ON(!pd);
> -
> -	VPRINTK("pkt_end_io_read: bio=%p sec0=%llx sec=%llx err=%d\n", bio,
> -		(unsigned long long)pkt->sector, (unsigned long long)bio->bi_sector, err);
> -
> -	if (err)
> -		atomic_inc(&pkt->io_errors);
> -	if (atomic_dec_and_test(&pkt->io_wait)) {
> -		atomic_inc(&pkt->run_sm);
> -		wake_up(&pd->wqueue);
> -	}
> -	pkt_bio_finished(pd);
> -}
> -
> -static void pkt_end_io_packet_write(struct bio *bio, int err)
> -{
> -	struct packet_data *pkt = bio->bi_private;
> -	struct pktcdvd_device *pd = pkt->pd;
> -	BUG_ON(!pd);
> -
> -	VPRINTK("pkt_end_io_packet_write: id=%d, err=%d\n", pkt->id, err);
> -
> -	pd->stats.pkt_ended++;
> -
> -	pkt_bio_finished(pd);
> -	atomic_dec(&pkt->io_wait);
> -	atomic_inc(&pkt->run_sm);
> -	wake_up(&pd->wqueue);
> -}
> -
>  /*
>   * Schedule reads for the holes in a packet
>   */
> @@ -1111,21 +1085,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
>  	 * Schedule reads for missing parts of the packet.
>  	 */
>  	for (f = 0; f < pkt->frames; f++) {
> -		struct bio_vec *vec;
> -
>  		int p, offset;
> +
>  		if (written[f])
>  			continue;
> +
>  		bio = pkt->r_bios[f];
> -		vec = bio->bi_io_vec;
> -		bio_init(bio);
> -		bio->bi_max_vecs = 1;
> -		bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> -		bio->bi_bdev = pd->bdev;
> -		bio->bi_end_io = pkt_end_io_read;
> -		bio->bi_private = pkt;
> -		bio->bi_io_vec = vec;
> -		bio->bi_destructor = pkt_bio_destructor;
> +		bio_reset(bio);
> +		bio->bi_sector	= pkt->sector + f * (CD_FRAMESIZE >> 9);
> +		bio->bi_bdev	= pd->bdev;
>  
>  		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
>  		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
> @@ -1418,14 +1386,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
>  	}
>  
>  	/* Start the write request */
> -	bio_init(pkt->w_bio);
> -	pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
> +	bio_reset(pkt->w_bio);
>  	pkt->w_bio->bi_sector = pkt->sector;
>  	pkt->w_bio->bi_bdev = pd->bdev;
> -	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> -	pkt->w_bio->bi_private = pkt;
> -	pkt->w_bio->bi_io_vec = bvec;
> -	pkt->w_bio->bi_destructor = pkt_bio_destructor;
>  	for (f = 0; f < pkt->frames; f++)
>  		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
>  			BUG();



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 05/12] block: Kill bi_destructor
  2012-07-24 20:11 ` [PATCH v4 05/12] block: Kill bi_destructor Kent Overstreet
@ 2012-07-25 11:39   ` Boaz Harrosh
  2012-07-25 23:15     ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 11:39 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> Now that we've got generic code for freeing bios allocated from bio
> pools, this isn't needed anymore.
> 
> This also changes the semantics of bio_free() a bit - it now also frees
> bios allocated by bio_kmalloc(). It's also no longer exported, as
> without bi_destructor there should be no need for it to be called
> anywhere else.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>

<snip>

> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index be65582..9338d0602 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -467,15 +467,7 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
>  	}
>  
>  	bio->bi_bdev = ib_dev->ibd_bd;
> -<<<<<<< HEAD
>  	bio->bi_private = cmd;
> -	bio->bi_destructor = iblock_bio_destructor;
> -||||||| merged common ancestors
> -	bio->bi_private = task;
> -	bio->bi_destructor = iblock_bio_destructor;
> -=======
> -	bio->bi_private = task;
> ->>>>>>> block: Generalized bio pool freeing
>  	bio->bi_end_io = &iblock_bio_done;
>  	bio->bi_sector = lba;
>  	return bio;


Merge conflict allmodconfig compilation please?

> diff --git a/fs/bio.c b/fs/bio.c
> index 252e253..a301071 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -56,6 +56,9 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
>   */
>  struct bio_set *fs_bio_set;
>  
> +/* Only used as a sentinal value */
> +static struct bio_set bio_kmalloc_pool;
> +


Id rather if you use a define like:
#define BIO_KMALLOC_POOL ((void *)~0)

So any code access actually crashes in the bug case where
this leaks out. (And there is no actual unused storage allocated)

BTW I like this reuse of the bi_pool member as a flag as well.

Thanks
Boaz

>  /*
>   * Our slab pool management
>   */
> @@ -232,10 +235,21 @@ fallback:
>  	return bvl;
>  }
>  
> -void bio_free(struct bio *bio, struct bio_set *bs)
> +void bio_free(struct bio *bio)
>  {
> +	struct bio_set *bs = bio->bi_pool;
>  	void *p;
>  
> +	BUG_ON(!bs);
> +
> +	if (bs == &bio_kmalloc_pool) {
> +		/* Bio was allocated by bio_kmalloc() */
> +		if (bio_integrity(bio))
> +			bio_integrity_free(bio, fs_bio_set);
> +		kfree(bio);
> +		return;
> +	}
> +
>  	if (bio_has_allocated_vec(bio))
>  		bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));
>  
> @@ -347,13 +361,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
>  }
>  EXPORT_SYMBOL(bio_alloc);
>  
> -static void bio_kmalloc_destructor(struct bio *bio)
> -{
> -	if (bio_integrity(bio))
> -		bio_integrity_free(bio, fs_bio_set);
> -	kfree(bio);
> -}
> -
>  /**
>   * bio_kmalloc - allocate a bio for I/O using kmalloc()
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -380,7 +387,7 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
>  	bio->bi_flags |= BIO_POOL_NONE << BIO_POOL_OFFSET;
>  	bio->bi_max_vecs = nr_iovecs;
>  	bio->bi_io_vec = bio->bi_inline_vecs;
> -	bio->bi_destructor = bio_kmalloc_destructor;
> +	bio->bi_pool = &bio_kmalloc_pool;
>  
>  	return bio;
>  }
> @@ -418,12 +425,7 @@ void bio_put(struct bio *bio)
>  	 */
>  	if (atomic_dec_and_test(&bio->bi_cnt)) {
>  		bio_disassociate_task(bio);
> -		bio->bi_next = NULL;
> -
> -		if (bio->bi_pool)
> -			bio_free(bio, bio->bi_pool);
> -		else
> -			bio->bi_destructor(bio);
> +		bio_free(bio);
>  	}
>  }
>  EXPORT_SYMBOL(bio_put);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index ba60319..393c87e 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -216,7 +216,7 @@ extern struct bio *bio_alloc(gfp_t, unsigned int);
>  extern struct bio *bio_kmalloc(gfp_t, unsigned int);
>  extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
>  extern void bio_put(struct bio *);
> -extern void bio_free(struct bio *, struct bio_set *);
> +extern void bio_free(struct bio *);
>  
>  extern void bio_endio(struct bio *, int);
>  struct request_queue;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 40411e2..fa45a12 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -84,11 +84,8 @@ struct bio {
>  	struct bio_integrity_payload *bi_integrity;  /* data integrity */
>  #endif
>  
> -	/* If bi_pool is non NULL, bi_destructor is not called */
>  	struct bio_set		*bi_pool;
>  
> -	bio_destructor_t	*bi_destructor;	/* destructor */
> -
>  	/*
>  	 * We can inline a number of vecs at the end of the bio, to avoid
>  	 * double allocations for a small number of bio_vecs. This member



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 08/12] block: Introduce new bio_split()
  2012-07-24 20:11 ` [PATCH v4 08/12] block: Introduce new bio_split() Kent Overstreet
@ 2012-07-25 11:55   ` Boaz Harrosh
  2012-07-25 23:26     ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 11:55 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> The new bio_split() can split arbitrary bios - it's not restricted to
> single page bios, like the old bio_split() (previously renamed to
> bio_pair_split()). It also has different semantics - it doesn't allocate
> a struct bio_pair, leaving it up to the caller to handle completions.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> ---
>  fs/bio.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 99 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 5d02aa5..a15e121 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
>  EXPORT_SYMBOL(bio_pair_split);
>  
>  /**
> + * bio_split - split a bio
> + * @bio:	bio to split
> + * @sectors:	number of sectors to split from the front of @bio
> + * @gfp:	gfp mask
> + * @bs:		bio set to allocate from
> + *
> + * Allocates and returns a new bio which represents @sectors from the start of
> + * @bio, and updates @bio to represent the remaining sectors.
> + *
> + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> + * unchanged.
> + *
> + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> + * freed before the split.
> + *
> + * If bio_split() is running under generic_make_request(), it's not safe to
> + * allocate more than one bio from the same bio set. Therefore, if it is running
> + * under generic_make_request() it masks out __GFP_WAIT when doing the
> + * allocation. The caller must check for failure if there's any possibility of
> + * it being called from under generic_make_request(); it is then the caller's
> + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> + */
> +struct bio *bio_split(struct bio *bio, int sectors,
> +		      gfp_t gfp, struct bio_set *bs)
> +{
> +	unsigned idx, vcnt = 0, nbytes = sectors << 9;
> +	struct bio_vec *bv;
> +	struct bio *ret = NULL;
> +
> +	BUG_ON(sectors <= 0);
> +
> +	/*
> +	 * If we're being called from underneath generic_make_request() and we
> +	 * already allocated any bios from this bio set, we risk deadlock if we
> +	 * use the mempool. So instead, we possibly fail and let the caller punt
> +	 * to workqueue or somesuch and retry in a safe context.
> +	 */
> +	if (current->bio_list)
> +		gfp &= ~__GFP_WAIT;


NACK!

If as you said above in the comment:
	if there's any possibility of it being called from under generic_make_request();
        it is then the caller's responsibility to ...

So all the comment needs to say is: 
	... caller's responsibility to not set __GFP_WAIT at gfp.

And drop this here. It is up to the caller to decide. If the caller wants he can do
"if (current->bio_list)" by his own.

This is a general purpose utility you might not know it's context.
for example with osdblk above will break.

Thanks
Boaz

> +
> +	if (sectors >= bio_sectors(bio))
> +		return bio;
> +
> +	trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> +			  bio->bi_sector + sectors);
> +
> +	bio_for_each_segment(bv, bio, idx) {
> +		vcnt = idx - bio->bi_idx;
> +
> +		if (!nbytes) {
> +			ret = bio_alloc_bioset(gfp, 0, bs);
> +			if (!ret)
> +				return NULL;
> +
> +			ret->bi_io_vec = bio_iovec(bio);
> +			ret->bi_flags |= 1 << BIO_CLONED;
> +			break;
> +		} else if (nbytes < bv->bv_len) {
> +			ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> +			if (!ret)
> +				return NULL;
> +
> +			memcpy(ret->bi_io_vec, bio_iovec(bio),
> +			       sizeof(struct bio_vec) * vcnt);
> +
> +			ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> +			bv->bv_offset	+= nbytes;
> +			bv->bv_len	-= nbytes;
> +			break;
> +		}
> +
> +		nbytes -= bv->bv_len;
> +	}
> +
> +	ret->bi_bdev	= bio->bi_bdev;
> +	ret->bi_sector	= bio->bi_sector;
> +	ret->bi_size	= sectors << 9;
> +	ret->bi_rw	= bio->bi_rw;
> +	ret->bi_vcnt	= vcnt;
> +	ret->bi_max_vecs = vcnt;
> +	ret->bi_end_io	= bio->bi_end_io;
> +	ret->bi_private	= bio->bi_private;
> +
> +	bio->bi_sector	+= sectors;
> +	bio->bi_size	-= sectors << 9;
> +	bio->bi_idx	 = idx;
> +
> +	if (bio_integrity(bio)) {
> +		bio_integrity_clone(ret, bio, gfp, bs);
> +		bio_integrity_trim(ret, 0, bio_sectors(ret));
> +		bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(bio_split);
> +
> +/**
>   *      bio_sector_offset - Find hardware sector offset in bio
>   *      @bio:           bio to inspect
>   *      @index:         bio_vec index



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 09/12] block: Rework bio_pair_split()
  2012-07-24 20:11 ` [PATCH v4 09/12] block: Rework bio_pair_split() Kent Overstreet
@ 2012-07-25 12:03   ` Boaz Harrosh
  0 siblings, 0 replies; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 12:03 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> This changes bio_pair_split() to use the new bio_split() underneath,
> which gets rid of the single page bio limitation. The various callers
> are fixed up for the slightly different struct bio_pair, and to remove
> the unnecessary checks.
> 
> Signed-off-by: Kent Overstreet <koverstreet@google.com>

<snip>

> +
> +extern struct bio *bio_split(struct bio *bio, int sectors,
> +			     gfp_t gfp, struct bio_set *bs);


nit:

Just for taking pride in my work, I'd move this extern declaration
to the previous patch that actually introduces it.

Cheers
Boaz

>  extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
>  extern void bio_pair_release(struct bio_pair *dbio);
>  
> @@ -512,7 +514,6 @@ extern int bio_integrity_prep(struct bio *);
>  extern void bio_integrity_endio(struct bio *, int);
>  extern void bio_integrity_advance(struct bio *, unsigned int);
>  extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
> -extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
>  extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t, struct bio_set *);
>  extern int bioset_integrity_create(struct bio_set *, int);
>  extern void bioset_integrity_free(struct bio_set *);
> @@ -556,12 +557,6 @@ static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
>  	return 0;
>  }
>  
> -static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
> -				       int sectors)
> -{
> -	return;
> -}
> -
>  static inline void bio_integrity_advance(struct bio *bio,
>  					 unsigned int bytes_done)
>  {



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/12] block: Add bio_clone_kmalloc()
  2012-07-24 20:11 ` [PATCH v4 10/12] block: Add bio_clone_kmalloc() Kent Overstreet
@ 2012-07-25 12:05   ` Boaz Harrosh
  0 siblings, 0 replies; 29+ messages in thread
From: Boaz Harrosh @ 2012-07-25 12:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On 07/24/2012 11:11 PM, Kent Overstreet wrote:

ACK-by: Boaz Harrosh <bharrosh@panasas.com>

> Signed-off-by: Kent Overstreet <koverstreet@google.com>


Thanks looks nice

> ---
>  drivers/block/osdblk.c |    3 +--
>  fs/bio.c               |   13 +++++++++++++
>  fs/exofs/ore.c         |    5 ++---
>  include/linux/bio.h    |    1 +
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> index 87311eb..1bbc681 100644
> --- a/drivers/block/osdblk.c
> +++ b/drivers/block/osdblk.c
> @@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
>  	struct bio *tmp, *new_chain = NULL, *tail = NULL;
>  
>  	while (old_chain) {
> -		tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
> +		tmp = bio_clone_kmalloc(old_chain, gfpmask);
>  		if (!tmp)
>  			goto err_out;
>  
> -		__bio_clone(tmp, old_chain);
>  		tmp->bi_bdev = NULL;
>  		gfpmask &= ~__GFP_WAIT;
>  		tmp->bi_next = NULL;
> diff --git a/fs/bio.c b/fs/bio.c
> index fa6dee4..9d0ceb2 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -499,6 +499,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(bio_clone);
>  
> +struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> +{
> +	struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
> +
> +	if (!b)
> +		return NULL;
> +
> +	__bio_clone(b, bio);
> +
> +	return b;
> +}
> +EXPORT_SYMBOL(bio_clone_kmalloc);
> +
>  /**
>   *	bio_get_nr_vecs		- return approx number of vecs
>   *	@bdev:  I/O target
> diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
> index 24a49d4..a8d92fc 100644
> --- a/fs/exofs/ore.c
> +++ b/fs/exofs/ore.c
> @@ -814,8 +814,8 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
>  			struct bio *bio;
>  
>  			if (per_dev != master_dev) {
> -				bio = bio_kmalloc(GFP_KERNEL,
> -						  master_dev->bio->bi_max_vecs);
> +				bio = bio_clone_kmalloc(master_dev->bio,
> +							GFP_KERNEL);
>  				if (unlikely(!bio)) {
>  					ORE_DBGMSG(
>  					      "Failed to allocate BIO size=%u\n",
> @@ -824,7 +824,6 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
>  					goto out;
>  				}
>  
> -				__bio_clone(bio, master_dev->bio);
>  				bio->bi_bdev = NULL;
>  				bio->bi_next = NULL;
>  				per_dev->offset = master_dev->offset;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 9720544..e180f1d 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -221,6 +221,7 @@ extern int bio_phys_segments(struct request_queue *, struct bio *);
>  
>  extern void __bio_clone(struct bio *, struct bio *);
>  extern struct bio *bio_clone(struct bio *, gfp_t);
> +struct bio *bio_clone_kmalloc(struct bio *, gfp_t);
>  
>  extern void bio_init(struct bio *);
>  extern void bio_reset(struct bio *);



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 03/12] block: Add bio_reset()
  2012-07-25 11:19   ` Boaz Harrosh
@ 2012-07-25 22:56     ` Kent Overstreet
  0 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-25 22:56 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On Wed, Jul 25, 2012 at 02:19:27PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > Reusing bios is something that's been highly frowned upon in the past,
> > but driver code keeps doing it anyways. If it's going to happen anyways,
> > we should provide a generic method.
> > 
> > This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> > was open coding it, by doing a bio_init() and resetting bi_destructor.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > ---
> >  fs/bio.c                  |   10 ++++++++++
> >  include/linux/bio.h       |    1 +
> >  include/linux/blk_types.h |    6 ++++++
> >  3 files changed, 17 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 1c6c8b7..252e253 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -261,6 +261,16 @@ void bio_init(struct bio *bio)
> >  }
> >  EXPORT_SYMBOL(bio_init);
> >  
> > +void bio_reset(struct bio *bio)
> > +{
> > +	/* Clear all flags below BIO_OWNS_VEC */
> > +	unsigned long flags = bio->bi_flags & (~0UL << BIO_OWNS_VEC);
> > +
> 
> 
> Hey I have not seen these FLAGS thing before. Are these new?
> 
> Anyway. Please NO!!! for one you need to put a big fat comment
> over at flags definitions. And two what happens when one adds
> a new flag. Is it reset or not reset?
> 
> I'd rather you define a flags mask for those that need to be
> preserved, at header, plus a comment that any needed-to-be-preserved
> cross init flags, must be added to the mask.

Yeah, that's a much better idea. Do you think this is clear enough?

diff --git a/fs/bio.c b/fs/bio.c
index 1c6c8b7..c7f3442 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -261,6 +261,15 @@ void bio_init(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_init);
 
+void bio_reset(struct bio *bio)
+{
+	unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
+
+	memset(bio, 0, BIO_RESET_BYTES);
+	bio->bi_flags = flags|(1 << BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2643589..ba60319 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
 extern struct bio *bio_clone(struct bio *, gfp_t);
 
 extern void bio_init(struct bio *);
+extern void bio_reset(struct bio *);
 
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 293547e..769799f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -59,6 +59,10 @@ struct bio {
 	unsigned int		bi_seg_front_size;
 	unsigned int		bi_seg_back_size;
 
+	/*
+	 * Everything starting with bi_max_vecs will be preserved by bio_reset()
+	 */
+
 	unsigned int		bi_max_vecs;	/* max bvl_vecs we can hold */
 
 	atomic_t		bi_cnt;		/* pin count */
@@ -93,6 +97,8 @@ struct bio {
 	struct bio_vec		bi_inline_vecs[0];
 };
 
+#define BIO_RESET_BYTES		offsetof(struct bio, bi_max_vecs)
+
 /*
  * bio flags
  */
@@ -108,6 +114,9 @@ struct bio {
 #define BIO_FS_INTEGRITY 9	/* fs owns integrity data, not block layer */
 #define BIO_QUIET	10	/* Make BIO Quiet */
 #define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
+
+#define BIO_RESET_BITS 12	/* Flags starting here get preserved by bio_reset() */
+
 #define bio_flagged(bio, flag)	((bio)->bi_flags & (1 << (flag)))
 
 /*

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc()
  2012-07-25 11:29   ` Boaz Harrosh
@ 2012-07-25 23:01     ` Kent Overstreet
  0 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-25 23:01 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On Wed, Jul 25, 2012 at 02:29:59PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > This is prep work for killing bi_destructor - previously, pktcdvd had
> > its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> > necessitating its own bi_destructor implementation.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > ---
> >  drivers/block/pktcdvd.c |  115 ++++++++++++++++-------------------------------
> >  1 files changed, 39 insertions(+), 76 deletions(-)
> > 
> > diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> > index ba66e44..6fe693a 100644
> > --- a/drivers/block/pktcdvd.c
> > +++ b/drivers/block/pktcdvd.c
> > @@ -522,36 +522,38 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
> >  	}
> >  }
> >  
> > -static void pkt_bio_destructor(struct bio *bio)
> > +static void pkt_end_io_read(struct bio *bio, int err)
> >  {
> > -	kfree(bio->bi_io_vec);
> > -	kfree(bio);
> > -}
> 
> 
> Again here, you decided to move the pkt_end_io_read && pkt_end_io_packet_write
> functions from below, to above here. Which makes it impossible to find
> any bugs by just reviewing the patch.
> 
> So I have not reviewed it. 
> 
> I know that you wanted so you can reference them at pkt_alloc_packet_data.
> I'd use a forward reference, in this case. And a move in a next patch. But
> this is just me. Perhaps the owner of this code can review it?

Yeah, I tend to be pretty anal about unnecessary forward declarations.
Overly anal in this case, I suppose. Here's a better patch:


commit 88cb170314bfa6cc90af37e433c07927b3b79ed2
Author: Kent Overstreet <koverstreet@google.com>
Date:   Wed Jul 25 16:00:00 2012 -0700

    pktcdvd: Switch to bio_kmalloc()
    
    This is prep work for killing bi_destructor - previously, pktcdvd had
    its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
    necessitating its own bi_destructor implementation.
    
    Signed-off-by: Kent Overstreet <koverstreet@google.com>

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..ae55f08 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -101,6 +101,8 @@ static struct dentry	*pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
 static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
 static int pkt_remove_dev(dev_t pkt_dev);
 static int pkt_seq_show(struct seq_file *m, void *p);
+static void pkt_end_io_read(struct bio *bio, int err);
+static void pkt_end_io_packet_write(struct bio *bio, int err);
 
 
 
@@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
 	}
 }
 
-static void pkt_bio_destructor(struct bio *bio)
-{
-	kfree(bio->bi_io_vec);
-	kfree(bio);
-}
-
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
-	struct bio_vec *bvl = NULL;
-	struct bio *bio;
-
-	bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
-	if (!bio)
-		goto no_bio;
-	bio_init(bio);
-
-	bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
-	if (!bvl)
-		goto no_bvl;
-
-	bio->bi_max_vecs = nr_iovecs;
-	bio->bi_io_vec = bvl;
-	bio->bi_destructor = pkt_bio_destructor;
-
-	return bio;
-
- no_bvl:
-	kfree(bio);
- no_bio:
-	return NULL;
-}
-
 /*
  * Allocate a packet_data struct
  */
@@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
 		goto no_pkt;
 
 	pkt->frames = frames;
-	pkt->w_bio = pkt_bio_alloc(frames);
+	pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
 	if (!pkt->w_bio)
 		goto no_bio;
 
+	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
+	pkt->w_bio->bi_private = pkt;
+
 	for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
 		pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
 		if (!pkt->pages[i])
@@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
 	bio_list_init(&pkt->orig_bios);
 
 	for (i = 0; i < frames; i++) {
-		struct bio *bio = pkt_bio_alloc(1);
+		struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
 		if (!bio)
 			goto no_rd_bio;
+
+		bio->bi_end_io = pkt_end_io_read;
+		bio->bi_private = pkt;
 		pkt->r_bios[i] = bio;
 	}
 
@@ -1111,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 	 * Schedule reads for missing parts of the packet.
 	 */
 	for (f = 0; f < pkt->frames; f++) {
-		struct bio_vec *vec;
-
 		int p, offset;
+
 		if (written[f])
 			continue;
+
 		bio = pkt->r_bios[f];
-		vec = bio->bi_io_vec;
-		bio_init(bio);
-		bio->bi_max_vecs = 1;
-		bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
-		bio->bi_bdev = pd->bdev;
-		bio->bi_end_io = pkt_end_io_read;
-		bio->bi_private = pkt;
-		bio->bi_io_vec = vec;
-		bio->bi_destructor = pkt_bio_destructor;
+		bio_reset(bio);
+		bio->bi_sector	= pkt->sector + f * (CD_FRAMESIZE >> 9);
+		bio->bi_bdev	= pd->bdev;
 
 		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
 		offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	}
 
 	/* Start the write request */
-	bio_init(pkt->w_bio);
-	pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
+	bio_reset(pkt->w_bio);
 	pkt->w_bio->bi_sector = pkt->sector;
 	pkt->w_bio->bi_bdev = pd->bdev;
-	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
-	pkt->w_bio->bi_private = pkt;
-	pkt->w_bio->bi_io_vec = bvec;
-	pkt->w_bio->bi_destructor = pkt_bio_destructor;
 	for (f = 0; f < pkt->frames; f++)
 		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
 			BUG();

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 05/12] block: Kill bi_destructor
  2012-07-25 11:39   ` Boaz Harrosh
@ 2012-07-25 23:15     ` Kent Overstreet
  0 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-25 23:15 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On Wed, Jul 25, 2012 at 02:39:57PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > Now that we've got generic code for freeing bios allocated from bio
> > pools, this isn't needed anymore.
> > 
> > This also changes the semantics of bio_free() a bit - it now also frees
> > bios allocated by bio_kmalloc(). It's also no longer exported, as
> > without bi_destructor there should be no need for it to be called
> > anywhere else.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> 
> <snip>
> 
> > diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> > index be65582..9338d0602 100644
> > --- a/drivers/target/target_core_iblock.c
> > +++ b/drivers/target/target_core_iblock.c
> > @@ -467,15 +467,7 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
> >  	}
> >  
> >  	bio->bi_bdev = ib_dev->ibd_bd;
> > -<<<<<<< HEAD
> >  	bio->bi_private = cmd;
> > -	bio->bi_destructor = iblock_bio_destructor;
> > -||||||| merged common ancestors
> > -	bio->bi_private = task;
> > -	bio->bi_destructor = iblock_bio_destructor;
> > -=======
> > -	bio->bi_private = task;
> > ->>>>>>> block: Generalized bio pool freeing
> >  	bio->bi_end_io = &iblock_bio_done;
> >  	bio->bi_sector = lba;
> >  	return bio;
> 
> 
> Merge conflict allmodconfig compilation please?
> 
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 252e253..a301071 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -56,6 +56,9 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
> >   */
> >  struct bio_set *fs_bio_set;
> >  
> > +/* Only used as a sentinal value */
> > +static struct bio_set bio_kmalloc_pool;
> > +
> 
> 
> Id rather if you use a define like:
> #define BIO_KMALLOC_POOL ((void *)~0)
> 
> So any code access actually crashes in the bug case where
> this leaks out. (And there is no actual unused storage allocated)

I kind of prefer having a sentinal value that can't be used for anything
else, but it doesn't really matter if there's only ever going to be one
sentinal value.

> BTW I like this reuse of the bi_pool member as a flag as well.

Yeah me too, this way bi_pool always tells you how to free the bio.

diff --git a/fs/bio.c b/fs/bio.c
index c7f3442..ebc7309 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
  */
 struct bio_set *fs_bio_set;
 
+#define BIO_KMALLOC_POOL ((void *) ~0)
+
 /*
  * Our slab pool management
  */
@@ -232,10 +234,21 @@ fallback:
 	return bvl;
 }
 
-void bio_free(struct bio *bio, struct bio_set *bs)
+void bio_free(struct bio *bio)
 {
+	struct bio_set *bs = bio->bi_pool;
 	void *p;
 
+	BUG_ON(!bs);
+
+	if (bs == BIO_KMALLOC_POOL) {
+		/* Bio was allocated by bio_kmalloc() */
+		if (bio_integrity(bio))
+			bio_integrity_free(bio, fs_bio_set);
+		kfree(bio);
+		return;
+	}
+
 	if (bio_has_allocated_vec(bio))
 		bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));
 
@@ -346,13 +359,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 }
 EXPORT_SYMBOL(bio_alloc);
 
-static void bio_kmalloc_destructor(struct bio *bio)
-{
-	if (bio_integrity(bio))
-		bio_integrity_free(bio, fs_bio_set);
-	kfree(bio);
-}
-
 /**
  * bio_kmalloc - allocate a bio for I/O using kmalloc()
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -379,7 +385,7 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 	bio->bi_flags |= BIO_POOL_NONE << BIO_POOL_OFFSET;
 	bio->bi_max_vecs = nr_iovecs;
 	bio->bi_io_vec = bio->bi_inline_vecs;
-	bio->bi_destructor = bio_kmalloc_destructor;
+	bio->bi_pool = BIO_KMALLOC_POOL;
 
 	return bio;
 }
@@ -417,12 +423,7 @@ void bio_put(struct bio *bio)
 	 */
 	if (atomic_dec_and_test(&bio->bi_cnt)) {
 		bio_disassociate_task(bio);
-		bio->bi_next = NULL;
-
-		if (bio->bi_pool)
-			bio_free(bio, bio->bi_pool);
-		else
-			bio->bi_destructor(bio);
+		bio_free(bio);
 	}
 }
 EXPORT_SYMBOL(bio_put);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ba60319..393c87e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -216,7 +216,7 @@ extern struct bio *bio_alloc(gfp_t, unsigned int);
 extern struct bio *bio_kmalloc(gfp_t, unsigned int);
 extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
 extern void bio_put(struct bio *);
-extern void bio_free(struct bio *, struct bio_set *);
+extern void bio_free(struct bio *);
 
 extern void bio_endio(struct bio *, int);
 struct request_queue;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 769799f..4bd8685 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -84,11 +84,8 @@ struct bio {
 	struct bio_integrity_payload *bi_integrity;  /* data integrity */
 #endif
 
-	/* If bi_pool is non NULL, bi_destructor is not called */
 	struct bio_set		*bi_pool;
 
-	bio_destructor_t	*bi_destructor;	/* destructor */
-
 	/*
 	 * We can inline a number of vecs at the end of the bio, to avoid
 	 * double allocations for a small number of bio_vecs. This member

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 08/12] block: Introduce new bio_split()
  2012-07-25 11:55   ` Boaz Harrosh
@ 2012-07-25 23:26     ` Kent Overstreet
  2012-07-27  0:50       ` [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split()) Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-07-25 23:26 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On Wed, Jul 25, 2012 at 02:55:40PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > The new bio_split() can split arbitrary bios - it's not restricted to
> > single page bios, like the old bio_split() (previously renamed to
> > bio_pair_split()). It also has different semantics - it doesn't allocate
> > a struct bio_pair, leaving it up to the caller to handle completions.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > ---
> >  fs/bio.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 99 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 5d02aa5..a15e121 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
> >  EXPORT_SYMBOL(bio_pair_split);
> >  
> >  /**
> > + * bio_split - split a bio
> > + * @bio:	bio to split
> > + * @sectors:	number of sectors to split from the front of @bio
> > + * @gfp:	gfp mask
> > + * @bs:		bio set to allocate from
> > + *
> > + * Allocates and returns a new bio which represents @sectors from the start of
> > + * @bio, and updates @bio to represent the remaining sectors.
> > + *
> > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > + * unchanged.
> > + *
> > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > + * freed before the split.
> > + *
> > + * If bio_split() is running under generic_make_request(), it's not safe to
> > + * allocate more than one bio from the same bio set. Therefore, if it is running
> > + * under generic_make_request() it masks out __GFP_WAIT when doing the
> > + * allocation. The caller must check for failure if there's any possibility of
> > + * it being called from under generic_make_request(); it is then the caller's
> > + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> > + */
> > +struct bio *bio_split(struct bio *bio, int sectors,
> > +		      gfp_t gfp, struct bio_set *bs)
> > +{
> > +	unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > +	struct bio_vec *bv;
> > +	struct bio *ret = NULL;
> > +
> > +	BUG_ON(sectors <= 0);
> > +
> > +	/*
> > +	 * If we're being called from underneath generic_make_request() and we
> > +	 * already allocated any bios from this bio set, we risk deadlock if we
> > +	 * use the mempool. So instead, we possibly fail and let the caller punt
> > +	 * to workqueue or somesuch and retry in a safe context.
> > +	 */
> > +	if (current->bio_list)
> > +		gfp &= ~__GFP_WAIT;
> 
> 
> NACK!
> 
> If as you said above in the comment:
> 	if there's any possibility of it being called from under generic_make_request();
>         it is then the caller's responsibility to ...
> 
> So all the comment needs to say is: 
> 	... caller's responsibility to not set __GFP_WAIT at gfp.
> 
> And drop this here. It is up to the caller to decide. If the caller wants he can do
> "if (current->bio_list)" by his own.
> 
> This is a general purpose utility you might not know it's context.
> for example with osdblk above will break.

Well I'm highly highly skeptical that using __GFP_WAIT under
generic_make_request() is ever a sane thing to do - it could certainly
be safe in specific circumstances, but it's just such a fragile thing to
rely on, you have to _never_ use the same bio pool more than once. And
even then I bet there's other subtle ways it could break.

But you're not the first to complain about it, and your point about
existing code is compelling.

commit ea124f899af29887e24d07497442066572012e5b
Author: Kent Overstreet <koverstreet@google.com>
Date:   Wed Jul 25 16:25:10 2012 -0700

    block: Introduce new bio_split()
    
    The new bio_split() can split arbitrary bios - it's not restricted to
    single page bios, like the old bio_split() (previously renamed to
    bio_pair_split()). It also has different semantics - it doesn't allocate
    a struct bio_pair, leaving it up to the caller to handle completions.

diff --git a/fs/bio.c b/fs/bio.c
index 0470376..312e5de 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
 EXPORT_SYMBOL(bio_pair_split);
 
 /**
+ * bio_split - split a bio
+ * @bio:	bio to split
+ * @sectors:	number of sectors to split from the front of @bio
+ * @gfp:	gfp mask
+ * @bs:		bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * BIG FAT WARNING:
+ *
+ * If you're calling this from under generic_make_request() (i.e.
+ * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
+ * workqueue if the allocation fails. Otherwise, your code will probably
+ * deadlock.
+ *
+ * You can't allocate more than once from the same bio pool without submitting
+ * the previous allocations (so they'll eventually complete and deallocate
+ * themselves), but if you're under generic_make_request() those previous
+ * allocations won't submit until you return . And if you have to split bios,
+ * you should expect that some bios will require multiple splits.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+		      gfp_t gfp, struct bio_set *bs)
+{
+	unsigned idx, vcnt = 0, nbytes = sectors << 9;
+	struct bio_vec *bv;
+	struct bio *ret = NULL;
+
+	BUG_ON(sectors <= 0);
+
+	if (sectors >= bio_sectors(bio))
+		return bio;
+
+	trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+			  bio->bi_sector + sectors);
+
+	bio_for_each_segment(bv, bio, idx) {
+		vcnt = idx - bio->bi_idx;
+
+		if (!nbytes) {
+			ret = bio_alloc_bioset(gfp, 0, bs);
+			if (!ret)
+				return NULL;
+
+			ret->bi_io_vec = bio_iovec(bio);
+			ret->bi_flags |= 1 << BIO_CLONED;
+			break;
+		} else if (nbytes < bv->bv_len) {
+			ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+			if (!ret)
+				return NULL;
+
+			memcpy(ret->bi_io_vec, bio_iovec(bio),
+			       sizeof(struct bio_vec) * vcnt);
+
+			ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+			bv->bv_offset	+= nbytes;
+			bv->bv_len	-= nbytes;
+			break;
+		}
+
+		nbytes -= bv->bv_len;
+	}
+
+	ret->bi_bdev	= bio->bi_bdev;
+	ret->bi_sector	= bio->bi_sector;
+	ret->bi_size	= sectors << 9;
+	ret->bi_rw	= bio->bi_rw;
+	ret->bi_vcnt	= vcnt;
+	ret->bi_max_vecs = vcnt;
+	ret->bi_end_io	= bio->bi_end_io;
+	ret->bi_private	= bio->bi_private;
+
+	bio->bi_sector	+= sectors;
+	bio->bi_size	-= sectors << 9;
+	bio->bi_idx	 = idx;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_clone(ret, bio, gfp, bs);
+		bio_integrity_trim(ret, 0, bio_sectors(ret));
+		bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
  *      bio_sector_offset - Find hardware sector offset in bio
  *      @bio:           bio to inspect
  *      @index:         bio_vec index

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 01/12] block: Generalized bio pool freeing
  2012-07-25 11:06   ` Boaz Harrosh
@ 2012-07-25 23:38     ` Kent Overstreet
  0 siblings, 0 replies; 29+ messages in thread
From: Kent Overstreet @ 2012-07-25 23:38 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, vgoyal, mpatocka, sage, yehuda

On Wed, Jul 25, 2012 at 02:06:23PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> 
> > With the old code, when you allocate a bio from a bio pool you have to
> > implement your own destructor that knows how to find the bio pool the
> > bio was originally allocated from.
> > 
> > This adds a new field to struct bio (bi_pool) and changes
> > bio_alloc_bioset() to use it. This makes various bio destructors
> > unnecessary, so they're then deleted.
> > 
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> 
> <snip>
> 
> > diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> > index fd47950..be65582 100644
> > --- a/drivers/target/target_core_iblock.c
> > +++ b/drivers/target/target_core_iblock.c
> > @@ -447,14 +447,6 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
> >  	kfree(ibr);
> >  }
> >  
> > -static void iblock_bio_destructor(struct bio *bio)
> > -{
> > -	struct se_cmd *cmd = bio->bi_private;
> > -	struct iblock_dev *ib_dev = cmd->se_dev->dev_ptr;
> > -
> > -	bio_free(bio, ib_dev->ibd_bio_set);
> > -}
> > -
> >  static struct bio *
> >  iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
> >  {
> > @@ -475,8 +467,15 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
> >  	}
> >  
> >  	bio->bi_bdev = ib_dev->ibd_bd;
> > +<<<<<<< HEAD
> >  	bio->bi_private = cmd;
> >  	bio->bi_destructor = iblock_bio_destructor;
> > +||||||| merged common ancestors
> > +	bio->bi_private = task;
> > +	bio->bi_destructor = iblock_bio_destructor;
> > +=======
> > +	bio->bi_private = task;
> > +>>>>>>> block: Generalized bio pool freeing
> >  	bio->bi_end_io = &iblock_bio_done;
> >  	bio->bi_sector = lba;
> >  	return bio;
> 
> 
> You left out a rebase merge conflict. Did you allmodconfig compile
> these patches?

Argh, clearly not. And I even fixed that rebase merge conflict at one
point, that's distressing.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split())
  2012-07-25 23:26     ` Kent Overstreet
@ 2012-07-27  0:50       ` Mikulas Patocka
  2012-08-15 23:07         ` Kent Overstreet
  0 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2012-07-27  0:50 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Boaz Harrosh, linux-bcache, linux-kernel, dm-devel, tj, axboe,
	agk, neilb, drbd-dev, vgoyal, sage, yehuda



On Wed, 25 Jul 2012, Kent Overstreet wrote:

> On Wed, Jul 25, 2012 at 02:55:40PM +0300, Boaz Harrosh wrote:
> > On 07/24/2012 11:11 PM, Kent Overstreet wrote:
> > 
> > > The new bio_split() can split arbitrary bios - it's not restricted to
> > > single page bios, like the old bio_split() (previously renamed to
> > > bio_pair_split()). It also has different semantics - it doesn't allocate
> > > a struct bio_pair, leaving it up to the caller to handle completions.
> > > 
> > > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > > ---
> > >  fs/bio.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 files changed, 99 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/fs/bio.c b/fs/bio.c
> > > index 5d02aa5..a15e121 100644
> > > --- a/fs/bio.c
> > > +++ b/fs/bio.c
> > > @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
> > >  EXPORT_SYMBOL(bio_pair_split);
> > >  
> > >  /**
> > > + * bio_split - split a bio
> > > + * @bio:	bio to split
> > > + * @sectors:	number of sectors to split from the front of @bio
> > > + * @gfp:	gfp mask
> > > + * @bs:		bio set to allocate from
> > > + *
> > > + * Allocates and returns a new bio which represents @sectors from the start of
> > > + * @bio, and updates @bio to represent the remaining sectors.
> > > + *
> > > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > > + * unchanged.
> > > + *
> > > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > > + * freed before the split.
> > > + *
> > > + * If bio_split() is running under generic_make_request(), it's not safe to
> > > + * allocate more than one bio from the same bio set. Therefore, if it is running
> > > + * under generic_make_request() it masks out __GFP_WAIT when doing the
> > > + * allocation. The caller must check for failure if there's any possibility of
> > > + * it being called from under generic_make_request(); it is then the caller's
> > > + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> > > + */
> > > +struct bio *bio_split(struct bio *bio, int sectors,
> > > +		      gfp_t gfp, struct bio_set *bs)
> > > +{
> > > +	unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > > +	struct bio_vec *bv;
> > > +	struct bio *ret = NULL;
> > > +
> > > +	BUG_ON(sectors <= 0);
> > > +
> > > +	/*
> > > +	 * If we're being called from underneath generic_make_request() and we
> > > +	 * already allocated any bios from this bio set, we risk deadlock if we
> > > +	 * use the mempool. So instead, we possibly fail and let the caller punt
> > > +	 * to workqueue or somesuch and retry in a safe context.
> > > +	 */
> > > +	if (current->bio_list)
> > > +		gfp &= ~__GFP_WAIT;
> > 
> > 
> > NACK!
> > 
> > If as you said above in the comment:
> > 	if there's any possibility of it being called from under generic_make_request();
> >         it is then the caller's responsibility to ...
> > 
> > So all the comment needs to say is: 
> > 	... caller's responsibility to not set __GFP_WAIT at gfp.
> > 
> > And drop this here. It is up to the caller to decide. If the caller wants he can do
> > "if (current->bio_list)" by his own.
> > 
> > This is a general purpose utility you might not know it's context.
> > for example with osdblk above will break.
> 
> Well I'm highly highly skeptical that using __GFP_WAIT under
> generic_make_request() is ever a sane thing to do - it could certainly
> be safe in specific circumstances, but it's just such a fragile thing to
> rely on, you have to _never_ use the same bio pool more than once. And
> even then I bet there's other subtle ways it could break.
> 
> But you're not the first to complain about it, and your point about
> existing code is compelling.

Both md and dm use __GFP_WAIT allocations from mempools in 
generic_make_request.

I think you found an interesting bug here. Suppose that we have three 
stacked devices: d1 depends on d2 and d2 depends on d3.

Now, a bio b1 comes to d1. d1 splits it to two bios: b2.1 and b2.2 and 
sends them to the device d2 - these bios end up in current->bio_list. The 
driver for d2 receives bio b2.1 and sends bio b3.1 to d3. Now, 
current->bio_list contains bios b2.2, b3.1. Now, bio b2.2 is popped off 
the bio list and the driver for d2 is called with b2.2 - suppose that for 
some reason mempool in d2 is exhausted and the driver needs to wait until 
b2.1 finishes. b2.1 never finishes, because b2.1 depends on b3.1 and b3.1 
is still in current->bio_list. So it deadlocks.

Turning off __GFP_WAIT fixes nothing - it just turns one bug (a possible 
deadlock) into another bug (a possible bio failure with -ENOMEM).

Increasing mempool sizes doesn't fix it either, the bio would just have to 
be split to more pieces in the above example to make it deadlock.

I think the above possible deadlock scenario could be fixed by reversing 
current->bio_list processing - i.e. when some device's make_request_fn 
adds some bios to current->bio_list, these bios are processed before other 
bios that were on the list before. This way, bio list would contain "b3.1, 
b2.2" instead of "b2.2, b3.1" in the above example and the deadlock would 
not happen.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 block/blk-core.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Index: linux-3.5-fast/block/blk-core.c
===================================================================
--- linux-3.5-fast.orig/block/blk-core.c	2012-07-27 02:20:25.000000000 +0200
+++ linux-3.5-fast/block/blk-core.c	2012-07-27 02:34:37.000000000 +0200
@@ -1740,6 +1740,7 @@ end_io:
 void generic_make_request(struct bio *bio)
 {
 	struct bio_list bio_list_on_stack;
+	struct bio_list current_bio_list;
 
 	if (!generic_make_request_checks(bio))
 		return;
@@ -1789,14 +1790,22 @@ void generic_make_request(struct bio *bi
 	 * bio_list, and call into ->make_request() again.
 	 */
 	BUG_ON(bio->bi_next);
-	bio_list_init(&bio_list_on_stack);
+	bio_list_init(&current_bio_list);
 	current->bio_list = &bio_list_on_stack;
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
+		bio_list_init(&bio_list_on_stack);
+
 		q->make_request_fn(q, bio);
 
-		bio = bio_list_pop(current->bio_list);
+		/*
+		 * To avoid a possible deadlock, bios that were added by
+		 * the most recent make_request_fn must be processed first.
+		 */
+		bio_list_merge_head(&current_bio_list, &bio_list_on_stack);
+
+		bio = bio_list_pop(&current_bio_list);
 	} while (bio);
 	current->bio_list = NULL; /* deactivate */
 }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 12/12] block: Only clone bio vecs that are in use
  2012-07-24 20:11 ` [PATCH v4 12/12] block: Only clone bio vecs that are in use Kent Overstreet
@ 2012-08-07  3:17   ` Muthu Kumar
  0 siblings, 0 replies; 29+ messages in thread
From: Muthu Kumar @ 2012-08-07  3:17 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, linux-kernel, dm-devel, tj, axboe, agk, neilb,
	drbd-dev, bharrosh, vgoyal, mpatocka, sage, yehuda

Hi,

On Tue, Jul 24, 2012 at 1:11 PM, Kent Overstreet <koverstreet@google.com> wrote:
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.
>
> Signed-off-by: Kent Overstreet <koverstreet@google.com>

<snip>

>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3f3c26e..193fb19 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1057,11 +1057,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
>  {
>         struct bio *clone;
>
> -       clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
> +       clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);

Number of io_vecs allocated in clone is different. Please see my comment below.

>         __bio_clone(clone, bio);
>         clone->bi_sector = sector;
> -       clone->bi_idx = idx;
> -       clone->bi_vcnt = idx + bv_count;
> +       clone->bi_vcnt = bv_count;
>         clone->bi_size = to_bytes(len);
>         clone->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 7a0801d..ec6a357 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -451,8 +451,9 @@ EXPORT_SYMBOL(bio_phys_segments);
>   */
>  void __bio_clone(struct bio *bio, struct bio *bio_src)
>  {
> -       memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
> -               bio_src->bi_max_vecs * sizeof(struct bio_vec));
> +       memcpy(bio->bi_io_vec,
> +              bio_iovec(bio_src),
> +              bio_segments(bio_src) * sizeof(struct bio_vec));
>


You are changing the meaning of __bio_clone() here. In old code, the
number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
bi_iovec[0] and also restricting the number of allocated io_vecs of
the clone. It may be useful for cases were we would like a identical
copy of the original bio (may not be in current code base, but this
implementation is definitely not what one would expect from the name
"clone").

May be, call this new implementation some thing else (and use it for bcache)?

Regards,
Muthu



>         /*
>          * most users will be overriding ->bi_bdev with a new target,
> @@ -461,10 +462,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
>         bio->bi_sector = bio_src->bi_sector;
>         bio->bi_bdev = bio_src->bi_bdev;
>         bio->bi_flags |= 1 << BIO_CLONED;
> +       bio->bi_flags &= ~(1 << BIO_SEG_VALID);
>         bio->bi_rw = bio_src->bi_rw;
> -       bio->bi_vcnt = bio_src->bi_vcnt;
> +       bio->bi_vcnt = bio_segments(bio_src);
>         bio->bi_size = bio_src->bi_size;
> -       bio->bi_idx = bio_src->bi_idx;
>  }
>  EXPORT_SYMBOL(__bio_clone);
>
> @@ -479,7 +480,7 @@ EXPORT_SYMBOL(__bio_clone);
>  struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
>                              struct bio_set *bs)
>  {
> -       struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
> +       struct bio *b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);
>
>         if (!b)
>                 return NULL;
> @@ -509,7 +510,7 @@ EXPORT_SYMBOL(bio_clone);
>
>  struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
>  {
> -       struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
> +       struct bio *b = bio_kmalloc(gfp_mask, bio_segments(bio));
>
>         if (!b)
>                 return NULL;
> --
> 1.7.7.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split())
  2012-07-27  0:50       ` [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split()) Mikulas Patocka
@ 2012-08-15 23:07         ` Kent Overstreet
  2012-08-29 16:08           ` Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Kent Overstreet @ 2012-08-15 23:07 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Boaz Harrosh, linux-bcache, linux-kernel, dm-devel, tj, axboe,
	agk, neilb, drbd-dev, vgoyal, sage, yehuda

> Both md and dm use __GFP_WAIT allocations from mempools in 
> generic_make_request.
> 
> I think you found an interesting bug here. Suppose that we have three 
> stacked devices: d1 depends on d2 and d2 depends on d3.
> 
> Now, a bio b1 comes to d1. d1 splits it to two bios: b2.1 and b2.2 and 
> sends them to the device d2 - these bios end up in current->bio_list. The 
> driver for d2 receives bio b2.1 and sends bio b3.1 to d3. Now, 
> current->bio_list contains bios b2.2, b3.1. Now, bio b2.2 is popped off 
> the bio list and the driver for d2 is called with b2.2 - suppose that for 
> some reason mempool in d2 is exhausted and the driver needs to wait until 
> b2.1 finishes. b2.1 never finishes, because b2.1 depends on b3.1 and b3.1 
> is still in current->bio_list. So it deadlocks.
> 
> Turning off __GFP_WAIT fixes nothing - it just turns one bug (a possible 
> deadlock) into another bug (a possible bio failure with -ENOMEM).
> 
> Increasing mempool sizes doesn't fix it either, the bio would just have to 
> be split to more pieces in the above example to make it deadlock.
> 
> I think the above possible deadlock scenario could be fixed by reversing 
> current->bio_list processing - i.e. when some device's make_request_fn 
> adds some bios to current->bio_list, these bios are processed before other 
> bios that were on the list before. This way, bio list would contain "b3.1, 
> b2.2" instead of "b2.2, b3.1" in the above example and the deadlock would 
> not happen.

Your patch isn't sufficient in the case where a bio may be split
multiple times (I'm not sure if it's sufficient in the case where bios
are only split once; trying to work it all out makes my head hurt).

You don't need multiple stacked drivers to see this; just the case where
a single driver is running that splits multiple times is sufficient, if
you have enough threads submitting at the same time.

Bcache works around this with the trick I mentioned previously, where it
masks out _GFP_WAIT if current->bio_list != NULL, and punts to workqueue
if the allocation fails.

This works but it'd have to be done in each stacking driver... it's not
a generic solution, and it's a pain in the ass.

I came up with another idea the other day. Conceptually, it inverts my
previous workaround - the punting to workqueue is done in the allocation
code when necessary, for the bios that would be blocked.

It's lightly tested, gonna rig up some kind of fault injection and test
it more thoroughly later.

commit d61bbb074cc8f2e089eb57e2bee8e84500f390a8
Author: Kent Overstreet <koverstreet@google.com>
Date:   Mon Aug 13 18:11:01 2012 -0700

    block: Avoid deadlocks with bio allocation by stacking drivers
    
    Previously, if we ever try to allocate more than once from the same bio
    set while running under generic_make_request(), we risk deadlock.
    
    This would happen if e.g. a bio ever needed to be split more than once,
    and it's difficult to handle correctly in the drivers - so in practice
    it's not.
    
    This patch fixes this issue by allocating a rescuer workqueue for each
    bio_set, and punting queued bios to said rescuer when necessary:

diff --git a/fs/bio.c b/fs/bio.c
index bc4265a..7b4f655 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -281,6 +281,23 @@ void bio_reset(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_reset);
 
+static void bio_alloc_rescue(struct work_struct *work)
+{
+	struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
+	struct bio *bio;
+
+	while (1) {
+		spin_lock(&bs->rescue_lock);
+		bio = bio_list_pop(&bs->rescue_list);
+		spin_unlock(&bs->rescue_lock);
+
+		if (!bio)
+			break;
+
+		generic_make_request(bio);
+	}
+}
+
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -294,6 +311,7 @@ EXPORT_SYMBOL(bio_reset);
  **/
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
+	gfp_t saved_gfp = gfp_mask;
 	unsigned front_pad;
 	unsigned inline_vecs;
 	unsigned long idx = BIO_POOL_NONE;
@@ -308,16 +326,39 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 		p = kmalloc(sizeof(struct bio) +
 			    nr_iovecs * sizeof(struct bio_vec),
 			    gfp_mask);
+
 		front_pad = 0;
 		inline_vecs = nr_iovecs;
 	} else {
-		p = mempool_alloc(bs->bio_pool, gfp_mask);
+		/*
+		 * If we're running under generic_make_request()
+		 * (current->bio_list != NULL), we risk deadlock if we sleep on
+		 * allocation and there's already bios on current->bio_list that
+		 * were allocated from the same bio_set; they won't be submitted
+		 * (and thus freed) as long as we're blocked here.
+		 *
+		 * To deal with this, we first try the allocation without using
+		 * the mempool; if that fails, we punt all the bios on
+		 * current->bio_list to a different thread and then retry the
+		 * allocation with the original gfp mask.
+		 */
+
+		if (current->bio_list &&
+		    !bio_list_empty(current->bio_list) &&
+		    (gfp_mask & __GFP_WAIT))
+			gfp_mask &= GFP_ATOMIC;
+retry:
+		if (gfp_mask & __GFP_WAIT)
+			p = mempool_alloc(bs->bio_pool, gfp_mask);
+		else
+			p = kmem_cache_alloc(bs->bio_slab, gfp_mask);
+
 		front_pad = bs->front_pad;
 		inline_vecs = BIO_INLINE_VECS;
 	}
 
 	if (unlikely(!p))
-		return NULL;
+		goto err;
 
 	bio = p + front_pad;
 	bio_init(bio);
@@ -338,6 +379,19 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 
 err_free:
 	mempool_free(p, bs->bio_pool);
+err:
+	if (gfp_mask != saved_gfp) {
+		gfp_mask = saved_gfp;
+
+		spin_lock(&bs->rescue_lock);
+		bio_list_merge(&bs->rescue_list, current->bio_list);
+		bio_list_init(current->bio_list);
+		spin_unlock(&bs->rescue_lock);
+
+		queue_work(bs->rescue_workqueue, &bs->rescue_work);
+		goto retry;
+	}
+
 	return NULL;
 }
 EXPORT_SYMBOL(bio_alloc_bioset);
@@ -1546,6 +1600,9 @@ static void biovec_free_pools(struct bio_set *bs)
 
 void bioset_free(struct bio_set *bs)
 {
+	if (bs->rescue_workqueue)
+		destroy_workqueue(bs->rescue_workqueue);
+
 	if (bs->bio_pool)
 		mempool_destroy(bs->bio_pool);
 
@@ -1581,6 +1638,10 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
 
 	bs->front_pad = front_pad;
 
+	spin_lock_init(&bs->rescue_lock);
+	bio_list_init(&bs->rescue_list);
+	INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
+
 	bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
 	if (!bs->bio_slab) {
 		kfree(bs);
@@ -1591,9 +1652,14 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
 	if (!bs->bio_pool)
 		goto bad;
 
-	if (!biovec_create_pools(bs, pool_size))
-		return bs;
+	if (biovec_create_pools(bs, pool_size))
+		goto bad;
+
+	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
+	if (!bs->rescue_workqueue)
+		goto bad;
 
+	return bs;
 bad:
 	bioset_free(bs);
 	return NULL;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b22c22b..ba5b52e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -290,39 +290,6 @@ static inline int bio_associate_current(struct bio *bio) { return -ENOENT; }
 static inline void bio_disassociate_task(struct bio *bio) { }
 #endif	/* CONFIG_BLK_CGROUP */
 
-/*
- * bio_set is used to allow other portions of the IO system to
- * allocate their own private memory pools for bio and iovec structures.
- * These memory pools in turn all allocate from the bio_slab
- * and the bvec_slabs[].
- */
-#define BIO_POOL_SIZE 2
-#define BIOVEC_NR_POOLS 6
-#define BIOVEC_MAX_IDX	(BIOVEC_NR_POOLS - 1)
-
-struct bio_set {
-	struct kmem_cache *bio_slab;
-	unsigned int front_pad;
-
-	mempool_t *bio_pool;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
-	mempool_t *bio_integrity_pool;
-#endif
-	mempool_t *bvec_pool;
-};
-
-struct biovec_slab {
-	int nr_vecs;
-	char *name;
-	struct kmem_cache *slab;
-};
-
-/*
- * a small number of entries is fine, not going to be performance critical.
- * basically we just need to survive
- */
-#define BIO_SPLIT_ENTRIES 2
-
 #ifdef CONFIG_HIGHMEM
 /*
  * remember never ever reenable interrupts between a bvec_kmap_irq and
@@ -497,6 +464,48 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
 	return bio;
 }
 
+/*
+ * bio_set is used to allow other portions of the IO system to
+ * allocate their own private memory pools for bio and iovec structures.
+ * These memory pools in turn all allocate from the bio_slab
+ * and the bvec_slabs[].
+ */
+#define BIO_POOL_SIZE 2
+#define BIOVEC_NR_POOLS 6
+#define BIOVEC_MAX_IDX	(BIOVEC_NR_POOLS - 1)
+
+struct bio_set {
+	struct kmem_cache *bio_slab;
+	unsigned int front_pad;
+
+	mempool_t *bio_pool;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+	mempool_t *bio_integrity_pool;
+#endif
+	mempool_t *bvec_pool;
+
+	/*
+	 * Deadlock avoidance for stacking block drivers: see comments in
+	 * bio_alloc_bioset() for details
+	 */
+	spinlock_t		rescue_lock;
+	struct bio_list		rescue_list;
+	struct work_struct	rescue_work;
+	struct workqueue_struct	*rescue_workqueue;
+};
+
+struct biovec_slab {
+	int nr_vecs;
+	char *name;
+	struct kmem_cache *slab;
+};
+
+/*
+ * a small number of entries is fine, not going to be performance critical.
+ * basically we just need to survive
+ */
+#define BIO_SPLIT_ENTRIES 2
+
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
 
 #define bip_vec_idx(bip, idx)	(&(bip->bip_vec[(idx)]))

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split())
  2012-08-15 23:07         ` Kent Overstreet
@ 2012-08-29 16:08           ` Mikulas Patocka
  0 siblings, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2012-08-29 16:08 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Boaz Harrosh, linux-bcache, linux-kernel, dm-devel, tj, axboe,
	agk, neilb, drbd-dev, vgoyal, sage, yehuda



On Wed, 15 Aug 2012, Kent Overstreet wrote:

> > Both md and dm use __GFP_WAIT allocations from mempools in 
> > generic_make_request.
> > 
> > I think you found an interesting bug here. Suppose that we have three 
> > stacked devices: d1 depends on d2 and d2 depends on d3.
> > 
> > Now, a bio b1 comes to d1. d1 splits it to two bios: b2.1 and b2.2 and 
> > sends them to the device d2 - these bios end up in current->bio_list. The 
> > driver for d2 receives bio b2.1 and sends bio b3.1 to d3. Now, 
> > current->bio_list contains bios b2.2, b3.1. Now, bio b2.2 is popped off 
> > the bio list and the driver for d2 is called with b2.2 - suppose that for 
> > some reason mempool in d2 is exhausted and the driver needs to wait until 
> > b2.1 finishes. b2.1 never finishes, because b2.1 depends on b3.1 and b3.1 
> > is still in current->bio_list. So it deadlocks.
> > 
> > Turning off __GFP_WAIT fixes nothing - it just turns one bug (a possible 
> > deadlock) into another bug (a possible bio failure with -ENOMEM).
> > 
> > Increasing mempool sizes doesn't fix it either, the bio would just have to 
> > be split to more pieces in the above example to make it deadlock.
> > 
> > I think the above possible deadlock scenario could be fixed by reversing 
> > current->bio_list processing - i.e. when some device's make_request_fn 
> > adds some bios to current->bio_list, these bios are processed before other 
> > bios that were on the list before. This way, bio list would contain "b3.1, 
> > b2.2" instead of "b2.2, b3.1" in the above example and the deadlock would 
> > not happen.
> 
> Your patch isn't sufficient in the case where a bio may be split
> multiple times (I'm not sure if it's sufficient in the case where bios
> are only split once; trying to work it all out makes my head hurt).
> 
> You don't need multiple stacked drivers to see this; just the case where
> a single driver is running that splits multiple times is sufficient, if
> you have enough threads submitting at the same time.

That is true. dm splits one bio to multiple bios, so it could still 
deadlock.

Mikulas

> Bcache works around this with the trick I mentioned previously, where it
> masks out _GFP_WAIT if current->bio_list != NULL, and punts to workqueue
> if the allocation fails.
> 
> This works but it'd have to be done in each stacking driver... it's not
> a generic solution, and it's a pain in the ass.
> 
> I came up with another idea the other day. Conceptually, it inverts my
> previous workaround - the punting to workqueue is done in the allocation
> code when necessary, for the bios that would be blocked.
> 
> It's lightly tested, gonna rig up some kind of fault injection and test
> it more thoroughly later.
> 
> commit d61bbb074cc8f2e089eb57e2bee8e84500f390a8
> Author: Kent Overstreet <koverstreet@google.com>
> Date:   Mon Aug 13 18:11:01 2012 -0700
> 
>     block: Avoid deadlocks with bio allocation by stacking drivers
>     
>     Previously, if we ever try to allocate more than once from the same bio
>     set while running under generic_make_request(), we risk deadlock.
>     
>     This would happen if e.g. a bio ever needed to be split more than once,
>     and it's difficult to handle correctly in the drivers - so in practice
>     it's not.
>     
>     This patch fixes this issue by allocating a rescuer workqueue for each
>     bio_set, and punting queued bios to said rescuer when necessary:
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index bc4265a..7b4f655 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -281,6 +281,23 @@ void bio_reset(struct bio *bio)
>  }
>  EXPORT_SYMBOL(bio_reset);
>  
> +static void bio_alloc_rescue(struct work_struct *work)
> +{
> +	struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> +	struct bio *bio;
> +
> +	while (1) {
> +		spin_lock(&bs->rescue_lock);
> +		bio = bio_list_pop(&bs->rescue_list);
> +		spin_unlock(&bs->rescue_lock);
> +
> +		if (!bio)
> +			break;
> +
> +		generic_make_request(bio);
> +	}
> +}
> +
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -294,6 +311,7 @@ EXPORT_SYMBOL(bio_reset);
>   **/
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  {
> +	gfp_t saved_gfp = gfp_mask;
>  	unsigned front_pad;
>  	unsigned inline_vecs;
>  	unsigned long idx = BIO_POOL_NONE;
> @@ -308,16 +326,39 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  		p = kmalloc(sizeof(struct bio) +
>  			    nr_iovecs * sizeof(struct bio_vec),
>  			    gfp_mask);
> +
>  		front_pad = 0;
>  		inline_vecs = nr_iovecs;
>  	} else {
> -		p = mempool_alloc(bs->bio_pool, gfp_mask);
> +		/*
> +		 * If we're running under generic_make_request()
> +		 * (current->bio_list != NULL), we risk deadlock if we sleep on
> +		 * allocation and there's already bios on current->bio_list that
> +		 * were allocated from the same bio_set; they won't be submitted
> +		 * (and thus freed) as long as we're blocked here.
> +		 *
> +		 * To deal with this, we first try the allocation without using
> +		 * the mempool; if that fails, we punt all the bios on
> +		 * current->bio_list to a different thread and then retry the
> +		 * allocation with the original gfp mask.
> +		 */
> +
> +		if (current->bio_list &&
> +		    !bio_list_empty(current->bio_list) &&
> +		    (gfp_mask & __GFP_WAIT))
> +			gfp_mask &= GFP_ATOMIC;
> +retry:
> +		if (gfp_mask & __GFP_WAIT)
> +			p = mempool_alloc(bs->bio_pool, gfp_mask);
> +		else
> +			p = kmem_cache_alloc(bs->bio_slab, gfp_mask);
> +
>  		front_pad = bs->front_pad;
>  		inline_vecs = BIO_INLINE_VECS;
>  	}
>  
>  	if (unlikely(!p))
> -		return NULL;
> +		goto err;
>  
>  	bio = p + front_pad;
>  	bio_init(bio);
> @@ -338,6 +379,19 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  
>  err_free:
>  	mempool_free(p, bs->bio_pool);
> +err:
> +	if (gfp_mask != saved_gfp) {
> +		gfp_mask = saved_gfp;
> +
> +		spin_lock(&bs->rescue_lock);
> +		bio_list_merge(&bs->rescue_list, current->bio_list);
> +		bio_list_init(current->bio_list);
> +		spin_unlock(&bs->rescue_lock);
> +
> +		queue_work(bs->rescue_workqueue, &bs->rescue_work);
> +		goto retry;
> +	}
> +
>  	return NULL;
>  }
>  EXPORT_SYMBOL(bio_alloc_bioset);
> @@ -1546,6 +1600,9 @@ static void biovec_free_pools(struct bio_set *bs)
>  
>  void bioset_free(struct bio_set *bs)
>  {
> +	if (bs->rescue_workqueue)
> +		destroy_workqueue(bs->rescue_workqueue);
> +
>  	if (bs->bio_pool)
>  		mempool_destroy(bs->bio_pool);
>  
> @@ -1581,6 +1638,10 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
>  
>  	bs->front_pad = front_pad;
>  
> +	spin_lock_init(&bs->rescue_lock);
> +	bio_list_init(&bs->rescue_list);
> +	INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
> +
>  	bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
>  	if (!bs->bio_slab) {
>  		kfree(bs);
> @@ -1591,9 +1652,14 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
>  	if (!bs->bio_pool)
>  		goto bad;
>  
> -	if (!biovec_create_pools(bs, pool_size))
> -		return bs;
> +	if (biovec_create_pools(bs, pool_size))
> +		goto bad;
> +
> +	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
> +	if (!bs->rescue_workqueue)
> +		goto bad;
>  
> +	return bs;
>  bad:
>  	bioset_free(bs);
>  	return NULL;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index b22c22b..ba5b52e 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -290,39 +290,6 @@ static inline int bio_associate_current(struct bio *bio) { return -ENOENT; }
>  static inline void bio_disassociate_task(struct bio *bio) { }
>  #endif	/* CONFIG_BLK_CGROUP */
>  
> -/*
> - * bio_set is used to allow other portions of the IO system to
> - * allocate their own private memory pools for bio and iovec structures.
> - * These memory pools in turn all allocate from the bio_slab
> - * and the bvec_slabs[].
> - */
> -#define BIO_POOL_SIZE 2
> -#define BIOVEC_NR_POOLS 6
> -#define BIOVEC_MAX_IDX	(BIOVEC_NR_POOLS - 1)
> -
> -struct bio_set {
> -	struct kmem_cache *bio_slab;
> -	unsigned int front_pad;
> -
> -	mempool_t *bio_pool;
> -#if defined(CONFIG_BLK_DEV_INTEGRITY)
> -	mempool_t *bio_integrity_pool;
> -#endif
> -	mempool_t *bvec_pool;
> -};
> -
> -struct biovec_slab {
> -	int nr_vecs;
> -	char *name;
> -	struct kmem_cache *slab;
> -};
> -
> -/*
> - * a small number of entries is fine, not going to be performance critical.
> - * basically we just need to survive
> - */
> -#define BIO_SPLIT_ENTRIES 2
> -
>  #ifdef CONFIG_HIGHMEM
>  /*
>   * remember never ever reenable interrupts between a bvec_kmap_irq and
> @@ -497,6 +464,48 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
>  	return bio;
>  }
>  
> +/*
> + * bio_set is used to allow other portions of the IO system to
> + * allocate their own private memory pools for bio and iovec structures.
> + * These memory pools in turn all allocate from the bio_slab
> + * and the bvec_slabs[].
> + */
> +#define BIO_POOL_SIZE 2
> +#define BIOVEC_NR_POOLS 6
> +#define BIOVEC_MAX_IDX	(BIOVEC_NR_POOLS - 1)
> +
> +struct bio_set {
> +	struct kmem_cache *bio_slab;
> +	unsigned int front_pad;
> +
> +	mempool_t *bio_pool;
> +#if defined(CONFIG_BLK_DEV_INTEGRITY)
> +	mempool_t *bio_integrity_pool;
> +#endif
> +	mempool_t *bvec_pool;
> +
> +	/*
> +	 * Deadlock avoidance for stacking block drivers: see comments in
> +	 * bio_alloc_bioset() for details
> +	 */
> +	spinlock_t		rescue_lock;
> +	struct bio_list		rescue_list;
> +	struct work_struct	rescue_work;
> +	struct workqueue_struct	*rescue_workqueue;
> +};
> +
> +struct biovec_slab {
> +	int nr_vecs;
> +	char *name;
> +	struct kmem_cache *slab;
> +};
> +
> +/*
> + * a small number of entries is fine, not going to be performance critical.
> + * basically we just need to survive
> + */
> +#define BIO_SPLIT_ENTRIES 2
> +
>  #if defined(CONFIG_BLK_DEV_INTEGRITY)
>  
>  #define bip_vec_idx(bip, idx)	(&(bip->bip_vec[(idx)]))
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-08-29 16:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-24 20:11 [PATCH v4 00/12] Block cleanups Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 01/12] block: Generalized bio pool freeing Kent Overstreet
2012-07-25 11:06   ` Boaz Harrosh
2012-07-25 23:38     ` Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 03/12] block: Add bio_reset() Kent Overstreet
2012-07-25 11:19   ` Boaz Harrosh
2012-07-25 22:56     ` Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 04/12] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
2012-07-25 11:29   ` Boaz Harrosh
2012-07-25 23:01     ` Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 05/12] block: Kill bi_destructor Kent Overstreet
2012-07-25 11:39   ` Boaz Harrosh
2012-07-25 23:15     ` Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 06/12] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 07/12] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 08/12] block: Introduce new bio_split() Kent Overstreet
2012-07-25 11:55   ` Boaz Harrosh
2012-07-25 23:26     ` Kent Overstreet
2012-07-27  0:50       ` [PATCH] A possible deadlock with stacked devices (was: [PATCH v4 08/12] block: Introduce new bio_split()) Mikulas Patocka
2012-08-15 23:07         ` Kent Overstreet
2012-08-29 16:08           ` Mikulas Patocka
2012-07-24 20:11 ` [PATCH v4 09/12] block: Rework bio_pair_split() Kent Overstreet
2012-07-25 12:03   ` Boaz Harrosh
2012-07-24 20:11 ` [PATCH v4 10/12] block: Add bio_clone_kmalloc() Kent Overstreet
2012-07-25 12:05   ` Boaz Harrosh
2012-07-24 20:11 ` [PATCH v4 11/12] block: Add bio_clone_bioset() Kent Overstreet
2012-07-24 20:11 ` [PATCH v4 12/12] block: Only clone bio vecs that are in use Kent Overstreet
2012-08-07  3:17   ` Muthu Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).