linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
@ 2017-05-02  3:42 NeilBrown
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
                   ` (13 more replies)
  0 siblings, 14 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

This is a revision of my series of patches working
towards removing the bioset work queues.

This set is based on Linus' tree as for today (2nd May) plus
the for-linus branch from Shaohua's md/raid tree.

This series adds a fix for the new lightnvm/pblk-read code
and discards bioset_create_nobvec() in favor of a flag arg to
bioset_create().  There are also minor fixes and a little
code clean-up.

I hope to eventually get rid of the new BIOSET_NEED_RESCUER flag,
but that needs work ing dm and probably bcache first.

Thanks,
NeilBrown


---

NeilBrown (13):
      blk: remove bio_set arg from blk_queue_split()
      blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
      blk: make the bioset rescue_workqueue optional.
      blk: use non-rescuing bioset for q->bio_split.
      block: Improvements to bounce-buffer handling
      rbd: use bio_clone_fast() instead of bio_clone()
      drbd: use bio_clone_fast() instead of bio_clone()
      pktcdvd: use bio_clone_fast() instead of bio_clone()
      lightnvm/pblk-read: use bio_clone_fast()
      xen-blkfront: remove bio splitting.
      bcache: use kmalloc to allocate bio in bch_data_verify()
      block: remove bio_clone() and all references.
      block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()


 Documentation/block/biodoc.txt      |    2 -
 block/bio.c                         |   72 ++++++++++++++++-------------------
 block/blk-core.c                    |    4 +-
 block/blk-merge.c                   |   31 ++-------------
 block/blk-mq.c                      |    2 -
 block/bounce.c                      |   32 +++++++++++++---
 drivers/block/drbd/drbd_int.h       |    3 +
 drivers/block/drbd/drbd_main.c      |   11 +++++
 drivers/block/drbd/drbd_req.c       |    2 -
 drivers/block/drbd/drbd_req.h       |    2 -
 drivers/block/pktcdvd.c             |   14 +++++--
 drivers/block/ps3vram.c             |    2 -
 drivers/block/rbd.c                 |   16 +++++++-
 drivers/block/rsxx/dev.c            |    2 -
 drivers/block/umem.c                |    2 -
 drivers/block/xen-blkfront.c        |   54 +-------------------------
 drivers/block/zram/zram_drv.c       |    2 -
 drivers/lightnvm/pblk-init.c        |   16 ++++++--
 drivers/lightnvm/pblk-read.c        |    2 -
 drivers/lightnvm/pblk.h             |    1 
 drivers/lightnvm/rrpc.c             |    2 -
 drivers/md/bcache/debug.c           |    2 -
 drivers/md/bcache/super.c           |    6 ++-
 drivers/md/dm-crypt.c               |    2 -
 drivers/md/dm-io.c                  |    2 -
 drivers/md/dm.c                     |    5 +-
 drivers/md/md.c                     |    6 +--
 drivers/md/raid1.c                  |    2 -
 drivers/md/raid10.c                 |    2 -
 drivers/md/raid5-cache.c            |    2 -
 drivers/md/raid5-ppl.c              |    2 -
 drivers/md/raid5.c                  |    2 -
 drivers/s390/block/dcssblk.c        |    2 -
 drivers/s390/block/xpram.c          |    2 -
 drivers/target/target_core_iblock.c |    2 -
 fs/block_dev.c                      |    2 -
 fs/btrfs/extent_io.c                |    3 +
 fs/xfs/xfs_super.c                  |    3 +
 include/linux/bio.h                 |   12 ++----
 include/linux/blkdev.h              |    3 -
 40 files changed, 162 insertions(+), 174 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/13] blk: remove bio_set arg from blk_queue_split()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
  2017-05-02  3:42 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02 10:44   ` javigon
  2017-05-02  3:42 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.

Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.

This is inconsistent and unnecessary.  Remove the last arg and always use
q->bio_split inside blk_queue_split()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/blk-core.c              |    2 +-
 block/blk-merge.c             |    9 ++++-----
 block/blk-mq.c                |    2 +-
 drivers/block/drbd/drbd_req.c |    2 +-
 drivers/block/pktcdvd.c       |    2 +-
 drivers/block/ps3vram.c       |    2 +-
 drivers/block/rsxx/dev.c      |    2 +-
 drivers/block/umem.c          |    2 +-
 drivers/block/zram/zram_drv.c |    2 +-
 drivers/lightnvm/pblk-init.c  |    4 ++--
 drivers/lightnvm/rrpc.c       |    2 +-
 drivers/md/md.c               |    2 +-
 drivers/s390/block/dcssblk.c  |    2 +-
 drivers/s390/block/xpram.c    |    2 +-
 include/linux/blkdev.h        |    3 +--
 15 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 24886b69690f..1a05c505964d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1663,7 +1663,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 */
 	blk_queue_bounce(q, &bio);
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
 		bio->bi_error = -EIO;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3990ae406341..d59074556703 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -202,8 +202,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	return do_split ? new : NULL;
 }
 
-void blk_queue_split(struct request_queue *q, struct bio **bio,
-		     struct bio_set *bs)
+void blk_queue_split(struct request_queue *q, struct bio **bio)
 {
 	struct bio *split, *res;
 	unsigned nsegs;
@@ -211,13 +210,13 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
 	switch (bio_op(*bio)) {
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
-		split = blk_bio_discard_split(q, *bio, bs, &nsegs);
+		split = blk_bio_discard_split(q, *bio, q->bio_split, &nsegs);
 		break;
 	case REQ_OP_WRITE_ZEROES:
-		split = blk_bio_write_zeroes_split(q, *bio, bs, &nsegs);
+		split = blk_bio_write_zeroes_split(q, *bio, q->bio_split, &nsegs);
 		break;
 	case REQ_OP_WRITE_SAME:
-		split = blk_bio_write_same_split(q, *bio, bs, &nsegs);
+		split = blk_bio_write_same_split(q, *bio, q->bio_split, &nsegs);
 		break;
 	default:
 		split = blk_bio_segment_split(q, *bio, q->bio_split, &nsegs);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bf90684a007a..45f084fb2052 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1543,7 +1543,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return BLK_QC_T_NONE;
 	}
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	if (!is_flush_fua && !blk_queue_nomerges(q) &&
 	    blk_attempt_plug_merge(q, bio, &request_count, &same_queue_rq))
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index b5730e17b455..fa62dd8a4d46 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1557,7 +1557,7 @@ blk_qc_t drbd_make_request(struct request_queue *q, struct bio *bio)
 	struct drbd_device *device = (struct drbd_device *) q->queuedata;
 	unsigned long start_jif;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	start_jif = jiffies;
 
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 205b865ebeb9..1457e65c50b1 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2414,7 +2414,7 @@ static blk_qc_t pkt_make_request(struct request_queue *q, struct bio *bio)
 
 	blk_queue_bounce(q, &bio);
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	pd = q->queuedata;
 	if (!pd) {
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index 456b4fe21559..48072c0c1010 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -606,7 +606,7 @@ static blk_qc_t ps3vram_make_request(struct request_queue *q, struct bio *bio)
 
 	dev_dbg(&dev->core, "%s\n", __func__);
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	spin_lock_irq(&priv->lock);
 	busy = !bio_list_empty(&priv->list);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 9c566364ac9c..01624eaefcba 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -151,7 +151,7 @@ static blk_qc_t rsxx_make_request(struct request_queue *q, struct bio *bio)
 	struct rsxx_bio_meta *bio_meta;
 	int st = -EINVAL;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	might_sleep();
 
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index c141cc3be22b..c8d8a2f16f8e 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -529,7 +529,7 @@ static blk_qc_t mm_make_request(struct request_queue *q, struct bio *bio)
 		 (unsigned long long)bio->bi_iter.bi_sector,
 		 bio->bi_iter.bi_size);
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	spin_lock_irq(&card->lock);
 	*card->biotail = bio;
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 6fac5fedd610..4297d19621bf 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -884,7 +884,7 @@ static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	blk_queue_split(queue, &bio, queue->bio_split);
+	blk_queue_split(queue, &bio);
 
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index ae8cd6d5af8b..b3fec8ec55b8 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -33,7 +33,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
 	 * constraint. Writes can be of arbitrary size.
 	 */
 	if (bio_data_dir(bio) == READ) {
-		blk_queue_split(q, &bio, q->bio_split);
+		blk_queue_split(q, &bio);
 		ret = pblk_submit_read(pblk, bio);
 		if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
 			bio_put(bio);
@@ -46,7 +46,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
 	 * available for user I/O.
 	 */
 	if (unlikely(pblk_get_secs(bio) >= pblk_rl_sysfs_rate_show(&pblk->rl)))
-		blk_queue_split(q, &bio, q->bio_split);
+		blk_queue_split(q, &bio);
 
 	return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
 }
diff --git a/drivers/lightnvm/rrpc.c b/drivers/lightnvm/rrpc.c
index cf0e28a0ff61..8e241056b141 100644
--- a/drivers/lightnvm/rrpc.c
+++ b/drivers/lightnvm/rrpc.c
@@ -994,7 +994,7 @@ static blk_qc_t rrpc_make_rq(struct request_queue *q, struct bio *bio)
 	struct nvm_rq *rqd;
 	int err;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	if (bio_op(bio) == REQ_OP_DISCARD) {
 		rrpc_discard(rrpc, bio);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 82f798be964f..de78e74c8e01 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -265,7 +265,7 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
 	unsigned int sectors;
 	int cpu;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	if (mddev == NULL || mddev->pers == NULL) {
 		bio_io_error(bio);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 415d10a67b7a..10ece6f3c7eb 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -829,7 +829,7 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 	unsigned long source_addr;
 	unsigned long bytes_done;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	bytes_done = 0;
 	dev_info = bio->bi_bdev->bd_disk->private_data;
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index b9d7e755c8a3..a48f0d40c1d2 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -190,7 +190,7 @@ static blk_qc_t xpram_make_request(struct request_queue *q, struct bio *bio)
 	unsigned long page_addr;
 	unsigned long bytes;
 
-	blk_queue_split(q, &bio, q->bio_split);
+	blk_queue_split(q, &bio);
 
 	if ((bio->bi_iter.bi_sector & 7) != 0 ||
 	    (bio->bi_iter.bi_size & 4095) != 0)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83d28623645f..f380d8b86e7e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -941,8 +941,7 @@ extern int blk_insert_cloned_request(struct request_queue *q,
 				     struct request *rq);
 extern int blk_rq_append_bio(struct request *rq, struct bio *bio);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
-extern void blk_queue_split(struct request_queue *, struct bio **,
-			    struct bio_set *);
+extern void blk_queue_split(struct request_queue *, struct bio **);
 extern void blk_recount_segments(struct request_queue *, struct bio *);
 extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
 extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:06   ` Christoph Hellwig
  2017-05-02  9:40   ` Ming Lei
  2017-05-02  3:42 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/bio.c                         |   60 +++++++++++++----------------------
 block/blk-core.c                    |    2 +
 drivers/block/drbd/drbd_main.c      |    2 +
 drivers/md/bcache/super.c           |    4 +-
 drivers/md/dm-crypt.c               |    2 +
 drivers/md/dm-io.c                  |    2 +
 drivers/md/dm.c                     |    2 +
 drivers/md/md.c                     |    2 +
 drivers/md/raid1.c                  |    2 +
 drivers/md/raid10.c                 |    2 +
 drivers/md/raid5-cache.c            |    2 +
 drivers/md/raid5-ppl.c              |    2 +
 drivers/md/raid5.c                  |    2 +
 drivers/target/target_core_iblock.c |    2 +
 fs/block_dev.c                      |    2 +
 fs/btrfs/extent_io.c                |    3 +-
 fs/xfs/xfs_super.c                  |    3 +-
 include/linux/bio.h                 |    6 ++--
 18 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 888e7801c638..3f1286ed301e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1921,9 +1921,26 @@ void bioset_free(struct bio_set *bs)
 }
 EXPORT_SYMBOL(bioset_free);
 
-static struct bio_set *__bioset_create(unsigned int pool_size,
-				       unsigned int front_pad,
-				       bool create_bvec_pool)
+/**
+ * bioset_create  - Create a bio_set
+ * @pool_size:	Number of bio and bio_vecs to cache in the mempool
+ * @front_pad:	Number of bytes to allocate in front of the returned bio
+ * @flags:	Flags to modify behavior, currently only %BIOSET_NEED_BVECS
+ *
+ * Description:
+ *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
+ *    to ask for a number of bytes to be allocated in front of the bio.
+ *    Front pad allocation is useful for embedding the bio inside
+ *    another structure, to avoid allocating extra data to go with the bio.
+ *    Note that the bio must be embedded at the END of that structure always,
+ *    or things will break badly.
+ *    If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated
+ *    for allocating iovecs.  This pool is not needed e.g. for bio_clone_fast().
+ *
+ */
+struct bio_set *bioset_create(unsigned int pool_size,
+			      unsigned int front_pad,
+			      int flags)
 {
 	unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
 	struct bio_set *bs;
@@ -1948,7 +1965,7 @@ static struct bio_set *__bioset_create(unsigned int pool_size,
 	if (!bs->bio_pool)
 		goto bad;
 
-	if (create_bvec_pool) {
+	if (flags & BIOSET_NEED_BVECS) {
 		bs->bvec_pool = biovec_create_pool(pool_size);
 		if (!bs->bvec_pool)
 			goto bad;
@@ -1963,41 +1980,8 @@ static struct bio_set *__bioset_create(unsigned int pool_size,
 	bioset_free(bs);
 	return NULL;
 }
-
-/**
- * bioset_create  - Create a bio_set
- * @pool_size:	Number of bio and bio_vecs to cache in the mempool
- * @front_pad:	Number of bytes to allocate in front of the returned bio
- *
- * Description:
- *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
- *    to ask for a number of bytes to be allocated in front of the bio.
- *    Front pad allocation is useful for embedding the bio inside
- *    another structure, to avoid allocating extra data to go with the bio.
- *    Note that the bio must be embedded at the END of that structure always,
- *    or things will break badly.
- */
-struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
-{
-	return __bioset_create(pool_size, front_pad, true);
-}
 EXPORT_SYMBOL(bioset_create);
 
-/**
- * bioset_create_nobvec  - Create a bio_set without bio_vec mempool
- * @pool_size:	Number of bio to cache in the mempool
- * @front_pad:	Number of bytes to allocate in front of the returned bio
- *
- * Description:
- *    Same functionality as bioset_create() except that mempool is not
- *    created for bio_vecs. Saving some memory for bio_clone_fast() users.
- */
-struct bio_set *bioset_create_nobvec(unsigned int pool_size, unsigned int front_pad)
-{
-	return __bioset_create(pool_size, front_pad, false);
-}
-EXPORT_SYMBOL(bioset_create_nobvec);
-
 #ifdef CONFIG_BLK_CGROUP
 
 /**
@@ -2112,7 +2096,7 @@ static int __init init_bio(void)
 	bio_integrity_init();
 	biovec_init_slabs();
 
-	fs_bio_set = bioset_create(BIO_POOL_SIZE, 0);
+	fs_bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!fs_bio_set)
 		panic("bio: can't allocate bios\n");
 
diff --git a/block/blk-core.c b/block/blk-core.c
index 1a05c505964d..3797753f4085 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_q;
 
-	q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!q->bio_split)
 		goto fail_id;
 
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 84455c365f57..b395fe391171 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2165,7 +2165,7 @@ static int drbd_create_mempools(void)
 		goto Enomem;
 
 	/* mempools */
-	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0);
+	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0, BIOSET_NEED_BVECS);
 	if (drbd_md_io_bio_set == NULL)
 		goto Enomem;
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 85e3f21c2514..8b9dae98cc7e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -786,7 +786,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 
 	minor *= BCACHE_MINORS;
 
-	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
+	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
 	    !(d->disk = alloc_disk(BCACHE_MINORS))) {
 		ida_simple_remove(&bcache_minor, minor);
 		return -ENOMEM;
@@ -1520,7 +1520,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 				sizeof(struct bbio) + sizeof(struct bio_vec) *
 				bucket_pages(c))) ||
 	    !(c->fill_iter = mempool_create_kmalloc_pool(1, iter_size)) ||
-	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
+	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
 	    !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) ||
 	    !(c->moving_gc_wq = alloc_workqueue("bcache_gc",
 						WQ_MEM_RECLAIM, 0)) ||
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ef1d836bd81b..5a98c1eb87d9 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1936,7 +1936,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	cc->bs = bioset_create(MIN_IOS, 0);
+	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS);
 	if (!cc->bs) {
 		ti->error = "Cannot allocate crypt bioset";
 		goto bad;
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 3702e502466d..1d86d1d39d48 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -58,7 +58,7 @@ struct dm_io_client *dm_io_client_create(void)
 	if (!client->pool)
 		goto bad;
 
-	client->bios = bioset_create(min_ios, 0);
+	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS);
 	if (!client->bios)
 		goto bad;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8bf397729bbd..e21a7d58c8ef 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2601,7 +2601,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t
 		BUG();
 	}
 
-	pools->bs = bioset_create_nobvec(pool_size, front_pad);
+	pools->bs = bioset_create(pool_size, front_pad, 0);
 	if (!pools->bs)
 		goto out;
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index de78e74c8e01..1379fb636de2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5420,7 +5420,7 @@ int md_run(struct mddev *mddev)
 	}
 
 	if (mddev->bio_set == NULL) {
-		mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0);
+		mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 		if (!mddev->bio_set)
 			return -ENOMEM;
 	}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7ed59351fe97..3f951e4a36ea 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2953,7 +2953,7 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	if (!conf->r1bio_pool)
 		goto abort;
 
-	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0, 0);
 	if (!conf->bio_split)
 		goto abort;
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6b86a0032cf8..9b401a9eae85 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3553,7 +3553,7 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 	if (!conf->r10bio_pool)
 		goto out;
 
-	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0, 0);
 	if (!conf->bio_split)
 		goto out;
 
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index b6194e082e48..ea562aa735ac 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3034,7 +3034,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
 	if (!log->io_pool)
 		goto io_pool;
 
-	log->bs = bioset_create(R5L_POOL_SIZE, 0);
+	log->bs = bioset_create(R5L_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!log->bs)
 		goto io_bs;
 
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 5d25bebf3328..3a31e6ef74d5 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -1150,7 +1150,7 @@ int ppl_init_log(struct r5conf *conf)
 		goto err;
 	}
 
-	ppl_conf->bs = bioset_create(conf->raid_disks, 0);
+	ppl_conf->bs = bioset_create(conf->raid_disks, 0, 0);
 	if (!ppl_conf->bs) {
 		ret = -ENOMEM;
 		goto err;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2e38cfac5b1d..3da078676503 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6918,7 +6918,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 			goto abort;
 	}
 
-	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+	conf->bio_split = bioset_create(BIO_POOL_SIZE, 0, 0);
 	if (!conf->bio_split)
 		goto abort;
 	conf->mddev = mddev;
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index d316ed537d59..c7b373dafaf9 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -93,7 +93,7 @@ static int iblock_configure_device(struct se_device *dev)
 		return -EINVAL;
 	}
 
-	ib_dev->ibd_bio_set = bioset_create(IBLOCK_BIO_POOL_SIZE, 0);
+	ib_dev->ibd_bio_set = bioset_create(IBLOCK_BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!ib_dev->ibd_bio_set) {
 		pr_err("IBLOCK: Unable to create bioset\n");
 		goto out;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9ccabe3bb7de..6e7a4c411cd2 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -436,7 +436,7 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 static __init int blkdev_init(void)
 {
-	blkdev_dio_pool = bioset_create(4, offsetof(struct blkdev_dio, bio));
+	blkdev_dio_pool = bioset_create(4, offsetof(struct blkdev_dio, bio), BIOSET_NEED_BVECS);
 	if (!blkdev_dio_pool)
 		return -ENOMEM;
 	return 0;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 27fdb250b446..6897c2cd18d3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -174,7 +174,8 @@ int __init extent_io_init(void)
 		goto free_state_cache;
 
 	btrfs_bioset = bioset_create(BIO_POOL_SIZE,
-				     offsetof(struct btrfs_io_bio, bio));
+				     offsetof(struct btrfs_io_bio, bio),
+				     BIOSET_NEED_BVECS);
 	if (!btrfs_bioset)
 		goto free_buffer_cache;
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 685c042a120f..0262063014c6 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1757,7 +1757,8 @@ STATIC int __init
 xfs_init_zones(void)
 {
 	xfs_ioend_bioset = bioset_create(4 * MAX_BUF_PER_PAGE,
-			offsetof(struct xfs_ioend, io_inline_bio));
+			offsetof(struct xfs_ioend, io_inline_bio),
+			BIOSET_NEED_BVECS);
 	if (!xfs_ioend_bioset)
 		goto out;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..0975da6bebd9 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -373,8 +373,10 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
 	return bio_split(bio, sectors, gfp, bs);
 }
 
-extern struct bio_set *bioset_create(unsigned int, unsigned int);
-extern struct bio_set *bioset_create_nobvec(unsigned int, unsigned int);
+extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
+enum {
+	BIOSET_NEED_BVECS = BIT(0),
+};
 extern void bioset_free(struct bio_set *);
 extern mempool_t *biovec_create_pool(int pool_entries);
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/13] blk: make the bioset rescue_workqueue optional.
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:14   ` Christoph Hellwig
                     ` (2 more replies)
  2017-05-02  3:42 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

This patch converts bioset_create() and
bioset_create_nobvec() to not create a workqueue so
alloctions will never trigger punt_bios_to_rescuer().  It
also introduces bioset_create_rescued() and
bioset_create_nobvec_rescued() which preserve the old
behaviour.

All callers of bioset_create() and bioset_create_nobvec(),
that are inside block device drivers, are converted to the
_rescued() version.

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, drbd_md_io_bio_set,
and one allocated by target_core_iblock.c.

biosets used by md/raid to not need rescuing as
their usage was recently audited to revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/bio.c               |   12 ++++++++++--
 block/blk-core.c          |    2 +-
 drivers/md/bcache/super.c |    6 ++++--
 drivers/md/dm-crypt.c     |    2 +-
 drivers/md/dm-io.c        |    2 +-
 drivers/md/dm.c           |    5 +++--
 include/linux/bio.h       |    1 +
 7 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3f1286ed301e..257606e742e0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -363,6 +363,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
 	struct bio_list punt, nopunt;
 	struct bio *bio;
 
+	if (!WARN_ON_ONCE(!bs->rescue_workqueue))
+		return;
 	/*
 	 * In order to guarantee forward progress we must punt only bios that
 	 * were allocated from this bio_set; otherwise, if there was a bio on
@@ -474,7 +476,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
 
 		if (current->bio_list &&
 		    (!bio_list_empty(&current->bio_list[0]) ||
-		     !bio_list_empty(&current->bio_list[1])))
+		     !bio_list_empty(&current->bio_list[1])) &&
+		    bs->rescue_workqueue)
 			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
 
 		p = mempool_alloc(bs->bio_pool, gfp_mask);
@@ -1925,7 +1928,7 @@ EXPORT_SYMBOL(bioset_free);
  * bioset_create  - Create a bio_set
  * @pool_size:	Number of bio and bio_vecs to cache in the mempool
  * @front_pad:	Number of bytes to allocate in front of the returned bio
- * @flags:	Flags to modify behavior, currently only %BIOSET_NEED_BVECS
+ * @flags:	Flags to modify behavior, currently %BIOSET_NEED_BVECS and %BIOSET_NEED_RESCUER
  *
  * Description:
  *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
@@ -1936,6 +1939,8 @@ EXPORT_SYMBOL(bioset_free);
  *    or things will break badly.
  *    If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated
  *    for allocating iovecs.  This pool is not needed e.g. for bio_clone_fast().
+ *    If %BIOSET_NEED_RESCUER is set, workqueue is create which can be used to
+ *    dispatch queued requests when the mempool runs out of space.
  *
  */
 struct bio_set *bioset_create(unsigned int pool_size,
@@ -1971,6 +1976,9 @@ struct bio_set *bioset_create(unsigned int pool_size,
 			goto bad;
 	}
 
+	if (!(flags & BIOSET_NEED_RESCUER))
+		return bs;
+
 	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
 	if (!bs->rescue_workqueue)
 		goto bad;
diff --git a/block/blk-core.c b/block/blk-core.c
index 3797753f4085..ae51f159a2ca 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_q;
 
-	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
+	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
 	if (!q->bio_split)
 		goto fail_id;
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8b9dae98cc7e..bb50991a82d3 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -786,7 +786,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 
 	minor *= BCACHE_MINORS;
 
-	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
+	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio),
+					   BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER)) ||
 	    !(d->disk = alloc_disk(BCACHE_MINORS))) {
 		ida_simple_remove(&bcache_minor, minor);
 		return -ENOMEM;
@@ -1520,7 +1521,8 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 				sizeof(struct bbio) + sizeof(struct bio_vec) *
 				bucket_pages(c))) ||
 	    !(c->fill_iter = mempool_create_kmalloc_pool(1, iter_size)) ||
-	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
+	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio),
+					   BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER)) ||
 	    !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) ||
 	    !(c->moving_gc_wq = alloc_workqueue("bcache_gc",
 						WQ_MEM_RECLAIM, 0)) ||
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 5a98c1eb87d9..87155eb7b39c 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1936,7 +1936,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS);
+	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
 	if (!cc->bs) {
 		ti->error = "Cannot allocate crypt bioset";
 		goto bad;
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1d86d1d39d48..95b9ac98fe0c 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -58,7 +58,7 @@ struct dm_io_client *dm_io_client_create(void)
 	if (!client->pool)
 		goto bad;
 
-	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS);
+	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
 	if (!client->bios)
 		goto bad;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e21a7d58c8ef..4f96556cb8c5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1014,7 +1014,8 @@ static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
 
 		while ((bio = bio_list_pop(&list))) {
 			struct bio_set *bs = bio->bi_pool;
-			if (unlikely(!bs) || bs == fs_bio_set) {
+			if (unlikely(!bs) || bs == fs_bio_set ||
+			    !bs->rescue_workqueue) {
 				bio_list_add(&current->bio_list[i], bio);
 				continue;
 			}
@@ -2601,7 +2602,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t
 		BUG();
 	}
 
-	pools->bs = bioset_create(pool_size, front_pad, 0);
+	pools->bs = bioset_create(pool_size, front_pad, BIOSET_NEED_RESCUER);
 	if (!pools->bs)
 		goto out;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0975da6bebd9..40e5d7b62f29 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -376,6 +376,7 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
 extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
 enum {
 	BIOSET_NEED_BVECS = BIT(0),
+	BIOSET_NEED_RESCUER = BIT(1),
 };
 extern void bioset_free(struct bio_set *);
 extern mempool_t *biovec_create_pool(int pool_entries);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split.
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (3 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02 11:54   ` Ming Lei
  2017-05-02 23:21   ` [PATCH 04/13 V2] " NeilBrown
  2017-05-02  3:42 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

A rescuing bioset is only useful if there might be bios from
that same bioset on the bio_list_on_stack queue at a time
when bio_alloc_bioset() is called.  This never applies to
q->bio_split.

Allocations from q->bio_split are only ever made from
blk_queue_split() which is only ever called early in each of
various make_request_fn()s.  The original bio (call this A)
is then passed to generic_make_request() and is placed on
the bio_list_on_stack queue, and the bio that was allocated
from q->bio_split (B) is processed.

The processing of this may cause other bios to be passed to
generic_make_request() or may even cause the bio B itself to
be passed, possible after some prefix has been split off
(using some other bioset).

generic_make_request() now guarantees that all of these bios
(B and dependants) will be fully processed before the tail
of the original bio A gets handled.  None of these early bios
can possible trigger an allocation from the original
q->bio_split as they are either too small to require
splitting or (more likely) are destined for a different queue.

The next time that the original q->bio_split might be used
by this thread is when A is processed again, as it might
still be too big to handle directly.  By this time there
cannot be any other bios allocated from q->bio_split in the
generic_make_request() queue.  So no rescuing will ever be
needed.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/blk-core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ae51f159a2ca..3797753f4085 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_q;
 
-	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
+	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!q->bio_split)
 		goto fail_id;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/13] block: Improvements to bounce-buffer handling
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (2 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:13   ` Christoph Hellwig
  2017-05-02 11:56   ` Ming Lei
  2017-05-02  3:42 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

Since commit 23688bf4f830 ("block: ensure to split after potentially
bouncing a bio") blk_queue_bounce() is called *before*
blk_queue_split().
This means that:
 1/ the comments blk_queue_split() about bounce buffers are
    irrelevant, and
 2/ a very large bio (more than BIO_MAX_PAGES) will no longer be
    split before it arrives at blk_queue_bounce(), leading to the
    possibility that bio_clone_bioset() will fail and a NULL
    will be dereferenced.

Separately, blk_queue_bounce() shouldn't use fs_bio_set as the bio
being copied could be from the same set, and this could lead to a
deadlock.

So:
 - allocate 2 private biosets for blk_queue_bounce, one for
   splitting enormous bios and one for cloning bios.
 - add code to split a bio that exceeds BIO_MAX_PAGES.
 - Fix up the comments in blk_queue_split()

Credit-to: Ming Lei <tom.leiming@gmail.com> (suggested using single bio_for_each_segment loop)
Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/blk-merge.c |   14 ++++----------
 block/bounce.c    |   32 ++++++++++++++++++++++++++------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index d59074556703..51c84540d3bb 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -117,17 +117,11 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		 * each holds at most BIO_MAX_PAGES bvecs because
 		 * bio_clone() can fail to allocate big bvecs.
 		 *
-		 * It should have been better to apply the limit per
-		 * request queue in which bio_clone() is involved,
-		 * instead of globally. The biggest blocker is the
-		 * bio_clone() in bio bounce.
+		 * Those drivers which will need to use bio_clone()
+		 * should tell us in some way.  For now, impose the
+		 * BIO_MAX_PAGES limit on all queues.
 		 *
-		 * If bio is splitted by this reason, we should have
-		 * allowed to continue bios merging, but don't do
-		 * that now for making the change simple.
-		 *
-		 * TODO: deal with bio bounce's bio_clone() gracefully
-		 * and convert the global limit into per-queue limit.
+		 * TODO: handle users of bio_clone() differently.
 		 */
 		if (bvecs++ >= BIO_MAX_PAGES)
 			goto split;
diff --git a/block/bounce.c b/block/bounce.c
index 1cb5dd3a5da1..087ecc2dc66c 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -26,6 +26,7 @@
 #define POOL_SIZE	64
 #define ISA_POOL_SIZE	16
 
+struct bio_set *bounce_bio_set, *bounce_bio_split;
 static mempool_t *page_pool, *isa_page_pool;
 
 #if defined(CONFIG_HIGHMEM) || defined(CONFIG_NEED_BOUNCE_POOL)
@@ -40,6 +41,14 @@ static __init int init_emergency_pool(void)
 	BUG_ON(!page_pool);
 	pr_info("pool size: %d pages\n", POOL_SIZE);
 
+	bounce_bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
+	BUG_ON(!bounce_bio_set);
+	if (bioset_integrity_create(bounce_bio_set, BIO_POOL_SIZE))
+		BUG_ON(1);
+
+	bounce_bio_split = bioset_create(BIO_POOL_SIZE, 0, 0);
+	BUG_ON(!bounce_bio_split);
+
 	return 0;
 }
 
@@ -186,15 +195,26 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	int rw = bio_data_dir(*bio_orig);
 	struct bio_vec *to, from;
 	struct bvec_iter iter;
-	unsigned i;
+	unsigned i = 0;
+	bool bounce = false;
+	int sectors = 0;
 
-	bio_for_each_segment(from, *bio_orig, iter)
+	bio_for_each_segment(from, *bio_orig, iter) {
+		if (i++ < BIO_MAX_PAGES)
+			sectors += from.bv_len >> 9;
 		if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
-			goto bounce;
+			bounce = true;
+	}
+	if (!bounce)
+		return;
 
-	return;
-bounce:
-	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
+	if (sectors < bio_sectors(*bio_orig)) {
+		bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split);
+		bio_chain(bio, *bio_orig);
+		generic_make_request(*bio_orig);
+		*bio_orig = bio;
+	}
+	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);
 
 	bio_for_each_segment_all(to, bio, i) {
 		struct page *page = to->bv_page;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (6 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02 23:20   ` [PATCH 07/13 V2] " NeilBrown
  2017-05-02  3:42 ` [PATCH 06/13] rbd: " NeilBrown
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

drbd does not modify the bi_io_vec of the cloned bio,
so there is no need to clone that part.  So bio_clone_fast()
is the better choice.
For bio_clone_fast() we need to specify a bio_set.
We could use fs_bio_set, which bio_clone() uses, or
drbd_md_io_bio_set, which drbd uses for metadata, but it is
generally best to avoid sharing bio_sets unless you can
be certain that there are no interdependencies.

So create a new bio_set, drbd_io_bio_set, and use bio_clone_fast().

Also remove a "XXX cannot fail ???" comment because it definitely
cannot fail - bio_clone_fast() doesn't fail if the GFP flags allow for
sleeping.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/drbd/drbd_int.h  |    3 +++
 drivers/block/drbd/drbd_main.c |    9 +++++++++
 drivers/block/drbd/drbd_req.h  |    2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index d5da45bb03a6..f91982515a6b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1441,6 +1441,9 @@ extern struct bio_set *drbd_md_io_bio_set;
 /* to allocate from that set */
 extern struct bio *bio_alloc_drbd(gfp_t gfp_mask);
 
+/* And a bio_set for cloning */
+extern struct bio_set *drbd_io_bio_set;
+
 extern struct mutex resources_mutex;
 
 extern int conn_lowest_minor(struct drbd_connection *connection);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index b395fe391171..df99dc8bcb63 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -128,6 +128,7 @@ mempool_t *drbd_request_mempool;
 mempool_t *drbd_ee_mempool;
 mempool_t *drbd_md_io_page_pool;
 struct bio_set *drbd_md_io_bio_set;
+struct bio_set *drbd_io_bio_set;
 
 /* I do not use a standard mempool, because:
    1) I want to hand out the pre-allocated objects first.
@@ -2098,6 +2099,8 @@ static void drbd_destroy_mempools(void)
 
 	/* D_ASSERT(device, atomic_read(&drbd_pp_vacant)==0); */
 
+	if (drbd_io_bio_set)
+		bioset_free(drbd_io_bio_set);
 	if (drbd_md_io_bio_set)
 		bioset_free(drbd_md_io_bio_set);
 	if (drbd_md_io_page_pool)
@@ -2115,6 +2118,7 @@ static void drbd_destroy_mempools(void)
 	if (drbd_al_ext_cache)
 		kmem_cache_destroy(drbd_al_ext_cache);
 
+	drbd_io_bio_set      = NULL;
 	drbd_md_io_bio_set   = NULL;
 	drbd_md_io_page_pool = NULL;
 	drbd_ee_mempool      = NULL;
@@ -2142,6 +2146,7 @@ static int drbd_create_mempools(void)
 	drbd_pp_pool         = NULL;
 	drbd_md_io_page_pool = NULL;
 	drbd_md_io_bio_set   = NULL;
+	drbd_io_bio_set      = NULL;
 
 	/* caches */
 	drbd_request_cache = kmem_cache_create(
@@ -2165,6 +2170,10 @@ static int drbd_create_mempools(void)
 		goto Enomem;
 
 	/* mempools */
+	drbd_io_bio_set = bioset_create(BIO_POOL_SIZE, 0, 0);
+	if (drbd_io_bio_set == NULL)
+		goto Enomem;
+
 	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0, BIOSET_NEED_BVECS);
 	if (drbd_md_io_bio_set == NULL)
 		goto Enomem;
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index eb49e7f2da91..9e1866ab238f 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -263,7 +263,7 @@ enum drbd_req_state_bits {
 static inline void drbd_req_make_private_bio(struct drbd_request *req, struct bio *bio_src)
 {
 	struct bio *bio;
-	bio = bio_clone(bio_src, GFP_NOIO); /* XXX cannot fail?? */
+	bio = bio_clone_fast(bio_src, GFP_NOIO, drbd_io_bio_set);
 
 	req->private_bio = bio;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (7 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  3:42 ` [PATCH 08/13] pktcdvd: " NeilBrown
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

bio_clone() makes a copy of the bi_io_vec, but rbd never changes that,
so there is no need for a copy.
bio_clone_fast() can be used instead, which avoids making the copy.

This requires that we provide a bio_set.  bio_clone() uses fs_bio_set,
but it isn't, in general, safe to use the same bio_set at different
levels of the stack, as that can lead to deadlocks.  As filesystems
use fs_bio_set, block devices shouldn't.

As rbd never stacks, it is safe to have a single global bio_set for
all rbd devices to use.  So allocate that when the module is
initialised, and use it with bio_clone_fast().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/rbd.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 089ac4179919..52120ed9385f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -441,6 +441,8 @@ static DEFINE_SPINLOCK(rbd_client_list_lock);
 static struct kmem_cache	*rbd_img_request_cache;
 static struct kmem_cache	*rbd_obj_request_cache;
 
+static struct bio_set		*rbd_bio_clone;
+
 static int rbd_major;
 static DEFINE_IDA(rbd_dev_id_ida);
 
@@ -1362,7 +1364,7 @@ static struct bio *bio_clone_range(struct bio *bio_src,
 {
 	struct bio *bio;
 
-	bio = bio_clone(bio_src, gfpmask);
+	bio = bio_clone_fast(bio_src, gfpmask, rbd_bio_clone);
 	if (!bio)
 		return NULL;	/* ENOMEM */
 
@@ -6342,8 +6344,16 @@ static int rbd_slab_init(void)
 	if (!rbd_obj_request_cache)
 		goto out_err;
 
+	rbd_assert(!rbd_bio_clone);
+	rbd_bio_clone = bioset_create(BIO_POOL_SIZE, 0, 0);
+	if (!rbd_bio_clone)
+		goto out_err_clone;
+
 	return 0;
 
+out_err_clone:
+	kmem_cache_destroy(rbd_obj_request_cache);
+	rbd_obj_request_cache = NULL;
 out_err:
 	kmem_cache_destroy(rbd_img_request_cache);
 	rbd_img_request_cache = NULL;
@@ -6359,6 +6369,10 @@ static void rbd_slab_exit(void)
 	rbd_assert(rbd_img_request_cache);
 	kmem_cache_destroy(rbd_img_request_cache);
 	rbd_img_request_cache = NULL;
+
+	rbd_assert(rbd_bio_clone);
+	bioset_free(rbd_bio_clone);
+	rbd_bio_clone = NULL;
 }
 
 static int __init rbd_init(void)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/13] pktcdvd: use bio_clone_fast() instead of bio_clone()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (8 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 06/13] rbd: " NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  3:42 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

pktcdvd doesn't change the bi_io_vec of the clone bio,
so it is more efficient to use bio_clone_fast(), and not clone
the bi_io_vec.
This requires providing a bio_set, and it is safest to
provide a dedicated bio_set rather than sharing
fs_bio_set, which filesytems use.
This new bio_set, pkt_bio_set, can also be use for the bio_split()
call as the two allocations (bio_clone_fast, and bio_split) are
independent, neither can block a bio allocated by the other.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/pktcdvd.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1457e65c50b1..6f9c31fae335 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -98,6 +98,7 @@ static int write_congestion_on  = PKT_WRITE_CONGESTION_ON;
 static int write_congestion_off = PKT_WRITE_CONGESTION_OFF;
 static struct mutex ctl_mutex;	/* Serialize open/close/setup/teardown */
 static mempool_t *psd_pool;
+static struct bio_set *pkt_bio_set;
 
 static struct class	*class_pktcdvd = NULL;    /* /sys/class/pktcdvd */
 static struct dentry	*pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
@@ -2310,7 +2311,7 @@ static void pkt_end_io_read_cloned(struct bio *bio)
 
 static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
 {
-	struct bio *cloned_bio = bio_clone(bio, GFP_NOIO);
+	struct bio *cloned_bio = bio_clone_fast(bio, GFP_NOIO, pkt_bio_set);
 	struct packet_stacked_data *psd = mempool_alloc(psd_pool, GFP_NOIO);
 
 	psd->pd = pd;
@@ -2455,7 +2456,7 @@ static blk_qc_t pkt_make_request(struct request_queue *q, struct bio *bio)
 
 			split = bio_split(bio, last_zone -
 					  bio->bi_iter.bi_sector,
-					  GFP_NOIO, fs_bio_set);
+					  GFP_NOIO, pkt_bio_set);
 			bio_chain(split, bio);
 		} else {
 			split = bio;
@@ -2919,6 +2920,11 @@ static int __init pkt_init(void)
 					sizeof(struct packet_stacked_data));
 	if (!psd_pool)
 		return -ENOMEM;
+	pkt_bio_set = bioset_create(BIO_POOL_SIZE, 0, 0);
+	if (!pkt_bio_set) {
+		mempool_destroy(psd_pool);
+		return -ENOMEM;
+	}
 
 	ret = register_blkdev(pktdev_major, DRIVER_NAME);
 	if (ret < 0) {
@@ -2951,6 +2957,7 @@ static int __init pkt_init(void)
 	unregister_blkdev(pktdev_major, DRIVER_NAME);
 out2:
 	mempool_destroy(psd_pool);
+	bioset_free(pkt_bio_set);
 	return ret;
 }
 
@@ -2964,6 +2971,7 @@ static void __exit pkt_exit(void)
 
 	unregister_blkdev(pktdev_major, DRIVER_NAME);
 	mempool_destroy(psd_pool);
+	bioset_free(pkt_bio_set);
 }
 
 MODULE_DESCRIPTION("Packet writing layer for CD/DVD drives");

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (4 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:15   ` Christoph Hellwig
  2017-05-02 11:22   ` Javier González
  2017-05-02  3:42 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel, Javier Gonzalez

pblk_submit_read() uses bio_clone_bioset() but doesn't change the
io_vec, so bio_clone_fast() is a better choice.

It also uses fs_bio_set which is intended for filesystems.  Using it
in a device driver can deadlock.
So allocate a new bioset, and and use bio_clone_fast().

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/lightnvm/pblk-init.c |   12 +++++++++++-
 drivers/lightnvm/pblk-read.c |    2 +-
 drivers/lightnvm/pblk.h      |    1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b3fec8ec55b8..aaefbccce30e 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -23,6 +23,7 @@
 static struct kmem_cache *pblk_blk_ws_cache, *pblk_rec_cache, *pblk_r_rq_cache,
 					*pblk_w_rq_cache, *pblk_line_meta_cache;
 static DECLARE_RWSEM(pblk_lock);
+struct bio_set *pblk_bio_set;
 
 static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
 			  struct bio *bio)
@@ -946,11 +947,20 @@ static struct nvm_tgt_type tt_pblk = {
 
 static int __init pblk_module_init(void)
 {
-	return nvm_register_tgt_type(&tt_pblk);
+	int ret;
+
+	pblk_bio_set = bioset_create(BIO_POOL_SIZE, 0, 0);
+	if (!pblk_bio_set)
+		return -ENOMEM;
+	ret = nvm_register_tgt_type(&tt_pblk);
+	if (ret)
+		bioset_free(pblk_bio_set);
+	return ret;
 }
 
 static void pblk_module_exit(void)
 {
+	bioset_free(pblk_bio_set);
 	nvm_unregister_tgt_type(&tt_pblk);
 }
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 4a12f14d78c6..8ee0c50a7a71 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -342,7 +342,7 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 		struct pblk_r_ctx *r_ctx = nvm_rq_to_pdu(rqd);
 
 		/* Clone read bio to deal with read errors internally */
-		int_bio = bio_clone_bioset(bio, GFP_KERNEL, fs_bio_set);
+		int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set);
 		if (!int_bio) {
 			pr_err("pblk: could not clone read bio\n");
 			return NVM_IO_ERR;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 99f3186b5288..95b665f23925 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -702,6 +702,7 @@ void pblk_write_should_kick(struct pblk *pblk);
 /*
  * pblk read path
  */
+extern struct bio_set *pblk_bio_set;
 int pblk_submit_read(struct pblk *pblk, struct bio *bio);
 int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 			unsigned int nr_secs, unsigned int *secs_to_gc,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/13] xen-blkfront: remove bio splitting.
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (5 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:15   ` Christoph Hellwig
  2017-05-02  3:42 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

bios that are re-submitted will pass through blk_queue_split() when
blk_queue_bio() is called, and this will split the bio if necessary.
There is no longer any need to do this splitting in xen-blkfront.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/xen-blkfront.c |   54 ++----------------------------------------
 1 file changed, 3 insertions(+), 51 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 39459631667c..fb963a010d4d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -110,11 +110,6 @@ struct blk_shadow {
 	unsigned long associated_id;
 };
 
-struct split_bio {
-	struct bio *bio;
-	atomic_t pending;
-};
-
 struct blkif_req {
 	int	error;
 };
@@ -1996,28 +1991,13 @@ static int blkfront_probe(struct xenbus_device *dev,
 	return 0;
 }
 
-static void split_bio_end(struct bio *bio)
-{
-	struct split_bio *split_bio = bio->bi_private;
-
-	if (atomic_dec_and_test(&split_bio->pending)) {
-		split_bio->bio->bi_phys_segments = 0;
-		split_bio->bio->bi_error = bio->bi_error;
-		bio_endio(split_bio->bio);
-		kfree(split_bio);
-	}
-	bio_put(bio);
-}
-
 static int blkif_recover(struct blkfront_info *info)
 {
-	unsigned int i, r_index;
+	unsigned int r_index;
 	struct request *req, *n;
 	int rc;
-	struct bio *bio, *cloned_bio;
-	unsigned int segs, offset;
-	int pending, size;
-	struct split_bio *split_bio;
+	struct bio *bio;
+	unsigned int segs;
 
 	blkfront_gather_backend_features(info);
 	/* Reset limits changed by blk_mq_update_nr_hw_queues(). */
@@ -2056,34 +2036,6 @@ static int blkif_recover(struct blkfront_info *info)
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
-		if (bio_segments(bio) > segs) {
-			/*
-			 * This bio has more segments than what we can
-			 * handle, we have to split it.
-			 */
-			pending = (bio_segments(bio) + segs - 1) / segs;
-			split_bio = kzalloc(sizeof(*split_bio), GFP_NOIO);
-			BUG_ON(split_bio == NULL);
-			atomic_set(&split_bio->pending, pending);
-			split_bio->bio = bio;
-			for (i = 0; i < pending; i++) {
-				offset = (i * segs * XEN_PAGE_SIZE) >> 9;
-				size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
-					   (unsigned int)bio_sectors(bio) - offset);
-				cloned_bio = bio_clone(bio, GFP_NOIO);
-				BUG_ON(cloned_bio == NULL);
-				bio_trim(cloned_bio, offset, size);
-				cloned_bio->bi_private = split_bio;
-				cloned_bio->bi_end_io = split_bio_end;
-				submit_bio(cloned_bio);
-			}
-			/*
-			 * Now we have to wait for all those smaller bios to
-			 * end, so we can also end the "parent" bio.
-			 */
-			continue;
-		}
-		/* We don't need to split this bio */
 		submit_bio(bio);
 	}
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/13] block: remove bio_clone() and all references.
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (11 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-11  0:58 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
  13 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

bio_clone() is no longer used.
Only bio_clone_bioset() or bio_clone_fast().
This is for the best, as bio_clone() used fs_bio_set,
and filesystems are unlikely to want to use bio_clone().

So remove bio_clone() and all references.
This includes a fix to some incorrect documentation.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 Documentation/block/biodoc.txt |    2 +-
 block/bio.c                    |    2 +-
 block/blk-merge.c              |    6 +++---
 drivers/md/md.c                |    2 +-
 include/linux/bio.h            |    5 -----
 5 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 01ddeaf64b0f..9490f2845f06 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -632,7 +632,7 @@ to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
 happens in the meantime).
 
-The bio_clone() routine may be used to duplicate a bio, where the clone
+The bio_clone_fast() routine may be used to duplicate a bio, where the clone
 shares the bio_vec_list with the original bio (i.e. both point to the
 same bio_vec_list). This would typically be used for splitting i/o requests
 in lvm or md.
diff --git a/block/bio.c b/block/bio.c
index 257606e742e0..d82f6da94421 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -547,7 +547,7 @@ EXPORT_SYMBOL(zero_fill_bio);
  *
  * Description:
  *   Put a reference to a &struct bio, either one you have gotten with
- *   bio_alloc, bio_get or bio_clone. The last put of a bio will free it.
+ *   bio_alloc, bio_get or bio_clone_*. The last put of a bio will free it.
  **/
 void bio_put(struct bio *bio)
 {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 51c84540d3bb..e7862e9dcc39 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -115,13 +115,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		 * With arbitrary bio size, the incoming bio may be very
 		 * big. We have to split the bio into small bios so that
 		 * each holds at most BIO_MAX_PAGES bvecs because
-		 * bio_clone() can fail to allocate big bvecs.
+		 * bio_clone_bioset() can fail to allocate big bvecs.
 		 *
-		 * Those drivers which will need to use bio_clone()
+		 * Those drivers which will need to use bio_clone_bioset()
 		 * should tell us in some way.  For now, impose the
 		 * BIO_MAX_PAGES limit on all queues.
 		 *
-		 * TODO: handle users of bio_clone() differently.
+		 * TODO: handle users of bio_clone_bioset() differently.
 		 */
 		if (bvecs++ >= BIO_MAX_PAGES)
 			goto split;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1379fb636de2..cd2fa40e9af3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -185,7 +185,7 @@ static int start_readonly;
 static bool create_on_open = true;
 
 /* bio_clone_mddev
- * like bio_clone, but with a local bio set
+ * like bio_clone_bioset, but with a local bio set
  */
 
 struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 40e5d7b62f29..538c981ac26a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -395,11 +395,6 @@ static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
 }
 
-static inline struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
-{
-	return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
-}
-
 static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, NULL);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (9 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 08/13] pktcdvd: " NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  8:15   ` Christoph Hellwig
  2017-05-02 10:22   ` Ming Lei
  2017-05-02  3:42 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

blk_bio_segment_split() makes sure bios have no more than
BIO_MAX_PAGES entries in the bi_io_vec.
This was done because bio_clone_bioset() (when given a
mempool bioset) could not handle larger io_vecs.

No driver uses bio_clone_bioset() any more, they all
use bio_clone_fast() if anything, and bio_clone_fast()
doesn't clone the bi_io_vec.

The main user of of bio_clone_bioset() at this level
is bounce.c, and bouncing now happens before blk_bio_segment_split(),
so that is not of concern.

So remove the big helpful comment and the code.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 block/blk-merge.c |   16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index e7862e9dcc39..cea544ec5d96 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -108,25 +108,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	bool do_split = true;
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
-	unsigned bvecs = 0;
 
 	bio_for_each_segment(bv, bio, iter) {
 		/*
-		 * With arbitrary bio size, the incoming bio may be very
-		 * big. We have to split the bio into small bios so that
-		 * each holds at most BIO_MAX_PAGES bvecs because
-		 * bio_clone_bioset() can fail to allocate big bvecs.
-		 *
-		 * Those drivers which will need to use bio_clone_bioset()
-		 * should tell us in some way.  For now, impose the
-		 * BIO_MAX_PAGES limit on all queues.
-		 *
-		 * TODO: handle users of bio_clone_bioset() differently.
-		 */
-		if (bvecs++ >= BIO_MAX_PAGES)
-			goto split;
-
-		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
 		 */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify()
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (10 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
@ 2017-05-02  3:42 ` NeilBrown
  2017-05-02  3:42 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
  2017-05-11  0:58 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
  13 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02  3:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

This function allocates a bio, then a collection
of pages.  It copes with failure.

It currently uses a mempool() to allocate the bio,
but alloc_page() to allocate the pages.  These fail
in different ways, so the usage is inconsistent.

Change the bio_clone() to bio_clone_kmalloc()
so that no pool is used either for the bio or the pages.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by : Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/bcache/debug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 06f55056aaae..35a5a7210e51 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,7 +110,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bio_vec bv, cbv;
 	struct bvec_iter iter, citer = { 0 };
 
-	check = bio_clone(bio, GFP_NOIO);
+	check = bio_clone_kmalloc(bio, GFP_NOIO);
 	if (!check)
 		return;
 	check->bi_opf = REQ_OP_READ;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
  2017-05-02  3:42 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
@ 2017-05-02  8:06   ` Christoph Hellwig
  2017-05-02 21:47     ` NeilBrown
  2017-05-02  9:40   ` Ming Lei
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d1b04b0e99cf..0975da6bebd9 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -373,8 +373,10 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
>  	return bio_split(bio, sectors, gfp, bs);
>  }
>  
> -extern struct bio_set *bioset_create(unsigned int, unsigned int);
> -extern struct bio_set *bioset_create_nobvec(unsigned int, unsigned int);
> +extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
> +enum {
> +	BIOSET_NEED_BVECS = BIT(0),
> +};

I really hate the BIT macro as it obsfucates what's going on.

Why not just

	BIOSET_NEED_BVECS	= (1 << 0),

which is a lot more intuitive.

Otherwise looks fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/13] block: Improvements to bounce-buffer handling
  2017-05-02  3:42 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
@ 2017-05-02  8:13   ` Christoph Hellwig
  2017-05-02 11:56   ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

although I think we really should kill off the block level bouncing
rather sooner than later.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] blk: make the bioset rescue_workqueue optional.
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
@ 2017-05-02  8:14   ` Christoph Hellwig
  2017-05-02 11:00   ` Ming Lei
  2017-05-02 22:34   ` [PATCH 03/13 V2] " NeilBrown
  2 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:14 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
> This patch converts bioset_create() and
> bioset_create_nobvec() to not create a workqueue so
> alloctions will never trigger punt_bios_to_rescuer().  It
> also introduces bioset_create_rescued() and
> bioset_create_nobvec_rescued() which preserve the old
> behaviour.
> 
> All callers of bioset_create() and bioset_create_nobvec(),
> that are inside block device drivers, are converted to the
> _rescued() version.
> 
> biosets used by filesystems or other top-level users do not
> need rescuing as the bio can never be queued behind other
> bios.  This includes fs_bio_set, blkdev_dio_pool,
> btrfs_bioset, xfs_ioend_bioset, drbd_md_io_bio_set,
> and one allocated by target_core_iblock.c.
> 
> biosets used by md/raid to not need rescuing as
> their usage was recently audited to revised to never
> risk deadlock.
> 
> It is hoped that most, if not all, of the remaining biosets
> can end up being the non-rescued version.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  block/bio.c               |   12 ++++++++++--
>  block/blk-core.c          |    2 +-
>  drivers/md/bcache/super.c |    6 ++++--
>  drivers/md/dm-crypt.c     |    2 +-
>  drivers/md/dm-io.c        |    2 +-
>  drivers/md/dm.c           |    5 +++--
>  include/linux/bio.h       |    1 +
>  7 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 3f1286ed301e..257606e742e0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -363,6 +363,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>  	struct bio_list punt, nopunt;
>  	struct bio *bio;
>  
> +	if (!WARN_ON_ONCE(!bs->rescue_workqueue))
> +		return;
>  	/*
>  	 * In order to guarantee forward progress we must punt only bios that
>  	 * were allocated from this bio_set; otherwise, if there was a bio on
> @@ -474,7 +476,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
>  
>  		if (current->bio_list &&
>  		    (!bio_list_empty(&current->bio_list[0]) ||
> -		     !bio_list_empty(&current->bio_list[1])))
> +		     !bio_list_empty(&current->bio_list[1])) &&
> +		    bs->rescue_workqueue)
>  			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
>  
>  		p = mempool_alloc(bs->bio_pool, gfp_mask);
> @@ -1925,7 +1928,7 @@ EXPORT_SYMBOL(bioset_free);
>   * bioset_create  - Create a bio_set
>   * @pool_size:	Number of bio and bio_vecs to cache in the mempool
>   * @front_pad:	Number of bytes to allocate in front of the returned bio
> - * @flags:	Flags to modify behavior, currently only %BIOSET_NEED_BVECS
> + * @flags:	Flags to modify behavior, currently %BIOSET_NEED_BVECS and %BIOSET_NEED_RESCUER
>   *
>   * Description:
>   *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
> @@ -1936,6 +1939,8 @@ EXPORT_SYMBOL(bioset_free);
>   *    or things will break badly.
>   *    If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated
>   *    for allocating iovecs.  This pool is not needed e.g. for bio_clone_fast().
> + *    If %BIOSET_NEED_RESCUER is set, workqueue is create which can be used to
> + *    dispatch queued requests when the mempool runs out of space.
>   *
>   */
>  struct bio_set *bioset_create(unsigned int pool_size,
> @@ -1971,6 +1976,9 @@ struct bio_set *bioset_create(unsigned int pool_size,
>  			goto bad;
>  	}
>  
> +	if (!(flags & BIOSET_NEED_RESCUER))
> +		return bs;
> +
>  	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
>  	if (!bs->rescue_workqueue)
>  		goto bad;
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3797753f4085..ae51f159a2ca 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>  	if (q->id < 0)
>  		goto fail_q;
>  
> -	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
> +	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);

Please avoid > 80 char lines.

Otherwise looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()
  2017-05-02  3:42 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
@ 2017-05-02  8:15   ` Christoph Hellwig
  2017-05-02 11:22   ` Javier González
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:15 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel, Javier Gonzalez

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/13] xen-blkfront: remove bio splitting.
  2017-05-02  3:42 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
@ 2017-05-02  8:15   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
  2017-05-02  3:42 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
@ 2017-05-02  8:15   ` Christoph Hellwig
  2017-05-02 10:22   ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2017-05-02  8:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
  2017-05-02  3:42 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
  2017-05-02  8:06   ` Christoph Hellwig
@ 2017-05-02  9:40   ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: Ming Lei @ 2017-05-02  9:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
> "flags" arguments are often seen as good API design as they allow
> easy extensibility.
> bioset_create_nobvec() is implemented internally as a variation in
> flags passed to __bioset_create().

>From driver's view, this flag has document benifit too.

> 
> To support future extension, make the internal structure part of the
> API.
> i.e. add a 'flags' argument to bioset_create() and discard
> bioset_create_nobvec().
> 
> Note that the bio_split allocations in drivers/md/raid* do not need
> the bvec mempool - they should have used bioset_create_nobvec().
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: NeilBrown <neilb@suse.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
  2017-05-02  3:42 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
  2017-05-02  8:15   ` Christoph Hellwig
@ 2017-05-02 10:22   ` Ming Lei
  2017-05-02 22:54     ` NeilBrown
  1 sibling, 1 reply; 46+ messages in thread
From: Ming Lei @ 2017-05-02 10:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:26PM +1000, NeilBrown wrote:
> blk_bio_segment_split() makes sure bios have no more than
> BIO_MAX_PAGES entries in the bi_io_vec.
> This was done because bio_clone_bioset() (when given a
> mempool bioset) could not handle larger io_vecs.
> 
> No driver uses bio_clone_bioset() any more, they all
> use bio_clone_fast() if anything, and bio_clone_fast()
> doesn't clone the bi_io_vec.

Maybe in future, some drivers still may try to use 
bio_clone_bioset() again, I suggest to add some comments
on bio_clone_bioset() to make this usage explicitly. Also
better to trigger a warning if a big src bio is passed to
bio_clone_bioset().

Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/13] blk: remove bio_set arg from blk_queue_split()
  2017-05-02  3:42 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
@ 2017-05-02 10:44   ` javigon
  0 siblings, 0 replies; 46+ messages in thread
From: javigon @ 2017-05-02 10:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2945 bytes --]


> On 2 May 2017, at 05.42, NeilBrown <neilb@suse.com> wrote:
> 
> blk_queue_split() is always called with the last arg being q->bio_split,
> where 'q' is the first arg.
> 
> Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
> q->bio_split.
> 
> This is inconsistent and unnecessary.  Remove the last arg and always use
> q->bio_split inside blk_queue_split()
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> block/blk-core.c              |    2 +-
> block/blk-merge.c             |    9 ++++-----
> block/blk-mq.c                |    2 +-
> drivers/block/drbd/drbd_req.c |    2 +-
> drivers/block/pktcdvd.c       |    2 +-
> drivers/block/ps3vram.c       |    2 +-
> drivers/block/rsxx/dev.c      |    2 +-
> drivers/block/umem.c          |    2 +-
> drivers/block/zram/zram_drv.c |    2 +-
> drivers/lightnvm/pblk-init.c  |    4 ++--
> drivers/lightnvm/rrpc.c       |    2 +-
> drivers/md/md.c               |    2 +-
> drivers/s390/block/dcssblk.c  |    2 +-
> drivers/s390/block/xpram.c    |    2 +-
> include/linux/blkdev.h        |    3 +--
> 15 files changed, 19 insertions(+), 21 deletions(-)
> 

> [..]

> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index ae8cd6d5af8b..b3fec8ec55b8 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -33,7 +33,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
> 	 * constraint. Writes can be of arbitrary size.
> 	 */
> 	if (bio_data_dir(bio) == READ) {
> -		blk_queue_split(q, &bio, q->bio_split);
> +		blk_queue_split(q, &bio);
> 		ret = pblk_submit_read(pblk, bio);
> 		if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
> 			bio_put(bio);
> @@ -46,7 +46,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
> 	 * available for user I/O.
> 	 */
> 	if (unlikely(pblk_get_secs(bio) >= pblk_rl_sysfs_rate_show(&pblk->rl)))
> -		blk_queue_split(q, &bio, q->bio_split);
> +		blk_queue_split(q, &bio);
> 
> 	return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
> }
> diff --git a/drivers/lightnvm/rrpc.c b/drivers/lightnvm/rrpc.c
> index cf0e28a0ff61..8e241056b141 100644
> --- a/drivers/lightnvm/rrpc.c
> +++ b/drivers/lightnvm/rrpc.c
> @@ -994,7 +994,7 @@ static blk_qc_t rrpc_make_rq(struct request_queue *q, struct bio *bio)
> 	struct nvm_rq *rqd;
> 	int err;
> 
> -	blk_queue_split(q, &bio, q->bio_split);
> +	blk_queue_split(q, &bio);
> 
> 	if (bio_op(bio) == REQ_OP_DISCARD) {
> 		rrpc_discard(rrpc, bio);
> 

Hi Neil,

Thanks for adding the LightNVM changes. It works fine on our test setup.

Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] blk: make the bioset rescue_workqueue optional.
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
  2017-05-02  8:14   ` Christoph Hellwig
@ 2017-05-02 11:00   ` Ming Lei
  2017-05-02 22:10     ` NeilBrown
  2017-05-02 22:34   ` [PATCH 03/13 V2] " NeilBrown
  2 siblings, 1 reply; 46+ messages in thread
From: Ming Lei @ 2017-05-02 11:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
> This patch converts bioset_create() and
> bioset_create_nobvec() to not create a workqueue so
> alloctions will never trigger punt_bios_to_rescuer().  It
> also introduces bioset_create_rescued() and
> bioset_create_nobvec_rescued() which preserve the old
> behaviour.
> 
> All callers of bioset_create() and bioset_create_nobvec(),
> that are inside block device drivers, are converted to the
> _rescued() version.

The above comment about *_nobvec() need to be updated.

> 
> biosets used by filesystems or other top-level users do not
> need rescuing as the bio can never be queued behind other
> bios.  This includes fs_bio_set, blkdev_dio_pool,
> btrfs_bioset, xfs_ioend_bioset, drbd_md_io_bio_set,
> and one allocated by target_core_iblock.c.
> 
> biosets used by md/raid to not need rescuing as
> their usage was recently audited to revised to never
> risk deadlock.
> 
> It is hoped that most, if not all, of the remaining biosets
> can end up being the non-rescued version.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  block/bio.c               |   12 ++++++++++--
>  block/blk-core.c          |    2 +-
>  drivers/md/bcache/super.c |    6 ++++--
>  drivers/md/dm-crypt.c     |    2 +-
>  drivers/md/dm-io.c        |    2 +-
>  drivers/md/dm.c           |    5 +++--
>  include/linux/bio.h       |    1 +
>  7 files changed, 21 insertions(+), 9 deletions(-)

DRBD can stack on block device too, so I guess it might need
BIOSET_NEED_RESCUER?

> 
> diff --git a/block/bio.c b/block/bio.c
> index 3f1286ed301e..257606e742e0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -363,6 +363,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>  	struct bio_list punt, nopunt;
>  	struct bio *bio;
>  
> +	if (!WARN_ON_ONCE(!bs->rescue_workqueue))
> +		return;

I guess the above check should have been the following?

	if (WARN_ON_ONCE(!bs->rescue_workqueue))
		return;

>  	/*
>  	 * In order to guarantee forward progress we must punt only bios that
>  	 * were allocated from this bio_set; otherwise, if there was a bio on
> @@ -474,7 +476,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
>  
>  		if (current->bio_list &&
>  		    (!bio_list_empty(&current->bio_list[0]) ||
> -		     !bio_list_empty(&current->bio_list[1])))
> +		     !bio_list_empty(&current->bio_list[1])) &&
> +		    bs->rescue_workqueue)
>  			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
>  
>  		p = mempool_alloc(bs->bio_pool, gfp_mask);
> @@ -1925,7 +1928,7 @@ EXPORT_SYMBOL(bioset_free);
>   * bioset_create  - Create a bio_set
>   * @pool_size:	Number of bio and bio_vecs to cache in the mempool
>   * @front_pad:	Number of bytes to allocate in front of the returned bio
> - * @flags:	Flags to modify behavior, currently only %BIOSET_NEED_BVECS
> + * @flags:	Flags to modify behavior, currently %BIOSET_NEED_BVECS and %BIOSET_NEED_RESCUER
>   *
>   * Description:
>   *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
> @@ -1936,6 +1939,8 @@ EXPORT_SYMBOL(bioset_free);
>   *    or things will break badly.
>   *    If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated
>   *    for allocating iovecs.  This pool is not needed e.g. for bio_clone_fast().
> + *    If %BIOSET_NEED_RESCUER is set, workqueue is create which can be used to
> + *    dispatch queued requests when the mempool runs out of space.
>   *
>   */
>  struct bio_set *bioset_create(unsigned int pool_size,
> @@ -1971,6 +1976,9 @@ struct bio_set *bioset_create(unsigned int pool_size,
>  			goto bad;
>  	}
>  
> +	if (!(flags & BIOSET_NEED_RESCUER))
> +		return bs;
> +
>  	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
>  	if (!bs->rescue_workqueue)
>  		goto bad;
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3797753f4085..ae51f159a2ca 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>  	if (q->id < 0)
>  		goto fail_q;
>  
> -	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
> +	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
>  	if (!q->bio_split)
>  		goto fail_id;
>  
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 8b9dae98cc7e..bb50991a82d3 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -786,7 +786,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
>  
>  	minor *= BCACHE_MINORS;
>  
> -	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
> +	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio),
> +					   BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER)) ||
>  	    !(d->disk = alloc_disk(BCACHE_MINORS))) {
>  		ida_simple_remove(&bcache_minor, minor);
>  		return -ENOMEM;
> @@ -1520,7 +1521,8 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>  				sizeof(struct bbio) + sizeof(struct bio_vec) *
>  				bucket_pages(c))) ||
>  	    !(c->fill_iter = mempool_create_kmalloc_pool(1, iter_size)) ||
> -	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
> +	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio),
> +					   BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER)) ||
>  	    !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) ||
>  	    !(c->moving_gc_wq = alloc_workqueue("bcache_gc",
>  						WQ_MEM_RECLAIM, 0)) ||
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 5a98c1eb87d9..87155eb7b39c 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1936,7 +1936,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		goto bad;
>  	}
>  
> -	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS);
> +	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
>  	if (!cc->bs) {
>  		ti->error = "Cannot allocate crypt bioset";
>  		goto bad;
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index 1d86d1d39d48..95b9ac98fe0c 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -58,7 +58,7 @@ struct dm_io_client *dm_io_client_create(void)
>  	if (!client->pool)
>  		goto bad;
>  
> -	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS);
> +	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
>  	if (!client->bios)
>  		goto bad;
>  
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index e21a7d58c8ef..4f96556cb8c5 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1014,7 +1014,8 @@ static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
>  
>  		while ((bio = bio_list_pop(&list))) {
>  			struct bio_set *bs = bio->bi_pool;
> -			if (unlikely(!bs) || bs == fs_bio_set) {
> +			if (unlikely(!bs) || bs == fs_bio_set ||
> +			    !bs->rescue_workqueue) {
>  				bio_list_add(&current->bio_list[i], bio);
>  				continue;
>  			}
> @@ -2601,7 +2602,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t
>  		BUG();
>  	}
>  
> -	pools->bs = bioset_create(pool_size, front_pad, 0);
> +	pools->bs = bioset_create(pool_size, front_pad, BIOSET_NEED_RESCUER);
>  	if (!pools->bs)
>  		goto out;
>  
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 0975da6bebd9..40e5d7b62f29 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -376,6 +376,7 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
>  extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
>  enum {
>  	BIOSET_NEED_BVECS = BIT(0),
> +	BIOSET_NEED_RESCUER = BIT(1),
>  };
>  extern void bioset_free(struct bio_set *);
>  extern mempool_t *biovec_create_pool(int pool_entries);
> 
> 

Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()
  2017-05-02  3:42 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
  2017-05-02  8:15   ` Christoph Hellwig
@ 2017-05-02 11:22   ` Javier González
  2017-05-02 21:51     ` NeilBrown
  1 sibling, 1 reply; 46+ messages in thread
From: Javier González @ 2017-05-02 11:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

> On 2 May 2017, at 05.42, NeilBrown <neilb@suse.com> wrote:
> 
> pblk_submit_read() uses bio_clone_bioset() but doesn't change the
> io_vec, so bio_clone_fast() is a better choice.
> 
> It also uses fs_bio_set which is intended for filesystems.  Using it
> in a device driver can deadlock.
> So allocate a new bioset, and and use bio_clone_fast().
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> drivers/lightnvm/pblk-init.c |   12 +++++++++++-
> drivers/lightnvm/pblk-read.c |    2 +-
> drivers/lightnvm/pblk.h      |    1 +
> 3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index b3fec8ec55b8..aaefbccce30e 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -23,6 +23,7 @@
> static struct kmem_cache *pblk_blk_ws_cache, *pblk_rec_cache, *pblk_r_rq_cache,
> 					*pblk_w_rq_cache, *pblk_line_meta_cache;
> static DECLARE_RWSEM(pblk_lock);
> +struct bio_set *pblk_bio_set;
> 
> static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
> 			  struct bio *bio)
> @@ -946,11 +947,20 @@ static struct nvm_tgt_type tt_pblk = {
> 
> static int __init pblk_module_init(void)
> {
> -	return nvm_register_tgt_type(&tt_pblk);
> +	int ret;
> +
> +	pblk_bio_set = bioset_create(BIO_POOL_SIZE, 0, 0);
> +	if (!pblk_bio_set)
> +		return -ENOMEM;
> +	ret = nvm_register_tgt_type(&tt_pblk);
> +	if (ret)
> +		bioset_free(pblk_bio_set);
> +	return ret;
> }
> 
> static void pblk_module_exit(void)
> {
> +	bioset_free(pblk_bio_set);
> 	nvm_unregister_tgt_type(&tt_pblk);
> }
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 4a12f14d78c6..8ee0c50a7a71 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -342,7 +342,7 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
> 		struct pblk_r_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> 
> 		/* Clone read bio to deal with read errors internally */
> -		int_bio = bio_clone_bioset(bio, GFP_KERNEL, fs_bio_set);
> +		int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set);
> 		if (!int_bio) {
> 			pr_err("pblk: could not clone read bio\n");
> 			return NVM_IO_ERR;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 99f3186b5288..95b665f23925 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -702,6 +702,7 @@ void pblk_write_should_kick(struct pblk *pblk);
> /*
>  * pblk read path
>  */
> +extern struct bio_set *pblk_bio_set;
> int pblk_submit_read(struct pblk *pblk, struct bio *bio);
> int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
> 			unsigned int nr_secs, unsigned int *secs_to_gc,

Hi Neil,

Looks good. Thanks for fixing this. I did not know that bio_clone_bioset
was not supposed to be used on drivers.

Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split.
  2017-05-02  3:42 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
@ 2017-05-02 11:54   ` Ming Lei
  2017-05-02 23:21   ` [PATCH 04/13 V2] " NeilBrown
  1 sibling, 0 replies; 46+ messages in thread
From: Ming Lei @ 2017-05-02 11:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
> A rescuing bioset is only useful if there might be bios from
> that same bioset on the bio_list_on_stack queue at a time
> when bio_alloc_bioset() is called.  This never applies to
> q->bio_split.
> 
> Allocations from q->bio_split are only ever made from
> blk_queue_split() which is only ever called early in each of
> various make_request_fn()s.  The original bio (call this A)
> is then passed to generic_make_request() and is placed on
> the bio_list_on_stack queue, and the bio that was allocated
> from q->bio_split (B) is processed.
> 
> The processing of this may cause other bios to be passed to
> generic_make_request() or may even cause the bio B itself to
> be passed, possible after some prefix has been split off
> (using some other bioset).
> 
> generic_make_request() now guarantees that all of these bios
> (B and dependants) will be fully processed before the tail
> of the original bio A gets handled.  None of these early bios
> can possible trigger an allocation from the original
> q->bio_split as they are either too small to require
> splitting or (more likely) are destined for a different queue.
> 
> The next time that the original q->bio_split might be used
> by this thread is when A is processed again, as it might
> still be too big to handle directly.  By this time there
> cannot be any other bios allocated from q->bio_split in the
> generic_make_request() queue.  So no rescuing will ever be
> needed.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  block/blk-core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ae51f159a2ca..3797753f4085 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -730,7 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
>  	if (q->id < 0)
>  		goto fail_q;
>  
> -	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS | BIOSET_NEED_RESCUER);
> +	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
>  	if (!q->bio_split)
>  		goto fail_id;
>  
> 
> 

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/13] block: Improvements to bounce-buffer handling
  2017-05-02  3:42 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
  2017-05-02  8:13   ` Christoph Hellwig
@ 2017-05-02 11:56   ` Ming Lei
  1 sibling, 0 replies; 46+ messages in thread
From: Ming Lei @ 2017-05-02 11:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
> Since commit 23688bf4f830 ("block: ensure to split after potentially
> bouncing a bio") blk_queue_bounce() is called *before*
> blk_queue_split().
> This means that:
>  1/ the comments blk_queue_split() about bounce buffers are
>     irrelevant, and
>  2/ a very large bio (more than BIO_MAX_PAGES) will no longer be
>     split before it arrives at blk_queue_bounce(), leading to the
>     possibility that bio_clone_bioset() will fail and a NULL
>     will be dereferenced.
> 
> Separately, blk_queue_bounce() shouldn't use fs_bio_set as the bio
> being copied could be from the same set, and this could lead to a
> deadlock.
> 
> So:
>  - allocate 2 private biosets for blk_queue_bounce, one for
>    splitting enormous bios and one for cloning bios.
>  - add code to split a bio that exceeds BIO_MAX_PAGES.
>  - Fix up the comments in blk_queue_split()
> 
> Credit-to: Ming Lei <tom.leiming@gmail.com> (suggested using single bio_for_each_segment loop)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  block/blk-merge.c |   14 ++++----------
>  block/bounce.c    |   32 ++++++++++++++++++++++++++------
>  2 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index d59074556703..51c84540d3bb 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -117,17 +117,11 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  		 * each holds at most BIO_MAX_PAGES bvecs because
>  		 * bio_clone() can fail to allocate big bvecs.
>  		 *
> -		 * It should have been better to apply the limit per
> -		 * request queue in which bio_clone() is involved,
> -		 * instead of globally. The biggest blocker is the
> -		 * bio_clone() in bio bounce.
> +		 * Those drivers which will need to use bio_clone()
> +		 * should tell us in some way.  For now, impose the
> +		 * BIO_MAX_PAGES limit on all queues.
>  		 *
> -		 * If bio is splitted by this reason, we should have
> -		 * allowed to continue bios merging, but don't do
> -		 * that now for making the change simple.
> -		 *
> -		 * TODO: deal with bio bounce's bio_clone() gracefully
> -		 * and convert the global limit into per-queue limit.
> +		 * TODO: handle users of bio_clone() differently.
>  		 */
>  		if (bvecs++ >= BIO_MAX_PAGES)
>  			goto split;
> diff --git a/block/bounce.c b/block/bounce.c
> index 1cb5dd3a5da1..087ecc2dc66c 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -26,6 +26,7 @@
>  #define POOL_SIZE	64
>  #define ISA_POOL_SIZE	16
>  
> +struct bio_set *bounce_bio_set, *bounce_bio_split;
>  static mempool_t *page_pool, *isa_page_pool;
>  
>  #if defined(CONFIG_HIGHMEM) || defined(CONFIG_NEED_BOUNCE_POOL)
> @@ -40,6 +41,14 @@ static __init int init_emergency_pool(void)
>  	BUG_ON(!page_pool);
>  	pr_info("pool size: %d pages\n", POOL_SIZE);
>  
> +	bounce_bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
> +	BUG_ON(!bounce_bio_set);
> +	if (bioset_integrity_create(bounce_bio_set, BIO_POOL_SIZE))
> +		BUG_ON(1);
> +
> +	bounce_bio_split = bioset_create(BIO_POOL_SIZE, 0, 0);
> +	BUG_ON(!bounce_bio_split);
> +
>  	return 0;
>  }
>  
> @@ -186,15 +195,26 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
>  	int rw = bio_data_dir(*bio_orig);
>  	struct bio_vec *to, from;
>  	struct bvec_iter iter;
> -	unsigned i;
> +	unsigned i = 0;
> +	bool bounce = false;
> +	int sectors = 0;
>  
> -	bio_for_each_segment(from, *bio_orig, iter)
> +	bio_for_each_segment(from, *bio_orig, iter) {
> +		if (i++ < BIO_MAX_PAGES)
> +			sectors += from.bv_len >> 9;
>  		if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
> -			goto bounce;
> +			bounce = true;
> +	}
> +	if (!bounce)
> +		return;
>  
> -	return;
> -bounce:
> -	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
> +	if (sectors < bio_sectors(*bio_orig)) {
> +		bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split);
> +		bio_chain(bio, *bio_orig);
> +		generic_make_request(*bio_orig);
> +		*bio_orig = bio;
> +	}
> +	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);
>  
>  	bio_for_each_segment_all(to, bio, i) {
>  		struct page *page = to->bv_page;

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
  2017-05-02  8:06   ` Christoph Hellwig
@ 2017-05-02 21:47     ` NeilBrown
  0 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02 21:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

On Tue, May 02 2017, Christoph Hellwig wrote:

>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index d1b04b0e99cf..0975da6bebd9 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -373,8 +373,10 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
>>  	return bio_split(bio, sectors, gfp, bs);
>>  }
>>  
>> -extern struct bio_set *bioset_create(unsigned int, unsigned int);
>> -extern struct bio_set *bioset_create_nobvec(unsigned int, unsigned int);
>> +extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
>> +enum {
>> +	BIOSET_NEED_BVECS = BIT(0),
>> +};
>
> I really hate the BIT macro as it obsfucates what's going on.

So post a patch to remove it from the tree.
I happen to like it, but I wouldn't fight for it.

>
> Why not just
>
> 	BIOSET_NEED_BVECS	= (1 << 0),
>
> which is a lot more intuitive.
>
> Otherwise looks fine to me:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()
  2017-05-02 11:22   ` Javier González
@ 2017-05-02 21:51     ` NeilBrown
  2017-05-03  6:42       ` Javier González
  0 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02 21:51 UTC (permalink / raw)
  To: Javier González; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3524 bytes --]

On Tue, May 02 2017, Javier González wrote:

>> On 2 May 2017, at 05.42, NeilBrown <neilb@suse.com> wrote:
>> 
>> pblk_submit_read() uses bio_clone_bioset() but doesn't change the
>> io_vec, so bio_clone_fast() is a better choice.
>> 
>> It also uses fs_bio_set which is intended for filesystems.  Using it
>> in a device driver can deadlock.
>> So allocate a new bioset, and and use bio_clone_fast().
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> drivers/lightnvm/pblk-init.c |   12 +++++++++++-
>> drivers/lightnvm/pblk-read.c |    2 +-
>> drivers/lightnvm/pblk.h      |    1 +
>> 3 files changed, 13 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index b3fec8ec55b8..aaefbccce30e 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -23,6 +23,7 @@
>> static struct kmem_cache *pblk_blk_ws_cache, *pblk_rec_cache, *pblk_r_rq_cache,
>> 					*pblk_w_rq_cache, *pblk_line_meta_cache;
>> static DECLARE_RWSEM(pblk_lock);
>> +struct bio_set *pblk_bio_set;
>> 
>> static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
>> 			  struct bio *bio)
>> @@ -946,11 +947,20 @@ static struct nvm_tgt_type tt_pblk = {
>> 
>> static int __init pblk_module_init(void)
>> {
>> -	return nvm_register_tgt_type(&tt_pblk);
>> +	int ret;
>> +
>> +	pblk_bio_set = bioset_create(BIO_POOL_SIZE, 0, 0);
>> +	if (!pblk_bio_set)
>> +		return -ENOMEM;
>> +	ret = nvm_register_tgt_type(&tt_pblk);
>> +	if (ret)
>> +		bioset_free(pblk_bio_set);
>> +	return ret;
>> }
>> 
>> static void pblk_module_exit(void)
>> {
>> +	bioset_free(pblk_bio_set);
>> 	nvm_unregister_tgt_type(&tt_pblk);
>> }
>> 
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 4a12f14d78c6..8ee0c50a7a71 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -342,7 +342,7 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>> 		struct pblk_r_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> 
>> 		/* Clone read bio to deal with read errors internally */
>> -		int_bio = bio_clone_bioset(bio, GFP_KERNEL, fs_bio_set);
>> +		int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set);
>> 		if (!int_bio) {
>> 			pr_err("pblk: could not clone read bio\n");
>> 			return NVM_IO_ERR;
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 99f3186b5288..95b665f23925 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -702,6 +702,7 @@ void pblk_write_should_kick(struct pblk *pblk);
>> /*
>>  * pblk read path
>>  */
>> +extern struct bio_set *pblk_bio_set;
>> int pblk_submit_read(struct pblk *pblk, struct bio *bio);
>> int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
>> 			unsigned int nr_secs, unsigned int *secs_to_gc,
>
> Hi Neil,
>
> Looks good. Thanks for fixing this. I did not know that bio_clone_bioset
> was not supposed to be used on drivers.

Prior to my patchset, using bio_clone_bioset() wasn't wrong in drivers,
though it was a waste when bio_clone_fast() would to just as well as is
more efficient.
After my patchset, using it can be problematic.  I'm wondering what I
should do to encourage those problems to be more visible so that if
people to us it, they'll get a warning or something.

Thanks,
NeilBrown


>
> Reviewed-by: Javier González <javier@cnexlabs.com>
> Tested-by: Javier González <javier@cnexlabs.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] blk: make the bioset rescue_workqueue optional.
  2017-05-02 11:00   ` Ming Lei
@ 2017-05-02 22:10     ` NeilBrown
  0 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02 22:10 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2139 bytes --]

On Tue, May 02 2017, Ming Lei wrote:

> On Tue, May 02, 2017 at 01:42:24PM +1000, NeilBrown wrote:
>> This patch converts bioset_create() and
>> bioset_create_nobvec() to not create a workqueue so
>> alloctions will never trigger punt_bios_to_rescuer().  It
>> also introduces bioset_create_rescued() and
>> bioset_create_nobvec_rescued() which preserve the old
>> behaviour.
>> 
>> All callers of bioset_create() and bioset_create_nobvec(),
>> that are inside block device drivers, are converted to the
>> _rescued() version.
>
> The above comment about *_nobvec() need to be updated.

Yes - thanks.  I'll post an updated version.


>
>> 
>> biosets used by filesystems or other top-level users do not
>> need rescuing as the bio can never be queued behind other
>> bios.  This includes fs_bio_set, blkdev_dio_pool,
>> btrfs_bioset, xfs_ioend_bioset, drbd_md_io_bio_set,
>> and one allocated by target_core_iblock.c.
>> 
>> biosets used by md/raid to not need rescuing as
>> their usage was recently audited to revised to never
>> risk deadlock.
>> 
>> It is hoped that most, if not all, of the remaining biosets
>> can end up being the non-rescued version.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  block/bio.c               |   12 ++++++++++--
>>  block/blk-core.c          |    2 +-
>>  drivers/md/bcache/super.c |    6 ++++--
>>  drivers/md/dm-crypt.c     |    2 +-
>>  drivers/md/dm-io.c        |    2 +-
>>  drivers/md/dm.c           |    5 +++--
>>  include/linux/bio.h       |    1 +
>>  7 files changed, 21 insertions(+), 9 deletions(-)
>
> DRBD can stack on block device too, so I guess it might need
> BIOSET_NEED_RESCUER?

A bioset only needs a rescuer is an allocation is made from the bioset
inside a make_request_fn.
The bioset in DRBD is drbd_md_io_bio_set is used for metadata IO, and
I think it is only used in a worker thread, where rescuing cannot help.
But that is probably subtle enough that I should leave the removal
of the rescuer to a subsequent patch.
So I'll give DRBD the flag when I update the changelog comment.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/13 V2] blk: make the bioset rescue_workqueue optional.
  2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
  2017-05-02  8:14   ` Christoph Hellwig
  2017-05-02 11:00   ` Ming Lei
@ 2017-05-02 22:34   ` NeilBrown
  2017-05-03  3:24     ` Ming Lei
  2 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02 22:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 8474 bytes --]

From 09017acf74ec4df674b78ca66f0924187f10d8a4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Fri, 10 Mar 2017 13:59:50 +1100
Subject: [PATCH] blk: make the bioset rescue_workqueue optional.

This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer().  It
also introduces a new flag BIOSET_NEED_RESCUER() which tells
bioset_create() to preserve the old behavior.

All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER().

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.

biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Signed-off-by: NeilBrown <neilb@suse.com>
---

This version fixes the WARN_ON_ONCE() typo, updates the above comment,
wraps long lines, and preserves the rescuer behavior for DRBD.

Thanks for the review.

NeilBrown


 block/bio.c                    | 13 +++++++++++--
 block/blk-core.c               |  3 ++-
 drivers/block/drbd/drbd_main.c |  4 +++-
 drivers/md/bcache/super.c      |  8 ++++++--
 drivers/md/dm-crypt.c          |  3 ++-
 drivers/md/dm-io.c             |  3 ++-
 drivers/md/dm.c                |  5 +++--
 include/linux/bio.h            |  1 +
 8 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3f1286ed301e..f3a6ef4f5c42 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -363,6 +363,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
 	struct bio_list punt, nopunt;
 	struct bio *bio;
 
+	if (WARN_ON_ONCE(!bs->rescue_workqueue))
+		return;
 	/*
 	 * In order to guarantee forward progress we must punt only bios that
 	 * were allocated from this bio_set; otherwise, if there was a bio on
@@ -474,7 +476,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
 
 		if (current->bio_list &&
 		    (!bio_list_empty(&current->bio_list[0]) ||
-		     !bio_list_empty(&current->bio_list[1])))
+		     !bio_list_empty(&current->bio_list[1])) &&
+		    bs->rescue_workqueue)
 			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
 
 		p = mempool_alloc(bs->bio_pool, gfp_mask);
@@ -1925,7 +1928,8 @@ EXPORT_SYMBOL(bioset_free);
  * bioset_create  - Create a bio_set
  * @pool_size:	Number of bio and bio_vecs to cache in the mempool
  * @front_pad:	Number of bytes to allocate in front of the returned bio
- * @flags:	Flags to modify behavior, currently only %BIOSET_NEED_BVECS
+ * @flags:	Flags to modify behavior, currently %BIOSET_NEED_BVECS
+ *              and %BIOSET_NEED_RESCUER
  *
  * Description:
  *    Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
@@ -1936,6 +1940,8 @@ EXPORT_SYMBOL(bioset_free);
  *    or things will break badly.
  *    If %BIOSET_NEED_BVECS is set in @flags, a separate pool will be allocated
  *    for allocating iovecs.  This pool is not needed e.g. for bio_clone_fast().
+ *    If %BIOSET_NEED_RESCUER is set, workqueue is create which can be used to
+ *    dispatch queued requests when the mempool runs out of space.
  *
  */
 struct bio_set *bioset_create(unsigned int pool_size,
@@ -1971,6 +1977,9 @@ struct bio_set *bioset_create(unsigned int pool_size,
 			goto bad;
 	}
 
+	if (!(flags & BIOSET_NEED_RESCUER))
+		return bs;
+
 	bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
 	if (!bs->rescue_workqueue)
 		goto bad;
diff --git a/block/blk-core.c b/block/blk-core.c
index 3797753f4085..bc66cd595bef 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -730,7 +730,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_q;
 
-	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
+	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, (BIOSET_NEED_BVECS |
+							BIOSET_NEED_RESCUER));
 	if (!q->bio_split)
 		goto fail_id;
 
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index b395fe391171..bdf51b6977cf 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2165,7 +2165,9 @@ static int drbd_create_mempools(void)
 		goto Enomem;
 
 	/* mempools */
-	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0, BIOSET_NEED_BVECS);
+	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0,
+					   BIOSET_NEED_BVECS |
+					   BIOSET_NEED_RESCUER);
 	if (drbd_md_io_bio_set == NULL)
 		goto Enomem;
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8b9dae98cc7e..f8d4c9543c45 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -786,7 +786,9 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 
 	minor *= BCACHE_MINORS;
 
-	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
+	if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio),
+					   BIOSET_NEED_BVECS |
+					   BIOSET_NEED_RESCUER)) ||
 	    !(d->disk = alloc_disk(BCACHE_MINORS))) {
 		ida_simple_remove(&bcache_minor, minor);
 		return -ENOMEM;
@@ -1520,7 +1522,9 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 				sizeof(struct bbio) + sizeof(struct bio_vec) *
 				bucket_pages(c))) ||
 	    !(c->fill_iter = mempool_create_kmalloc_pool(1, iter_size)) ||
-	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio), BIOSET_NEED_BVECS)) ||
+	    !(c->bio_split = bioset_create(4, offsetof(struct bbio, bio),
+					   BIOSET_NEED_BVECS |
+					   BIOSET_NEED_RESCUER)) ||
 	    !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) ||
 	    !(c->moving_gc_wq = alloc_workqueue("bcache_gc",
 						WQ_MEM_RECLAIM, 0)) ||
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 5a98c1eb87d9..1ddb2b160e14 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1936,7 +1936,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		goto bad;
 	}
 
-	cc->bs = bioset_create(MIN_IOS, 0, BIOSET_NEED_BVECS);
+	cc->bs = bioset_create(MIN_IOS, 0, (BIOSET_NEED_BVECS |
+					    BIOSET_NEED_RESCUER));
 	if (!cc->bs) {
 		ti->error = "Cannot allocate crypt bioset";
 		goto bad;
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1d86d1d39d48..aee5aae1c992 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -58,7 +58,8 @@ struct dm_io_client *dm_io_client_create(void)
 	if (!client->pool)
 		goto bad;
 
-	client->bios = bioset_create(min_ios, 0, BIOSET_NEED_BVECS);
+	client->bios = bioset_create(min_ios, 0, (BIOSET_NEED_BVECS |
+						  BIOSET_NEED_RESCUER));
 	if (!client->bios)
 		goto bad;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e21a7d58c8ef..4f96556cb8c5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1014,7 +1014,8 @@ static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
 
 		while ((bio = bio_list_pop(&list))) {
 			struct bio_set *bs = bio->bi_pool;
-			if (unlikely(!bs) || bs == fs_bio_set) {
+			if (unlikely(!bs) || bs == fs_bio_set ||
+			    !bs->rescue_workqueue) {
 				bio_list_add(&current->bio_list[i], bio);
 				continue;
 			}
@@ -2601,7 +2602,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t
 		BUG();
 	}
 
-	pools->bs = bioset_create(pool_size, front_pad, 0);
+	pools->bs = bioset_create(pool_size, front_pad, BIOSET_NEED_RESCUER);
 	if (!pools->bs)
 		goto out;
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0975da6bebd9..40e5d7b62f29 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -376,6 +376,7 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
 extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
 enum {
 	BIOSET_NEED_BVECS = BIT(0),
+	BIOSET_NEED_RESCUER = BIT(1),
 };
 extern void bioset_free(struct bio_set *);
 extern mempool_t *biovec_create_pool(int pool_entries);
-- 
2.12.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
  2017-05-02 10:22   ` Ming Lei
@ 2017-05-02 22:54     ` NeilBrown
  2017-05-02 23:50       ` Ming Lei
  0 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-02 22:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On Tue, May 02 2017, Ming Lei wrote:

> On Tue, May 02, 2017 at 01:42:26PM +1000, NeilBrown wrote:
>> blk_bio_segment_split() makes sure bios have no more than
>> BIO_MAX_PAGES entries in the bi_io_vec.
>> This was done because bio_clone_bioset() (when given a
>> mempool bioset) could not handle larger io_vecs.
>> 
>> No driver uses bio_clone_bioset() any more, they all
>> use bio_clone_fast() if anything, and bio_clone_fast()
>> doesn't clone the bi_io_vec.
>
> Maybe in future, some drivers still may try to use 
> bio_clone_bioset() again, I suggest to add some comments
> on bio_clone_bioset() to make this usage explicitly. Also
> better to trigger a warning if a big src bio is passed to
> bio_clone_bioset().

There are now just two users for bio_clone_bioset(): bounce.c and btrfs.

Christoph wants to get rid of bounce.c, which would leave one.

I'd have to drill into the btrfs code to be sure, but it might be that
btrfs only needs bio_clone_fast().  That would leave zero users.
Then we wouldn't need a warning at all.

So I agree that we need to guard against future incorrect usage.  I'm not
yet sure what the best approach is.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 07/13 V2] drbd: use bio_clone_fast() instead of bio_clone()
  2017-05-02  3:42 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
@ 2017-05-02 23:20   ` NeilBrown
  0 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02 23:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]


drbd does not modify the bi_io_vec of the cloned bio,
so there is no need to clone that part.  So bio_clone_fast()
is the better choice.
For bio_clone_fast() we need to specify a bio_set.
We could use fs_bio_set, which bio_clone() uses, or
drbd_md_io_bio_set, which drbd uses for metadata, but it is
generally best to avoid sharing bio_sets unless you can
be certain that there are no interdependencies.

So create a new bio_set, drbd_io_bio_set, and use bio_clone_fast().

Also remove a "XXX cannot fail ???" comment because it definitely
cannot fail - bio_clone_fast() doesn't fail if the GFP flags allow for
sleeping.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---

This patch needed to be refreshed after long lines were
wrapped in an earlier patch.
Also added the BIOSET_NEED_RESCUER flag for new bioset as
it is not trivially obvious that this isn't needed.

NeilBrown


 drivers/block/drbd/drbd_int.h  | 3 +++
 drivers/block/drbd/drbd_main.c | 9 +++++++++
 drivers/block/drbd/drbd_req.h  | 2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index d5da45bb03a6..f91982515a6b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1441,6 +1441,9 @@ extern struct bio_set *drbd_md_io_bio_set;
 /* to allocate from that set */
 extern struct bio *bio_alloc_drbd(gfp_t gfp_mask);
 
+/* And a bio_set for cloning */
+extern struct bio_set *drbd_io_bio_set;
+
 extern struct mutex resources_mutex;
 
 extern int conn_lowest_minor(struct drbd_connection *connection);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index bdf51b6977cf..90680034ef57 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -128,6 +128,7 @@ mempool_t *drbd_request_mempool;
 mempool_t *drbd_ee_mempool;
 mempool_t *drbd_md_io_page_pool;
 struct bio_set *drbd_md_io_bio_set;
+struct bio_set *drbd_io_bio_set;
 
 /* I do not use a standard mempool, because:
    1) I want to hand out the pre-allocated objects first.
@@ -2098,6 +2099,8 @@ static void drbd_destroy_mempools(void)
 
 	/* D_ASSERT(device, atomic_read(&drbd_pp_vacant)==0); */
 
+	if (drbd_io_bio_set)
+		bioset_free(drbd_io_bio_set);
 	if (drbd_md_io_bio_set)
 		bioset_free(drbd_md_io_bio_set);
 	if (drbd_md_io_page_pool)
@@ -2115,6 +2118,7 @@ static void drbd_destroy_mempools(void)
 	if (drbd_al_ext_cache)
 		kmem_cache_destroy(drbd_al_ext_cache);
 
+	drbd_io_bio_set      = NULL;
 	drbd_md_io_bio_set   = NULL;
 	drbd_md_io_page_pool = NULL;
 	drbd_ee_mempool      = NULL;
@@ -2142,6 +2146,7 @@ static int drbd_create_mempools(void)
 	drbd_pp_pool         = NULL;
 	drbd_md_io_page_pool = NULL;
 	drbd_md_io_bio_set   = NULL;
+	drbd_io_bio_set      = NULL;
 
 	/* caches */
 	drbd_request_cache = kmem_cache_create(
@@ -2165,6 +2170,10 @@ static int drbd_create_mempools(void)
 		goto Enomem;
 
 	/* mempools */
+	drbd_io_bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_RESCUER);
+	if (drbd_io_bio_set == NULL)
+		goto Enomem;
+
 	drbd_md_io_bio_set = bioset_create(DRBD_MIN_POOL_PAGES, 0,
 					   BIOSET_NEED_BVECS |
 					   BIOSET_NEED_RESCUER);
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index eb49e7f2da91..9e1866ab238f 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -263,7 +263,7 @@ enum drbd_req_state_bits {
 static inline void drbd_req_make_private_bio(struct drbd_request *req, struct bio *bio_src)
 {
 	struct bio *bio;
-	bio = bio_clone(bio_src, GFP_NOIO); /* XXX cannot fail?? */
+	bio = bio_clone_fast(bio_src, GFP_NOIO, drbd_io_bio_set);
 
 	req->private_bio = bio;
 
-- 
2.12.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/13 V2] blk: use non-rescuing bioset for q->bio_split.
  2017-05-02  3:42 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
  2017-05-02 11:54   ` Ming Lei
@ 2017-05-02 23:21   ` NeilBrown
  1 sibling, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-05-02 23:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]


A rescuing bioset is only useful if there might be bios from
that same bioset on the bio_list_on_stack queue at a time
when bio_alloc_bioset() is called.  This never applies to
q->bio_split.

Allocations from q->bio_split are only ever made from
blk_queue_split() which is only ever called early in each of
various make_request_fn()s.  The original bio (call this A)
is then passed to generic_make_request() and is placed on
the bio_list_on_stack queue, and the bio that was allocated
from q->bio_split (B) is processed.

The processing of this may cause other bios to be passed to
generic_make_request() or may even cause the bio B itself to
be passed, possible after some prefix has been split off
(using some other bioset).

generic_make_request() now guarantees that all of these bios
(B and dependants) will be fully processed before the tail
of the original bio A gets handled.  None of these early bios
can possible trigger an allocation from the original
q->bio_split as they are either too small to require
splitting or (more likely) are destined for a different queue.

The next time that the original q->bio_split might be used
by this thread is when A is processed again, as it might
still be too big to handle directly.  By this time there
cannot be any other bios allocated from q->bio_split in the
generic_make_request() queue.  So no rescuing will ever be
needed.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---

This patch needed to be refreshed after previous patch
had long lines wrapped.
Also added Ming's Reviewed-by.

Thanks,
NeilBrown



 block/blk-core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bc66cd595bef..3797753f4085 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -730,8 +730,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (q->id < 0)
 		goto fail_q;
 
-	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, (BIOSET_NEED_BVECS |
-							BIOSET_NEED_RESCUER));
+	q->bio_split = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
 	if (!q->bio_split)
 		goto fail_id;
 
-- 
2.12.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
  2017-05-02 22:54     ` NeilBrown
@ 2017-05-02 23:50       ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2017-05-02 23:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

On Wed, May 03, 2017 at 08:54:55AM +1000, NeilBrown wrote:
> On Tue, May 02 2017, Ming Lei wrote:
> 
> > On Tue, May 02, 2017 at 01:42:26PM +1000, NeilBrown wrote:
> >> blk_bio_segment_split() makes sure bios have no more than
> >> BIO_MAX_PAGES entries in the bi_io_vec.
> >> This was done because bio_clone_bioset() (when given a
> >> mempool bioset) could not handle larger io_vecs.
> >> 
> >> No driver uses bio_clone_bioset() any more, they all
> >> use bio_clone_fast() if anything, and bio_clone_fast()
> >> doesn't clone the bi_io_vec.
> >
> > Maybe in future, some drivers still may try to use 
> > bio_clone_bioset() again, I suggest to add some comments
> > on bio_clone_bioset() to make this usage explicitly. Also
> > better to trigger a warning if a big src bio is passed to
> > bio_clone_bioset().
> 
> There are now just two users for bio_clone_bioset(): bounce.c and btrfs.
> 
> Christoph wants to get rid of bounce.c, which would leave one.
> 
> I'd have to drill into the btrfs code to be sure, but it might be that
> btrfs only needs bio_clone_fast().  That would leave zero users.
> Then we wouldn't need a warning at all.
> 
> So I agree that we need to guard against future incorrect usage.  I'm not
> yet sure what the best approach is.

I think it is helpful to simply comment this function as obsolete.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13 V2] blk: make the bioset rescue_workqueue optional.
  2017-05-02 22:34   ` [PATCH 03/13 V2] " NeilBrown
@ 2017-05-03  3:24     ` Ming Lei
  0 siblings, 0 replies; 46+ messages in thread
From: Ming Lei @ 2017-05-03  3:24 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Christoph Hellwig

On Wed, May 3, 2017 at 6:34 AM, NeilBrown <neilb@suse.com> wrote:
> From 09017acf74ec4df674b78ca66f0924187f10d8a4 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.com>
> Date: Fri, 10 Mar 2017 13:59:50 +1100
> Subject: [PATCH] blk: make the bioset rescue_workqueue optional.
>
> This patch converts bioset_create() to not create a workqueue by
> default, so alloctions will never trigger punt_bios_to_rescuer().  It
> also introduces a new flag BIOSET_NEED_RESCUER() which tells
> bioset_create() to preserve the old behavior.
>
> All callers of bioset_create() that are inside block device drivers,
> are given the BIOSET_NEED_RESCUER().
>
> biosets used by filesystems or other top-level users do not
> need rescuing as the bio can never be queued behind other
> bios.  This includes fs_bio_set, blkdev_dio_pool,
> btrfs_bioset, xfs_ioend_bioset, and one allocated by
> target_core_iblock.c.
>
> biosets used by md/raid do not need rescuing as
> their usage was recently audited and revised to never
> risk deadlock.
>
> It is hoped that most, if not all, of the remaining biosets
> can end up being the non-rescued version.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
> Signed-off-by: NeilBrown <neilb@suse.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()
  2017-05-02 21:51     ` NeilBrown
@ 2017-05-03  6:42       ` Javier González
  0 siblings, 0 replies; 46+ messages in thread
From: Javier González @ 2017-05-03  6:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 765 bytes --]

> On 2 May 2017, at 23.51, NeilBrown <neilb@suse.com> wrote:
> 
>> 
>> Hi Neil,
>> 
>> Looks good. Thanks for fixing this. I did not know that bio_clone_bioset
>> was not supposed to be used on drivers.
> 
> Prior to my patchset, using bio_clone_bioset() wasn't wrong in drivers,
> though it was a waste when bio_clone_fast() would to just as well as is
> more efficient.
> After my patchset, using it can be problematic.  I'm wondering what I
> should do to encourage those problems to be more visible so that if
> people to us it, they'll get a warning or something.
> 
> Thanks,
> NeilBrown
> 

Thanks for the explanation Neil. In my opinion a comment on top of
bio_clone_bioset() would be hellful, as Ming suggested. But a warning
might make sense too.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
                   ` (12 preceding siblings ...)
  2017-05-02  3:42 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
@ 2017-05-11  0:58 ` NeilBrown
  2017-06-16  5:54   ` NeilBrown
  13 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-05-11  0:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

[-- Attachment #1: Type: text/plain, Size: 3986 bytes --]

On Tue, May 02 2017, NeilBrown wrote:

> This is a revision of my series of patches working
> towards removing the bioset work queues.

Hi Jens,
 could I get some feed-back about your thoughts on this series?
 Will you apply it?  When?  Do I need to resend anything?
 Would you like a git-pull request?  If so, what should I base it on?
 There is a minor conflict with drivers/block/zram/zram_drv.c
 as it dropped the call to blk_queue_split() recently, but otherwise it
 still applies.

Thanks,
NeilBrown


>
> This set is based on Linus' tree as for today (2nd May) plus
> the for-linus branch from Shaohua's md/raid tree.
>
> This series adds a fix for the new lightnvm/pblk-read code
> and discards bioset_create_nobvec() in favor of a flag arg to
> bioset_create().  There are also minor fixes and a little
> code clean-up.
>
> I hope to eventually get rid of the new BIOSET_NEED_RESCUER flag,
> but that needs work ing dm and probably bcache first.
>
> Thanks,
> NeilBrown
>
>
> ---
>
> NeilBrown (13):
>       blk: remove bio_set arg from blk_queue_split()
>       blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
>       blk: make the bioset rescue_workqueue optional.
>       blk: use non-rescuing bioset for q->bio_split.
>       block: Improvements to bounce-buffer handling
>       rbd: use bio_clone_fast() instead of bio_clone()
>       drbd: use bio_clone_fast() instead of bio_clone()
>       pktcdvd: use bio_clone_fast() instead of bio_clone()
>       lightnvm/pblk-read: use bio_clone_fast()
>       xen-blkfront: remove bio splitting.
>       bcache: use kmalloc to allocate bio in bch_data_verify()
>       block: remove bio_clone() and all references.
>       block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
>
>
>  Documentation/block/biodoc.txt      |    2 -
>  block/bio.c                         |   72 ++++++++++++++++-------------------
>  block/blk-core.c                    |    4 +-
>  block/blk-merge.c                   |   31 ++-------------
>  block/blk-mq.c                      |    2 -
>  block/bounce.c                      |   32 +++++++++++++---
>  drivers/block/drbd/drbd_int.h       |    3 +
>  drivers/block/drbd/drbd_main.c      |   11 +++++
>  drivers/block/drbd/drbd_req.c       |    2 -
>  drivers/block/drbd/drbd_req.h       |    2 -
>  drivers/block/pktcdvd.c             |   14 +++++--
>  drivers/block/ps3vram.c             |    2 -
>  drivers/block/rbd.c                 |   16 +++++++-
>  drivers/block/rsxx/dev.c            |    2 -
>  drivers/block/umem.c                |    2 -
>  drivers/block/xen-blkfront.c        |   54 +-------------------------
>  drivers/block/zram/zram_drv.c       |    2 -
>  drivers/lightnvm/pblk-init.c        |   16 ++++++--
>  drivers/lightnvm/pblk-read.c        |    2 -
>  drivers/lightnvm/pblk.h             |    1 
>  drivers/lightnvm/rrpc.c             |    2 -
>  drivers/md/bcache/debug.c           |    2 -
>  drivers/md/bcache/super.c           |    6 ++-
>  drivers/md/dm-crypt.c               |    2 -
>  drivers/md/dm-io.c                  |    2 -
>  drivers/md/dm.c                     |    5 +-
>  drivers/md/md.c                     |    6 +--
>  drivers/md/raid1.c                  |    2 -
>  drivers/md/raid10.c                 |    2 -
>  drivers/md/raid5-cache.c            |    2 -
>  drivers/md/raid5-ppl.c              |    2 -
>  drivers/md/raid5.c                  |    2 -
>  drivers/s390/block/dcssblk.c        |    2 -
>  drivers/s390/block/xpram.c          |    2 -
>  drivers/target/target_core_iblock.c |    2 -
>  fs/block_dev.c                      |    2 -
>  fs/btrfs/extent_io.c                |    3 +
>  fs/xfs/xfs_super.c                  |    3 +
>  include/linux/bio.h                 |   12 ++----
>  include/linux/blkdev.h              |    3 -
>  40 files changed, 162 insertions(+), 174 deletions(-)
>
> --
> Signature

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-05-11  0:58 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
@ 2017-06-16  5:54   ` NeilBrown
  2017-06-16  6:42     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-06-16  5:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

[-- Attachment #1: Type: text/plain, Size: 4396 bytes --]

On Thu, May 11 2017, NeilBrown wrote:

> On Tue, May 02 2017, NeilBrown wrote:
>
>> This is a revision of my series of patches working
>> towards removing the bioset work queues.
>
> Hi Jens,
>  could I get some feed-back about your thoughts on this series?
>  Will you apply it?  When?  Do I need to resend anything?
>  Would you like a git-pull request?  If so, what should I base it on?
>  There is a minor conflict with drivers/block/zram/zram_drv.c
>  as it dropped the call to blk_queue_split() recently, but otherwise it
>  still applies.

Hi Jens,
 I didn't hear back ... have you had a chance to look?
 In case it helps, you can pull the full set, based on a recent Linus tree,
 from
     git://neil.brown.name/linux bioset

 or I can resend the patches if you like.

Thanks,
NeilBrown


>
> Thanks,
> NeilBrown
>
>
>>
>> This set is based on Linus' tree as for today (2nd May) plus
>> the for-linus branch from Shaohua's md/raid tree.
>>
>> This series adds a fix for the new lightnvm/pblk-read code
>> and discards bioset_create_nobvec() in favor of a flag arg to
>> bioset_create().  There are also minor fixes and a little
>> code clean-up.
>>
>> I hope to eventually get rid of the new BIOSET_NEED_RESCUER flag,
>> but that needs work ing dm and probably bcache first.
>>
>> Thanks,
>> NeilBrown
>>
>>
>> ---
>>
>> NeilBrown (13):
>>       blk: remove bio_set arg from blk_queue_split()
>>       blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
>>       blk: make the bioset rescue_workqueue optional.
>>       blk: use non-rescuing bioset for q->bio_split.
>>       block: Improvements to bounce-buffer handling
>>       rbd: use bio_clone_fast() instead of bio_clone()
>>       drbd: use bio_clone_fast() instead of bio_clone()
>>       pktcdvd: use bio_clone_fast() instead of bio_clone()
>>       lightnvm/pblk-read: use bio_clone_fast()
>>       xen-blkfront: remove bio splitting.
>>       bcache: use kmalloc to allocate bio in bch_data_verify()
>>       block: remove bio_clone() and all references.
>>       block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()
>>
>>
>>  Documentation/block/biodoc.txt      |    2 -
>>  block/bio.c                         |   72 ++++++++++++++++-------------------
>>  block/blk-core.c                    |    4 +-
>>  block/blk-merge.c                   |   31 ++-------------
>>  block/blk-mq.c                      |    2 -
>>  block/bounce.c                      |   32 +++++++++++++---
>>  drivers/block/drbd/drbd_int.h       |    3 +
>>  drivers/block/drbd/drbd_main.c      |   11 +++++
>>  drivers/block/drbd/drbd_req.c       |    2 -
>>  drivers/block/drbd/drbd_req.h       |    2 -
>>  drivers/block/pktcdvd.c             |   14 +++++--
>>  drivers/block/ps3vram.c             |    2 -
>>  drivers/block/rbd.c                 |   16 +++++++-
>>  drivers/block/rsxx/dev.c            |    2 -
>>  drivers/block/umem.c                |    2 -
>>  drivers/block/xen-blkfront.c        |   54 +-------------------------
>>  drivers/block/zram/zram_drv.c       |    2 -
>>  drivers/lightnvm/pblk-init.c        |   16 ++++++--
>>  drivers/lightnvm/pblk-read.c        |    2 -
>>  drivers/lightnvm/pblk.h             |    1 
>>  drivers/lightnvm/rrpc.c             |    2 -
>>  drivers/md/bcache/debug.c           |    2 -
>>  drivers/md/bcache/super.c           |    6 ++-
>>  drivers/md/dm-crypt.c               |    2 -
>>  drivers/md/dm-io.c                  |    2 -
>>  drivers/md/dm.c                     |    5 +-
>>  drivers/md/md.c                     |    6 +--
>>  drivers/md/raid1.c                  |    2 -
>>  drivers/md/raid10.c                 |    2 -
>>  drivers/md/raid5-cache.c            |    2 -
>>  drivers/md/raid5-ppl.c              |    2 -
>>  drivers/md/raid5.c                  |    2 -
>>  drivers/s390/block/dcssblk.c        |    2 -
>>  drivers/s390/block/xpram.c          |    2 -
>>  drivers/target/target_core_iblock.c |    2 -
>>  fs/block_dev.c                      |    2 -
>>  fs/btrfs/extent_io.c                |    3 +
>>  fs/xfs/xfs_super.c                  |    3 +
>>  include/linux/bio.h                 |   12 ++----
>>  include/linux/blkdev.h              |    3 -
>>  40 files changed, 162 insertions(+), 174 deletions(-)
>>
>> --
>> Signature

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16  5:54   ` NeilBrown
@ 2017-06-16  6:42     ` Christoph Hellwig
  2017-06-16  7:30       ` NeilBrown
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2017-06-16  6:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel, Shaohua Li,
	Javier Gonzalez

On Fri, Jun 16, 2017 at 03:54:30PM +1000, NeilBrown wrote:
> Hi Jens,
>  I didn't hear back ... have you had a chance to look?
>  In case it helps, you can pull the full set, based on a recent Linus tree,
>  from
>      git://neil.brown.name/linux bioset
> 
>  or I can resend the patches if you like.

Can yo uresend it on top of the latest for-4.13/block tree?  It has
a lot of changes that could potentially conflict.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16  6:42     ` Christoph Hellwig
@ 2017-06-16  7:30       ` NeilBrown
  2017-06-16  7:34         ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: NeilBrown @ 2017-06-16  7:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Ming Lei, linux-kernel, Shaohua Li,
	Javier Gonzalez

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On Thu, Jun 15 2017, Christoph Hellwig wrote:

> On Fri, Jun 16, 2017 at 03:54:30PM +1000, NeilBrown wrote:
>> Hi Jens,
>>  I didn't hear back ... have you had a chance to look?
>>  In case it helps, you can pull the full set, based on a recent Linus tree,
>>  from
>>      git://neil.brown.name/linux bioset
>> 
>>  or I can resend the patches if you like.
>
> Can yo uresend it on top of the latest for-4.13/block tree?  It has
> a lot of changes that could potentially conflict.

One trivial-to-fix conflict in xen-blkfront (a function I remove was
changed slightly).

I've pushed the new version to the same place.  Do you actually want
me to re-post all the patches?

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16  7:30       ` NeilBrown
@ 2017-06-16  7:34         ` Christoph Hellwig
  2017-06-16 20:45           ` Jens Axboe
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2017-06-16  7:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Ming Lei,
	linux-kernel, Shaohua Li, Javier Gonzalez

On Fri, Jun 16, 2017 at 05:30:50PM +1000, NeilBrown wrote:
> I've pushed the new version to the same place.  Do you actually want
> me to re-post all the patches?

I personally prefer to always have patches on the list, but I can't
speak for Jens of course.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16  7:34         ` Christoph Hellwig
@ 2017-06-16 20:45           ` Jens Axboe
  2017-06-18  4:40             ` NeilBrown
  2017-06-18  4:40             ` NeilBrown
  0 siblings, 2 replies; 46+ messages in thread
From: Jens Axboe @ 2017-06-16 20:45 UTC (permalink / raw)
  To: Christoph Hellwig, NeilBrown
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

On 06/16/2017 01:34 AM, Christoph Hellwig wrote:
> On Fri, Jun 16, 2017 at 05:30:50PM +1000, NeilBrown wrote:
>> I've pushed the new version to the same place.  Do you actually want
>> me to re-post all the patches?
> 
> I personally prefer to always have patches on the list, but I can't
> speak for Jens of course.

Yes please, I'd prefer them posted again as well.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16 20:45           ` Jens Axboe
@ 2017-06-18  4:40             ` NeilBrown
  2017-06-18  4:40             ` NeilBrown
  1 sibling, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-06-18  4:40 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

On Fri, Jun 16 2017, Jens Axboe wrote:

> On 06/16/2017 01:34 AM, Christoph Hellwig wrote:
>> On Fri, Jun 16, 2017 at 05:30:50PM +1000, NeilBrown wrote:
>>> I've pushed the new version to the same place.  Do you actually want
>>> me to re-post all the patches?
>> 
>> I personally prefer to always have patches on the list, but I can't
>> speak for Jens of course.
>
> Yes please, I'd prefer them posted again as well.

Done.
Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
  2017-06-16 20:45           ` Jens Axboe
  2017-06-18  4:40             ` NeilBrown
@ 2017-06-18  4:40             ` NeilBrown
  1 sibling, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-06-18  4:40 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block, Ming Lei, linux-kernel, Shaohua Li, Javier Gonzalez

[-- Attachment #1: Type: text/plain, Size: 487 bytes --]

On Fri, Jun 16 2017, Jens Axboe wrote:

> On 06/16/2017 01:34 AM, Christoph Hellwig wrote:
>> On Fri, Jun 16, 2017 at 05:30:50PM +1000, NeilBrown wrote:
>>> I've pushed the new version to the same place.  Do you actually want
>>> me to re-post all the patches?
>> 
>> I personally prefer to always have patches on the list, but I can't
>> speak for Jens of course.
>
> Yes please, I'd prefer them posted again as well.

Done - to lkml and linux-block.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/13] xen-blkfront: remove bio splitting.
  2017-06-18  4:38 NeilBrown
@ 2017-06-18  4:38 ` NeilBrown
  0 siblings, 0 replies; 46+ messages in thread
From: NeilBrown @ 2017-06-18  4:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

bios that are re-submitted will pass through blk_queue_split() when
blk_queue_bio() is called, and this will split the bio if necessary.
There is no longer any need to do this splitting in xen-blkfront.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/xen-blkfront.c |   54 ++----------------------------------------
 1 file changed, 3 insertions(+), 51 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index e3be666c2776..ac90093fcb25 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -110,11 +110,6 @@ struct blk_shadow {
 	unsigned long associated_id;
 };
 
-struct split_bio {
-	struct bio *bio;
-	atomic_t pending;
-};
-
 struct blkif_req {
 	int	error;
 };
@@ -2000,28 +1995,13 @@ static int blkfront_probe(struct xenbus_device *dev,
 	return 0;
 }
 
-static void split_bio_end(struct bio *bio)
-{
-	struct split_bio *split_bio = bio->bi_private;
-
-	if (atomic_dec_and_test(&split_bio->pending)) {
-		split_bio->bio->bi_phys_segments = 0;
-		split_bio->bio->bi_status = bio->bi_status;
-		bio_endio(split_bio->bio);
-		kfree(split_bio);
-	}
-	bio_put(bio);
-}
-
 static int blkif_recover(struct blkfront_info *info)
 {
-	unsigned int i, r_index;
+	unsigned int r_index;
 	struct request *req, *n;
 	int rc;
-	struct bio *bio, *cloned_bio;
-	unsigned int segs, offset;
-	int pending, size;
-	struct split_bio *split_bio;
+	struct bio *bio;
+	unsigned int segs;
 
 	blkfront_gather_backend_features(info);
 	/* Reset limits changed by blk_mq_update_nr_hw_queues(). */
@@ -2060,34 +2040,6 @@ static int blkif_recover(struct blkfront_info *info)
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
-		if (bio_segments(bio) > segs) {
-			/*
-			 * This bio has more segments than what we can
-			 * handle, we have to split it.
-			 */
-			pending = (bio_segments(bio) + segs - 1) / segs;
-			split_bio = kzalloc(sizeof(*split_bio), GFP_NOIO);
-			BUG_ON(split_bio == NULL);
-			atomic_set(&split_bio->pending, pending);
-			split_bio->bio = bio;
-			for (i = 0; i < pending; i++) {
-				offset = (i * segs * XEN_PAGE_SIZE) >> 9;
-				size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
-					   (unsigned int)bio_sectors(bio) - offset);
-				cloned_bio = bio_clone(bio, GFP_NOIO);
-				BUG_ON(cloned_bio == NULL);
-				bio_trim(cloned_bio, offset, size);
-				cloned_bio->bi_private = split_bio;
-				cloned_bio->bi_end_io = split_bio_end;
-				submit_bio(cloned_bio);
-			}
-			/*
-			 * Now we have to wait for all those smaller bios to
-			 * end, so we can also end the "parent" bio.
-			 */
-			continue;
-		}
-		/* We don't need to split this bio */
 		submit_bio(bio);
 	}
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-06-18  4:41 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-02  3:42 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-05-02  3:42 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
2017-05-02  8:14   ` Christoph Hellwig
2017-05-02 11:00   ` Ming Lei
2017-05-02 22:10     ` NeilBrown
2017-05-02 22:34   ` [PATCH 03/13 V2] " NeilBrown
2017-05-03  3:24     ` Ming Lei
2017-05-02  3:42 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
2017-05-02  8:06   ` Christoph Hellwig
2017-05-02 21:47     ` NeilBrown
2017-05-02  9:40   ` Ming Lei
2017-05-02  3:42 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-05-02 10:44   ` javigon
2017-05-02  3:42 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
2017-05-02  8:13   ` Christoph Hellwig
2017-05-02 11:56   ` Ming Lei
2017-05-02  3:42 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-05-02 11:54   ` Ming Lei
2017-05-02 23:21   ` [PATCH 04/13 V2] " NeilBrown
2017-05-02  3:42 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
2017-05-02  8:15   ` Christoph Hellwig
2017-05-02 11:22   ` Javier González
2017-05-02 21:51     ` NeilBrown
2017-05-03  6:42       ` Javier González
2017-05-02  3:42 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
2017-05-02  8:15   ` Christoph Hellwig
2017-05-02  3:42 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-05-02 23:20   ` [PATCH 07/13 V2] " NeilBrown
2017-05-02  3:42 ` [PATCH 06/13] rbd: " NeilBrown
2017-05-02  3:42 ` [PATCH 08/13] pktcdvd: " NeilBrown
2017-05-02  3:42 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
2017-05-02  8:15   ` Christoph Hellwig
2017-05-02 10:22   ` Ming Lei
2017-05-02 22:54     ` NeilBrown
2017-05-02 23:50       ` Ming Lei
2017-05-02  3:42 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
2017-05-02  3:42 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
2017-05-11  0:58 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-06-16  5:54   ` NeilBrown
2017-06-16  6:42     ` Christoph Hellwig
2017-06-16  7:30       ` NeilBrown
2017-06-16  7:34         ` Christoph Hellwig
2017-06-16 20:45           ` Jens Axboe
2017-06-18  4:40             ` NeilBrown
2017-06-18  4:40             ` NeilBrown
2017-06-18  4:38 NeilBrown
2017-06-18  4:38 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).