linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 01/54] block: drbd: comment on direct access bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 02/54] block: loop: comment on direct access to " Ming Lei
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Philipp Reisner,
	Lars Ellenberg, open list:DRBD DRIVER

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/drbd/drbd_bitmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index ab62b81c2ca7..ce9506da30ad 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -953,6 +953,7 @@ static void drbd_bm_endio(struct bio *bio)
 	struct drbd_bm_aio_ctx *ctx = bio->bi_private;
 	struct drbd_device *device = ctx->device;
 	struct drbd_bitmap *b = device->bitmap;
+	/* single page bio, safe for multipage bvec */
 	unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);
 
 	if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 02/54] block: loop: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
  2016-12-27 15:55 ` [PATCH v1 01/54] block: drbd: comment on direct access bvec table Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 03/54] kernel/power/swap.c: " Ming Lei
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Mike Christie, Jeff Moyer, Minfei Huang, Omar Sandoval,
	Petr Mladek

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/loop.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f347285c67ec..be1cc51815ac 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -485,6 +485,11 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	/* nomerge for loop request queue */
 	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
 
+	/*
+	 * For multipage bvec support, it is safe to pass the bvec
+	 * table to iov iterator, because iov iter still uses bvec
+	 * iter helpers to travese bvec.
+	 */
 	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
 	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
 		      bio_segments(bio), blk_rq_bytes(cmd->rq));
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 03/54] kernel/power/swap.c: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
  2016-12-27 15:55 ` [PATCH v1 01/54] block: drbd: comment on direct access bvec table Ming Lei
  2016-12-27 15:55 ` [PATCH v1 02/54] block: loop: comment on direct access to " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 04/54] mm: page_io.c: " Ming Lei
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Rafael J. Wysocki,
	Len Brown, Pavel Machek, open list:SUSPEND TO RAM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/power/swap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 32e0c232efba..031e709c9fc4 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 static void hib_end_io(struct bio *bio)
 {
 	struct hib_bio_batch *hb = bio->bi_private;
+
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 04/54] mm: page_io.c: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (2 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 03/54] kernel/power/swap.c: " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 05/54] fs/buffer: " Ming Lei
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Andrew Morton,
	Mike Christie, Michal Hocko, Minchan Kim, Joe Perches,
	Kirill A. Shutemov, open list:MEMORY MANAGEMENT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 mm/page_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index 23f6d0d3470f..368a16aa810c 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
@@ -116,6 +117,7 @@ static void swap_slot_free_notify(struct page *page)
 
 static void end_swap_bio_read(struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 05/54] fs/buffer: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (3 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 04/54] mm: page_io.c: " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 06/54] f2fs: f2fs_read_end_io: " Ming Lei
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Alexander Viro,
	open list:FILESYSTEMS (VFS and infrastructure)

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/buffer.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d21771fcf7d3..63d2f40c21fd 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3060,8 +3060,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
 void guard_bio_eod(int op, struct bio *bio)
 {
 	sector_t maxsector;
-	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 	unsigned truncated_bytes;
+	/*
+	 * It is safe to truncate the last bvec in the following way
+	 * even though multipage bvec is supported, but we need to
+	 * fix the parameters passed to zero_user().
+	 */
+	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
 	maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
 	if (!maxsector)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 06/54] f2fs: f2fs_read_end_io: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (4 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 05/54] fs/buffer: " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 07/54] bcache: " Ming Lei
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jaegeuk Kim, Chao Yu,
	open list:F2FS FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/f2fs/data.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ac262564fa6..5d1a192e1c3c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -55,6 +55,10 @@ static void f2fs_read_end_io(struct bio *bio)
 	int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
+	/*
+	 * It is still safe to retrieve the 1st page of the bio
+	 * in this way after supporting multipage bvec.
+	 */
 	if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
 		bio->bi_error = -EIO;
 #endif
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 07/54] bcache: comment on direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (5 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 06/54] f2fs: f2fs_read_end_io: " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-30 16:56   ` Coly Li
  2016-12-27 15:55 ` [PATCH v1 08/54] block: comment on bio_alloc_pages() Ming Lei
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Kent Overstreet,
	Shaohua Li, Guoqing Jiang, Zheng Liu, Mike Christie, Jiri Kosina,
	Eric Wheeler, Yijing Wang, Al Viro,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Looks all are safe after multipage bvec is supported.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/btree.c | 1 +
 drivers/md/bcache/super.c | 6 ++++++
 drivers/md/bcache/util.c  | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index a43eedd5804d..fc35cfb4d0f1 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
 
 		continue_at(cl, btree_node_write_done, NULL);
 	} else {
+		/* No harm for multipage bvec since the new is just allocated */
 		b->bio->bi_vcnt = 0;
 		bch_bio_map(b->bio, i);
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 3a19cbc8b230..607b022259dc 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -208,6 +208,7 @@ static void write_bdev_super_endio(struct bio *bio)
 
 static void __write_super(struct cache_sb *sb, struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
 	unsigned i;
 
@@ -1156,6 +1157,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev->bd_holder = dc;
 
 	bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
@@ -1799,6 +1802,7 @@ void bch_cache_release(struct kobject *kobj)
 	for (i = 0; i < RESERVE_NR; i++)
 		free_fifo(&ca->free[i]);
 
+	/* single page bio, safe for multipage bvec */
 	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
 		put_page(ca->sb_bio.bi_io_vec[0].bv_page);
 
@@ -1854,6 +1858,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev->bd_holder = ca;
 
 	bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index dde6172f3f10..5cc0b49a65fb 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
 		: 0;
 }
 
+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
 void bch_bio_map(struct bio *bio, void *base)
 {
 	size_t size = bio->bi_iter.bi_size;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 08/54] block: comment on bio_alloc_pages()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (6 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 07/54] bcache: " Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-30 10:40   ` Coly Li
  2016-12-30 11:06   ` Coly Li
  2016-12-27 15:55 ` [PATCH v1 09/54] block: comment on bio_iov_iter_get_pages() Ming Lei
                   ` (23 subsequent siblings)
  31 siblings, 2 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe,
	Kent Overstreet, Shaohua Li, Mike Christie, Guoqing Jiang,
	Hannes Reinecke, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

This patch adds comment on usage of bio_alloc_pages(),
also comments on one special case of bch_data_verify().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c               | 4 +++-
 drivers/md/bcache/debug.c | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 2b375020fc49..d4a1e0b63ea0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -961,7 +961,9 @@ EXPORT_SYMBOL(bio_advance);
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
  *
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means all pages aren't added
+ * to the bio via bio_add_page() yet.
  *
  * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
  * freed.
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 06f55056aaae..48d03e8b3385 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bio_vec bv, cbv;
 	struct bvec_iter iter, citer = { 0 };
 
+	/*
+	 * Once multipage bvec is supported, the bio_clone()
+	 * has to make sure page count in this bio can be held
+	 * in the new cloned bio because each single page need
+	 * to assign to each bvec of the new bio.
+	 */
 	check = bio_clone(bio, GFP_NOIO);
 	if (!check)
 		return;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 09/54] block: comment on bio_iov_iter_get_pages()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (7 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 08/54] block: comment on bio_alloc_pages() Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:55 ` [PATCH v1 10/54] block: introduce flag QUEUE_FLAG_NO_MP Ming Lei
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

bio_iov_iter_get_pages() used unused bvec spaces for
storing page pointer array temporarily, and this patch
comments on this usage wrt. multipage bvec support.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index d4a1e0b63ea0..10398969353b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -859,6 +859,10 @@ EXPORT_SYMBOL(bio_add_page);
  *
  * Pins as many pages from *iter and appends them to @bio's bvec array. The
  * pages will have to be released using put_page() when done.
+ *
+ * The hacking way of using bvec table as page pointer array is safe
+ * even after multipage bvec is introduced because that space can be
+ * thought as unused by bio_add_page().
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 10/54] block: introduce flag QUEUE_FLAG_NO_MP
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (8 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 09/54] block: comment on bio_iov_iter_get_pages() Ming Lei
@ 2016-12-27 15:55 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 11/54] md: set NO_MP for request queue of md Ming Lei
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:55 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Mike Christie, Dan Williams, Toshi Kani, Damien Le Moal

MD(especially raid1 and raid10) is a bit difficult to support
multipage bvec, so introduce this flag for not enabling multipage
bvec, then MD can still accept singlepage bvec only, and once
direct access to bvec table in MD are cleanuped, the flag can be
removed.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83695641bd5e..0c02d9778965 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -600,6 +600,7 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
 #define QUEUE_FLAG_STATS       27	/* track rq completion times */
+#define QUEUE_FLAG_NO_MP       28	/* multipage bvecs isn't ready */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -690,6 +691,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_no_mp(q)	test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 11/54] md: set NO_MP for request queue of md
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (9 preceding siblings ...)
  2016-12-27 15:55 ` [PATCH v1 10/54] block: introduce flag QUEUE_FLAG_NO_MP Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE Ming Lei
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Shaohua Li,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

MD isn't ready for multipage bvecs, so mark it as
NO_MP.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 82821ee0d57f..63c6326bafde 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5162,6 +5162,16 @@ static void md_safemode_timeout(unsigned long data)
 
 static int start_dirty_degraded;
 
+/*
+ * MD isn't ready for multipage bvecs yet, and set the flag
+ * so that MD still can see singlepage bvecs bio
+ */
+static inline void md_set_no_mp(struct mddev *mddev)
+{
+	if (mddev->queue)
+		set_bit(QUEUE_FLAG_NO_MP, &mddev->queue->queue_flags);
+}
+
 int md_run(struct mddev *mddev)
 {
 	int err;
@@ -5381,6 +5391,8 @@ int md_run(struct mddev *mddev)
 	if (mddev->sb_flags)
 		md_update_sb(mddev, 0);
 
+	md_set_no_mp(mddev);
+
 	md_new_event(mddev);
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (10 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 11/54] md: set NO_MP for request queue of md Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2017-01-03 16:43   ` Mike Snitzer
  2016-12-27 15:56 ` [PATCH v1 13/54] block: comments on bio_for_each_segment[_all] Ming Lei
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Alasdair Kergon,
	Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt target.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5664f3..6139bf7623f7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 		return -EINVAL;
 	}
 
-	ti->max_io_len = (uint32_t) len;
+	/*
+	 * BIO based queue uses its own splitting. When multipage bvecs
+	 * is switched on, size of the incoming bio may be too big to
+	 * be handled in some targets, such as crypt.
+	 *
+	 * When these targets are ready for the big bio, we can remove
+	 * the limit.
+	 */
+	ti->max_io_len = min_t(uint32_t, len,
+			       (BIO_MAX_PAGES * PAGE_SIZE));
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 13/54] block: comments on bio_for_each_segment[_all]
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (11 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 14/54] block: introduce multipage/single page bvec helpers Ming Lei
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Christie,
	Hannes Reinecke, Kent Overstreet, Chaitanya Kulkarni

This patch clarifies the fact that even though both
bio_for_each_segment() and bio_for_each_segment_all()
are named as _segment/_segment_all, they still returns
one page one time, instead of real segment(multipage bvec).

With comming multipage bvec, both the two helpers
are capable of returning real segment(multipage bvec),
but the callers(users) of the two helpers may not be
capable of handling of the multipage bvec or real
segment, so we still keep the interfaces of the helpers
not changed. And new helpers for returning multipage bvec
should be introduced.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7cf8a6c70a3f..714fbf495af7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -156,7 +156,10 @@ static inline void *bio_data(struct bio *bio)
 
 /*
  * drivers should _never_ use the all version - the bio may have been split
- * before it got to the driver and the driver won't own all of it
+ * before it got to the driver and the driver won't own all of it.
+ *
+ * Even though the helper is named as _segment_all, it still returns
+ * page one by one instead of real segment.
  */
 #define bio_for_each_segment_all(bvl, bio, i)				\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
@@ -178,6 +181,10 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 		((bvl = bio_iter_iovec((bio), (iter))), 1);		\
 	     bio_advance_iter((bio), &(iter), (bvl).bv_len))
 
+/*
+ * Even though the helper is named as _segment, it still returns
+ * page one by one instead of real segment.
+ */
 #define bio_for_each_segment(bvl, bio, iter)				\
 	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 14/54] block: introduce multipage/single page bvec helpers
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (12 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 13/54] block: comments on bio_for_each_segment[_all] Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 15/54] block: implement sp version of bvec iterator helpers Ming Lei
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Johannes Berg

This patch introduces helpers which are suffixed with _mp
and _sp for the multipage bvec/segment support.

The helpers with _mp suffix are the interfaces for treating
one bvec/segment as real multipage one, for example, .bv_len
is the total length of the multipage segment.

The helpers with _sp suffix are interfaces for supporting
current bvec iterator which is thought as singlepage only
by drivers, fs, dm and etc. These _sp helpers are introduced
to build singlepage bvec in flight, so users of bio/bvec
iterator still can work well and needn't change even though
we store multipage into bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b82d98f..307a387eb29c 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -24,6 +24,42 @@
 #include <linux/bug.h>
 
 /*
+ * What is multipage bvecs(segment)?
+ *
+ * - bvec stored in bio->bi_io_vec is always multipage style vector
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ *   buffer, now the buffer may include more than one pages since
+ *   multipage(mp) bvec is supported, and all these pages represented
+ *   by one bvec is physically contiguous. Before mp support, at most
+ *   one page can be included in one bvec, we call it singlepage(sp)
+ *   bvec.
+ *
+ * - .bv_page of th bvec represents the 1st page in the mp segment
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ *   page, so we keep the sp interface not changed, for example,
+ *   bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment_all() will be changed to return singlepage
+ *   bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ *   updated in multipage bvec style and that means bvec_iter_advance()
+ *   is kept not changed
+ *
+ * - returned(copied) singlepage bvec is generated in flight by bvec
+ *   helpers from the stored mp bvec
+ *
+ * - In case that some components(such as iov_iter) need to support mp
+ *   segment, we introduce new helpers(suffixed with _mp) for them.
+ */
+
+/*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
 struct bio_vec {
@@ -49,16 +85,30 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)	(&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter)				\
+#define bvec_iter_page_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)				\
+#define bvec_iter_len_mp(bvec, iter)				\
 	min((iter).bi_size,					\
 	    __bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)				\
+#define bvec_iter_offset_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+/*
+ * <page, offset,length> of singlepage(sp) segment.
+ *
+ * This helpers will be implemented for building sp bvec in flight.
+ */
+#define bvec_iter_offset_sp(bvec, iter)	bvec_iter_offset_mp((bvec), (iter))
+#define bvec_iter_len_sp(bvec, iter)	bvec_iter_len_mp((bvec), (iter))
+#define bvec_iter_page_sp(bvec, iter)	bvec_iter_page_mp((bvec), (iter))
+
+/* current interfaces support sp style at default */
+#define bvec_iter_page(bvec, iter)	bvec_iter_page_sp((bvec), (iter))
+#define bvec_iter_len(bvec, iter)	bvec_iter_len_sp((bvec), (iter))
+#define bvec_iter_offset(bvec, iter)	bvec_iter_offset_sp((bvec), (iter))
+
 #define bvec_iter_bvec(bvec, iter)				\
 ((struct bio_vec) {						\
 	.bv_page	= bvec_iter_page((bvec), (iter)),	\
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 15/54] block: implement sp version of bvec iterator helpers
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (13 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 14/54] block: introduce multipage/single page bvec helpers Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 16/54] block: introduce bio_for_each_segment_mp() Ming Lei
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Johannes Berg

This patch implements singlepage version of the following
3 helpers:
	- bvec_iter_offset_sp()
	- bvec_iter_len_sp()
	- bvec_iter_page_sp()

So that one multipage bvec can be splited to singlepage
bvec, and make users of current bvec iterator happy.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 307a387eb29c..d77c3cabce8c 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@
 
 #include <linux/kernel.h>
 #include <linux/bug.h>
+#include <linux/mm.h>
 
 /*
  * What is multipage bvecs(segment)?
@@ -95,14 +96,25 @@ struct bvec_iter {
 #define bvec_iter_offset_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define bvec_iter_page_idx_mp(bvec, iter)			\
+	(bvec_iter_offset_mp((bvec), (iter)) / PAGE_SIZE)
+
+
 /*
  * <page, offset,length> of singlepage(sp) segment.
  *
  * This helpers will be implemented for building sp bvec in flight.
  */
-#define bvec_iter_offset_sp(bvec, iter)	bvec_iter_offset_mp((bvec), (iter))
-#define bvec_iter_len_sp(bvec, iter)	bvec_iter_len_mp((bvec), (iter))
-#define bvec_iter_page_sp(bvec, iter)	bvec_iter_page_mp((bvec), (iter))
+#define bvec_iter_offset_sp(bvec, iter)					\
+	(bvec_iter_offset_mp((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len_sp(bvec, iter)					\
+	min_t(unsigned, bvec_iter_len_mp((bvec), (iter)),		\
+	    (PAGE_SIZE - (bvec_iter_offset_sp((bvec), (iter)))))
+
+#define bvec_iter_page_sp(bvec, iter)					\
+	nth_page(bvec_iter_page_mp((bvec), (iter)),			\
+		 bvec_iter_page_idx_mp((bvec), (iter)))
 
 /* current interfaces support sp style at default */
 #define bvec_iter_page(bvec, iter)	bvec_iter_page_sp((bvec), (iter))
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 16/54] block: introduce bio_for_each_segment_mp()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (14 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 15/54] block: implement sp version of bvec iterator helpers Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 17/54] block: introduce bio_clone_sp() Ming Lei
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Christie,
	Hannes Reinecke, Kent Overstreet, Chaitanya Kulkarni,
	Johannes Berg

This helper is used to iterate multipage bvec and it is
required in bio_clone().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h  | 39 ++++++++++++++++++++++++++++++++++-----
 include/linux/bvec.h | 37 ++++++++++++++++++++++++++++++++-----
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 714fbf495af7..2bd4e6f2087a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -68,6 +68,9 @@
 #define bio_data_dir(bio) \
 	(op_is_write(bio_op(bio)) ? WRITE : READ)
 
+#define bio_iter_iovec_mp(bio, iter)				\
+	bvec_iter_bvec_mp((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -164,15 +167,31 @@ static inline void *bio_data(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)				\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-				    unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+				      unsigned bytes, bool mp)
 {
 	iter->bi_sector += bytes >> 9;
 
-	if (bio_no_advance_iter(bio))
+	if (bio_no_advance_iter(bio)) {
 		iter->bi_size -= bytes;
-	else
-		bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+	} else {
+		if (!mp)
+			bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+		else
+			bvec_iter_advance_mp(bio->bi_io_vec, iter, bytes);
+	}
+}
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+				    unsigned bytes)
+{
+	__bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
+				       unsigned bytes)
+{
+	__bio_advance_iter(bio, iter, bytes, true);
 }
 
 #define __bio_for_each_segment(bvl, bio, iter, start)			\
@@ -188,6 +207,16 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)				\
 	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_segment_mp(bvl, bio, iter, start)		\
+	for (iter = (start);						\
+	     (iter).bi_size &&						\
+		((bvl = bio_iter_iovec_mp((bio), (iter))), 1);		\
+	     bio_advance_iter_mp((bio), &(iter), (bvl).bv_len))
+
+/* returns one real segment(multipage bvec) each time */
+#define bio_for_each_segment_mp(bvl, bio, iter)				\
+	__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index d77c3cabce8c..f8a8b293cd32 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -128,16 +128,29 @@ struct bvec_iter {
 	.bv_offset	= bvec_iter_offset((bvec), (iter)),	\
 })
 
-static inline void bvec_iter_advance(const struct bio_vec *bv,
-				     struct bvec_iter *iter,
-				     unsigned bytes)
+#define bvec_iter_bvec_mp(bvec, iter)				\
+((struct bio_vec) {						\
+	.bv_page	= bvec_iter_page_mp((bvec), (iter)),	\
+	.bv_len		= bvec_iter_len_mp((bvec), (iter)),	\
+	.bv_offset	= bvec_iter_offset_mp((bvec), (iter)),	\
+})
+
+static inline void __bvec_iter_advance(const struct bio_vec *bv,
+				       struct bvec_iter *iter,
+				       unsigned bytes, bool mp)
 {
 	WARN_ONCE(bytes > iter->bi_size,
 		  "Attempted to advance past end of bvec iter\n");
 
 	while (bytes) {
-		unsigned iter_len = bvec_iter_len(bv, *iter);
-		unsigned len = min(bytes, iter_len);
+		unsigned len;
+
+		if (mp)
+			len = bvec_iter_len_mp(bv, *iter);
+		else
+			len = bvec_iter_len_sp(bv, *iter);
+
+		len = min(bytes, len);
 
 		bytes -= len;
 		iter->bi_size -= len;
@@ -150,6 +163,20 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
 	}
 }
 
+static inline void bvec_iter_advance(const struct bio_vec *bv,
+				     struct bvec_iter *iter,
+				     unsigned bytes)
+{
+	__bvec_iter_advance(bv, iter, bytes, false);
+}
+
+static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
+					struct bvec_iter *iter,
+					unsigned bytes)
+{
+	__bvec_iter_advance(bv, iter, bytes, true);
+}
+
 #define for_each_bvec(bvl, bio_vec, iter, start)			\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 17/54] block: introduce bio_clone_sp()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (15 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 16/54] block: introduce bio_for_each_segment_mp() Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 18/54] bvec_iter: introduce BVEC_ITER_ALL_INIT Ming Lei
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe,
	Mike Christie, Hannes Reinecke, Kent Overstreet,
	Chaitanya Kulkarni

Firstly bio_clone() and bio_clone_bioset() are changed
to clone mp bvecs because our iterator helpers are capable
of splitting mp bvecs into sp bvecs.

But sometimes we still need cloned bio with singlepage bvecs,
for example, in bio bounce/bcache(bch_data_verify), bvecs of
cloned bio need to be updated.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c         | 27 +++++++++++++++++++++------
 include/linux/bio.h | 42 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 10398969353b..a76ed8a780de 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -630,16 +630,22 @@ EXPORT_SYMBOL(bio_clone_fast);
  * 	@bio_src: bio to clone
  *	@gfp_mask: allocation priority
  *	@bs: bio_set to allocate from
+ *	@sp_bvecs: if clone to singlepage bvecs.
  *
  *	Clone bio. Caller will own the returned bio, but not the actual data it
  *	points to. Reference count of returned bio will be one.
+ *
+ *	If @sp_bvecs is true, the caller must make sure number of singlepage
+ *	bvecs is less than maximum bvec count.
+ *
  */
-struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
-			     struct bio_set *bs)
+struct bio *__bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
+			       struct bio_set *bs, bool sp_bvecs)
 {
 	struct bvec_iter iter;
 	struct bio_vec bv;
 	struct bio *bio;
+	unsigned segs;
 
 	/*
 	 * Pre immutable biovecs, __bio_clone() used to just do a memcpy from
@@ -663,7 +669,12 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 	 *    __bio_clone_fast() anyways.
 	 */
 
-	bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
+	if (sp_bvecs)
+		segs = bio_segments(bio_src);
+	else
+		segs = bio_segments_mp(bio_src);
+
+	bio = bio_alloc_bioset(gfp_mask, segs, bs);
 	if (!bio)
 		return NULL;
 	bio->bi_bdev		= bio_src->bi_bdev;
@@ -680,8 +691,12 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 		bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
 		break;
 	default:
-		bio_for_each_segment(bv, bio_src, iter)
-			bio->bi_io_vec[bio->bi_vcnt++] = bv;
+		if (sp_bvecs)
+			bio_for_each_segment(bv, bio_src, iter)
+				bio->bi_io_vec[bio->bi_vcnt++] = bv;
+		else
+			bio_for_each_segment_mp(bv, bio_src, iter)
+				bio->bi_io_vec[bio->bi_vcnt++] = bv;
 		break;
 	}
 
@@ -699,7 +714,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 
 	return bio;
 }
-EXPORT_SYMBOL(bio_clone_bioset);
+EXPORT_SYMBOL(__bio_clone_bioset);
 
 /**
  *	bio_add_pc_page	-	attempt to add page to bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2bd4e6f2087a..0f2859f96468 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -219,7 +219,7 @@ static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
 
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
-static inline unsigned bio_segments(struct bio *bio)
+static inline unsigned __bio_segments(struct bio *bio, bool mp)
 {
 	unsigned segs = 0;
 	struct bio_vec bv;
@@ -241,12 +241,26 @@ static inline unsigned bio_segments(struct bio *bio)
 		break;
 	}
 
-	bio_for_each_segment(bv, bio, iter)
-		segs++;
+	if (!mp)
+		bio_for_each_segment(bv, bio, iter)
+			segs++;
+	else
+		bio_for_each_segment_mp(bv, bio, iter)
+			segs++;
 
 	return segs;
 }
 
+static inline unsigned bio_segments(struct bio *bio)
+{
+	return __bio_segments(bio, false);
+}
+
+static inline unsigned bio_segments_mp(struct bio *bio)
+{
+	return __bio_segments(bio, true);
+}
+
 /*
  * get a reference to a bio, so it won't disappear. the intended use is
  * something like:
@@ -419,10 +433,24 @@ extern void bio_put(struct bio *);
 
 extern void __bio_clone_fast(struct bio *, struct bio *);
 extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);
-extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+extern struct bio *__bio_clone_bioset(struct bio *, gfp_t,
+				      struct bio_set *bs, bool);
 
 extern struct bio_set *fs_bio_set;
 
+/* at default we clone bio with multipage bvecs */
+static inline struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp,
+					   struct bio_set *bs)
+{
+	return __bio_clone_bioset(bio, gfp, bs, false);
+}
+
+static inline struct bio *bio_clone_bioset_sp(struct bio *bio, gfp_t gfp,
+					      struct bio_set *bs)
+{
+	return __bio_clone_bioset(bio, gfp, bs, true);
+}
+
 static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
@@ -433,6 +461,12 @@ static inline struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 	return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
 }
 
+/* Sometimes we have to clone one bio with singlepage bvec */
+static inline struct bio *bio_clone_sp(struct bio *bio, gfp_t gfp_mask)
+{
+	return __bio_clone_bioset(bio, gfp_mask, fs_bio_set, true);
+}
+
 static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, NULL);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 18/54] bvec_iter: introduce BVEC_ITER_ALL_INIT
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (16 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 17/54] block: introduce bio_clone_sp() Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 19/54] block: bounce: avoid direct access to bvec table Ming Lei
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Johannes Berg

Introduce BVEC_ITER_ALL_INIT for iterating one bio
from start to end.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index f8a8b293cd32..5c51c58fe202 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -183,4 +183,13 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
 		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
 	     bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
 
+/* for iterating one bio from start to end */
+#define BVEC_ITER_ALL_INIT (struct bvec_iter)				\
+{									\
+	.bi_sector	= 0,						\
+	.bi_size	= UINT_MAX,					\
+	.bi_idx		= 0,						\
+	.bi_bvec_done	= 0,						\
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 19/54] block: bounce: avoid direct access to bvec table
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (17 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 18/54] bvec_iter: introduce BVEC_ITER_ALL_INIT Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 20/54] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq Ming Lei
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

We will support multipage bvecs in the future, so change to
iterator way for getting bv_page of bvec from original bio.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 1cb5dd3a5da1..babd3f224ca0 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -126,21 +126,23 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 static void bounce_end_io(struct bio *bio, mempool_t *pool)
 {
 	struct bio *bio_orig = bio->bi_private;
-	struct bio_vec *bvec, *org_vec;
+	struct bio_vec *bvec, orig_vec;
 	int i;
-	int start = bio_orig->bi_iter.bi_idx;
+	struct bvec_iter orig_iter = bio_orig->bi_iter;
 
 	/*
 	 * free up bounce indirect pages used
 	 */
 	bio_for_each_segment_all(bvec, bio, i) {
-		org_vec = bio_orig->bi_io_vec + i + start;
 
-		if (bvec->bv_page == org_vec->bv_page)
-			continue;
+		orig_vec = bio_iter_iovec(bio_orig, orig_iter);
+		if (bvec->bv_page == orig_vec.bv_page)
+			goto next;
 
 		dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
 		mempool_free(bvec->bv_page, pool);
+ next:
+		bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
 	}
 
 	bio_orig->bi_error = bio->bi_error;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 20/54] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (18 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 19/54] block: bounce: avoid direct access to bvec table Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 21/54] block: introduce bio_can_convert_to_sp() Ming Lei
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

As we need to support multipage bvecs, so don't access bio->bi_io_vec
in copy_to_high_bio_irq(), and just use the standard iterator
to do that.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index babd3f224ca0..a42f7b98b7e6 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -102,24 +102,30 @@ int init_emergency_isa_pool(void)
 static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 {
 	unsigned char *vfrom;
-	struct bio_vec tovec, *fromvec = from->bi_io_vec;
+	struct bio_vec tovec, fromvec;
 	struct bvec_iter iter;
+	/*
+	 * @from bio is created by bounce, so we can iterate from
+	 * start and can't trust @from->bi_iter because it might be
+	 * changed by splitting.
+	 */
+	struct bvec_iter from_iter = BVEC_ITER_ALL_INIT;
 
 	bio_for_each_segment(tovec, to, iter) {
-		if (tovec.bv_page != fromvec->bv_page) {
+		fromvec = bio_iter_iovec(from, from_iter);
+		if (tovec.bv_page != fromvec.bv_page) {
 			/*
 			 * fromvec->bv_offset and fromvec->bv_len might have
 			 * been modified by the block layer, so use the original
 			 * copy, bounce_copy_vec already uses tovec->bv_len
 			 */
-			vfrom = page_address(fromvec->bv_page) +
+			vfrom = page_address(fromvec.bv_page) +
 				tovec.bv_offset;
 
 			bounce_copy_vec(&tovec, vfrom);
 			flush_dcache_page(tovec.bv_page);
 		}
-
-		fromvec++;
+		bio_advance_iter(from, &from_iter, tovec.bv_len);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 21/54] block: introduce bio_can_convert_to_sp()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (19 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 20/54] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 22/54] block: bounce: convert multipage bvecs into singlepage Ming Lei
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Christie,
	Hannes Reinecke, Kent Overstreet, Chaitanya Kulkarni

This patch introduces bio_can_convert_to_sp() for checking
if one multipage bio can be converted into one singlepage
bio. If not, returns how many sectors we need to split for
converting the splitted one into singlepage bio.

In the following patches, block bounce and bcache will use
the helper.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 0f2859f96468..79079bc5a1be 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -262,6 +262,35 @@ static inline unsigned bio_segments_mp(struct bio *bio)
 }
 
 /*
+ * This helper checks @bio and see if we can convert it into one
+ * singlepage bvec based bio. Return true if yes, otherwise false
+ * is returned.
+ *
+ * @sectors returns how many sectors we need to split for converting
+ * the splitted one into a singlepage bio if the whole bio can't be
+ * converted into a singlepage bio.
+ */
+static inline bool bio_can_convert_to_sp(struct bio *bio, unsigned *sectors)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	unsigned len = 0;
+	bool ret = true;
+	unsigned segs = 0;
+
+	bio_for_each_segment(bv, bio, iter) {
+		if (++segs > BIO_MAX_PAGES) {
+			ret = false;
+			*sectors = len >> 9;
+			break;
+		}
+		len += bv.bv_len;
+	}
+
+	return ret;
+}
+
+/*
  * get a reference to a bio, so it won't disappear. the intended use is
  * something like:
  *
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 22/54] block: bounce: convert multipage bvecs into singlepage
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (20 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 21/54] block: introduce bio_can_convert_to_sp() Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs Ming Lei
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

This patch trys to split the incoming multipage bvecs bio, so
that the splitted bio can be held into one singlepage bvecs bio.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index a42f7b98b7e6..08841ed4cdae 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -187,22 +187,33 @@ static void bounce_end_io_read_isa(struct bio *bio)
 	__bounce_end_io_read(bio, isa_page_pool);
 }
 
+static inline bool need_bounce(struct request_queue *q, struct bio *bio)
+{
+	struct bvec_iter iter;
+	struct bio_vec bv;
+
+	bio_for_each_segment_mp(bv, bio, iter) {
+		unsigned nr = (bv.bv_offset + bv.bv_len - 1) >>
+			PAGE_SHIFT;
+
+		if (page_to_pfn(bv.bv_page) + nr > queue_bounce_pfn(q))
+			return true;
+	}
+	return false;
+}
+
 static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 			       mempool_t *pool)
 {
 	struct bio *bio;
 	int rw = bio_data_dir(*bio_orig);
-	struct bio_vec *to, from;
-	struct bvec_iter iter;
+	struct bio_vec *to;
 	unsigned i;
 
-	bio_for_each_segment(from, *bio_orig, iter)
-		if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
-			goto bounce;
+	if (!need_bounce(q, *bio_orig))
+		return;
 
-	return;
-bounce:
-	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
+	bio = bio_clone_bioset_sp(*bio_orig, GFP_NOIO, fs_bio_set);
 
 	bio_for_each_segment_all(to, bio, i) {
 		struct page *page = to->bv_page;
@@ -246,6 +257,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 {
 	mempool_t *pool;
+	unsigned sectors;
 
 	/*
 	 * Data-less bio, nothing to bounce
@@ -267,9 +279,22 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 		pool = isa_page_pool;
 	}
 
+	if (!need_bounce(q, *bio_orig))
+		return;
+
 	/*
 	 * slow path
+	 *
+	 * REQ_PC bio won't reach splitting because multipage bvec
+	 * isn't enabled for REQ_PC.
 	 */
+	if (!bio_can_convert_to_sp(*bio_orig, &sectors)) {
+		struct bio *split = bio_split(*bio_orig, sectors,
+					      GFP_NOIO, q->bio_split);
+		bio_chain(split, *bio_orig);
+		generic_make_request(*bio_orig);
+		*bio_orig = split;
+	}
 	__blk_queue_bounce(q, bio_orig, pool);
 }
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (21 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 22/54] block: bounce: convert multipage bvecs into singlepage Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-30 11:01   ` Coly Li
  2016-12-27 15:56 ` [PATCH v1 24/54] blk-merge: compute bio->bi_seg_front_size efficiently Ming Lei
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

The incoming bio may be too big to be cloned into
one singlepage bvecs bio, so split the bio and
check the splitted bio one by one.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 48d03e8b3385..18b2d2d138e3 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -103,7 +103,7 @@ void bch_btree_verify(struct btree *b)
 	up(&b->io_mutex);
 }
 
-void bch_data_verify(struct cached_dev *dc, struct bio *bio)
+static void __bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
 	char name[BDEVNAME_SIZE];
 	struct bio *check;
@@ -116,7 +116,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	 * in the new cloned bio because each single page need
 	 * to assign to each bvec of the new bio.
 	 */
-	check = bio_clone(bio, GFP_NOIO);
+	check = bio_clone_sp(bio, GFP_NOIO);
 	if (!check)
 		return;
 	check->bi_opf = REQ_OP_READ;
@@ -151,6 +151,26 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	bio_put(check);
 }
 
+void bch_data_verify(struct cached_dev *dc, struct bio *bio)
+{
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	struct bio *clone = bio_clone_fast(bio, GFP_NOIO, q->bio_split);
+	unsigned sectors;
+
+	while (!bio_can_convert_to_sp(clone, &sectors)) {
+		struct bio *split = bio_split(clone, sectors,
+					      GFP_NOIO, q->bio_split);
+
+		__bch_data_verify(dc, split);
+		bio_put(split);
+	}
+
+	if (bio_sectors(clone))
+		__bch_data_verify(dc, clone);
+
+	bio_put(clone);
+}
+
 #endif
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 24/54] blk-merge: compute bio->bi_seg_front_size efficiently
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (22 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 25/54] block: blk-merge: try to make front segments in full size Ming Lei
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

It is enough to check and compute bio->bi_seg_front_size just
after the 1st segment is found, but current code checks that
for each bvec, which is inefficient.

This patch follows the way in  __blk_recalc_rq_segments()
for computing bio->bi_seg_front_size, and it is more efficient
and code becomes more readable too.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 182398cb1524..e3abc835e4b7 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -153,22 +153,21 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			bvprvp = &bvprv;
 			sectors += bv.bv_len >> 9;
 
-			if (nsegs == 1 && seg_size > front_seg_size)
-				front_seg_size = seg_size;
 			continue;
 		}
 new_segment:
 		if (nsegs == queue_max_segments(q))
 			goto split;
 
+		if (nsegs == 1 && seg_size > front_seg_size)
+			front_seg_size = seg_size;
+
 		nsegs++;
 		bvprv = bv;
 		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 
-		if (nsegs == 1 && seg_size > front_seg_size)
-			front_seg_size = seg_size;
 	}
 
 	do_split = false;
@@ -181,6 +180,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			bio = new;
 	}
 
+	if (nsegs == 1 && seg_size > front_seg_size)
+		front_seg_size = seg_size;
 	bio->bi_seg_front_size = front_seg_size;
 	if (seg_size > bio->bi_seg_back_size)
 		bio->bi_seg_back_size = seg_size;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 25/54] block: blk-merge: try to make front segments in full size
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (23 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 24/54] blk-merge: compute bio->bi_seg_front_size efficiently Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 26/54] block: blk-merge: remove unnecessary check Ming Lei
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

When merging one bvec into segment, if the bvec is too big
to merge, current policy is to move the whole bvec into another
new segment.

This patchset changes the policy into trying to maximize size of
front segments, that means in above situation, part of bvec
is merged into current segment, and the remainder is put
into next segment.

This patch prepares for support multipage bvec because
it can be quite common to see this case and we should try
to make front segments in full size.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index e3abc835e4b7..a801f62a104b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -95,6 +95,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
 	unsigned bvecs = 0;
+	unsigned advance = 0;
 
 	bio_for_each_segment(bv, bio, iter) {
 		/*
@@ -141,12 +142,32 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		}
 
 		if (bvprvp && blk_queue_cluster(q)) {
-			if (seg_size + bv.bv_len > queue_max_segment_size(q))
-				goto new_segment;
 			if (!BIOVEC_PHYS_MERGEABLE(bvprvp, &bv))
 				goto new_segment;
 			if (!BIOVEC_SEG_BOUNDARY(q, bvprvp, &bv))
 				goto new_segment;
+			if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
+				/*
+				 * On assumption is that initial value of
+				 * @seg_size(equals to bv.bv_len) won't be
+				 * bigger than max segment size, but will
+				 * becomes false after multipage bvec comes.
+				 */
+				advance = queue_max_segment_size(q) - seg_size;
+
+				if (advance > 0) {
+					seg_size += advance;
+					sectors += advance >> 9;
+					bv.bv_len -= advance;
+					bv.bv_offset += advance;
+				}
+
+				/*
+				 * Still need to put remainder of current
+				 * bvec into a new segment.
+				 */
+				goto new_segment;
+			}
 
 			seg_size += bv.bv_len;
 			bvprv = bv;
@@ -168,6 +189,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 
+		/* restore the bvec for iterator */
+		if (advance) {
+			bv.bv_len += advance;
+			bv.bv_offset -= advance;
+			advance = 0;
+		}
 	}
 
 	do_split = false;
@@ -370,16 +397,29 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 {
 
 	int nbytes = bvec->bv_len;
+	unsigned advance = 0;
 
 	if (*sg && *cluster) {
-		if ((*sg)->length + nbytes > queue_max_segment_size(q))
-			goto new_segment;
-
 		if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
 			goto new_segment;
 		if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
 			goto new_segment;
 
+		/*
+		 * try best to merge part of the bvec into previous
+		 * segment and follow same policy with
+		 * blk_bio_segment_split()
+		 */
+		if ((*sg)->length + nbytes > queue_max_segment_size(q)) {
+			advance = queue_max_segment_size(q) - (*sg)->length;
+			if (advance) {
+				(*sg)->length += advance;
+				bvec->bv_offset += advance;
+				bvec->bv_len -= advance;
+			}
+			goto new_segment;
+		}
+
 		(*sg)->length += nbytes;
 	} else {
 new_segment:
@@ -402,6 +442,10 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 
 		sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
 		(*nsegs)++;
+
+		/* for making iterator happy */
+		bvec->bv_offset -= advance;
+		bvec->bv_len += advance;
 	}
 	*bvprv = *bvec;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 26/54] block: blk-merge: remove unnecessary check
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (24 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 25/54] block: blk-merge: try to make front segments in full size Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 27/54] block: use bio_for_each_segment_mp() to compute segments count Ming Lei
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

In this case, 'sectors' can't be zero at all, so remove the check.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index a801f62a104b..05b6a3ef63f6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -136,9 +136,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 				nsegs++;
 				sectors = max_sectors;
 			}
-			if (sectors)
-				goto split;
-			/* Make this single bvec as the 1st segment */
+			goto split;
 		}
 
 		if (bvprvp && blk_queue_cluster(q)) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 27/54] block: use bio_for_each_segment_mp() to compute segments count
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (25 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 26/54] block: blk-merge: remove unnecessary check Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 28/54] block: use bio_for_each_segment_mp() to map sg Ming Lei
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

Firstly it is more efficient to use bio_for_each_segment_mp()
in both blk_bio_segment_split() and __blk_recalc_rq_segments()
to compute how many segments there are in the bio.

Secondaly once bio_for_each_segment_mp() is used, the bvec
may need to be splitted because its length can be very long
and more than max segment size, so we have to split one bvec
into several segments.

Thirdly during splitting mp bvec into segments, max segment
number may be reached, then the bio need to be splitted.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 98 +++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 05b6a3ef63f6..a0e97959db7b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -82,6 +82,63 @@ static inline unsigned get_max_io_size(struct request_queue *q,
 	return sectors;
 }
 
+/*
+ * Split the bvec @bv into segments, and update all kinds of
+ * variables.
+ */
+static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
+		unsigned *nsegs, unsigned *last_seg_size,
+		unsigned *front_seg_size, unsigned *sectors)
+{
+	bool need_split = false;
+	unsigned len = bv->bv_len;
+	unsigned total_len = 0;
+	unsigned new_nsegs = 0, seg_size = 0;
+	int idx;
+
+	if ((*nsegs >= queue_max_segments(q)) || !len)
+		return need_split;
+
+	/*
+	 * Multipage bvec may be too big to hold in one segment,
+	 * so the current bvec has to be splitted as multiple
+	 * segments.
+	 */
+	while (new_nsegs + *nsegs < queue_max_segments(q)) {
+		seg_size = min(queue_max_segment_size(q), len);
+
+		new_nsegs++;
+		total_len += seg_size;
+		len -= seg_size;
+
+		if ((queue_virt_boundary(q) && ((bv->bv_offset +
+		    total_len) & queue_virt_boundary(q))) || !len)
+			break;
+	}
+
+	/* split in the middle of the bvec */
+	if (len)
+		need_split = true;
+
+	/* update front segment size */
+	if (!*nsegs) {
+		unsigned first_seg_size = seg_size;
+
+		if (new_nsegs > 1)
+			first_seg_size = queue_max_segment_size(q);
+		if (*front_seg_size < first_seg_size)
+			*front_seg_size = first_seg_size;
+	}
+
+	/* update other varibles */
+	*last_seg_size = seg_size;
+	*nsegs += new_nsegs;
+	if (sectors)
+		*sectors += total_len >> 9;
+
+	return need_split;
+}
+
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 					 struct bio *bio,
 					 struct bio_set *bs,
@@ -97,7 +154,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	unsigned bvecs = 0;
 	unsigned advance = 0;
 
-	bio_for_each_segment(bv, bio, iter) {
+	bio_for_each_segment_mp(bv, bio, iter) {
 		/*
 		 * With arbitrary bio size, the incoming bio may be very
 		 * big. We have to split the bio into small bios so that
@@ -133,8 +190,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			 */
 			if (nsegs < queue_max_segments(q) &&
 			    sectors < max_sectors) {
-				nsegs++;
-				sectors = max_sectors;
+				/* split in the middle of bvec */
+				bv.bv_len = (max_sectors - sectors) << 9;
+				bvec_split_segs(q, &bv, &nsegs,
+						&seg_size,
+						&front_seg_size,
+						&sectors);
 			}
 			goto split;
 		}
@@ -146,10 +207,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 				goto new_segment;
 			if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
 				/*
-				 * On assumption is that initial value of
-				 * @seg_size(equals to bv.bv_len) won't be
-				 * bigger than max segment size, but will
-				 * becomes false after multipage bvec comes.
+				 * The initial value of @seg_size won't be
+				 * bigger than max segment size, because we
+				 * split the bvec via bvec_split_segs().
 				 */
 				advance = queue_max_segment_size(q) - seg_size;
 
@@ -181,11 +241,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (nsegs == 1 && seg_size > front_seg_size)
 			front_seg_size = seg_size;
 
-		nsegs++;
 		bvprv = bv;
 		bvprvp = &bvprv;
-		seg_size = bv.bv_len;
-		sectors += bv.bv_len >> 9;
+
+		if (bvec_split_segs(q, &bv, &nsegs, &seg_size,
+					&front_seg_size, &sectors))
+			goto split;
 
 		/* restore the bvec for iterator */
 		if (advance) {
@@ -261,6 +322,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	struct bio_vec bv, bvprv = { NULL };
 	int cluster, prev = 0;
 	unsigned int seg_size, nr_phys_segs;
+	unsigned front_seg_size = bio->bi_seg_front_size;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
 
@@ -281,7 +343,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	seg_size = 0;
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
-		bio_for_each_segment(bv, bio, iter) {
+		bio_for_each_segment_mp(bv, bio, iter) {
 			/*
 			 * If SG merging is disabled, each bio vector is
 			 * a segment
@@ -303,20 +365,20 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 				continue;
 			}
 new_segment:
-			if (nr_phys_segs == 1 && seg_size >
-			    fbio->bi_seg_front_size)
-				fbio->bi_seg_front_size = seg_size;
+			if (nr_phys_segs == 1 && seg_size > front_seg_size)
+				front_seg_size = seg_size;
 
-			nr_phys_segs++;
 			bvprv = bv;
 			prev = 1;
-			seg_size = bv.bv_len;
+			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
+					&front_seg_size, NULL);
 		}
 		bbio = bio;
 	}
 
-	if (nr_phys_segs == 1 && seg_size > fbio->bi_seg_front_size)
-		fbio->bi_seg_front_size = seg_size;
+	if (nr_phys_segs == 1 && seg_size > front_seg_size)
+		front_seg_size = seg_size;
+	fbio->bi_seg_front_size = front_seg_size;
 	if (seg_size > bbio->bi_seg_back_size)
 		bbio->bi_seg_back_size = seg_size;
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 28/54] block: use bio_for_each_segment_mp() to map sg
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (26 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 27/54] block: use bio_for_each_segment_mp() to compute segments count Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 29/54] block: introduce bvec_for_each_sp_bvec() Ming Lei
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Jens Axboe

It is more efficient to use bio_for_each_segment_mp()
for mapping sg, meantime we have to consider splitting
multipage bvec as done in blk_bio_segment_split().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 72 +++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index a0e97959db7b..55c5866ea77a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -450,6 +450,56 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
 	return 0;
 }
 
+static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
+		struct scatterlist *sglist)
+{
+	if (!*sg)
+		return sglist;
+	else {
+		/*
+		 * If the driver previously mapped a shorter
+		 * list, we could see a termination bit
+		 * prematurely unless it fully inits the sg
+		 * table on each mapping. We KNOW that there
+		 * must be more entries here or the driver
+		 * would be buggy, so force clear the
+		 * termination bit to avoid doing a full
+		 * sg_init_table() in drivers for each command.
+		 */
+		sg_unmark_end(*sg);
+		return sg_next(*sg);
+	}
+}
+
+static inline unsigned
+blk_bvec_map_sg(struct request_queue *q, struct bio_vec *bvec,
+		struct scatterlist *sglist, struct scatterlist **sg)
+{
+	unsigned nbytes = bvec->bv_len;
+	unsigned nsegs = 0, total = 0;
+
+	while (nbytes > 0) {
+		unsigned seg_size;
+		struct page *pg;
+		unsigned offset, idx;
+
+		*sg = blk_next_sg(sg, sglist);
+
+		seg_size = min(nbytes, queue_max_segment_size(q));
+		offset = (total + bvec->bv_offset) % PAGE_SIZE;
+		idx = (total + bvec->bv_offset) / PAGE_SIZE;
+		pg = nth_page(bvec->bv_page, idx);
+
+		sg_set_page(*sg, pg, seg_size, offset);
+
+		total += seg_size;
+		nbytes -= seg_size;
+		nsegs++;
+	}
+
+	return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 		     struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -483,25 +533,7 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 		(*sg)->length += nbytes;
 	} else {
 new_segment:
-		if (!*sg)
-			*sg = sglist;
-		else {
-			/*
-			 * If the driver previously mapped a shorter
-			 * list, we could see a termination bit
-			 * prematurely unless it fully inits the sg
-			 * table on each mapping. We KNOW that there
-			 * must be more entries here or the driver
-			 * would be buggy, so force clear the
-			 * termination bit to avoid doing a full
-			 * sg_init_table() in drivers for each command.
-			 */
-			sg_unmark_end(*sg);
-			*sg = sg_next(*sg);
-		}
-
-		sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-		(*nsegs)++;
+		(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
 
 		/* for making iterator happy */
 		bvec->bv_offset -= advance;
@@ -527,7 +559,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 	int cluster = blk_queue_cluster(q), nsegs = 0;
 
 	for_each_bio(bio)
-		bio_for_each_segment(bvec, bio, iter)
+		bio_for_each_segment_mp(bvec, bio, iter)
 			__blk_segment_map_sg(q, &bvec, sglist, &bvprv, sg,
 					     &nsegs, &cluster);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 29/54] block: introduce bvec_for_each_sp_bvec()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (27 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 28/54] block: use bio_for_each_segment_mp() to map sg Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 30/54] block: bio: introduce single/multi page version of bio_for_each_segment_all() Ming Lei
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Johannes Berg

This helper can be used to iterate each singlepage bvec
from one multipage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 5c51c58fe202..baf379d56106 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -192,4 +192,18 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
 	.bi_bvec_done	= 0,						\
 }
 
+/*
+ * This helper iterates over the multipage bvec of @mp_bvec and
+ * returns each singlepage bvec via @sp_bvl.
+ */
+#define __bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, start)		\
+	for (iter = start,						\
+	     (iter).bi_size = (mp_bvec)->bv_len;			\
+	     (iter).bi_size &&						\
+		((sp_bvl = bvec_iter_bvec((mp_bvec), (iter))), 1);	\
+	     bvec_iter_advance((mp_bvec), &(iter), (sp_bvl).bv_len))
+
+#define bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter)			\
+	__bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, BVEC_ITER_ALL_INIT)
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 30/54] block: bio: introduce single/multi page version of bio_for_each_segment_all()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (28 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 29/54] block: introduce bvec_for_each_sp_bvec() Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2016-12-27 15:56 ` [PATCH v1 31/54] block: introduce bio_segments_all() Ming Lei
  2017-01-16  3:19 ` [PATCH v1 00/54] block: support multipage bvec Ming Lei
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Christie,
	Hannes Reinecke, Kent Overstreet, Chaitanya Kulkarni,
	Shaun Tancheff, Johannes Thumshirn, Bart Van Assche

This patches introduce bio_for_each_segment_all_sp() and
bio_for_each_segment_all_mp().

bio_for_each_segment_all_sp() is for replacing bio_for_each_segment_all()
in case that the returned bvec has to be single page bvec.

bio_for_each_segment_all_mp() is for replacing bio_for_each_segment_all()
in case that user wants to update the returned bvec via the pointer.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h       | 24 ++++++++++++++++++++++++
 include/linux/blk_types.h |  6 ++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 79079bc5a1be..efa0b3627735 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -217,6 +217,30 @@ static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
 #define bio_for_each_segment_mp(bvl, bio, iter)				\
 	__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
 
+/*
+ * This helper returns each bvec stored in bvec table directly,
+ * so the returned bvec points to one multipage bvec in the table
+ * and caller can update the bvec via the returnd pointer.
+ */
+#define bio_for_each_segment_all_mp(bvl, bio, i)                       \
+	bio_for_each_segment_all((bvl), (bio), (i))
+
+/*
+ * This helper returns singlepage bvec to caller, and the sp bvec
+ * is generated in-flight from multipage bvec stored in bvec table.
+ * So we can _not_ change the bvec stored in bio->bi_io_vec[] via
+ * this helper.
+ *
+ * If someone need to update bvec in the table, please use
+ * bio_for_each_segment_all_mp() and make sure it is correctly used
+ * since the bvec points to one multipage bvec.
+ */
+#define bio_for_each_segment_all_sp(bvl, bio, i, bi)			\
+	for ((bi).iter = BVEC_ITER_ALL_INIT, i = 0, bvl = &(bi).bv;	\
+	     (bi).iter.bi_idx < (bio)->bi_vcnt &&			\
+		(((bi).bv = bio_iter_iovec((bio), (bi).iter)), 1);	\
+	     bio_advance_iter((bio), &(bi).iter, (bi).bv.bv_len), i++)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned __bio_segments(struct bio *bio, bool mp)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 519ea2c9df61..ef8e30abb099 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -89,6 +89,12 @@ struct bio {
 
 #define BIO_RESET_BYTES		offsetof(struct bio, bi_max_vecs)
 
+/* this iter is only for implementing bio_for_each_segment_rd() */
+struct bvec_iter_all {
+	struct bvec_iter	iter;
+	struct bio_vec		bv;      /* in-flight singlepage bvec */
+};
+
 /*
  * bio flags
  */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 31/54] block: introduce bio_segments_all()
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (29 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 30/54] block: bio: introduce single/multi page version of bio_for_each_segment_all() Ming Lei
@ 2016-12-27 15:56 ` Ming Lei
  2017-01-16  3:19 ` [PATCH v1 00/54] block: support multipage bvec Ming Lei
  31 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-27 15:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Christie,
	Kent Overstreet, Hannes Reinecke, Chaitanya Kulkarni

There are still several direct access to .bi_vcnt, so
introduce this helper to replace that usage for supporting
multipage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index efa0b3627735..b0929cf8c7fe 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -391,6 +391,27 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
 		bv->bv_len = iter.bi_bvec_done;
 }
 
+/*
+ * Return how many singlepage bvecs included in this bio,
+ * and this helper is only used for some fs to replace
+ * bio->bi_vcnt.
+ */
+static inline unsigned bio_segments_all(struct bio *bio)
+{
+	unsigned segs = 0;
+	int i;
+	struct bio_vec *bv;
+	struct bvec_iter_all bia;
+
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+	bio_for_each_segment_all_sp(bv, bio, i, bia)
+		segs++;
+
+	return segs;
+}
+
+
 enum bip_flags {
 	BIP_BLOCK_INTEGRITY	= 1 << 0, /* block layer owns integrity data */
 	BIP_MAPPED_INTEGRITY	= 1 << 1, /* ref tag has been remapped */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/54] block: comment on bio_alloc_pages()
  2016-12-27 15:55 ` [PATCH v1 08/54] block: comment on bio_alloc_pages() Ming Lei
@ 2016-12-30 10:40   ` Coly Li
  2016-12-30 11:06   ` Coly Li
  1 sibling, 0 replies; 43+ messages in thread
From: Coly Li @ 2016-12-30 10:40 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On 2016/12/27 下午11:55, Ming Lei wrote:
> This patch adds comment on usage of bio_alloc_pages(),
> also comments on one special case of bch_data_verify().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c               | 4 +++-
>  drivers/md/bcache/debug.c | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 2b375020fc49..d4a1e0b63ea0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -961,7 +961,9 @@ EXPORT_SYMBOL(bio_advance);
>   * @bio: bio to allocate pages for
>   * @gfp_mask: flags for allocation
>   *
> - * Allocates pages up to @bio->bi_vcnt.
> + * Allocates pages up to @bio->bi_vcnt, and this function should only
> + * be called on a new initialized bio, which means all pages aren't added
> + * to the bio via bio_add_page() yet.
>   *
>   * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
>   * freed.
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 06f55056aaae..48d03e8b3385 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	struct bio_vec bv, cbv;
>  	struct bvec_iter iter, citer = { 0 };
>  
> +	/*
> +	 * Once multipage bvec is supported, the bio_clone()
> +	 * has to make sure page count in this bio can be held
> +	 * in the new cloned bio because each single page need
> +	 * to assign to each bvec of the new bio.
> +	 */
>  	check = bio_clone(bio, GFP_NOIO);
>  	if (!check)
>  		return;
> 
Acked-by: Coly Li <colyli@suse.de>

-- 
Coly Li

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs
  2016-12-27 15:56 ` [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs Ming Lei
@ 2016-12-30 11:01   ` Coly Li
  2016-12-31 10:29     ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Coly Li @ 2016-12-30 11:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Kent Overstreet, Shaohua Li, Mike Christie, Guoqing Jiang,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On 2016/12/27 下午11:56, Ming Lei wrote:
> The incoming bio may be too big to be cloned into
> one singlepage bvecs bio, so split the bio and
> check the splitted bio one by one.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/bcache/debug.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 48d03e8b3385..18b2d2d138e3 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -103,7 +103,7 @@ void bch_btree_verify(struct btree *b)
>  	up(&b->io_mutex);
>  }
>  
> -void bch_data_verify(struct cached_dev *dc, struct bio *bio)
> +static void __bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  {
>  	char name[BDEVNAME_SIZE];
>  	struct bio *check;
> @@ -116,7 +116,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	 * in the new cloned bio because each single page need
>  	 * to assign to each bvec of the new bio.
>  	 */
> -	check = bio_clone(bio, GFP_NOIO);
> +	check = bio_clone_sp(bio, GFP_NOIO);
>  	if (!check)
>  		return;
>  	check->bi_opf = REQ_OP_READ;
> @@ -151,6 +151,26 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	bio_put(check);
>  }
>  
> +void bch_data_verify(struct cached_dev *dc, struct bio *bio)
> +{
> +	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> +	struct bio *clone = bio_clone_fast(bio, GFP_NOIO, q->bio_split);
> +	unsigned sectors;
> +
> +	while (!bio_can_convert_to_sp(clone, &sectors)) {
> +		struct bio *split = bio_split(clone, sectors,
> +					      GFP_NOIO, q->bio_split);
> +
> +		__bch_data_verify(dc, split);
> +		bio_put(split);
> +	}
> +
> +	if (bio_sectors(clone))
> +		__bch_data_verify(dc, clone);
> +
> +	bio_put(clone);
> +}
> +

Hi Lei,

The above patch is good IMHO. Just wondering why not use the classical
style ? something like,


do {
	if (!bio_can_convert_to_sp(clone, &sectors))
		split = bio_split(clone, sectors,
				  GFP_NOIO, q->bio_split);
	else
		split = clone;

	__bch_data_verity(gc, split);
	bio_put(split);
} while (split != clone);


I guess maybe the above style generates less binary code.


-- 
Coly Li

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/54] block: comment on bio_alloc_pages()
  2016-12-27 15:55 ` [PATCH v1 08/54] block: comment on bio_alloc_pages() Ming Lei
  2016-12-30 10:40   ` Coly Li
@ 2016-12-30 11:06   ` Coly Li
  1 sibling, 0 replies; 43+ messages in thread
From: Coly Li @ 2016-12-30 11:06 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On 2016/12/27 下午11:55, Ming Lei wrote:
> This patch adds comment on usage of bio_alloc_pages(),
> also comments on one special case of bch_data_verify().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c               | 4 +++-
>  drivers/md/bcache/debug.c | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 2b375020fc49..d4a1e0b63ea0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -961,7 +961,9 @@ EXPORT_SYMBOL(bio_advance);
>   * @bio: bio to allocate pages for
>   * @gfp_mask: flags for allocation
>   *
> - * Allocates pages up to @bio->bi_vcnt.
> + * Allocates pages up to @bio->bi_vcnt, and this function should only
> + * be called on a new initialized bio, which means all pages aren't added
> + * to the bio via bio_add_page() yet.
>   *
>   * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
>   * freed.
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 06f55056aaae..48d03e8b3385 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	struct bio_vec bv, cbv;
>  	struct bvec_iter iter, citer = { 0 };
>  
> +	/*
> +	 * Once multipage bvec is supported, the bio_clone()
> +	 * has to make sure page count in this bio can be held
> +	 * in the new cloned bio because each single page need
> +	 * to assign to each bvec of the new bio.
> +	 */
>  	check = bio_clone(bio, GFP_NOIO);
>  	if (!check)
>  		return;
> 
Acked-by: Coly Li <colyli@suse.de>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 07/54] bcache: comment on direct access to bvec table
  2016-12-27 15:55 ` [PATCH v1 07/54] bcache: " Ming Lei
@ 2016-12-30 16:56   ` Coly Li
  0 siblings, 0 replies; 43+ messages in thread
From: Coly Li @ 2016-12-30 16:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Kent Overstreet, Shaohua Li, Guoqing Jiang, Zheng Liu,
	Mike Christie, Jiri Kosina, Eric Wheeler, Yijing Wang, Al Viro,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On 2016/12/27 下午11:55, Ming Lei wrote:
> Looks all are safe after multipage bvec is supported.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/bcache/btree.c | 1 +
>  drivers/md/bcache/super.c | 6 ++++++
>  drivers/md/bcache/util.c  | 7 +++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index a43eedd5804d..fc35cfb4d0f1 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
>  
>  		continue_at(cl, btree_node_write_done, NULL);
>  	} else {
> +		/* No harm for multipage bvec since the new is just allocated */
>  		b->bio->bi_vcnt = 0;
>  		bch_bio_map(b->bio, i);
>  
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 3a19cbc8b230..607b022259dc 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -208,6 +208,7 @@ static void write_bdev_super_endio(struct bio *bio)
>  
>  static void __write_super(struct cache_sb *sb, struct bio *bio)
>  {
> +	/* single page bio, safe for multipage bvec */
>  	struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
>  	unsigned i;
>  
> @@ -1156,6 +1157,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
>  	dc->bdev->bd_holder = dc;
>  
>  	bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
> +
> +	/* single page bio, safe for multipage bvec */
>  	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
>  	get_page(sb_page);
>  
> @@ -1799,6 +1802,7 @@ void bch_cache_release(struct kobject *kobj)
>  	for (i = 0; i < RESERVE_NR; i++)
>  		free_fifo(&ca->free[i]);
>  
> +	/* single page bio, safe for multipage bvec */
>  	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
>  		put_page(ca->sb_bio.bi_io_vec[0].bv_page);
>  
> @@ -1854,6 +1858,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
>  	ca->bdev->bd_holder = ca;
>  
>  	bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
> +
> +	/* single page bio, safe for multipage bvec */
>  	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
>  	get_page(sb_page);
>  
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index dde6172f3f10..5cc0b49a65fb 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
>  		: 0;
>  }
>  
> +/*
> + * Generally it isn't good to access .bi_io_vec and .bi_vcnt
> + * directly, the preferred way is bio_add_page, but in
> + * this case, bch_bio_map() supposes that the bvec table
> + * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
> + * in this way even after multipage bvec is supported.
> + */
>  void bch_bio_map(struct bio *bio, void *base)
>  {
>  	size_t size = bio->bi_iter.bi_size;
> 

Acked-by: Coly Li <colyli@suse.de>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs
  2016-12-30 11:01   ` Coly Li
@ 2016-12-31 10:29     ` Ming Lei
  0 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2016-12-31 10:29 UTC (permalink / raw)
  To: Coly Li
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Kent Overstreet, Shaohua Li, Mike Christie,
	Guoqing Jiang, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Hi Coly,

On Fri, Dec 30, 2016 at 7:01 PM, Coly Li <i@coly.li> wrote:
> On 2016/12/27 下午11:56, Ming Lei wrote:
>> The incoming bio may be too big to be cloned into
>> one singlepage bvecs bio, so split the bio and
>> check the splitted bio one by one.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/bcache/debug.c | 24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
>> index 48d03e8b3385..18b2d2d138e3 100644
>> --- a/drivers/md/bcache/debug.c
>> +++ b/drivers/md/bcache/debug.c
>> @@ -103,7 +103,7 @@ void bch_btree_verify(struct btree *b)
>>       up(&b->io_mutex);
>>  }
>>
>> -void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>> +static void __bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>  {
>>       char name[BDEVNAME_SIZE];
>>       struct bio *check;
>> @@ -116,7 +116,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>        * in the new cloned bio because each single page need
>>        * to assign to each bvec of the new bio.
>>        */
>> -     check = bio_clone(bio, GFP_NOIO);
>> +     check = bio_clone_sp(bio, GFP_NOIO);
>>       if (!check)
>>               return;
>>       check->bi_opf = REQ_OP_READ;
>> @@ -151,6 +151,26 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>       bio_put(check);
>>  }
>>
>> +void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>> +{
>> +     struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>> +     struct bio *clone = bio_clone_fast(bio, GFP_NOIO, q->bio_split);
>> +     unsigned sectors;
>> +
>> +     while (!bio_can_convert_to_sp(clone, &sectors)) {
>> +             struct bio *split = bio_split(clone, sectors,
>> +                                           GFP_NOIO, q->bio_split);
>> +
>> +             __bch_data_verify(dc, split);
>> +             bio_put(split);
>> +     }
>> +
>> +     if (bio_sectors(clone))
>> +             __bch_data_verify(dc, clone);
>> +
>> +     bio_put(clone);
>> +}
>> +
>
> Hi Lei,
>
> The above patch is good IMHO. Just wondering why not use the classical
> style ? something like,

I don't know there is the classical style, :-)

>
>
> do {
>         if (!bio_can_convert_to_sp(clone, &sectors))
>                 split = bio_split(clone, sectors,
>                                   GFP_NOIO, q->bio_split);
>         else
>                 split = clone;
>
>         __bch_data_verity(gc, split);
>         bio_put(split);
> } while (split != clone);
>
>
> I guess maybe the above style generates less binary code.

Maybe, will take this style in V2.

Thanks for the review!

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
  2016-12-27 15:56 ` [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE Ming Lei
@ 2017-01-03 16:43   ` Mike Snitzer
  2017-01-06  3:30     ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Snitzer @ 2017-01-03 16:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-block,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li <shli@kernel.org>,
	linux-raid@vger.kernel.org open list:SOFTWARE RAID Multiple
	DisksSUPPORT, Alasdair Kergon

On Tue, Dec 27 2016 at 10:56am -0500,
Ming Lei <tom.leiming@gmail.com> wrote:

> For BIO based DM, some targets aren't ready for dealing with
> bigger incoming bio than 1Mbyte, such as crypt target.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/dm.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3086da5664f3..6139bf7623f7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
>  		return -EINVAL;
>  	}
>  
> -	ti->max_io_len = (uint32_t) len;
> +	/*
> +	 * BIO based queue uses its own splitting. When multipage bvecs
> +	 * is switched on, size of the incoming bio may be too big to
> +	 * be handled in some targets, such as crypt.
> +	 *
> +	 * When these targets are ready for the big bio, we can remove
> +	 * the limit.
> +	 */
> +	ti->max_io_len = min_t(uint32_t, len,
> +			       (BIO_MAX_PAGES * PAGE_SIZE));
>  
>  	return 0;
>  }
> -- 
> 2.7.4

dm_set_target_max_io_len() is already meant to be called by the .ctr
hook for each DM target.  So why not just have the dm-crypt target (and
other targets if needed) pass your reduced $len?

That way only targets that need to be fixed (e.g. dm-crypt) impose this
limit.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
  2017-01-03 16:43   ` Mike Snitzer
@ 2017-01-06  3:30     ` Ming Lei
  0 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2017-01-06  3:30 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Linux Kernel Mailing List, Christoph Hellwig,
	linux-block, maintainer:DEVICE-MAPPER LVM,
	Shaohua Li <shli@kernel.org>,
	linux-raid@vger.kernel.org open list:SOFTWARE RAID Multiple
	DisksSUPPORT, Alasdair Kergon

On Wed, Jan 4, 2017 at 12:43 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, Dec 27 2016 at 10:56am -0500,
> Ming Lei <tom.leiming@gmail.com> wrote:
>
>> For BIO based DM, some targets aren't ready for dealing with
>> bigger incoming bio than 1Mbyte, such as crypt target.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/dm.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 3086da5664f3..6139bf7623f7 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
>>               return -EINVAL;
>>       }
>>
>> -     ti->max_io_len = (uint32_t) len;
>> +     /*
>> +      * BIO based queue uses its own splitting. When multipage bvecs
>> +      * is switched on, size of the incoming bio may be too big to
>> +      * be handled in some targets, such as crypt.
>> +      *
>> +      * When these targets are ready for the big bio, we can remove
>> +      * the limit.
>> +      */
>> +     ti->max_io_len = min_t(uint32_t, len,
>> +                            (BIO_MAX_PAGES * PAGE_SIZE));
>>
>>       return 0;
>>  }
>> --
>> 2.7.4
>
> dm_set_target_max_io_len() is already meant to be called by the .ctr
> hook for each DM target.  So why not just have the dm-crypt target (and
> other targets if needed) pass your reduced $len?
>
> That way only targets that need to be fixed (e.g. dm-crypt) impose this
> limit.

Looks better way, and I will do it in V2.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/54] block: support multipage bvec
       [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
                   ` (30 preceding siblings ...)
  2016-12-27 15:56 ` [PATCH v1 31/54] block: introduce bio_segments_all() Ming Lei
@ 2017-01-16  3:19 ` Ming Lei
  2017-01-16 15:18   ` Christoph Hellwig
  31 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2017-01-16  3:19 UTC (permalink / raw)
  To: Jens Axboe, Linux Kernel Mailing List, linux-block
  Cc: Christoph Hellwig, Al Viro, Andrew Morton, Kent Overstreet

Hi Guys,

On Tue, Dec 27, 2016 at 11:55 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi,
>
> This patchset brings multipage bvec into block layer. Basic
> xfstests(-a auto) over virtio-blk/virtio-scsi have been run
> and no regression is found, so it should be good enough
> to show the approach now, and any comments are welcome!
>
> 1) what is multipage bvec?
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in linux kernel for long time.
>
> 2) why is multipage bvec introduced?
>
> Kent proposed the idea[1] first.
>
> As system's RAM becomes much bigger than before, and
> at the same time huge page, transparent huge page and
> memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O.
> On the other hand, from block layer's view, it isn't
> necessary to store intermediate pages into bvec, and
> it is enough to just store the physicallly contiguous
> 'segment'.
>
> Also huge pages are being brought to filesystem[2], we
> can do IO a hugepage a time[3], requires that one bio can
> transfer at least one huge page one time. Turns out it isn't
> flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
> can fit in this case very well.
>
> With multipage bvec:
>
> - bio size can be increased and it should improve some
> high-bandwidth IO case in theory[4].
>
> - Inside block layer, both bio splitting and sg map can
> become more efficient than before by just traversing the
> physically contiguous 'segment' instead of each page.
>
> - there is possibility in future to improve memory footprint
> of bvecs usage.
>
> 3) how is multipage bvec implemented in this patchset?
>
> The 1st 9 patches comment on some special cases. As we saw,
> most of cases are found as safe for multipage bvec,
> only fs/buffer, MD and btrfs need to deal with. Both fs/buffer
> and btrfs are dealt with in the following patches based on some
> new block APIs for multipage bvec.
>
> Given a little more work is involved to cleanup MD, this patchset
> introduces QUEUE_FLAG_NO_MP for them, and this component can still
> see/use singlepage bvec. In the future, once the cleanup is done, the
> flag can be killed.
>
> The 2nd part(23 ~ 54) implements multipage bvec in block:
>
> - put all tricks into bvec/bio/rq iterators, and as far as
> drivers and fs use these standard iterators, they are happy
> with multipage bvec
>
> - bio_for_each_segment_all() changes
> this helper pass pointer of each bvec directly to user, and
> it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
> and bio_for_each_segment_all_mp()) are introduced.
>
> Also convert current bio_for_each_segment_all() into the
> above two.
>
> - bio_clone() changes
> At default bio_clone still clones one new bio in multipage bvec
> way. Also single page version of bio_clone() is introduced
> for some special cases, such as only single page bvec is used
> for the new cloned bio(bio bounce, ...)
>
> - btrfs cleanup
> just three patches for avoiding direct access to bvec table.
>
> These patches can be found in the following git tree:
>
>         https://github.com/ming1/linux/commits/mp-bvec-0.6-v4.10-rc
>
> Thanks Christoph for looking at the early version and providing
> very good suggestions, such as: introduce bio_init_with_vec_table(),
> remove another unnecessary helpers for cleanup and so on.
>
> TODO:
>         - cleanup direct access to bvec table for MD
>
> V1:
>         - against v4.10-rc1 and some cleanup in V0 are in -linus already
>         - handle queue_virt_boundary() in mp bvec change and make NVMe happy
>         - further BTRFS cleanup
>         - remove QUEUE_FLAG_SPLIT_MP
>         - rename for two new helpers of bio_for_each_segment_all()
>         - fix bounce convertion
>         - address comments in V0

Any comments on this version?

BTW, with one fix in the following link:

https://github.com/ming1/linux/commit/e52897a21b4b4c1500cc3686b8392757ebc5bd19

xfstests(ext4, xfs and btrfs) were run and no regression is observed.

Also one new patch is introduced to cover dio over block device:

https://github.com/ming1/linux/commit/58a0f7a7f6afa74cc29d453f9b5d79304c90aa09

Thanks,
Ming

>
> [1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> [2], https://patchwork.kernel.org/patch/9451523/
> [3], http://marc.info/?t=147735447100001&r=1&w=2
> [4], http://marc.info/?l=linux-mm&m=147745525801433&w=2
>
>
> Ming Lei (54):
>   block: drbd: comment on direct access bvec table
>   block: loop: comment on direct access to bvec table
>   kernel/power/swap.c: comment on direct access to bvec table
>   mm: page_io.c: comment on direct access to bvec table
>   fs/buffer: comment on direct access to bvec table
>   f2fs: f2fs_read_end_io: comment on direct access to bvec table
>   bcache: comment on direct access to bvec table
>   block: comment on bio_alloc_pages()
>   block: comment on bio_iov_iter_get_pages()
>   block: introduce flag QUEUE_FLAG_NO_MP
>   md: set NO_MP for request queue of md
>   dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
>   block: comments on bio_for_each_segment[_all]
>   block: introduce multipage/single page bvec helpers
>   block: implement sp version of bvec iterator helpers
>   block: introduce bio_for_each_segment_mp()
>   block: introduce bio_clone_sp()
>   bvec_iter: introduce BVEC_ITER_ALL_INIT
>   block: bounce: avoid direct access to bvec table
>   block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
>   block: introduce bio_can_convert_to_sp()
>   block: bounce: convert multipage bvecs into singlepage
>   bcache: handle bio_clone() & bvec updating for multipage bvecs
>   blk-merge: compute bio->bi_seg_front_size efficiently
>   block: blk-merge: try to make front segments in full size
>   block: blk-merge: remove unnecessary check
>   block: use bio_for_each_segment_mp() to compute segments count
>   block: use bio_for_each_segment_mp() to map sg
>   block: introduce bvec_for_each_sp_bvec()
>   block: bio: introduce single/multi page version of
>     bio_for_each_segment_all()
>   block: introduce bio_segments_all()
>   block: introduce bvec_get_last_sp()
>   block: deal with dirtying pages for multipage bvec
>   block: convert to singe/multi page version of
>     bio_for_each_segment_all()
>   bcache: convert to bio_for_each_segment_all_sp()
>   dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()
>   dm-crypt: convert to bio_for_each_segment_all_sp()
>   md/raid1.c: convert to bio_for_each_segment_all_sp()
>   fs/mpage: convert to bio_for_each_segment_all_sp()
>   fs/direct-io: convert to bio_for_each_segment_all_sp()
>   ext4: convert to bio_for_each_segment_all_sp()
>   xfs: convert to bio_for_each_segment_all_sp()
>   gfs2: convert to bio_for_each_segment_all_sp()
>   f2fs: convert to bio_for_each_segment_all_sp()
>   exofs: convert to bio_for_each_segment_all_sp()
>   fs: crypto: convert to bio_for_each_segment_all_sp()
>   fs/btrfs: convert to bio_for_each_segment_all_sp()
>   fs/block_dev.c: convert to bio_for_each_segment_all_sp()
>   fs/iomap.c: convert to bio_for_each_segment_all_sp()
>   fs/buffer.c: use bvec iterator to truncate the bio
>   btrfs: avoid access to .bi_vcnt directly
>   btrfs: use bvec_get_last_sp to get the last singlepage bvec
>   btrfs: comment on direct access bvec table
>   block: enable multipage bvecs
>
>  block/bio.c                      | 110 +++++++++++++++----
>  block/blk-merge.c                | 227 +++++++++++++++++++++++++++++++--------
>  block/blk-zoned.c                |   5 +-
>  block/bounce.c                   |  75 +++++++++----
>  drivers/block/drbd/drbd_bitmap.c |   1 +
>  drivers/block/loop.c             |   5 +
>  drivers/md/bcache/btree.c        |   4 +-
>  drivers/md/bcache/debug.c        |  30 +++++-
>  drivers/md/bcache/super.c        |   6 ++
>  drivers/md/bcache/util.c         |   7 ++
>  drivers/md/dm-crypt.c            |   4 +-
>  drivers/md/dm.c                  |  11 +-
>  drivers/md/md.c                  |  12 +++
>  drivers/md/raid1.c               |   3 +-
>  fs/block_dev.c                   |   6 +-
>  fs/btrfs/check-integrity.c       |  12 ++-
>  fs/btrfs/compression.c           |  12 ++-
>  fs/btrfs/disk-io.c               |   3 +-
>  fs/btrfs/extent_io.c             |  26 +++--
>  fs/btrfs/extent_io.h             |   1 +
>  fs/btrfs/file-item.c             |   6 +-
>  fs/btrfs/inode.c                 |  34 ++++--
>  fs/btrfs/raid56.c                |   6 +-
>  fs/buffer.c                      |  24 +++--
>  fs/crypto/crypto.c               |   3 +-
>  fs/direct-io.c                   |   4 +-
>  fs/exofs/ore.c                   |   3 +-
>  fs/exofs/ore_raid.c              |   3 +-
>  fs/ext4/page-io.c                |   3 +-
>  fs/ext4/readpage.c               |   3 +-
>  fs/f2fs/data.c                   |  13 ++-
>  fs/gfs2/lops.c                   |   3 +-
>  fs/gfs2/meta_io.c                |   3 +-
>  fs/iomap.c                       |   3 +-
>  fs/mpage.c                       |   3 +-
>  fs/xfs/xfs_aops.c                |   3 +-
>  include/linux/bio.h              | 164 ++++++++++++++++++++++++++--
>  include/linux/blk_types.h        |   6 ++
>  include/linux/blkdev.h           |   2 +
>  include/linux/bvec.h             | 138 ++++++++++++++++++++++--
>  kernel/power/swap.c              |   2 +
>  mm/page_io.c                     |   2 +
>  42 files changed, 829 insertions(+), 162 deletions(-)
>
> --
> 2.7.4
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/54] block: support multipage bvec
  2017-01-16  3:19 ` [PATCH v1 00/54] block: support multipage bvec Ming Lei
@ 2017-01-16 15:18   ` Christoph Hellwig
  2017-01-17  2:40     ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-01-16 15:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Al Viro, Andrew Morton, Kent Overstreet

On Mon, Jan 16, 2017 at 11:19:19AM +0800, Ming Lei wrote:
> Any comments on this version?

We'll need to make sure all drivers can handle multi-page bvecs
before continuing any other work.  Without that the series is a no-go.
Note that in general making a drivers capable of handling multipage
bvecs will clean it up by using new helpers and be worthwhile on it's
own.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/54] block: support multipage bvec
  2017-01-16 15:18   ` Christoph Hellwig
@ 2017-01-17  2:40     ` Ming Lei
  2017-01-17  7:50       ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2017-01-17  2:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block, Al Viro,
	Andrew Morton, Kent Overstreet

On Mon, Jan 16, 2017 at 11:18 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Jan 16, 2017 at 11:19:19AM +0800, Ming Lei wrote:
>> Any comments on this version?
>
> We'll need to make sure all drivers can handle multi-page bvecs
> before continuing any other work.  Without that the series is a no-go.
> Note that in general making a drivers capable of handling multipage
> bvecs will clean it up by using new helpers and be worthwhile on it's
> own.

IMO, the only one left is raid(1/5/10) which can be dealt with by the
"NO_MP" flag. This point can be observed by result from runnig 'git grep':

          $git grep -n -E "bi_vcnt|bi_io_vec" ./

Also this patchset addes comment in cases of direct access to bvec table,
and the cases have been minimized too, and most of them are single
bvec based.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/54] block: support multipage bvec
  2017-01-17  2:40     ` Ming Lei
@ 2017-01-17  7:50       ` Christoph Hellwig
  2017-01-17  8:13         ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-01-17  7:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Al Viro, Andrew Morton, Kent Overstreet

On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote:
> IMO, the only one left is raid(1/5/10) which can be dealt with by the
> "NO_MP" flag.

No, they can't in the long run.  They need to handle the bvecs just
like everyone else.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/54] block: support multipage bvec
  2017-01-17  7:50       ` Christoph Hellwig
@ 2017-01-17  8:13         ` Ming Lei
  0 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2017-01-17  8:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block, Al Viro,
	Andrew Morton, Kent Overstreet

On Tue, Jan 17, 2017 at 3:50 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote:
>> IMO, the only one left is raid(1/5/10) which can be dealt with by the
>> "NO_MP" flag.
>
> No, they can't in the long run.  They need to handle the bvecs just
> like everyone else.

OK, looks raid(1/5/10) becomes blocker for multipage bvec now.
I will study it a bit, and maybe need help from raid guys.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-01-17  8:14 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1482854250-13481-1-git-send-email-tom.leiming@gmail.com>
2016-12-27 15:55 ` [PATCH v1 01/54] block: drbd: comment on direct access bvec table Ming Lei
2016-12-27 15:55 ` [PATCH v1 02/54] block: loop: comment on direct access to " Ming Lei
2016-12-27 15:55 ` [PATCH v1 03/54] kernel/power/swap.c: " Ming Lei
2016-12-27 15:55 ` [PATCH v1 04/54] mm: page_io.c: " Ming Lei
2016-12-27 15:55 ` [PATCH v1 05/54] fs/buffer: " Ming Lei
2016-12-27 15:55 ` [PATCH v1 06/54] f2fs: f2fs_read_end_io: " Ming Lei
2016-12-27 15:55 ` [PATCH v1 07/54] bcache: " Ming Lei
2016-12-30 16:56   ` Coly Li
2016-12-27 15:55 ` [PATCH v1 08/54] block: comment on bio_alloc_pages() Ming Lei
2016-12-30 10:40   ` Coly Li
2016-12-30 11:06   ` Coly Li
2016-12-27 15:55 ` [PATCH v1 09/54] block: comment on bio_iov_iter_get_pages() Ming Lei
2016-12-27 15:55 ` [PATCH v1 10/54] block: introduce flag QUEUE_FLAG_NO_MP Ming Lei
2016-12-27 15:56 ` [PATCH v1 11/54] md: set NO_MP for request queue of md Ming Lei
2016-12-27 15:56 ` [PATCH v1 12/54] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE Ming Lei
2017-01-03 16:43   ` Mike Snitzer
2017-01-06  3:30     ` Ming Lei
2016-12-27 15:56 ` [PATCH v1 13/54] block: comments on bio_for_each_segment[_all] Ming Lei
2016-12-27 15:56 ` [PATCH v1 14/54] block: introduce multipage/single page bvec helpers Ming Lei
2016-12-27 15:56 ` [PATCH v1 15/54] block: implement sp version of bvec iterator helpers Ming Lei
2016-12-27 15:56 ` [PATCH v1 16/54] block: introduce bio_for_each_segment_mp() Ming Lei
2016-12-27 15:56 ` [PATCH v1 17/54] block: introduce bio_clone_sp() Ming Lei
2016-12-27 15:56 ` [PATCH v1 18/54] bvec_iter: introduce BVEC_ITER_ALL_INIT Ming Lei
2016-12-27 15:56 ` [PATCH v1 19/54] block: bounce: avoid direct access to bvec table Ming Lei
2016-12-27 15:56 ` [PATCH v1 20/54] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq Ming Lei
2016-12-27 15:56 ` [PATCH v1 21/54] block: introduce bio_can_convert_to_sp() Ming Lei
2016-12-27 15:56 ` [PATCH v1 22/54] block: bounce: convert multipage bvecs into singlepage Ming Lei
2016-12-27 15:56 ` [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs Ming Lei
2016-12-30 11:01   ` Coly Li
2016-12-31 10:29     ` Ming Lei
2016-12-27 15:56 ` [PATCH v1 24/54] blk-merge: compute bio->bi_seg_front_size efficiently Ming Lei
2016-12-27 15:56 ` [PATCH v1 25/54] block: blk-merge: try to make front segments in full size Ming Lei
2016-12-27 15:56 ` [PATCH v1 26/54] block: blk-merge: remove unnecessary check Ming Lei
2016-12-27 15:56 ` [PATCH v1 27/54] block: use bio_for_each_segment_mp() to compute segments count Ming Lei
2016-12-27 15:56 ` [PATCH v1 28/54] block: use bio_for_each_segment_mp() to map sg Ming Lei
2016-12-27 15:56 ` [PATCH v1 29/54] block: introduce bvec_for_each_sp_bvec() Ming Lei
2016-12-27 15:56 ` [PATCH v1 30/54] block: bio: introduce single/multi page version of bio_for_each_segment_all() Ming Lei
2016-12-27 15:56 ` [PATCH v1 31/54] block: introduce bio_segments_all() Ming Lei
2017-01-16  3:19 ` [PATCH v1 00/54] block: support multipage bvec Ming Lei
2017-01-16 15:18   ` Christoph Hellwig
2017-01-17  2:40     ` Ming Lei
2017-01-17  7:50       ` Christoph Hellwig
2017-01-17  8:13         ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).