linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec
@ 2016-04-05 11:56 Ming Lei
  2016-04-05 11:56 ` [PATCH 01/27] block: bio: introduce 4 helpers for cleanup Ming Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei, Al Viro,
	Andreas Dilger, Andrew Morton, open list:STAGING SUBSYSTEM,
	open list:DEVICE-MAPPER (LVM),
	open list:DRBD DRIVER, Frank Zago, Greg Kroah-Hartman,
	Hannes Reinecke, James Simmons, Jan Kara, Jarod Wilson,
	Jiri Kosina, Joe Perches, John L. Hammond, Julia Lawall,
	Keith Busch, Kent Overstreet,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:MEMORY MANAGEMENT, open list:SUSPEND TO RAM,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:TARGET SUBSYSTEM, open list:LogFS,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	Mike Rapoport, Mike Snitzer, Miklos Szeredi, Minchan Kim,
	Ming Lin, NeilBrown, NeilBrown, Oleg Drokin, Omar Sandoval,
	Rasmus Villemoes, open list:TARGET SUBSYSTEM, Tejun Heo

Hi Guys,

It is always not a good practice to access bio->bi_vcnt and
bio->bi_io_vec from drivers directly. Also this kind of direct
access will cause trouble when converting to multipage bvecs.

The 1st patch introduces the following 4 bio helpers which can be
used inside drivers for avoiding direct access to .bi_vcnt and .bi_io_vec.

	bio_pages()
	bio_is_full()
	bio_get_base_vec()
	bio_set_vec_table()

Both bio_pages() and bio_is_full() can be easy to convert to
multipage bvecs.

For bio_get_base_vec() and bio_set_vec_table(), they are often used
during initializing a new bio or in case of single bvec bio. With the
two new helpers, it becomes quite easy to audit access to .bi_io_vec
and .bi_vcnt.

Most of the other patches use the 4 helpers to clean up most of direct
access to .bi_vcnt and .bi_io_vec from drivers, except for MD and btrfs,
which two subsystems will be done in the future. 

Also bio_add_page() is used in floppy, dm-crypt and fs/logfs to
avoiding direct access to .bi_vcnt & .bi_io_vec.

Thanks,
Ming

Ming Lei (27):
  block: bio: introduce 4 helpers for cleanup
  block: drbd: use bio_get_base_vec() to retrieve the 1st bvec
  block: drbd: remove impossible failure handling
  block: loop: use bio_get_base_vec() to retrive bvec table
  block: pktcdvd: use bio_get_base_vec() to retrive bvec table
  block: floppy: use bio_set_vec_table()
  block: floppy: use bio_add_page()
  staging: lustre: avoid to use bio->bi_vcnt directly
  target: use bio_is_full()
  bcache: debug: avoid to access .bi_io_vec directly
  bcache: io.c: use bio_set_vec_table
  bcache: journal.c: use bio_set_vec_table()
  bcache: movinggc: use bio_set_vec_table()
  bcache: writeback: use bio_set_vec_table()
  bcache: super: use bio_set_vec_table()
  bcache: super: use bio_get_base_vec
  dm: crypt: use bio_add_page()
  dm: dm-io.c: use bio_get_base_vec()
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  dm: dm-bufio.c: use bio_set_vec_table()
  fs: logfs: use bio_set_vec_table()
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  kernel/power/swap.c: use bio_get_base_vec()
  mm: page_io.c: use bio_get_base_vec()

 drivers/block/drbd/drbd_bitmap.c            |   4 +-
 drivers/block/drbd/drbd_receiver.c          |  14 +---
 drivers/block/floppy.c                      |   9 +--
 drivers/block/loop.c                        |   5 +-
 drivers/block/pktcdvd.c                     |   3 +-
 drivers/md/bcache/debug.c                   |  11 ++-
 drivers/md/bcache/io.c                      |   3 +-
 drivers/md/bcache/journal.c                 |   3 +-
 drivers/md/bcache/movinggc.c                |   6 +-
 drivers/md/bcache/super.c                   |  28 +++++---
 drivers/md/bcache/writeback.c               |   4 +-
 drivers/md/dm-bufio.c                       |   3 +-
 drivers/md/dm-crypt.c                       |   8 +--
 drivers/md/dm-io.c                          |   7 +-
 drivers/md/dm.c                             |   3 +-
 drivers/staging/lustre/lustre/llite/lloop.c |   9 +--
 drivers/target/target_core_pscsi.c          |   2 +-
 fs/logfs/dev_bdev.c                         | 107 +++++++++++-----------------
 include/linux/bio.h                         |  28 ++++++++
 kernel/power/swap.c                         |  10 ++-
 mm/page_io.c                                |  18 ++++-
 21 files changed, 156 insertions(+), 129 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-06  0:18   ` Kent Overstreet
  2016-04-05 11:56 ` [PATCH 02/27] block: drbd: use bio_get_base_vec() to retrieve the 1st bvec Ming Lei
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei, Jan Kara,
	Kent Overstreet, Keith Busch, Tejun Heo, Mike Snitzer

Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
firstly it isn't a good practice, secondly it may cause trouble
for converting to multipage bvecs.

So this patches introduces 4 helpers for cleaning up this kind
of usage.

Both bio_pages() and bio_is_full() can be convertd to support
multipage bvecs easily.

For bio_get_base_vec() and bio_set_vec_table(), they are often
used during initializing a new bio or in case of single bvec
bio. With the two new helpers, it becomes easy to audit access
of .bi_io_vec and .bi_vcnt.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 88bc64f..2179bc4 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -310,6 +310,34 @@ static inline void bio_clear_flag(struct bio *bio, unsigned int bit)
 	bio->bi_flags &= ~(1U << bit);
 }
 
+static inline bool bio_is_full(struct bio *bio)
+{
+	WARN_ONCE(bio_flagged(bio, BIO_CLONED), "cloned bio");
+
+	return bio->bi_vcnt >= bio->bi_max_vecs;
+}
+
+static inline struct bio_vec *bio_get_base_vec(struct bio *bio)
+{
+	return __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+}
+
+/* This helper should be used for setting bvec table on a new bio */
+static inline void bio_set_vec_table(struct bio *bio, struct bio_vec *table,
+			      unsigned max_vecs)
+{
+	bio->bi_io_vec = table;
+	bio->bi_max_vecs = max_vecs;
+}
+
+/* For singlepage bvecs, one segment includes one page */
+static inline unsigned bio_pages(struct bio *bio)
+{
+	if (!bio_flagged(bio, BIO_CLONED))
+		return bio->bi_vcnt;
+	return bio_segments(bio);
+}
+
 static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
 {
 	*bv = bio_iovec(bio);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/27] block: drbd: use bio_get_base_vec() to retrieve the 1st bvec
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
  2016-04-05 11:56 ` [PATCH 01/27] block: bio: introduce 4 helpers for cleanup Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 03/27] block: drbd: remove impossible failure handling Ming Lei
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Philipp Reisner, Lars Ellenberg, open list:DRBD DRIVER

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/drbd/drbd_bitmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 92d6fc0..ccbd1e0 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -938,7 +938,9 @@ static void drbd_bm_endio(struct bio *bio)
 	struct drbd_bm_aio_ctx *ctx = bio->bi_private;
 	struct drbd_device *device = ctx->device;
 	struct drbd_bitmap *b = device->bitmap;
-	unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);
+	/* single bvec bio */
+	const struct bio_vec *bvec = bio_get_base_vec(bio);
+	unsigned int idx = bm_page_to_idx(bvec->bv_page);
 
 	if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
 	    !bm_test_page_unchanged(b->bm_pages[idx]))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/27] block: drbd: remove impossible failure handling
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
  2016-04-05 11:56 ` [PATCH 01/27] block: bio: introduce 4 helpers for cleanup Ming Lei
  2016-04-05 11:56 ` [PATCH 02/27] block: drbd: use bio_get_base_vec() to retrieve the 1st bvec Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 12:42   ` Lars Ellenberg
  2016-04-05 11:56 ` [PATCH 04/27] block: loop: use bio_get_base_vec() to retrive bvec table Ming Lei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Philipp Reisner, Lars Ellenberg, open list:DRBD DRIVER

For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/drbd/drbd_receiver.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 050aaa1..1b0ed15 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1465,20 +1465,8 @@ next_bio:
 
 	page_chain_for_each(page) {
 		unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
-		if (!bio_add_page(bio, page, len, 0)) {
-			/* A single page must always be possible!
-			 * But in case it fails anyways,
-			 * we deal with it, and complain (below). */
-			if (bio->bi_vcnt == 0) {
-				drbd_err(device,
-					"bio_add_page failed for len=%u, "
-					"bi_vcnt=0 (bi_sector=%llu)\n",
-					len, (uint64_t)bio->bi_iter.bi_sector);
-				err = -ENOSPC;
-				goto fail;
-			}
+		if (!bio_add_page(bio, page, len, 0))
 			goto next_bio;
-		}
 		data_size -= len;
 		sector += len >> 9;
 		--nr_pages;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/27] block: loop: use bio_get_base_vec() to retrive bvec table
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (2 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 03/27] block: drbd: remove impossible failure handling Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 05/27] block: pktcdvd: " Ming Lei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei, Al Viro,
	Jarod Wilson, Tejun Heo, Miklos Szeredi, NeilBrown

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/loop.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 423f4ca..2a94d3bb 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -477,7 +477,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 		     loff_t pos, bool rw)
 {
 	struct iov_iter iter;
-	struct bio_vec *bvec;
+	const struct bio_vec *bvec;
 	struct bio *bio = cmd->rq->bio;
 	struct file *file = lo->lo_backing_file;
 	int ret;
@@ -485,7 +485,8 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	/* nomerge for loop request queue */
 	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
 
-	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+	/* passed to iterate_bvec() */
+	bvec = bio_get_base_vec(bio);
 	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
 		      bio_segments(bio), blk_rq_bytes(cmd->rq));
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/27] block: pktcdvd: use bio_get_base_vec() to retrive bvec table
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (3 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 04/27] block: loop: use bio_get_base_vec() to retrive bvec table Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 06/27] block: floppy: use bio_set_vec_table() Ming Lei
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei, Jiri Kosina

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/pktcdvd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index d06c62e..8f37435 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1298,7 +1298,8 @@ try_next_bio:
 static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
 	int f;
-	struct bio_vec *bvec = pkt->w_bio->bi_io_vec;
+	/* need to fix this usage after multipage bvecs */
+	struct bio_vec *bvec = bio_get_base_vec(pkt->w_bio);
 
 	bio_reset(pkt->w_bio);
 	pkt->w_bio->bi_iter.bi_sector = pkt->sector;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/27] block: floppy: use bio_set_vec_table()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (4 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 05/27] block: pktcdvd: " Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 13:00   ` Christoph Hellwig
  2016-04-05 11:56 ` [PATCH 07/27] block: floppy: use bio_add_page() Ming Lei
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Jiri Kosina, Hannes Reinecke, NeilBrown, Rasmus Villemoes

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/floppy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 84708a5..b5b0e68 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3811,7 +3811,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 	cbdata.drive = drive;
 
 	bio_init(&bio);
-	bio.bi_io_vec = &bio_vec;
+	bio_set_vec_table(&bio, &bio_vec, 1);
 	bio_vec.bv_page = page;
 	bio_vec.bv_len = size;
 	bio_vec.bv_offset = 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/27] block: floppy: use bio_add_page()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (5 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 06/27] block: floppy: use bio_set_vec_table() Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly Ming Lei
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Jiri Kosina, Rasmus Villemoes, NeilBrown, Hannes Reinecke

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/floppy.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index b5b0e68..a093de0 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3812,17 +3812,14 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 
 	bio_init(&bio);
 	bio_set_vec_table(&bio, &bio_vec, 1);
-	bio_vec.bv_page = page;
-	bio_vec.bv_len = size;
-	bio_vec.bv_offset = 0;
-	bio.bi_vcnt = 1;
-	bio.bi_iter.bi_size = size;
 	bio.bi_bdev = bdev;
 	bio.bi_iter.bi_sector = 0;
 	bio.bi_flags |= (1 << BIO_QUIET);
 	bio.bi_private = &cbdata;
 	bio.bi_end_io = floppy_rb0_cb;
 
+	bio_add_page(&bio, page, size, 0);
+
 	submit_bio(READ, &bio);
 	process_fd_request();
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (6 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 07/27] block: floppy: use bio_add_page() Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 12:59   ` Greg Kroah-Hartman
  2016-04-05 13:01   ` Christoph Hellwig
  2016-04-05 11:56 ` [PATCH 09/27] target: use bio_is_full() Ming Lei
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Oleg Drokin, Andreas Dilger, Greg Kroah-Hartman, John L. Hammond,
	Frank Zago, Mike Rapoport, James Simmons, Kent Overstreet,
	Julia Lawall, Al Viro,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/staging/lustre/lustre/llite/lloop.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/lloop.c b/drivers/staging/lustre/lustre/llite/lloop.c
index b725fc1..67323db 100644
--- a/drivers/staging/lustre/lustre/llite/lloop.c
+++ b/drivers/staging/lustre/lustre/llite/lloop.c
@@ -302,19 +302,20 @@ static unsigned int loop_get_bio(struct lloop_device *lo, struct bio **req)
 	}
 
 	/* TODO: need to split the bio, too bad. */
-	LASSERT(first->bi_vcnt <= LLOOP_MAX_SEGMENTS);
+	LASSERT(bio_pages(first) <= LLOOP_MAX_SEGMENTS);
 
 	rw = first->bi_rw;
 	bio = &lo->lo_bio;
 	while (*bio && (*bio)->bi_rw == rw) {
+		unsigned curr_cnt = bio_pages(*bio);
 		CDEBUG(D_INFO, "bio sector %llu size %u count %u vcnt%u\n",
 		       (unsigned long long)(*bio)->bi_iter.bi_sector,
 		       (*bio)->bi_iter.bi_size,
-		       page_count, (*bio)->bi_vcnt);
-		if (page_count + (*bio)->bi_vcnt > LLOOP_MAX_SEGMENTS)
+		       page_count, curr_cnt);
+		if (page_count + curr_cnt > LLOOP_MAX_SEGMENTS)
 			break;
 
-		page_count += (*bio)->bi_vcnt;
+		page_count += curr_cnt;
 		count++;
 		bio = &(*bio)->bi_next;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/27] target: use bio_is_full()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (7 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 13:02   ` Christoph Hellwig
  2016-04-05 11:56 ` [PATCH 10/27] bcache: debug: avoid to access .bi_io_vec directly Ming Lei
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Nicholas A. Bellinger, open list:TARGET SUBSYSTEM,
	open list:TARGET SUBSYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/target/target_core_pscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index de18790..24906c3 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -951,7 +951,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
 				bio->bi_vcnt, nr_vecs);
 
-			if (bio->bi_vcnt > nr_vecs) {
+			if (bio_is_full(bio)) {
 				pr_debug("PSCSI: Reached bio->bi_vcnt max:"
 					" %d i: %d bio: %p, allocating another"
 					" bio\n", bio->bi_vcnt, i, bio);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/27] bcache: debug: avoid to access .bi_io_vec directly
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (8 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 09/27] target: use bio_is_full() Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 11/27] bcache: io.c: use bio_set_vec_table Ming Lei
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Instead we use standard iterator way to do that.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 8b1f1d5..d1ad49d 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -106,8 +106,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
 	char name[BDEVNAME_SIZE];
 	struct bio *check;
-	struct bio_vec bv, *bv2;
-	struct bvec_iter iter;
+	struct bio_vec bv, cbv, *bv2;
+	struct bvec_iter iter, citer = { 0 };
 	int i;
 
 	check = bio_clone(bio, GFP_NOIO);
@@ -119,9 +119,13 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 
 	submit_bio_wait(READ_SYNC, check);
 
+	citer.bi_size = UINT_MAX;
 	bio_for_each_segment(bv, bio, iter) {
 		void *p1 = kmap_atomic(bv.bv_page);
-		void *p2 = page_address(check->bi_io_vec[iter.bi_idx].bv_page);
+		void *p2;
+
+		cbv = bio_iter_iovec(check, citer);
+		p2 = page_address(cbv.bv_page);
 
 		cache_set_err_on(memcmp(p1 + bv.bv_offset,
 					p2 + bv.bv_offset,
@@ -132,6 +136,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 				 (uint64_t) bio->bi_iter.bi_sector);
 
 		kunmap_atomic(p1);
+		bio_advance_iter(check, &citer, bv.bv_len);
 	}
 
 	bio_for_each_segment_all(bv2, check, i)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/27] bcache: io.c: use bio_set_vec_table
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (9 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 10/27] bcache: debug: avoid to access .bi_io_vec directly Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 12:49   ` Christoph Hellwig
  2016-04-05 11:56 ` [PATCH 12/27] bcache: journal.c: use bio_set_vec_table() Ming Lei
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..1c48462 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -26,8 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
 
 	bio_init(bio);
 	bio->bi_flags		|= BIO_POOL_NONE << BIO_POOL_OFFSET;
-	bio->bi_max_vecs	 = bucket_pages(c);
-	bio->bi_io_vec		 = bio->bi_inline_vecs;
+	bio_set_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));
 
 	return bio;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/27] bcache: journal.c: use bio_set_vec_table()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (10 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 11/27] bcache: io.c: use bio_set_vec_table Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 13/27] bcache: movinggc: " Ming Lei
  2016-04-05 11:56 ` [PATCH 14/27] bcache: writeback: " Ming Lei
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/journal.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 29eba72..bf8924f 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -453,8 +453,7 @@ static void do_journal_discard(struct cache *ca)
 						ca->sb.d[ja->discard_idx]);
 		bio->bi_bdev		= ca->bdev;
 		bio->bi_rw		= REQ_WRITE|REQ_DISCARD;
-		bio->bi_max_vecs	= 1;
-		bio->bi_io_vec		= bio->bi_inline_vecs;
+		bio_set_vec_table(bio, bio->bi_inline_vecs, 1);
 		bio->bi_iter.bi_size	= bucket_bytes(ca);
 		bio->bi_end_io		= journal_discard_endio;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/27] bcache: movinggc: use bio_set_vec_table()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (11 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 12/27] bcache: journal.c: use bio_set_vec_table() Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  2016-04-05 11:56 ` [PATCH 14/27] bcache: writeback: " Ming Lei
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/movinggc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..dbe5af2 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -85,10 +85,10 @@ static void moving_init(struct moving_io *io)
 	bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&io->w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&io->w->key),
-					       PAGE_SECTORS);
 	bio->bi_private		= &io->cl;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
+	bio_set_vec_table(bio, bio->bi_inline_vecs,
+			  DIV_ROUND_UP(KEY_SIZE(&io->w->key),
+			  PAGE_SECTORS));
 	bch_bio_map(bio, NULL);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/27] bcache: writeback: use bio_set_vec_table()
  2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
                   ` (12 preceding siblings ...)
  2016-04-05 11:56 ` [PATCH 13/27] bcache: movinggc: " Ming Lei
@ 2016-04-05 11:56 ` Ming Lei
  13 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-05 11:56 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Boaz Harrosh, Ming Lei,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/writeback.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index b9346cd..49a8f8a 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -112,9 +112,9 @@ static void dirty_init(struct keybuf_key *w)
 		bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS);
 	bio->bi_private		= w;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
+	bio_set_vec_table(bio, bio->bi_inline_vecs,
+			  DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS));
 	bch_bio_map(bio, NULL);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/27] block: drbd: remove impossible failure handling
  2016-04-05 11:56 ` [PATCH 03/27] block: drbd: remove impossible failure handling Ming Lei
@ 2016-04-05 12:42   ` Lars Ellenberg
  0 siblings, 0 replies; 35+ messages in thread
From: Lars Ellenberg @ 2016-04-05 12:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Philipp Reisner, drbd-dev

On Tue, Apr 05, 2016 at 07:56:48PM +0800, Ming Lei wrote:
> For a non-cloned bio, bio_add_page() only returns failure when
> the io vec table is full, but in that case, bio->bi_vcnt can't
> be zero at all.
> 
> So remove the impossible failure handling.

Before the immutable bvecs,
we did in fact see this trigger in the wild.
On "strange" deployments.

But for the current implementation of bio_add_page(),
you are correct, this is impossible now.

Ack.

Thanks,

    Lars

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/27] bcache: io.c: use bio_set_vec_table
  2016-04-05 11:56 ` [PATCH 11/27] bcache: io.c: use bio_set_vec_table Ming Lei
@ 2016-04-05 12:49   ` Christoph Hellwig
  2016-04-05 15:24     ` Ming Lei
  2016-04-06  0:35     ` Kent Overstreet
  0 siblings, 2 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-05 12:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Kent Overstreet, Shaohua Li,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Tue, Apr 05, 2016 at 07:56:56PM +0800, Ming Lei wrote:
> diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
> index 86a0bb8..1c48462 100644
> --- a/drivers/md/bcache/io.c
> +++ b/drivers/md/bcache/io.c
> @@ -26,8 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
>  
>  	bio_init(bio);
>  	bio->bi_flags		|= BIO_POOL_NONE << BIO_POOL_OFFSET;
> -	bio->bi_max_vecs	 = bucket_pages(c);
> -	bio->bi_io_vec		 = bio->bi_inline_vecs;
> +	bio_set_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));

All this bcache code needs to move away from bio_init on a bio
embedded in a driver private structure toward properly using
bio_alloc / bio_alloc_bioset.  That will also fix the crash
with bcache over md that Shaohua reported, so I'd suggest to fast
track this part of the series.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-05 11:56 ` [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly Ming Lei
@ 2016-04-05 12:59   ` Greg Kroah-Hartman
  2016-04-05 13:01   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2016-04-05 12:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, open list:STAGING SUBSYSTEM,
	Christoph Hellwig, Julia Lawall, Andreas Dilger, Boaz Harrosh,
	Oleg Drokin, linux-block, Al Viro, James Simmons,
	John L. Hammond, Frank Zago, Kent Overstreet,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM

On Tue, Apr 05, 2016 at 07:56:53PM +0800, Ming Lei wrote:
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

A bit more of a commit message is always nice :)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/27] block: floppy: use bio_set_vec_table()
  2016-04-05 11:56 ` [PATCH 06/27] block: floppy: use bio_set_vec_table() Ming Lei
@ 2016-04-05 13:00   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-05 13:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Jiri Kosina, Hannes Reinecke, NeilBrown,
	Rasmus Villemoes

Should be switch to bio_alloc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-05 11:56 ` [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly Ming Lei
  2016-04-05 12:59   ` Greg Kroah-Hartman
@ 2016-04-05 13:01   ` Christoph Hellwig
  2016-04-10 14:37     ` James Simmons
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-05 13:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Oleg Drokin, Andreas Dilger, Greg Kroah-Hartman,
	John L. Hammond, Frank Zago, Mike Rapoport, James Simmons,
	Kent Overstreet, Julia Lawall, Al Viro,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM

The lloop driver should be removed entirely - use the loop driver
instead.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/27] target: use bio_is_full()
  2016-04-05 11:56 ` [PATCH 09/27] target: use bio_is_full() Ming Lei
@ 2016-04-05 13:02   ` Christoph Hellwig
  2016-04-07  4:07     ` Ming Lei
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-05 13:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Nicholas A. Bellinger, open list:TARGET SUBSYSTEM,
	open list:TARGET SUBSYSTEM

On Tue, Apr 05, 2016 at 07:56:54PM +0800, Ming Lei wrote:
> +++ b/drivers/target/target_core_pscsi.c
> @@ -951,7 +951,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
>  				bio->bi_vcnt, nr_vecs);
>  
> -			if (bio->bi_vcnt > nr_vecs) {
> +			if (bio_is_full(bio)) {
>  				pr_debug("PSCSI: Reached bio->bi_vcnt max:"
>  					" %d i: %d bio: %p, allocating another"
>  					" bio\n", bio->bi_vcnt, i, bio);

This check should be removed entirely - bio_add_pc_page takes care of
it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/27] bcache: io.c: use bio_set_vec_table
  2016-04-05 12:49   ` Christoph Hellwig
@ 2016-04-05 15:24     ` Ming Lei
  2016-04-05 17:31       ` Christoph Hellwig
  2016-04-06  0:35     ` Kent Overstreet
  1 sibling, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-05 15:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block, Boaz Harrosh,
	Kent Overstreet, Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Tue, Apr 5, 2016 at 8:49 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 05, 2016 at 07:56:56PM +0800, Ming Lei wrote:
>> diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
>> index 86a0bb8..1c48462 100644
>> --- a/drivers/md/bcache/io.c
>> +++ b/drivers/md/bcache/io.c
>> @@ -26,8 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
>>
>>       bio_init(bio);
>>       bio->bi_flags           |= BIO_POOL_NONE << BIO_POOL_OFFSET;
>> -     bio->bi_max_vecs         = bucket_pages(c);
>> -     bio->bi_io_vec           = bio->bi_inline_vecs;
>> +     bio_set_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));
>
> All this bcache code needs to move away from bio_init on a bio
> embedded in a driver private structure toward properly using
> bio_alloc / bio_alloc_bioset.  That will also fix the crash
> with bcache over md that Shaohua reported, so I'd suggest to fast
> track this part of the series.

I suggest to keep this usage for the following reasons:

- bio can be embedded into one biger instance, which is often allocated
dynamically, so one extra allocation for bio can be avoided.

- we should support arbitrary bio size by this way, at least bio_add_page()
supports this usage.  Also code gets lots of simplication with arbitrary bio
size support, such as prio_io(): bcache

BTW, the root cause for bcache crash still isn't clear now because
blk_bio_segment_split() should split big bio into proper size with
all queue's limits. Maybe the max segment limit isn't figured out correctly.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/27] bcache: io.c: use bio_set_vec_table
  2016-04-05 15:24     ` Ming Lei
@ 2016-04-05 17:31       ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-05 17:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Boaz Harrosh, Kent Overstreet, Shaohua Li,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Tue, Apr 05, 2016 at 11:24:30PM +0800, Ming Lei wrote:
> - bio can be embedded into one biger instance, which is often allocated
> dynamically, so one extra allocation for bio can be avoided.

We can also do this the other way around with the bios front_pad,
which avoid the caller poking into bio details.

> - we should support arbitrary bio size by this way, at least bio_add_page()
> supports this usage.  Also code gets lots of simplication with arbitrary bio
> size support, such as prio_io(): bcache

There is no reason for not supporting huge bios in the core bio code,
in fact using bio_kmalloc you can already allocate huges bios
dynamically right now.  Except that you can't really use it, because the
layers below don't expect that.  Bios based drivers expect to be able to
call bio_clone and friends called on bios passed to them, and might
also make assumptions about the max number of bios segments for now.

> BTW, the root cause for bcache crash still isn't clear now because
> blk_bio_segment_split() should split big bio into proper size with
> all queue's limits. Maybe the max segment limit isn't figured out correctly.

The root cause is pretty simple:  The queue limits matter for request
based drivers, which are the only ones getting bios > BIO_MAX_PAGES
except for the buggy bcache use case.  You'll need to either adjust the
limit for all bio based drivers to or get rid of that one magic caller
not playing by the rules.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-05 11:56 ` [PATCH 01/27] block: bio: introduce 4 helpers for cleanup Ming Lei
@ 2016-04-06  0:18   ` Kent Overstreet
  2016-04-06  1:34     ` Ming Lei
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2016-04-06  0:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Boaz Harrosh, Jan Kara, Keith Busch, Tejun Heo, Mike Snitzer

On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
> firstly it isn't a good practice, secondly it may cause trouble
> for converting to multipage bvecs.

"not good practice" is OO bullshit snake oil without more justification. We
don't plaster accessors everywhere without an actual reason.

How would it cause trouble with multipage bvecs?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/27] bcache: io.c: use bio_set_vec_table
  2016-04-05 12:49   ` Christoph Hellwig
  2016-04-05 15:24     ` Ming Lei
@ 2016-04-06  0:35     ` Kent Overstreet
  1 sibling, 0 replies; 35+ messages in thread
From: Kent Overstreet @ 2016-04-06  0:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, Boaz Harrosh,
	Shaohua Li, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Tue, Apr 05, 2016 at 05:49:02AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 05, 2016 at 07:56:56PM +0800, Ming Lei wrote:
> > diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
> > index 86a0bb8..1c48462 100644
> > --- a/drivers/md/bcache/io.c
> > +++ b/drivers/md/bcache/io.c
> > @@ -26,8 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
> >  
> >  	bio_init(bio);
> >  	bio->bi_flags		|= BIO_POOL_NONE << BIO_POOL_OFFSET;
> > -	bio->bi_max_vecs	 = bucket_pages(c);
> > -	bio->bi_io_vec		 = bio->bi_inline_vecs;
> > +	bio_set_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));
> 
> All this bcache code needs to move away from bio_init on a bio
> embedded in a driver private structure toward properly using
> bio_alloc / bio_alloc_bioset.  That will also fix the crash
> with bcache over md that Shaohua reported, so I'd suggest to fast
> track this part of the series.

Why?

bio_init() is a publicly exported function, it's always been one and bcache is
ot the only driver to use it directly.

bios with > BIO_MAX_PAGES bvecs is a separate issue; I would argue that the bug
is in md's queue_limits; it uses blk_set_stacking_limits() which sets
max_segments = USHRT_MAX, which is wrong if it's going to clone the biovec.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-06  0:18   ` Kent Overstreet
@ 2016-04-06  1:34     ` Ming Lei
  2016-04-06  1:46       ` Kent Overstreet
  0 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-06  1:34 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Boaz Harrosh, Jan Kara, Keith Busch,
	Tejun Heo, Mike Snitzer

On Wed, Apr 6, 2016 at 8:18 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
>> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
>> firstly it isn't a good practice, secondly it may cause trouble
>> for converting to multipage bvecs.
>
> "not good practice" is OO bullshit snake oil without more justification. We
> don't plaster accessors everywhere without an actual reason.
>
> How would it cause trouble with multipage bvecs?

Simply speaking, the current drivers may depend on .bi_vcnt for
computing how many page there are in one bio. After multipage bvecs,
it is not true any more. Isn't it a actual reason?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-06  1:34     ` Ming Lei
@ 2016-04-06  1:46       ` Kent Overstreet
  2016-04-06  2:11         ` Ming Lei
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2016-04-06  1:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Boaz Harrosh, Jan Kara, Keith Busch,
	Tejun Heo, Mike Snitzer

On Wed, Apr 06, 2016 at 09:34:34AM +0800, Ming Lei wrote:
> On Wed, Apr 6, 2016 at 8:18 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
> >> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
> >> firstly it isn't a good practice, secondly it may cause trouble
> >> for converting to multipage bvecs.
> >
> > "not good practice" is OO bullshit snake oil without more justification. We
> > don't plaster accessors everywhere without an actual reason.
> >
> > How would it cause trouble with multipage bvecs?
> 
> Simply speaking, the current drivers may depend on .bi_vcnt for
> computing how many page there are in one bio. After multipage bvecs,
> it is not true any more. Isn't it a actual reason?

But it's completely valid to use bi_vcnt for segments, which is what it's always
_really_ meant anyways.

Sometimes you have cases where the meaning of a member changes significantly
enough that you really don't want code using it accidentally anymore - like with
Jens' patches that changed how bi_remaining and bi_cnt work, but after those
patches it really wasn't correct to use those members directly anymore so he
renamed them to prevent that.

I don't buy that that's the case for multipage bvecs - the meaning of bi_vcnt
itself isn't changing (it's just the number of entries in the array!) and it'll
still be possible for code to correctly use it directly.

Same with bio->bi_io_vec, it's still an array of biovecs, that's not changing.
Your helpers are at the wrong level of abstraction.

Also, there isn't a huge number of bi_vcnt references in the kernel anyways -
the immutable biovec work required removing most of them.

Instead of adding these low level accessors, it'd be better to convert code to
higher level helpers (especially bio_add_page()) where applicable.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-06  1:46       ` Kent Overstreet
@ 2016-04-06  2:11         ` Ming Lei
  2016-04-06  2:21           ` Kent Overstreet
  0 siblings, 1 reply; 35+ messages in thread
From: Ming Lei @ 2016-04-06  2:11 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Boaz Harrosh, Jan Kara, Keith Busch,
	Tejun Heo, Mike Snitzer

On Wed, Apr 6, 2016 at 9:46 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Wed, Apr 06, 2016 at 09:34:34AM +0800, Ming Lei wrote:
>> On Wed, Apr 6, 2016 at 8:18 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
>> >> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
>> >> firstly it isn't a good practice, secondly it may cause trouble
>> >> for converting to multipage bvecs.
>> >
>> > "not good practice" is OO bullshit snake oil without more justification. We
>> > don't plaster accessors everywhere without an actual reason.
>> >
>> > How would it cause trouble with multipage bvecs?
>>
>> Simply speaking, the current drivers may depend on .bi_vcnt for
>> computing how many page there are in one bio. After multipage bvecs,
>> it is not true any more. Isn't it a actual reason?
>
> But it's completely valid to use bi_vcnt for segments, which is what it's always
> _really_ meant anyways.

Previously drivers may be confused with segment and page, so they just thought
segment is same with page. The situation will change after multipage bvecs
is introduced.

Drivers may loop over .bi_io_vec and .bi_vcnt for accessing each pages.
(pktcdvd, staging: lustre, raid,...)

It isn't practical to fix all these drivers before introducing multipage bvecs.
Meantime we can't cause regressions with multipage bvecs. But we can
disable multipage bvecs for some insane drivers if they insist on their
misusing.

With these helpers, it is easy to audit drivers about their access to
.bi_vcnt & .bi_io_vec.

>
> Sometimes you have cases where the meaning of a member changes significantly
> enough that you really don't want code using it accidentally anymore - like with
> Jens' patches that changed how bi_remaining and bi_cnt work, but after those
> patches it really wasn't correct to use those members directly anymore so he
> renamed them to prevent that.
>
> I don't buy that that's the case for multipage bvecs - the meaning of bi_vcnt
> itself isn't changing (it's just the number of entries in the array!) and it'll

It depends on view, from driver's view, they have changed significantly enough.

> still be possible for code to correctly use it directly.
>
> Same with bio->bi_io_vec, it's still an array of biovecs, that's not changing.
> Your helpers are at the wrong level of abstraction.
>
> Also, there isn't a huge number of bi_vcnt references in the kernel anyways -
> the immutable biovec work required removing most of them.

After this ptach is applied, only btrfs and md are left with these references.

For btrfs, we still need to audit each usage and try to clean them up.
For md, we can't enable multipage bvecs for them until all these usage
are cleaned up or audited.

>
> Instead of adding these low level accessors, it'd be better to convert code to
> higher level helpers (especially bio_add_page()) where applicable.

That is always the better way to use bio_add_page(). but sometimes
both .bi_vcnt and .bi_io_vec is used not for adding page to bio.



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-06  2:11         ` Ming Lei
@ 2016-04-06  2:21           ` Kent Overstreet
  2016-04-06  4:11             ` Ming Lei
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2016-04-06  2:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Boaz Harrosh, Jan Kara, Keith Busch,
	Tejun Heo, Mike Snitzer

On Wed, Apr 06, 2016 at 10:11:27AM +0800, Ming Lei wrote:
> On Wed, Apr 6, 2016 at 9:46 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Wed, Apr 06, 2016 at 09:34:34AM +0800, Ming Lei wrote:
> >> On Wed, Apr 6, 2016 at 8:18 AM, Kent Overstreet
> >> <kent.overstreet@gmail.com> wrote:
> >> > On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
> >> >> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
> >> >> firstly it isn't a good practice, secondly it may cause trouble
> >> >> for converting to multipage bvecs.
> >> >
> >> > "not good practice" is OO bullshit snake oil without more justification. We
> >> > don't plaster accessors everywhere without an actual reason.
> >> >
> >> > How would it cause trouble with multipage bvecs?
> >>
> >> Simply speaking, the current drivers may depend on .bi_vcnt for
> >> computing how many page there are in one bio. After multipage bvecs,
> >> it is not true any more. Isn't it a actual reason?
> >
> > But it's completely valid to use bi_vcnt for segments, which is what it's always
> > _really_ meant anyways.
> 
> Previously drivers may be confused with segment and page, so they just thought
> segment is same with page. The situation will change after multipage bvecs
> is introduced.
> 
> Drivers may loop over .bi_io_vec and .bi_vcnt for accessing each pages.
> (pktcdvd, staging: lustre, raid,...)
> 
> It isn't practical to fix all these drivers before introducing multipage bvecs.
> Meantime we can't cause regressions with multipage bvecs. But we can
> disable multipage bvecs for some insane drivers if they insist on their
> misusing.

No - it is both practical and IMO _required_ to convert those drivers to
bio_for_each_segment() or bio_for_each_page() as appropriate, before multipage
bvecs.

Especially code that needs pages and segments _has_ to be converted before
multipage bvecs.

If you'll recall looking at my various patch series from way back, especially
around immutable biovecs - most of the work was in converting drivers, not the
actual implementation (and I got rid of a more bi_io_vec/bi_vcnt uses than you
have left, so honestly there's no excuse for not doing it right).

> With these helpers, it is easy to audit drivers about their access to
> .bi_vcnt & .bi_io_vec.

It's easy to grep for those uses now!

> After this ptach is applied, only btrfs and md are left with these references.
> 
> For btrfs, we still need to audit each usage and try to clean them up.
> For md, we can't enable multipage bvecs for them until all these usage
> are cleaned up or audited.

Cleaning up those should be your focus now, not adding these helpers. You don't
need these patches to go in to tell you what needs to be cleaned up, we already
know wha thas to be done.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/27] block: bio: introduce 4 helpers for cleanup
  2016-04-06  2:21           ` Kent Overstreet
@ 2016-04-06  4:11             ` Ming Lei
  0 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-06  4:11 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Boaz Harrosh, Jan Kara, Keith Busch,
	Tejun Heo, Mike Snitzer

On Wed, Apr 6, 2016 at 10:21 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Wed, Apr 06, 2016 at 10:11:27AM +0800, Ming Lei wrote:
>> On Wed, Apr 6, 2016 at 9:46 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Wed, Apr 06, 2016 at 09:34:34AM +0800, Ming Lei wrote:
>> >> On Wed, Apr 6, 2016 at 8:18 AM, Kent Overstreet
>> >> <kent.overstreet@gmail.com> wrote:
>> >> > On Tue, Apr 05, 2016 at 07:56:46PM +0800, Ming Lei wrote:
>> >> >> Some drivers access bio->bi_vcnt and bio->bi_io_vec directly,
>> >> >> firstly it isn't a good practice, secondly it may cause trouble
>> >> >> for converting to multipage bvecs.
>> >> >
>> >> > "not good practice" is OO bullshit snake oil without more justification. We
>> >> > don't plaster accessors everywhere without an actual reason.
>> >> >
>> >> > How would it cause trouble with multipage bvecs?
>> >>
>> >> Simply speaking, the current drivers may depend on .bi_vcnt for
>> >> computing how many page there are in one bio. After multipage bvecs,
>> >> it is not true any more. Isn't it a actual reason?
>> >
>> > But it's completely valid to use bi_vcnt for segments, which is what it's always
>> > _really_ meant anyways.
>>
>> Previously drivers may be confused with segment and page, so they just thought
>> segment is same with page. The situation will change after multipage bvecs
>> is introduced.
>>
>> Drivers may loop over .bi_io_vec and .bi_vcnt for accessing each pages.
>> (pktcdvd, staging: lustre, raid,...)
>>
>> It isn't practical to fix all these drivers before introducing multipage bvecs.
>> Meantime we can't cause regressions with multipage bvecs. But we can
>> disable multipage bvecs for some insane drivers if they insist on their
>> misusing.
>
> No - it is both practical and IMO _required_ to convert those drivers to
> bio_for_each_segment() or bio_for_each_page() as appropriate, before multipage
> bvecs.
>
> Especially code that needs pages and segments _has_ to be converted before
> multipage bvecs.
>
> If you'll recall looking at my various patch series from way back, especially
> around immutable biovecs - most of the work was in converting drivers, not the
> actual implementation (and I got rid of a more bi_io_vec/bi_vcnt uses than you
> have left, so honestly there's no excuse for not doing it right).

Looks your style for new featue is the following way:
- convert all drivers to new interface
- convert core code to new feature and enable it

My style is:
- if driver is easy to convert, then take new interface; othewise just leave it
alone without using new feature
- convert core code to new feature and enable it

I don't want to discuss which way is better.

But my way just introduces change to driver as few as possible, and
I try to avoid regression becasue I don't want to change code hugely
without detailed test.

That is why you can see the change to driver in this patchset is just
a little.

Thanks,

>
>> With these helpers, it is easy to audit drivers about their access to
>> .bi_vcnt & .bi_io_vec.
>
> It's easy to grep for those uses now!
>
>> After this ptach is applied, only btrfs and md are left with these references.
>>
>> For btrfs, we still need to audit each usage and try to clean them up.
>> For md, we can't enable multipage bvecs for them until all these usage
>> are cleaned up or audited.
>
> Cleaning up those should be your focus now, not adding these helpers. You don't
> need these patches to go in to tell you what needs to be cleaned up, we already
> know wha thas to be done.




-- 
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/27] target: use bio_is_full()
  2016-04-05 13:02   ` Christoph Hellwig
@ 2016-04-07  4:07     ` Ming Lei
  0 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-07  4:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block, Boaz Harrosh,
	Nicholas A. Bellinger, open list:TARGET SUBSYSTEM,
	open list:TARGET SUBSYSTEM

On Tue, Apr 5, 2016 at 9:02 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 05, 2016 at 07:56:54PM +0800, Ming Lei wrote:
>> +++ b/drivers/target/target_core_pscsi.c
>> @@ -951,7 +951,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>>                       pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
>>                               bio->bi_vcnt, nr_vecs);
>>
>> -                     if (bio->bi_vcnt > nr_vecs) {
>> +                     if (bio_is_full(bio)) {
>>                               pr_debug("PSCSI: Reached bio->bi_vcnt max:"
>>                                       " %d i: %d bio: %p, allocating another"
>>                                       " bio\n", bio->bi_vcnt, i, bio);
>
> This check should be removed entirely - bio_add_pc_page takes care of
> it.

OK.


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-05 13:01   ` Christoph Hellwig
@ 2016-04-10 14:37     ` James Simmons
  2016-04-10 14:41       ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: James Simmons @ 2016-04-10 14:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, Boaz Harrosh,
	Oleg Drokin, Andreas Dilger, Greg Kroah-Hartman, John L. Hammond,
	Frank Zago, Mike Rapoport, Kent Overstreet, Julia Lawall,
	Al Viro, moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM


> The lloop driver should be removed entirely - use the loop driver
> instead.

I talked with Andreas last week at our annual Lustre users group meeting 
about this. The reason I was told for existance is that some users were
using files on a Lustre file system with the loop back device. The 
performance was really bad at the time so a lloop was developed to 
overcome those limitations. Its been a long time so perhaps its time
to look at the default loop driver again to see if can perform now. If
it doesn't we will go the route of reworking the lloop driver in the
spirit of the cryptoloop device.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-10 14:37     ` James Simmons
@ 2016-04-10 14:41       ` Christoph Hellwig
  2016-04-10 16:02         ` James Simmons
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-10 14:41 UTC (permalink / raw)
  To: James Simmons
  Cc: Christoph Hellwig, Ming Lei, Jens Axboe, linux-kernel,
	linux-block, Boaz Harrosh, Oleg Drokin, Andreas Dilger,
	Greg Kroah-Hartman, John L. Hammond, Frank Zago, Mike Rapoport,
	Kent Overstreet, Julia Lawall, Al Viro,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM

On Sun, Apr 10, 2016 at 03:37:42PM +0100, James Simmons wrote:
> 
> > The lloop driver should be removed entirely - use the loop driver
> > instead.
> 
> I talked with Andreas last week at our annual Lustre users group meeting 
> about this. The reason I was told for existance is that some users were
> using files on a Lustre file system with the loop back device. The 
> performance was really bad at the time so a lloop was developed to 
> overcome those limitations. Its been a long time so perhaps its time
> to look at the default loop driver again to see if can perform now. If
> it doesn't we will go the route of reworking the lloop driver in the
> spirit of the cryptoloop device.

The loop driver now supports using AIO/DIO on any file systems that
implements ->read_iter and ->write_iter. If lustre doesn't support
those or doesn't have proper performance using them it should be
addressed in the file system.

Note that the dio mode in the loop device is not the default and you
need to manually enabled it, keep that in mind when testing.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-10 14:41       ` Christoph Hellwig
@ 2016-04-10 16:02         ` James Simmons
  2016-04-11  3:30           ` Ming Lei
  0 siblings, 1 reply; 35+ messages in thread
From: James Simmons @ 2016-04-10 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, Boaz Harrosh,
	Oleg Drokin, Andreas Dilger, Greg Kroah-Hartman, John L. Hammond,
	Frank Zago, Mike Rapoport, Kent Overstreet, Julia Lawall,
	Al Viro, moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM


> On Sun, Apr 10, 2016 at 03:37:42PM +0100, James Simmons wrote:
> > 
> > > The lloop driver should be removed entirely - use the loop driver
> > > instead.
> > 
> > I talked with Andreas last week at our annual Lustre users group meeting 
> > about this. The reason I was told for existance is that some users were
> > using files on a Lustre file system with the loop back device. The 
> > performance was really bad at the time so a lloop was developed to 
> > overcome those limitations. Its been a long time so perhaps its time
> > to look at the default loop driver again to see if can perform now. If
> > it doesn't we will go the route of reworking the lloop driver in the
> > spirit of the cryptoloop device.
> 
> The loop driver now supports using AIO/DIO on any file systems that
> implements ->read_iter and ->write_iter. If lustre doesn't support
> those or doesn't have proper performance using them it should be
> addressed in the file system.
> 
> Note that the dio mode in the loop device is not the default and you
> need to manually enabled it, keep that in mind when testing.

This is excellent news. The only sad thing is that most lustre users
are running distros that use kernels before the AIO/DIO enhancements
were landed :-( We will have to keep a copy around for those guys. But
first I need to test the performance of the loop back driver this
week before this can be dropped.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly
  2016-04-10 16:02         ` James Simmons
@ 2016-04-11  3:30           ` Ming Lei
  0 siblings, 0 replies; 35+ messages in thread
From: Ming Lei @ 2016-04-11  3:30 UTC (permalink / raw)
  To: James Simmons
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Boaz Harrosh, Oleg Drokin, Andreas Dilger,
	Greg Kroah-Hartman, John L. Hammond, Frank Zago, Mike Rapoport,
	Kent Overstreet, Julia Lawall, Al Viro,
	moderated list:STAGING - LUSTRE PARALLEL FILESYSTEM,
	open list:STAGING SUBSYSTEM

On Mon, Apr 11, 2016 at 12:02 AM, James Simmons <jsimmons@infradead.org> wrote:
>
>> On Sun, Apr 10, 2016 at 03:37:42PM +0100, James Simmons wrote:
>> >
>> > > The lloop driver should be removed entirely - use the loop driver
>> > > instead.
>> >
>> > I talked with Andreas last week at our annual Lustre users group meeting
>> > about this. The reason I was told for existance is that some users were
>> > using files on a Lustre file system with the loop back device. The
>> > performance was really bad at the time so a lloop was developed to
>> > overcome those limitations. Its been a long time so perhaps its time
>> > to look at the default loop driver again to see if can perform now. If
>> > it doesn't we will go the route of reworking the lloop driver in the
>> > spirit of the cryptoloop device.
>>
>> The loop driver now supports using AIO/DIO on any file systems that
>> implements ->read_iter and ->write_iter. If lustre doesn't support
>> those or doesn't have proper performance using them it should be
>> addressed in the file system.
>>
>> Note that the dio mode in the loop device is not the default and you
>> need to manually enabled it, keep that in mind when testing.
>
> This is excellent news. The only sad thing is that most lustre users
> are running distros that use kernels before the AIO/DIO enhancements
> were landed :-( We will have to keep a copy around for those guys. But
> first I need to test the performance of the loop back driver this
> week before this can be dropped.

Considered that this cleanup patch for lustre loop is quite simple and
straightforward, I suggest to keep this cleanup patch as so and do the
dropping in another patchset. Christoph, are you OK with that?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-04-11  3:30 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-05 11:56 [PATCH 00/27] block: cleanup direct access on .bi_vcnt & .bi_io_vec Ming Lei
2016-04-05 11:56 ` [PATCH 01/27] block: bio: introduce 4 helpers for cleanup Ming Lei
2016-04-06  0:18   ` Kent Overstreet
2016-04-06  1:34     ` Ming Lei
2016-04-06  1:46       ` Kent Overstreet
2016-04-06  2:11         ` Ming Lei
2016-04-06  2:21           ` Kent Overstreet
2016-04-06  4:11             ` Ming Lei
2016-04-05 11:56 ` [PATCH 02/27] block: drbd: use bio_get_base_vec() to retrieve the 1st bvec Ming Lei
2016-04-05 11:56 ` [PATCH 03/27] block: drbd: remove impossible failure handling Ming Lei
2016-04-05 12:42   ` Lars Ellenberg
2016-04-05 11:56 ` [PATCH 04/27] block: loop: use bio_get_base_vec() to retrive bvec table Ming Lei
2016-04-05 11:56 ` [PATCH 05/27] block: pktcdvd: " Ming Lei
2016-04-05 11:56 ` [PATCH 06/27] block: floppy: use bio_set_vec_table() Ming Lei
2016-04-05 13:00   ` Christoph Hellwig
2016-04-05 11:56 ` [PATCH 07/27] block: floppy: use bio_add_page() Ming Lei
2016-04-05 11:56 ` [PATCH 08/27] staging: lustre: avoid to use bio->bi_vcnt directly Ming Lei
2016-04-05 12:59   ` Greg Kroah-Hartman
2016-04-05 13:01   ` Christoph Hellwig
2016-04-10 14:37     ` James Simmons
2016-04-10 14:41       ` Christoph Hellwig
2016-04-10 16:02         ` James Simmons
2016-04-11  3:30           ` Ming Lei
2016-04-05 11:56 ` [PATCH 09/27] target: use bio_is_full() Ming Lei
2016-04-05 13:02   ` Christoph Hellwig
2016-04-07  4:07     ` Ming Lei
2016-04-05 11:56 ` [PATCH 10/27] bcache: debug: avoid to access .bi_io_vec directly Ming Lei
2016-04-05 11:56 ` [PATCH 11/27] bcache: io.c: use bio_set_vec_table Ming Lei
2016-04-05 12:49   ` Christoph Hellwig
2016-04-05 15:24     ` Ming Lei
2016-04-05 17:31       ` Christoph Hellwig
2016-04-06  0:35     ` Kent Overstreet
2016-04-05 11:56 ` [PATCH 12/27] bcache: journal.c: use bio_set_vec_table() Ming Lei
2016-04-05 11:56 ` [PATCH 13/27] bcache: movinggc: " Ming Lei
2016-04-05 11:56 ` [PATCH 14/27] bcache: writeback: " Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).