linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] md: cleanup on direct access to bvec table
@ 2017-02-28 15:41 Ming Lei
  2017-02-28 15:41 ` [PATCH v2 01/13] block: introduce bio_segments_all() Ming Lei
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

In MD's resync I/O path, there are lots of direct access to bio's
bvec table. This patchset kills almost all, and the conversion
is quite straightforward. One root cause of direct access to bvec
table is that resync I/O uses the bio's bvec to manage pages.
In V1, as suggested by Shaohua, a new approach is used to manage
these pages for resync I/O, turns out code becomes more clean
and readable.

Once direct access to bvec table in MD is cleaned up, we may make
multipage bvec moving on.

V2:
	- remove the patch for introducing/applying bio_remove_last_page()

V1:
	- allocate page array to manage resync pages

Thanks,
Ming

Ming Lei (13):
  block: introduce bio_segments_all()
  md: raid1/raid10: don't handle failure of bio_add_page()
  md: move two macros into md.h
  md: prepare for managing resync I/O pages in clean way
  md: raid1: simplify r1buf_pool_free()
  md: raid1: don't use bio's vec table to manage resync pages
  md: raid1: retrieve page from pre-allocated resync page array
  md: raid1: use bio helper in process_checks()
  md: raid1: use bio_segments_all()
  md: raid10: refactor code of read reshape's .bi_end_io
  md: raid10: don't use bio's vec table to manage resync pages
  md: raid10: retrieve page from preallocated resync page array
  md: raid10: avoid direct access to bvec table in
    handle_reshape_read_error

 drivers/md/md.h     |  59 ++++++++++++++
 drivers/md/raid1.c  | 140 ++++++++++++++++++---------------
 drivers/md/raid10.c | 220 ++++++++++++++++++++++++++++------------------------
 include/linux/bio.h |   7 ++
 4 files changed, 263 insertions(+), 163 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 01/13] block: introduce bio_segments_all()
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 02/13] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

So that we can replace the direct access to .bi_vcnt.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..3364b3ed90e7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -293,6 +293,13 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
 		bv->bv_len = iter.bi_bvec_done;
 }
 
+static inline unsigned bio_segments_all(struct bio *bio)
+{
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+	return bio->bi_vcnt;
+}
+
 enum bip_flags {
 	BIP_BLOCK_INTEGRITY	= 1 << 0, /* block layer owns integrity data */
 	BIP_MAPPED_INTEGRITY	= 1 << 1, /* ref tag has been remapped */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 02/13] md: raid1/raid10: don't handle failure of bio_add_page()
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
  2017-02-28 15:41 ` [PATCH v2 01/13] block: introduce bio_segments_all() Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 03/13] md: move two macros into md.h Ming Lei
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

All bio_add_page() is for adding one page into resync bio,
which is big enough to hold RESYNC_PAGES pages, and
the current bio_add_page() doesn't check queue limit any more,
so it won't fail at all.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c  | 21 ++++++---------------
 drivers/md/raid10.c | 41 ++++++++++-------------------------------
 2 files changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0628c07dd16d..b3021355c7e2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2903,21 +2903,12 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 			bio = r1_bio->bios[i];
 			if (bio->bi_end_io) {
 				page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
-				if (bio_add_page(bio, page, len, 0) == 0) {
-					/* stop here */
-					bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
-					while (i > 0) {
-						i--;
-						bio = r1_bio->bios[i];
-						if (bio->bi_end_io==NULL)
-							continue;
-						/* remove last page from this bio */
-						bio->bi_vcnt--;
-						bio->bi_iter.bi_size -= len;
-						bio_clear_flag(bio, BIO_SEG_VALID);
-					}
-					goto bio_full;
-				}
+
+				/*
+				 * won't fail because the vec table is big
+				 * enough to hold all these pages
+				 */
+				bio_add_page(bio, page, len, 0);
 			}
 		}
 		nr_sectors += len>>9;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 33f6a535dc1f..ceb3acc793cf 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3435,27 +3435,16 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (len == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
-			struct bio *bio2;
 			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
-			if (bio_add_page(bio, page, len, 0))
-				continue;
-
-			/* stop here */
-			bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
-			for (bio2 = biolist;
-			     bio2 && bio2 != bio;
-			     bio2 = bio2->bi_next) {
-				/* remove last page from this bio */
-				bio2->bi_vcnt--;
-				bio2->bi_iter.bi_size -= len;
-				bio_clear_flag(bio2, BIO_SEG_VALID);
-			}
-			goto bio_full;
+			/*
+			 * won't fail because the vec table is big enough
+			 * to hold all these pages
+			 */
+			bio_add_page(bio, page, len, 0);
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
 	} while (biolist->bi_vcnt < RESYNC_PAGES);
- bio_full:
 	r10_bio->sectors = nr_sectors;
 
 	while (biolist) {
@@ -4528,25 +4517,15 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		if (len > PAGE_SIZE)
 			len = PAGE_SIZE;
 		for (bio = blist; bio ; bio = bio->bi_next) {
-			struct bio *bio2;
-			if (bio_add_page(bio, page, len, 0))
-				continue;
-
-			/* Didn't fit, must stop */
-			for (bio2 = blist;
-			     bio2 && bio2 != bio;
-			     bio2 = bio2->bi_next) {
-				/* Remove last page from this bio */
-				bio2->bi_vcnt--;
-				bio2->bi_iter.bi_size -= len;
-				bio_clear_flag(bio2, BIO_SEG_VALID);
-			}
-			goto bio_full;
+			/*
+			 * won't fail because the vec table is big enough
+			 * to hold all these pages
+			 */
+			bio_add_page(bio, page, len, 0);
 		}
 		sector_nr += len >> 9;
 		nr_sectors += len >> 9;
 	}
-bio_full:
 	rcu_read_unlock();
 	r10_bio->sectors = nr_sectors;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 03/13] md: move two macros into md.h
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
  2017-02-28 15:41 ` [PATCH v2 01/13] block: introduce bio_segments_all() Ming Lei
  2017-02-28 15:41 ` [PATCH v2 02/13] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way Ming Lei
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Both raid1 and raid10 share common resync
block size and page count, so move them into md.h.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.h     | 5 +++++
 drivers/md/raid1.c  | 2 --
 drivers/md/raid10.c | 3 ---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8859cbf84b6..1d63239a1be4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -715,4 +715,9 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
 	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
 		mddev->queue->limits.max_write_same_sectors = 0;
 }
+
+/* Maximum size of each resync request */
+#define RESYNC_BLOCK_SIZE (64*1024)
+#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
+
 #endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b3021355c7e2..25c9172db639 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -91,10 +91,8 @@ static void r1bio_pool_free(void *r1_bio, void *data)
 	kfree(r1_bio);
 }
 
-#define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_DEPTH 32
 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
 #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
 #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ceb3acc793cf..c5f1a117494b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -125,9 +125,6 @@ static void r10bio_pool_free(void *r10_bio, void *data)
 	kfree(r10_bio);
 }
 
-/* Maximum size of each resync request */
-#define RESYNC_BLOCK_SIZE (64*1024)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 /* amount of memory to reserve for resync requests */
 #define RESYNC_WINDOW (1024*1024)
 /* maximum number of concurrent requests, memory permitting */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (2 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 03/13] md: move two macros into md.h Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:30   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free() Ming Lei
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now resync I/O use bio's bec table to manage pages,
this way is very hacky, and may not work any more
once multipage bvec is introduced.

So introduce helpers and new data structure for
managing resync I/O pages more cleanly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1d63239a1be4..b5a638d85cb4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
 #define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 
+/* for managing resync I/O pages */
+struct resync_pages {
+	unsigned	idx;	/* for get/put page from the pool */
+	void		*raid_bio;
+	struct page	*pages[RESYNC_PAGES];
+};
+
+static inline int resync_alloc_pages(struct resync_pages *rp,
+				     gfp_t gfp_flags)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++) {
+		rp->pages[i] = alloc_page(gfp_flags);
+		if (!rp->pages[i])
+			goto out_free;
+	}
+
+	return 0;
+
+ out_free:
+	while (--i >= 0)
+		__free_page(rp->pages[i]);
+	return -ENOMEM;
+}
+
+static inline void resync_free_pages(struct resync_pages *rp)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++)
+		__free_page(rp->pages[i]);
+}
+
+static inline void resync_get_all_pages(struct resync_pages *rp)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++)
+		get_page(rp->pages[i]);
+}
+
+static inline struct page *resync_fetch_page(struct resync_pages *rp)
+{
+	if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
+		return NULL;
+	return rp->pages[rp->idx++];
+}
+
+static inline bool resync_page_available(struct resync_pages *rp)
+{
+	return rp->idx < RESYNC_PAGES;
+}
+
 #endif /* _MD_MD_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (3 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:31   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

This patch gets each page's reference of each bio for resync,
then r1buf_pool_free() gets simplified a lot.

The same policy has been taken in raid10's buf pool allocation/free
too.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 25c9172db639..c442b4657e2f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -139,9 +139,12 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	/* If not user-requests, copy the page pointers to all bios */
 	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
 		for (i=0; i<RESYNC_PAGES ; i++)
-			for (j=1; j<pi->raid_disks; j++)
-				r1_bio->bios[j]->bi_io_vec[i].bv_page =
+			for (j=1; j<pi->raid_disks; j++) {
+				struct page *page =
 					r1_bio->bios[0]->bi_io_vec[i].bv_page;
+				get_page(page);
+				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
+			}
 	}
 
 	r1_bio->master_bio = NULL;
@@ -166,12 +169,8 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
 	struct r1bio *r1bio = __r1_bio;
 
 	for (i = 0; i < RESYNC_PAGES; i++)
-		for (j = pi->raid_disks; j-- ;) {
-			if (j == 0 ||
-			    r1bio->bios[j]->bi_io_vec[i].bv_page !=
-			    r1bio->bios[0]->bi_io_vec[i].bv_page)
-				safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
-		}
+		for (j = pi->raid_disks; j-- ;)
+			safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
 	for (i=0 ; i < pi->raid_disks; i++)
 		bio_put(r1bio->bios[i]);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (4 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free() Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:37   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 07/13] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now we allocate one page array for managing resync pages, instead
of using bio's vec table to do that, and the old way is very hacky
and won't work any more if multipage bvec is enabled.

The introduced cost is that we need to allocate (128 + 16) * raid_disks
bytes per r1_bio, and it is fine because the inflight r1_bio for
resync shouldn't be much, as pointed by Shaohua.

Also the bio_reset() in raid1_sync_request() is removed because
all bios are freshly new now and not necessary to reset any more.

This patch can be thought as a cleanup too

Suggested-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c442b4657e2f..900144f39630 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
 #define raid1_log(md, fmt, args...)				\
 	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
 
+static inline struct resync_pages *get_resync_pages(struct bio *bio)
+{
+	return bio->bi_private;
+}
+
+static inline struct r1bio *get_resync_r1bio(struct bio *bio)
+{
+	return get_resync_pages(bio)->raid_bio;
+}
+
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct pool_info *pi = data;
@@ -104,12 +114,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	struct r1bio *r1_bio;
 	struct bio *bio;
 	int need_pages;
-	int i, j;
+	int j;
+	struct resync_pages *rps;
 
 	r1_bio = r1bio_pool_alloc(gfp_flags, pi);
 	if (!r1_bio)
 		return NULL;
 
+	rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
+		      gfp_flags);
+	if (!rps)
+		goto out_free_r1bio;
+
 	/*
 	 * Allocate bios : 1 for reading, n-1 for writing
 	 */
@@ -129,22 +145,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 		need_pages = pi->raid_disks;
 	else
 		need_pages = 1;
-	for (j = 0; j < need_pages; j++) {
+	for (j = 0; j < pi->raid_disks; j++) {
+		struct resync_pages *rp = &rps[j];
+
 		bio = r1_bio->bios[j];
-		bio->bi_vcnt = RESYNC_PAGES;
-
-		if (bio_alloc_pages(bio, gfp_flags))
-			goto out_free_pages;
-	}
-	/* If not user-requests, copy the page pointers to all bios */
-	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
-		for (i=0; i<RESYNC_PAGES ; i++)
-			for (j=1; j<pi->raid_disks; j++) {
-				struct page *page =
-					r1_bio->bios[0]->bi_io_vec[i].bv_page;
-				get_page(page);
-				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
-			}
+
+		if (j < need_pages) {
+			if (resync_alloc_pages(rp, gfp_flags))
+				goto out_free_pages;
+		} else {
+			memcpy(rp, &rps[0], sizeof(*rp));
+			resync_get_all_pages(rp);
+		}
+
+		rp->idx = 0;
+		rp->raid_bio = r1_bio;
+		bio->bi_private = rp;
 	}
 
 	r1_bio->master_bio = NULL;
@@ -153,11 +169,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 
 out_free_pages:
 	while (--j >= 0)
-		bio_free_pages(r1_bio->bios[j]);
+		resync_free_pages(&rps[j]);
 
 out_free_bio:
 	while (++j < pi->raid_disks)
 		bio_put(r1_bio->bios[j]);
+	kfree(rps);
+
+out_free_r1bio:
 	r1bio_pool_free(r1_bio, data);
 	return NULL;
 }
@@ -165,14 +184,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 static void r1buf_pool_free(void *__r1_bio, void *data)
 {
 	struct pool_info *pi = data;
-	int i,j;
+	int i;
 	struct r1bio *r1bio = __r1_bio;
+	struct resync_pages *rp = NULL;
 
-	for (i = 0; i < RESYNC_PAGES; i++)
-		for (j = pi->raid_disks; j-- ;)
-			safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
-	for (i=0 ; i < pi->raid_disks; i++)
+	for (i = pi->raid_disks; i--; ) {
+		rp = get_resync_pages(r1bio->bios[i]);
+		resync_free_pages(rp);
 		bio_put(r1bio->bios[i]);
+	}
+
+	/* resync pages array stored in the 1st bio's .bi_private */
+	kfree(rp);
 
 	r1bio_pool_free(r1bio, data);
 }
@@ -1849,7 +1872,7 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 
 static void end_sync_read(struct bio *bio)
 {
-	struct r1bio *r1_bio = bio->bi_private;
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
 
 	update_head_pos(r1_bio->read_disk, r1_bio);
 
@@ -1868,7 +1891,7 @@ static void end_sync_read(struct bio *bio)
 static void end_sync_write(struct bio *bio)
 {
 	int uptodate = !bio->bi_error;
-	struct r1bio *r1_bio = bio->bi_private;
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	sector_t first_bad;
@@ -2085,6 +2108,7 @@ static void process_checks(struct r1bio *r1_bio)
 		int size;
 		int error;
 		struct bio *b = r1_bio->bios[i];
+		struct resync_pages *rp = get_resync_pages(b);
 		if (b->bi_end_io != end_sync_read)
 			continue;
 		/* fixup the bio for reuse, but preserve errno */
@@ -2097,7 +2121,8 @@ static void process_checks(struct r1bio *r1_bio)
 			conf->mirrors[i].rdev->data_offset;
 		b->bi_bdev = conf->mirrors[i].rdev->bdev;
 		b->bi_end_io = end_sync_read;
-		b->bi_private = r1_bio;
+		rp->raid_bio = r1_bio;
+		b->bi_private = rp;
 
 		size = b->bi_iter.bi_size;
 		for (j = 0; j < vcnt ; j++) {
@@ -2755,7 +2780,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		struct md_rdev *rdev;
 		bio = r1_bio->bios[i];
-		bio_reset(bio);
 
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
 		if (rdev == NULL ||
@@ -2811,7 +2835,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&rdev->nr_pending);
 			bio->bi_iter.bi_sector = sector_nr + rdev->data_offset;
 			bio->bi_bdev = rdev->bdev;
-			bio->bi_private = r1_bio;
 			if (test_bit(FailFast, &rdev->flags))
 				bio->bi_opf |= MD_FAILFAST;
 		}
@@ -2899,7 +2922,7 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		for (i = 0 ; i < conf->raid_disks * 2; i++) {
 			bio = r1_bio->bios[i];
 			if (bio->bi_end_io) {
-				page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
+				page = resync_fetch_page(get_resync_pages(bio));
 
 				/*
 				 * won't fail because the vec table is big
@@ -2911,8 +2934,8 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
 		sync_blocks -= (len>>9);
-	} while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES);
- bio_full:
+	} while (resync_page_available(r1_bio->bios[disk]->bi_private));
+
 	r1_bio->sectors = nr_sectors;
 
 	if (mddev_is_clustered(mddev) &&
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 07/13] md: raid1: retrieve page from pre-allocated resync page array
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (5 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 08/13] md: raid1: use bio helper in process_checks() Ming Lei
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now one page array is allocated for each resync bio, and we can
retrieve page from this table directly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 900144f39630..d0cb5c026506 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1970,6 +1970,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
+	struct page **pages = get_resync_pages(bio)->pages;
 	sector_t sect = r1_bio->sector;
 	int sectors = r1_bio->sectors;
 	int idx = 0;
@@ -2003,7 +2004,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				 */
 				rdev = conf->mirrors[d].rdev;
 				if (sync_page_io(rdev, sect, s<<9,
-						 bio->bi_io_vec[idx].bv_page,
+						 pages[idx],
 						 REQ_OP_READ, 0, false)) {
 					success = 1;
 					break;
@@ -2058,7 +2059,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
-					    bio->bi_io_vec[idx].bv_page,
+					    pages[idx],
 					    WRITE) == 0) {
 				r1_bio->bios[d]->bi_end_io = NULL;
 				rdev_dec_pending(rdev, mddev);
@@ -2073,7 +2074,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
-					    bio->bi_io_vec[idx].bv_page,
+					    pages[idx],
 					    READ) != 0)
 				atomic_add(s, &rdev->corrected_errors);
 		}
@@ -2149,6 +2150,8 @@ static void process_checks(struct r1bio *r1_bio)
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
 		int error = sbio->bi_error;
+		struct page **ppages = get_resync_pages(pbio)->pages;
+		struct page **spages = get_resync_pages(sbio)->pages;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
@@ -2157,11 +2160,8 @@ static void process_checks(struct r1bio *r1_bio)
 
 		if (!error) {
 			for (j = vcnt; j-- ; ) {
-				struct page *p, *s;
-				p = pbio->bi_io_vec[j].bv_page;
-				s = sbio->bi_io_vec[j].bv_page;
-				if (memcmp(page_address(p),
-					   page_address(s),
+				if (memcmp(page_address(ppages[j]),
+					   page_address(spages[j]),
 					   sbio->bi_io_vec[j].bv_len))
 					break;
 			}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 08/13] md: raid1: use bio helper in process_checks()
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (6 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 07/13] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:39   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 09/13] md: raid1: use bio_segments_all() Ming Lei
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Avoid to direct access to bvec table.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d0cb5c026506..316bd6dd6cc1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2108,6 +2108,7 @@ static void process_checks(struct r1bio *r1_bio)
 		int j;
 		int size;
 		int error;
+		struct bio_vec *bi;
 		struct bio *b = r1_bio->bios[i];
 		struct resync_pages *rp = get_resync_pages(b);
 		if (b->bi_end_io != end_sync_read)
@@ -2126,9 +2127,7 @@ static void process_checks(struct r1bio *r1_bio)
 		b->bi_private = rp;
 
 		size = b->bi_iter.bi_size;
-		for (j = 0; j < vcnt ; j++) {
-			struct bio_vec *bi;
-			bi = &b->bi_io_vec[j];
+		bio_for_each_segment_all(bi, b, j) {
 			bi->bv_offset = 0;
 			if (size > PAGE_SIZE)
 				bi->bv_len = PAGE_SIZE;
@@ -2152,17 +2151,22 @@ static void process_checks(struct r1bio *r1_bio)
 		int error = sbio->bi_error;
 		struct page **ppages = get_resync_pages(pbio)->pages;
 		struct page **spages = get_resync_pages(sbio)->pages;
+		struct bio_vec *bi;
+		int page_len[RESYNC_PAGES];
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
 		/* Now we can 'fixup' the error value */
 		sbio->bi_error = 0;
 
+		bio_for_each_segment_all(bi, sbio, j)
+			page_len[j] = bi->bv_len;
+
 		if (!error) {
 			for (j = vcnt; j-- ; ) {
 				if (memcmp(page_address(ppages[j]),
 					   page_address(spages[j]),
-					   sbio->bi_io_vec[j].bv_len))
+					   page_len[j]))
 					break;
 			}
 		} else
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (7 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 08/13] md: raid1: use bio helper in process_checks() Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:42   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 10/13] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Use this helper, instead of direct access to .bi_vcnt.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 316bd6dd6cc1..7396c99ff7b1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1091,7 +1091,8 @@ static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
 {
 	int i;
 	struct bio_vec *bvec;
-	struct bio_vec *bvecs = kzalloc(bio->bi_vcnt * sizeof(struct bio_vec),
+	unsigned vcnt = bio_segments_all(bio);
+	struct bio_vec *bvecs = kzalloc(vcnt * sizeof(struct bio_vec),
 					GFP_NOIO);
 	if (unlikely(!bvecs))
 		return;
@@ -1107,12 +1108,12 @@ static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
 		kunmap(bvec->bv_page);
 	}
 	r1_bio->behind_bvecs = bvecs;
-	r1_bio->behind_page_count = bio->bi_vcnt;
+	r1_bio->behind_page_count = vcnt;
 	set_bit(R1BIO_BehindIO, &r1_bio->state);
 	return;
 
 do_sync_io:
-	for (i = 0; i < bio->bi_vcnt; i++)
+	for (i = 0; i < vcnt; i++)
 		if (bvecs[i].bv_page)
 			put_page(bvecs[i].bv_page);
 	kfree(bvecs);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 10/13] md: raid10: refactor code of read reshape's .bi_end_io
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (8 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 09/13] md: raid1: use bio_segments_all() Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

reshape read request is a bit special and requires one extra
bio which isn't allocated from r10buf_pool.

Refactor the .bi_end_io for read reshape, so that we can use
raid10's resync page mangement approach easily in the following
patches.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c5f1a117494b..a9ddd4f14008 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1907,17 +1907,9 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 	return err;
 }
 
-static void end_sync_read(struct bio *bio)
+static void __end_sync_read(struct r10bio *r10_bio, struct bio *bio, int d)
 {
-	struct r10bio *r10_bio = bio->bi_private;
 	struct r10conf *conf = r10_bio->mddev->private;
-	int d;
-
-	if (bio == r10_bio->master_bio) {
-		/* this is a reshape read */
-		d = r10_bio->read_slot; /* really the read dev */
-	} else
-		d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
 
 	if (!bio->bi_error)
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
@@ -1941,6 +1933,22 @@ static void end_sync_read(struct bio *bio)
 	}
 }
 
+static void end_sync_read(struct bio *bio)
+{
+	struct r10bio *r10_bio = bio->bi_private;
+	struct r10conf *conf = r10_bio->mddev->private;
+	int d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
+
+	__end_sync_read(r10_bio, bio, d);
+}
+
+static void end_reshape_read(struct bio *bio)
+{
+	struct r10bio *r10_bio = bio->bi_private;
+
+	__end_sync_read(r10_bio, bio, r10_bio->read_slot);
+}
+
 static void end_sync_request(struct r10bio *r10_bio)
 {
 	struct mddev *mddev = r10_bio->mddev;
@@ -4464,7 +4472,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
 			       + rdev->data_offset);
 	read_bio->bi_private = r10_bio;
-	read_bio->bi_end_io = end_sync_read;
+	read_bio->bi_end_io = end_reshape_read;
 	bio_set_op_attrs(read_bio, REQ_OP_READ, 0);
 	read_bio->bi_flags &= (~0UL << BIO_RESET_BITS);
 	read_bio->bi_error = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (9 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 10/13] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:43   ` Shaohua Li
  2017-02-28 15:41 ` [PATCH v2 12/13] md: raid10: retrieve page from preallocated resync page array Ming Lei
  2017-02-28 15:41 ` [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now we allocate one page array for managing resync pages, instead
of using bio's vec table to do that, and the old way is very hacky
and won't work any more if multipage bvec is enabled.

The introduced cost is that we need to allocate (128 + 16) * copies
bytes per r10_bio, and it is fine because the inflight r10_bio for
resync shouldn't be much, as pointed by Shaohua.

Also bio_reset() in raid10_sync_request() and reshape_request()
are removed because all bios are freshly new now in these functions
and not necessary to reset any more.

This patch can be thought as cleanup too.

Suggested-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 125 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 73 insertions(+), 52 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a9ddd4f14008..f887b21332e7 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -110,6 +110,16 @@ static void end_reshape(struct r10conf *conf);
 #define raid10_log(md, fmt, args...)				\
 	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid10 " fmt, ##args); } while (0)
 
+static inline struct resync_pages *get_resync_pages(struct bio *bio)
+{
+	return bio->bi_private;
+}
+
+static inline struct r10bio *get_resync_r10bio(struct bio *bio)
+{
+	return get_resync_pages(bio)->raid_bio;
+}
+
 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct r10conf *conf = data;
@@ -140,11 +150,11 @@ static void r10bio_pool_free(void *r10_bio, void *data)
 static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct r10conf *conf = data;
-	struct page *page;
 	struct r10bio *r10_bio;
 	struct bio *bio;
-	int i, j;
-	int nalloc;
+	int j;
+	int nalloc, nalloc_rp;
+	struct resync_pages *rps;
 
 	r10_bio = r10bio_pool_alloc(gfp_flags, conf);
 	if (!r10_bio)
@@ -156,6 +166,15 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	else
 		nalloc = 2; /* recovery */
 
+	/* allocate once for all bios */
+	if (!conf->have_replacement)
+		nalloc_rp = nalloc;
+	else
+		nalloc_rp = nalloc * 2;
+	rps = kmalloc(sizeof(struct resync_pages) * nalloc_rp, gfp_flags);
+	if (!rps)
+		goto out_free_r10bio;
+
 	/*
 	 * Allocate bios.
 	 */
@@ -175,36 +194,40 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	 * Allocate RESYNC_PAGES data pages and attach them
 	 * where needed.
 	 */
-	for (j = 0 ; j < nalloc; j++) {
+	for (j = 0; j < nalloc; j++) {
 		struct bio *rbio = r10_bio->devs[j].repl_bio;
+		struct resync_pages *rp, *rp_repl;
+
+		rp = &rps[j];
+		if (rbio)
+			rp_repl = &rps[nalloc + j];
+
 		bio = r10_bio->devs[j].bio;
-		for (i = 0; i < RESYNC_PAGES; i++) {
-			if (j > 0 && !test_bit(MD_RECOVERY_SYNC,
-					       &conf->mddev->recovery)) {
-				/* we can share bv_page's during recovery
-				 * and reshape */
-				struct bio *rbio = r10_bio->devs[0].bio;
-				page = rbio->bi_io_vec[i].bv_page;
-				get_page(page);
-			} else
-				page = alloc_page(gfp_flags);
-			if (unlikely(!page))
+
+		if (!j || test_bit(MD_RECOVERY_SYNC,
+				   &conf->mddev->recovery)) {
+			if (resync_alloc_pages(rp, gfp_flags))
 				goto out_free_pages;
+		} else {
+			memcpy(rp, &rps[0], sizeof(*rp));
+			resync_get_all_pages(rp);
+		}
 
-			bio->bi_io_vec[i].bv_page = page;
-			if (rbio)
-				rbio->bi_io_vec[i].bv_page = page;
+		rp->idx = 0;
+		rp->raid_bio = r10_bio;
+		bio->bi_private = rp;
+		if (rbio) {
+			memcpy(rp_repl, rp, sizeof(*rp));
+			rbio->bi_private = rp_repl;
 		}
 	}
 
 	return r10_bio;
 
 out_free_pages:
-	for ( ; i > 0 ; i--)
-		safe_put_page(bio->bi_io_vec[i-1].bv_page);
-	while (j--)
-		for (i = 0; i < RESYNC_PAGES ; i++)
-			safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
+	while (--j >= 0)
+		resync_free_pages(&rps[j * 2]);
+
 	j = 0;
 out_free_bio:
 	for ( ; j < nalloc; j++) {
@@ -213,30 +236,34 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 		if (r10_bio->devs[j].repl_bio)
 			bio_put(r10_bio->devs[j].repl_bio);
 	}
+	kfree(rps);
+out_free_r10bio:
 	r10bio_pool_free(r10_bio, conf);
 	return NULL;
 }
 
 static void r10buf_pool_free(void *__r10_bio, void *data)
 {
-	int i;
 	struct r10conf *conf = data;
 	struct r10bio *r10bio = __r10_bio;
 	int j;
+	struct resync_pages *rp = NULL;
 
-	for (j=0; j < conf->copies; j++) {
+	for (j = conf->copies; j--; ) {
 		struct bio *bio = r10bio->devs[j].bio;
-		if (bio) {
-			for (i = 0; i < RESYNC_PAGES; i++) {
-				safe_put_page(bio->bi_io_vec[i].bv_page);
-				bio->bi_io_vec[i].bv_page = NULL;
-			}
-			bio_put(bio);
-		}
+
+		rp = get_resync_pages(bio);
+		resync_free_pages(rp);
+		bio_put(bio);
+
 		bio = r10bio->devs[j].repl_bio;
 		if (bio)
 			bio_put(bio);
 	}
+
+	/* resync pages array stored in the 1st bio's .bi_private */
+	kfree(rp);
+
 	r10bio_pool_free(r10bio, conf);
 }
 
@@ -1935,7 +1962,7 @@ static void __end_sync_read(struct r10bio *r10_bio, struct bio *bio, int d)
 
 static void end_sync_read(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct r10conf *conf = r10_bio->mddev->private;
 	int d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
 
@@ -1944,6 +1971,7 @@ static void end_sync_read(struct bio *bio)
 
 static void end_reshape_read(struct bio *bio)
 {
+	/* reshape read bio isn't allocated from r10buf_pool */
 	struct r10bio *r10_bio = bio->bi_private;
 
 	__end_sync_read(r10_bio, bio, r10_bio->read_slot);
@@ -1978,7 +2006,7 @@ static void end_sync_request(struct r10bio *r10_bio)
 
 static void end_sync_write(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
 	int d;
@@ -2058,6 +2086,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	for (i=0 ; i < conf->copies ; i++) {
 		int  j, d;
 		struct md_rdev *rdev;
+		struct resync_pages *rp;
 
 		tbio = r10_bio->devs[i].bio;
 
@@ -2099,11 +2128,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		 * First we need to fixup bv_offset, bv_len and
 		 * bi_vecs, as the read request might have corrupted these
 		 */
+		rp = get_resync_pages(tbio);
 		bio_reset(tbio);
 
 		tbio->bi_vcnt = vcnt;
 		tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
-		tbio->bi_private = r10_bio;
+		rp->raid_bio = r10_bio;
+		tbio->bi_private = rp;
 		tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
 		tbio->bi_end_io = end_sync_write;
 		bio_set_op_attrs(tbio, REQ_OP_WRITE, 0);
@@ -3171,10 +3202,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					}
 				}
 				bio = r10_bio->devs[0].bio;
-				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
-				bio->bi_private = r10_bio;
 				bio->bi_end_io = end_sync_read;
 				bio_set_op_attrs(bio, REQ_OP_READ, 0);
 				if (test_bit(FailFast, &rdev->flags))
@@ -3198,10 +3227,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 
 				if (!test_bit(In_sync, &mrdev->flags)) {
 					bio = r10_bio->devs[1].bio;
-					bio_reset(bio);
 					bio->bi_next = biolist;
 					biolist = bio;
-					bio->bi_private = r10_bio;
 					bio->bi_end_io = end_sync_write;
 					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 					bio->bi_iter.bi_sector = to_addr
@@ -3226,10 +3253,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				if (mreplace == NULL || bio == NULL ||
 				    test_bit(Faulty, &mreplace->flags))
 					break;
-				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
-				bio->bi_private = r10_bio;
 				bio->bi_end_io = end_sync_write;
 				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 				bio->bi_iter.bi_sector = to_addr +
@@ -3351,7 +3376,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				r10_bio->devs[i].repl_bio->bi_end_io = NULL;
 
 			bio = r10_bio->devs[i].bio;
-			bio_reset(bio);
 			bio->bi_error = -EIO;
 			rcu_read_lock();
 			rdev = rcu_dereference(conf->mirrors[d].rdev);
@@ -3376,7 +3400,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&r10_bio->remaining);
 			bio->bi_next = biolist;
 			biolist = bio;
-			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_read;
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3395,13 +3418,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 
 			/* Need to set up for writing to the replacement */
 			bio = r10_bio->devs[i].repl_bio;
-			bio_reset(bio);
 			bio->bi_error = -EIO;
 
 			sector = r10_bio->devs[i].addr;
 			bio->bi_next = biolist;
 			biolist = bio;
-			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_write;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3440,7 +3461,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (len == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
-			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
+			page = resync_fetch_page(get_resync_pages(bio));
 			/*
 			 * won't fail because the vec table is big enough
 			 * to hold all these pages
@@ -3449,7 +3470,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
-	} while (biolist->bi_vcnt < RESYNC_PAGES);
+	} while (resync_page_available(get_resync_pages(biolist)));
 	r10_bio->sectors = nr_sectors;
 
 	while (biolist) {
@@ -3457,7 +3478,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		biolist = biolist->bi_next;
 
 		bio->bi_next = NULL;
-		r10_bio = bio->bi_private;
+		r10_bio = get_resync_r10bio(bio);
 		r10_bio->sectors = nr_sectors;
 
 		if (bio->bi_end_io == end_sync_read) {
@@ -4352,6 +4373,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	struct bio *blist;
 	struct bio *bio, *read_bio;
 	int sectors_done = 0;
+	struct page **pages;
 
 	if (sector_nr == 0) {
 		/* If restarting in the middle, skip the initial sectors */
@@ -4502,11 +4524,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		if (!rdev2 || test_bit(Faulty, &rdev2->flags))
 			continue;
 
-		bio_reset(b);
 		b->bi_bdev = rdev2->bdev;
 		b->bi_iter.bi_sector = r10_bio->devs[s/2].addr +
 			rdev2->new_data_offset;
-		b->bi_private = r10_bio;
 		b->bi_end_io = end_reshape_write;
 		bio_set_op_attrs(b, REQ_OP_WRITE, 0);
 		b->bi_next = blist;
@@ -4516,8 +4536,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	/* Now add as many pages as possible to all of these bios. */
 
 	nr_sectors = 0;
+	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
 	for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
-		struct page *page = r10_bio->devs[0].bio->bi_io_vec[s/(PAGE_SIZE>>9)].bv_page;
+		struct page *page = pages[s / (PAGE_SIZE >> 9)];
 		int len = (max_sectors - s) << 9;
 		if (len > PAGE_SIZE)
 			len = PAGE_SIZE;
@@ -4701,7 +4722,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 
 static void end_reshape_write(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
 	int d;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 12/13] md: raid10: retrieve page from preallocated resync page array
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (10 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 15:41 ` [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
  12 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now one page array is allocated for each resync bio, and we can
retrieve page from this table directly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f887b21332e7..0b97631e3905 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2065,6 +2065,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	int i, first;
 	struct bio *tbio, *fbio;
 	int vcnt;
+	struct page **tpages, **fpages;
 
 	atomic_set(&r10_bio->remaining, 1);
 
@@ -2080,6 +2081,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	fbio = r10_bio->devs[i].bio;
 	fbio->bi_iter.bi_size = r10_bio->sectors << 9;
 	fbio->bi_iter.bi_idx = 0;
+	fpages = get_resync_pages(fbio)->pages;
 
 	vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
 	/* now find blocks with errors */
@@ -2094,6 +2096,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 			continue;
 		if (i == first)
 			continue;
+
+		tpages = get_resync_pages(tbio)->pages;
 		d = r10_bio->devs[i].devnum;
 		rdev = conf->mirrors[d].rdev;
 		if (!r10_bio->devs[i].bio->bi_error) {
@@ -2106,8 +2110,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 				int len = PAGE_SIZE;
 				if (sectors < (len / 512))
 					len = sectors * 512;
-				if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
-					   page_address(tbio->bi_io_vec[j].bv_page),
+				if (memcmp(page_address(fpages[j]),
+					   page_address(tpages[j]),
 					   len))
 					break;
 				sectors -= len/512;
@@ -2205,6 +2209,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 	int idx = 0;
 	int dr = r10_bio->devs[0].devnum;
 	int dw = r10_bio->devs[1].devnum;
+	struct page **pages = get_resync_pages(bio)->pages;
 
 	while (sectors) {
 		int s = sectors;
@@ -2220,7 +2225,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 		ok = sync_page_io(rdev,
 				  addr,
 				  s << 9,
-				  bio->bi_io_vec[idx].bv_page,
+				  pages[idx],
 				  REQ_OP_READ, 0, false);
 		if (ok) {
 			rdev = conf->mirrors[dw].rdev;
@@ -2228,7 +2233,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 			ok = sync_page_io(rdev,
 					  addr,
 					  s << 9,
-					  bio->bi_io_vec[idx].bv_page,
+					  pages[idx],
 					  REQ_OP_WRITE, 0, false);
 			if (!ok) {
 				set_bit(WriteErrorSeen, &rdev->flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (11 preceding siblings ...)
  2017-02-28 15:41 ` [PATCH v2 12/13] md: raid10: retrieve page from preallocated resync page array Ming Lei
@ 2017-02-28 15:41 ` Ming Lei
  2017-02-28 23:46   ` Shaohua Li
  12 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-02-28 15:41 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

The cost is 128bytes(8*16) stack space in kernel thread context, and
just use the bio helper to retrieve pages from bio.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0b97631e3905..6ffb64ab45f8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4670,7 +4670,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
 	struct r10bio *r10b = &on_stack.r10_bio;
 	int slot = 0;
 	int idx = 0;
-	struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
+	struct bio_vec *bvl;
+	struct page *pages[RESYNC_PAGES];
+
+	/*
+	 * This bio is allocated in reshape_request(), and size
+	 * is still RESYNC_PAGES
+	 */
+	bio_for_each_segment_all(bvl, r10_bio->master_bio, idx)
+		pages[idx] = bvl->bv_page;
 
 	r10b->sector = r10_bio->sector;
 	__raid10_find_phys(&conf->prev, r10b);
@@ -4699,7 +4707,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 			success = sync_page_io(rdev,
 					       addr,
 					       s << 9,
-					       bvec[idx].bv_page,
+					       pages[idx],
 					       REQ_OP_READ, 0, false);
 			rdev_dec_pending(rdev, mddev);
 			rcu_read_lock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
  2017-02-28 15:41 ` [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way Ming Lei
@ 2017-02-28 23:30   ` Shaohua Li
  2017-03-02  2:09     ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:30 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote:
> Now resync I/O use bio's bec table to manage pages,
> this way is very hacky, and may not work any more
> once multipage bvec is introduced.
> 
> So introduce helpers and new data structure for
> managing resync I/O pages more cleanly.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1d63239a1be4..b5a638d85cb4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>  #define RESYNC_BLOCK_SIZE (64*1024)
>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  
> +/* for managing resync I/O pages */
> +struct resync_pages {
> +	unsigned	idx;	/* for get/put page from the pool */
> +	void		*raid_bio;
> +	struct page	*pages[RESYNC_PAGES];
> +};

I'd like this to be embedded into r1bio directly. Not sure if we really need a
structure.

> +
> +static inline int resync_alloc_pages(struct resync_pages *rp,
> +				     gfp_t gfp_flags)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++) {
> +		rp->pages[i] = alloc_page(gfp_flags);
> +		if (!rp->pages[i])
> +			goto out_free;
> +	}
> +
> +	return 0;
> +
> + out_free:
> +	while (--i >= 0)
> +		__free_page(rp->pages[i]);
> +	return -ENOMEM;
> +}
> +
> +static inline void resync_free_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		__free_page(rp->pages[i]);

Since we will use get_page, shouldn't this be put_page?

> +}
> +
> +static inline void resync_get_all_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		get_page(rp->pages[i]);
> +}
> +
> +static inline struct page *resync_fetch_page(struct resync_pages *rp)
> +{
> +	if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
> +		return NULL;
> +	return rp->pages[rp->idx++];

I'd like the caller explicitly specify the index instead of a global variable
to track it, which will make the code more understandable and less error prone.

> +}
> +
> +static inline bool resync_page_available(struct resync_pages *rp)
> +{
> +	return rp->idx < RESYNC_PAGES;
> +}

Then we don't need this obscure API.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()
  2017-02-28 15:41 ` [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free() Ming Lei
@ 2017-02-28 23:31   ` Shaohua Li
  2017-03-02  2:11     ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote:
> This patch gets each page's reference of each bio for resync,
> then r1buf_pool_free() gets simplified a lot.
> 
> The same policy has been taken in raid10's buf pool allocation/free
> too.

We are going to delete the code, this simplify isn't really required.

> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 25c9172db639..c442b4657e2f 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -139,9 +139,12 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  	/* If not user-requests, copy the page pointers to all bios */
>  	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
>  		for (i=0; i<RESYNC_PAGES ; i++)
> -			for (j=1; j<pi->raid_disks; j++)
> -				r1_bio->bios[j]->bi_io_vec[i].bv_page =
> +			for (j=1; j<pi->raid_disks; j++) {
> +				struct page *page =
>  					r1_bio->bios[0]->bi_io_vec[i].bv_page;
> +				get_page(page);
> +				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
> +			}
>  	}
>  
>  	r1_bio->master_bio = NULL;
> @@ -166,12 +169,8 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
>  	struct r1bio *r1bio = __r1_bio;
>  
>  	for (i = 0; i < RESYNC_PAGES; i++)
> -		for (j = pi->raid_disks; j-- ;) {
> -			if (j == 0 ||
> -			    r1bio->bios[j]->bi_io_vec[i].bv_page !=
> -			    r1bio->bios[0]->bi_io_vec[i].bv_page)
> -				safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
> -		}
> +		for (j = pi->raid_disks; j-- ;)
> +			safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
>  	for (i=0 ; i < pi->raid_disks; i++)
>  		bio_put(r1bio->bios[i]);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-02-28 15:41 ` [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-02-28 23:37   ` Shaohua Li
  2017-03-02  2:25     ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:37 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote:
> Now we allocate one page array for managing resync pages, instead
> of using bio's vec table to do that, and the old way is very hacky
> and won't work any more if multipage bvec is enabled.
> 
> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> bytes per r1_bio, and it is fine because the inflight r1_bio for
> resync shouldn't be much, as pointed by Shaohua.
> 
> Also the bio_reset() in raid1_sync_request() is removed because
> all bios are freshly new now and not necessary to reset any more.
> 
> This patch can be thought as a cleanup too
> 
> Suggested-by: Shaohua Li <shli@kernel.org>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 53 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index c442b4657e2f..900144f39630 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>  #define raid1_log(md, fmt, args...)				\
>  	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>  
> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +{
> +	return bio->bi_private;
> +}
> +
> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> +{
> +	return get_resync_pages(bio)->raid_bio;
> +}

This is a weird between bio, r1bio and the resync_pages. I'd like the pages are
embedded in r1bio. Maybe a pointer of r1bio to the pages. It's cleaner and more
straightforward.

I think the patch 6, 7 8 should be in a same patch. Otherwise bisect will be broken.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 08/13] md: raid1: use bio helper in process_checks()
  2017-02-28 15:41 ` [PATCH v2 08/13] md: raid1: use bio helper in process_checks() Ming Lei
@ 2017-02-28 23:39   ` Shaohua Li
  0 siblings, 0 replies; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:38PM +0800, Ming Lei wrote:
> Avoid to direct access to bvec table.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d0cb5c026506..316bd6dd6cc1 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2108,6 +2108,7 @@ static void process_checks(struct r1bio *r1_bio)
>  		int j;
>  		int size;
>  		int error;
> +		struct bio_vec *bi;
>  		struct bio *b = r1_bio->bios[i];
>  		struct resync_pages *rp = get_resync_pages(b);
>  		if (b->bi_end_io != end_sync_read)
> @@ -2126,9 +2127,7 @@ static void process_checks(struct r1bio *r1_bio)
>  		b->bi_private = rp;
>  
>  		size = b->bi_iter.bi_size;
> -		for (j = 0; j < vcnt ; j++) {
> -			struct bio_vec *bi;
> -			bi = &b->bi_io_vec[j];
> +		bio_for_each_segment_all(bi, b, j) {
>  			bi->bv_offset = 0;
>  			if (size > PAGE_SIZE)
>  				bi->bv_len = PAGE_SIZE;
> @@ -2152,17 +2151,22 @@ static void process_checks(struct r1bio *r1_bio)
>  		int error = sbio->bi_error;
>  		struct page **ppages = get_resync_pages(pbio)->pages;
>  		struct page **spages = get_resync_pages(sbio)->pages;
> +		struct bio_vec *bi;
> +		int page_len[RESYNC_PAGES];
>  
>  		if (sbio->bi_end_io != end_sync_read)
>  			continue;
>  		/* Now we can 'fixup' the error value */
>  		sbio->bi_error = 0;
>  
> +		bio_for_each_segment_all(bi, sbio, j)
> +			page_len[j] = bi->bv_len;
> +

If we record length for each page in resync_pages (don't need offset, because
it should be 0), we don't need to access the bio_vec too.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-02-28 15:41 ` [PATCH v2 09/13] md: raid1: use bio_segments_all() Ming Lei
@ 2017-02-28 23:42   ` Shaohua Li
  2017-03-02  2:34     ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:42 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
> Use this helper, instead of direct access to .bi_vcnt.

what We really need to do for the behind IO is:
- allocate memory and copy bio data to the memory
- let behind bio do IO against the memory

The behind bio doesn't need to have the exactly same bio_vec setting. If we
just track the new memory, we don't need use the bio_segments_all and access
bio_vec too.

Thanks,
Shaohua
 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 316bd6dd6cc1..7396c99ff7b1 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1091,7 +1091,8 @@ static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
>  {
>  	int i;
>  	struct bio_vec *bvec;
> -	struct bio_vec *bvecs = kzalloc(bio->bi_vcnt * sizeof(struct bio_vec),
> +	unsigned vcnt = bio_segments_all(bio);
> +	struct bio_vec *bvecs = kzalloc(vcnt * sizeof(struct bio_vec),
>  					GFP_NOIO);
>  	if (unlikely(!bvecs))
>  		return;
> @@ -1107,12 +1108,12 @@ static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
>  		kunmap(bvec->bv_page);
>  	}
>  	r1_bio->behind_bvecs = bvecs;
> -	r1_bio->behind_page_count = bio->bi_vcnt;
> +	r1_bio->behind_page_count = vcnt;
>  	set_bit(R1BIO_BehindIO, &r1_bio->state);
>  	return;
>  
>  do_sync_io:
> -	for (i = 0; i < bio->bi_vcnt; i++)
> +	for (i = 0; i < vcnt; i++)
>  		if (bvecs[i].bv_page)
>  			put_page(bvecs[i].bv_page);
>  	kfree(bvecs);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages
  2017-02-28 15:41 ` [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-02-28 23:43   ` Shaohua Li
  0 siblings, 0 replies; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:41PM +0800, Ming Lei wrote:
> Now we allocate one page array for managing resync pages, instead
> of using bio's vec table to do that, and the old way is very hacky
> and won't work any more if multipage bvec is enabled.
> 
> The introduced cost is that we need to allocate (128 + 16) * copies
> bytes per r10_bio, and it is fine because the inflight r10_bio for
> resync shouldn't be much, as pointed by Shaohua.
> 
> Also bio_reset() in raid10_sync_request() and reshape_request()
> are removed because all bios are freshly new now in these functions
> and not necessary to reset any more.
> 
> This patch can be thought as cleanup too.
> 
> Suggested-by: Shaohua Li <shli@kernel.org>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid10.c | 125 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 73 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a9ddd4f14008..f887b21332e7 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -110,6 +110,16 @@ static void end_reshape(struct r10conf *conf);
>  #define raid10_log(md, fmt, args...)				\
>  	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid10 " fmt, ##args); } while (0)
>  
> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +{
> +	return bio->bi_private;
> +}
> +
> +static inline struct r10bio *get_resync_r10bio(struct bio *bio)
> +{
> +	return get_resync_pages(bio)->raid_bio;
> +}

Same applies to raid10 too. embedded the resync_pages into r10bio, possibly a
pointer. Merge the 11, 12, 13 into a single patch.

Thanks,
Shaohua

> +
>  static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
>  {
>  	struct r10conf *conf = data;
> @@ -140,11 +150,11 @@ static void r10bio_pool_free(void *r10_bio, void *data)
>  static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  {
>  	struct r10conf *conf = data;
> -	struct page *page;
>  	struct r10bio *r10_bio;
>  	struct bio *bio;
> -	int i, j;
> -	int nalloc;
> +	int j;
> +	int nalloc, nalloc_rp;
> +	struct resync_pages *rps;
>  
>  	r10_bio = r10bio_pool_alloc(gfp_flags, conf);
>  	if (!r10_bio)
> @@ -156,6 +166,15 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  	else
>  		nalloc = 2; /* recovery */
>  
> +	/* allocate once for all bios */
> +	if (!conf->have_replacement)
> +		nalloc_rp = nalloc;
> +	else
> +		nalloc_rp = nalloc * 2;
> +	rps = kmalloc(sizeof(struct resync_pages) * nalloc_rp, gfp_flags);
> +	if (!rps)
> +		goto out_free_r10bio;
> +
>  	/*
>  	 * Allocate bios.
>  	 */
> @@ -175,36 +194,40 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  	 * Allocate RESYNC_PAGES data pages and attach them
>  	 * where needed.
>  	 */
> -	for (j = 0 ; j < nalloc; j++) {
> +	for (j = 0; j < nalloc; j++) {
>  		struct bio *rbio = r10_bio->devs[j].repl_bio;
> +		struct resync_pages *rp, *rp_repl;
> +
> +		rp = &rps[j];
> +		if (rbio)
> +			rp_repl = &rps[nalloc + j];
> +
>  		bio = r10_bio->devs[j].bio;
> -		for (i = 0; i < RESYNC_PAGES; i++) {
> -			if (j > 0 && !test_bit(MD_RECOVERY_SYNC,
> -					       &conf->mddev->recovery)) {
> -				/* we can share bv_page's during recovery
> -				 * and reshape */
> -				struct bio *rbio = r10_bio->devs[0].bio;
> -				page = rbio->bi_io_vec[i].bv_page;
> -				get_page(page);
> -			} else
> -				page = alloc_page(gfp_flags);
> -			if (unlikely(!page))
> +
> +		if (!j || test_bit(MD_RECOVERY_SYNC,
> +				   &conf->mddev->recovery)) {
> +			if (resync_alloc_pages(rp, gfp_flags))
>  				goto out_free_pages;
> +		} else {
> +			memcpy(rp, &rps[0], sizeof(*rp));
> +			resync_get_all_pages(rp);
> +		}
>  
> -			bio->bi_io_vec[i].bv_page = page;
> -			if (rbio)
> -				rbio->bi_io_vec[i].bv_page = page;
> +		rp->idx = 0;
> +		rp->raid_bio = r10_bio;
> +		bio->bi_private = rp;
> +		if (rbio) {
> +			memcpy(rp_repl, rp, sizeof(*rp));
> +			rbio->bi_private = rp_repl;
>  		}
>  	}
>  
>  	return r10_bio;
>  
>  out_free_pages:
> -	for ( ; i > 0 ; i--)
> -		safe_put_page(bio->bi_io_vec[i-1].bv_page);
> -	while (j--)
> -		for (i = 0; i < RESYNC_PAGES ; i++)
> -			safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
> +	while (--j >= 0)
> +		resync_free_pages(&rps[j * 2]);
> +
>  	j = 0;
>  out_free_bio:
>  	for ( ; j < nalloc; j++) {
> @@ -213,30 +236,34 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  		if (r10_bio->devs[j].repl_bio)
>  			bio_put(r10_bio->devs[j].repl_bio);
>  	}
> +	kfree(rps);
> +out_free_r10bio:
>  	r10bio_pool_free(r10_bio, conf);
>  	return NULL;
>  }
>  
>  static void r10buf_pool_free(void *__r10_bio, void *data)
>  {
> -	int i;
>  	struct r10conf *conf = data;
>  	struct r10bio *r10bio = __r10_bio;
>  	int j;
> +	struct resync_pages *rp = NULL;
>  
> -	for (j=0; j < conf->copies; j++) {
> +	for (j = conf->copies; j--; ) {
>  		struct bio *bio = r10bio->devs[j].bio;
> -		if (bio) {
> -			for (i = 0; i < RESYNC_PAGES; i++) {
> -				safe_put_page(bio->bi_io_vec[i].bv_page);
> -				bio->bi_io_vec[i].bv_page = NULL;
> -			}
> -			bio_put(bio);
> -		}
> +
> +		rp = get_resync_pages(bio);
> +		resync_free_pages(rp);
> +		bio_put(bio);
> +
>  		bio = r10bio->devs[j].repl_bio;
>  		if (bio)
>  			bio_put(bio);
>  	}
> +
> +	/* resync pages array stored in the 1st bio's .bi_private */
> +	kfree(rp);
> +
>  	r10bio_pool_free(r10bio, conf);
>  }
>  
> @@ -1935,7 +1962,7 @@ static void __end_sync_read(struct r10bio *r10_bio, struct bio *bio, int d)
>  
>  static void end_sync_read(struct bio *bio)
>  {
> -	struct r10bio *r10_bio = bio->bi_private;
> +	struct r10bio *r10_bio = get_resync_r10bio(bio);
>  	struct r10conf *conf = r10_bio->mddev->private;
>  	int d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
>  
> @@ -1944,6 +1971,7 @@ static void end_sync_read(struct bio *bio)
>  
>  static void end_reshape_read(struct bio *bio)
>  {
> +	/* reshape read bio isn't allocated from r10buf_pool */
>  	struct r10bio *r10_bio = bio->bi_private;
>  
>  	__end_sync_read(r10_bio, bio, r10_bio->read_slot);
> @@ -1978,7 +2006,7 @@ static void end_sync_request(struct r10bio *r10_bio)
>  
>  static void end_sync_write(struct bio *bio)
>  {
> -	struct r10bio *r10_bio = bio->bi_private;
> +	struct r10bio *r10_bio = get_resync_r10bio(bio);
>  	struct mddev *mddev = r10_bio->mddev;
>  	struct r10conf *conf = mddev->private;
>  	int d;
> @@ -2058,6 +2086,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>  	for (i=0 ; i < conf->copies ; i++) {
>  		int  j, d;
>  		struct md_rdev *rdev;
> +		struct resync_pages *rp;
>  
>  		tbio = r10_bio->devs[i].bio;
>  
> @@ -2099,11 +2128,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>  		 * First we need to fixup bv_offset, bv_len and
>  		 * bi_vecs, as the read request might have corrupted these
>  		 */
> +		rp = get_resync_pages(tbio);
>  		bio_reset(tbio);
>  
>  		tbio->bi_vcnt = vcnt;
>  		tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> -		tbio->bi_private = r10_bio;
> +		rp->raid_bio = r10_bio;
> +		tbio->bi_private = rp;
>  		tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>  		tbio->bi_end_io = end_sync_write;
>  		bio_set_op_attrs(tbio, REQ_OP_WRITE, 0);
> @@ -3171,10 +3202,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  					}
>  				}
>  				bio = r10_bio->devs[0].bio;
> -				bio_reset(bio);
>  				bio->bi_next = biolist;
>  				biolist = bio;
> -				bio->bi_private = r10_bio;
>  				bio->bi_end_io = end_sync_read;
>  				bio_set_op_attrs(bio, REQ_OP_READ, 0);
>  				if (test_bit(FailFast, &rdev->flags))
> @@ -3198,10 +3227,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  
>  				if (!test_bit(In_sync, &mrdev->flags)) {
>  					bio = r10_bio->devs[1].bio;
> -					bio_reset(bio);
>  					bio->bi_next = biolist;
>  					biolist = bio;
> -					bio->bi_private = r10_bio;
>  					bio->bi_end_io = end_sync_write;
>  					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  					bio->bi_iter.bi_sector = to_addr
> @@ -3226,10 +3253,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  				if (mreplace == NULL || bio == NULL ||
>  				    test_bit(Faulty, &mreplace->flags))
>  					break;
> -				bio_reset(bio);
>  				bio->bi_next = biolist;
>  				biolist = bio;
> -				bio->bi_private = r10_bio;
>  				bio->bi_end_io = end_sync_write;
>  				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  				bio->bi_iter.bi_sector = to_addr +
> @@ -3351,7 +3376,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  				r10_bio->devs[i].repl_bio->bi_end_io = NULL;
>  
>  			bio = r10_bio->devs[i].bio;
> -			bio_reset(bio);
>  			bio->bi_error = -EIO;
>  			rcu_read_lock();
>  			rdev = rcu_dereference(conf->mirrors[d].rdev);
> @@ -3376,7 +3400,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  			atomic_inc(&r10_bio->remaining);
>  			bio->bi_next = biolist;
>  			biolist = bio;
> -			bio->bi_private = r10_bio;
>  			bio->bi_end_io = end_sync_read;
>  			bio_set_op_attrs(bio, REQ_OP_READ, 0);
>  			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
> @@ -3395,13 +3418,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  
>  			/* Need to set up for writing to the replacement */
>  			bio = r10_bio->devs[i].repl_bio;
> -			bio_reset(bio);
>  			bio->bi_error = -EIO;
>  
>  			sector = r10_bio->devs[i].addr;
>  			bio->bi_next = biolist;
>  			biolist = bio;
> -			bio->bi_private = r10_bio;
>  			bio->bi_end_io = end_sync_write;
>  			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
> @@ -3440,7 +3461,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  		if (len == 0)
>  			break;
>  		for (bio= biolist ; bio ; bio=bio->bi_next) {
> -			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
> +			page = resync_fetch_page(get_resync_pages(bio));
>  			/*
>  			 * won't fail because the vec table is big enough
>  			 * to hold all these pages
> @@ -3449,7 +3470,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  		}
>  		nr_sectors += len>>9;
>  		sector_nr += len>>9;
> -	} while (biolist->bi_vcnt < RESYNC_PAGES);
> +	} while (resync_page_available(get_resync_pages(biolist)));
>  	r10_bio->sectors = nr_sectors;
>  
>  	while (biolist) {
> @@ -3457,7 +3478,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  		biolist = biolist->bi_next;
>  
>  		bio->bi_next = NULL;
> -		r10_bio = bio->bi_private;
> +		r10_bio = get_resync_r10bio(bio);
>  		r10_bio->sectors = nr_sectors;
>  
>  		if (bio->bi_end_io == end_sync_read) {
> @@ -4352,6 +4373,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>  	struct bio *blist;
>  	struct bio *bio, *read_bio;
>  	int sectors_done = 0;
> +	struct page **pages;
>  
>  	if (sector_nr == 0) {
>  		/* If restarting in the middle, skip the initial sectors */
> @@ -4502,11 +4524,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>  		if (!rdev2 || test_bit(Faulty, &rdev2->flags))
>  			continue;
>  
> -		bio_reset(b);
>  		b->bi_bdev = rdev2->bdev;
>  		b->bi_iter.bi_sector = r10_bio->devs[s/2].addr +
>  			rdev2->new_data_offset;
> -		b->bi_private = r10_bio;
>  		b->bi_end_io = end_reshape_write;
>  		bio_set_op_attrs(b, REQ_OP_WRITE, 0);
>  		b->bi_next = blist;
> @@ -4516,8 +4536,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>  	/* Now add as many pages as possible to all of these bios. */
>  
>  	nr_sectors = 0;
> +	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
>  	for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
> -		struct page *page = r10_bio->devs[0].bio->bi_io_vec[s/(PAGE_SIZE>>9)].bv_page;
> +		struct page *page = pages[s / (PAGE_SIZE >> 9)];
>  		int len = (max_sectors - s) << 9;
>  		if (len > PAGE_SIZE)
>  			len = PAGE_SIZE;
> @@ -4701,7 +4722,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
>  
>  static void end_reshape_write(struct bio *bio)
>  {
> -	struct r10bio *r10_bio = bio->bi_private;
> +	struct r10bio *r10_bio = get_resync_r10bio(bio);
>  	struct mddev *mddev = r10_bio->mddev;
>  	struct r10conf *conf = mddev->private;
>  	int d;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-02-28 15:41 ` [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
@ 2017-02-28 23:46   ` Shaohua Li
  2017-03-02  2:37     ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-02-28 23:46 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Tue, Feb 28, 2017 at 11:41:43PM +0800, Ming Lei wrote:
> The cost is 128bytes(8*16) stack space in kernel thread context, and
> just use the bio helper to retrieve pages from bio.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid10.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 0b97631e3905..6ffb64ab45f8 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4670,7 +4670,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
>  	struct r10bio *r10b = &on_stack.r10_bio;
>  	int slot = 0;
>  	int idx = 0;
> -	struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
> +	struct bio_vec *bvl;
> +	struct page *pages[RESYNC_PAGES];
> +
> +	/*
> +	 * This bio is allocated in reshape_request(), and size
> +	 * is still RESYNC_PAGES
> +	 */
> +	bio_for_each_segment_all(bvl, r10_bio->master_bio, idx)
> +		pages[idx] = bvl->bv_page;

The reshape bio is doing IO against the memory we allocated for r10_bio, I'm
wondering why we can't get the pages from r10_bio. In this way, we don't need
access the bio_vec any more.

Thanks,
Shaohua
 
>  	r10b->sector = r10_bio->sector;
>  	__raid10_find_phys(&conf->prev, r10b);
> @@ -4699,7 +4707,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
>  			success = sync_page_io(rdev,
>  					       addr,
>  					       s << 9,
> -					       bvec[idx].bv_page,
> +					       pages[idx],
>  					       REQ_OP_READ, 0, false);
>  			rdev_dec_pending(rdev, mddev);
>  			rcu_read_lock();
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
  2017-02-28 23:30   ` Shaohua Li
@ 2017-03-02  2:09     ` Ming Lei
  2017-03-02 17:55       ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-02  2:09 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:30 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote:
>> Now resync I/O use bio's bec table to manage pages,
>> this way is very hacky, and may not work any more
>> once multipage bvec is introduced.
>>
>> So introduce helpers and new data structure for
>> managing resync I/O pages more cleanly.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 54 insertions(+)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 1d63239a1be4..b5a638d85cb4 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>>  #define RESYNC_BLOCK_SIZE (64*1024)
>>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>>
>> +/* for managing resync I/O pages */
>> +struct resync_pages {
>> +     unsigned        idx;    /* for get/put page from the pool */
>> +     void            *raid_bio;
>> +     struct page     *pages[RESYNC_PAGES];
>> +};
>
> I'd like this to be embedded into r1bio directly. Not sure if we really need a
> structure.

There are two reasons we can't put this into r1bio:
- r1bio is used in both normal and resync I/O, not fair to allocate more
in normal I/O, and that is why this patch wouldn't like to touch r1bio or r10bio

- the count of 'struct resync_pages' instance depends on raid_disks(raid1)
or copies(raid10), both can't be decided during compiling.

>
>> +
>> +static inline int resync_alloc_pages(struct resync_pages *rp,
>> +                                  gfp_t gfp_flags)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++) {
>> +             rp->pages[i] = alloc_page(gfp_flags);
>> +             if (!rp->pages[i])
>> +                     goto out_free;
>> +     }
>> +
>> +     return 0;
>> +
>> + out_free:
>> +     while (--i >= 0)
>> +             __free_page(rp->pages[i]);
>> +     return -ENOMEM;
>> +}
>> +
>> +static inline void resync_free_pages(struct resync_pages *rp)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++)
>> +             __free_page(rp->pages[i]);
>
> Since we will use get_page, shouldn't this be put_page?

You are right, will fix in v3.

>
>> +}
>> +
>> +static inline void resync_get_all_pages(struct resync_pages *rp)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++)
>> +             get_page(rp->pages[i]);
>> +}
>> +
>> +static inline struct page *resync_fetch_page(struct resync_pages *rp)
>> +{
>> +     if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
>> +             return NULL;
>> +     return rp->pages[rp->idx++];
>
> I'd like the caller explicitly specify the index instead of a global variable
> to track it, which will make the code more understandable and less error prone.

That is fine, but the index has to be per bio, and finally the index
has to be stored
in 'struct resync_pages', so every user has to call it in the following way:

          resync_fetch_page(rp, rp->idx);

then looks no benefit to pass index explicitly.

>
>> +}
>> +
>> +static inline bool resync_page_available(struct resync_pages *rp)
>> +{
>> +     return rp->idx < RESYNC_PAGES;
>> +}
>
> Then we don't need this obscure API.

That is fine.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()
  2017-02-28 23:31   ` Shaohua Li
@ 2017-03-02  2:11     ` Ming Lei
  2017-03-02 17:49       ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-02  2:11 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:31 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote:
>> This patch gets each page's reference of each bio for resync,
>> then r1buf_pool_free() gets simplified a lot.
>>
>> The same policy has been taken in raid10's buf pool allocation/free
>> too.
>
> We are going to delete the code, this simplify isn't really required.

Could you explain it a bit? Or I can put this patch into this patchset if you
can provide one?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-02-28 23:37   ` Shaohua Li
@ 2017-03-02  2:25     ` Ming Lei
  2017-03-02 17:48       ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-02  2:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:37 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote:
>> Now we allocate one page array for managing resync pages, instead
>> of using bio's vec table to do that, and the old way is very hacky
>> and won't work any more if multipage bvec is enabled.
>>
>> The introduced cost is that we need to allocate (128 + 16) * raid_disks
>> bytes per r1_bio, and it is fine because the inflight r1_bio for
>> resync shouldn't be much, as pointed by Shaohua.
>>
>> Also the bio_reset() in raid1_sync_request() is removed because
>> all bios are freshly new now and not necessary to reset any more.
>>
>> This patch can be thought as a cleanup too
>>
>> Suggested-by: Shaohua Li <shli@kernel.org>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 53 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index c442b4657e2f..900144f39630 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>>  #define raid1_log(md, fmt, args...)                          \
>>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>>
>> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
>> +{
>> +     return bio->bi_private;
>> +}
>> +
>> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>> +{
>> +     return get_resync_pages(bio)->raid_bio;
>> +}
>
> This is a weird between bio, r1bio and the resync_pages. I'd like the pages are

It is only a bit weird inside allocating and freeing r1bio, once all
are allocated, you
can see everthing is clean and simple:

    - r1bio includes lots of bioes,
    - and one bio is attached by one resync_pages via .bi_private

> embedded in r1bio. Maybe a pointer of r1bio to the pages. It's cleaner and more
> straightforward.

In theory, the cleanest way is to allocate one resync_pages for each resync bio,
but that isn't efficient. That is why this patch allocates all
resync_pages together
in r1buf_pool_alloc(), and split them into bio.

BTW, the only trick is just that the whole page array is stored in .bi_private
of the 1st bio, then we can avoid to add one pointer into r1bio.

>
> I think the patch 6, 7 8 should be in a same patch. Otherwise bisect will be broken.
>

No, it won't. Both patch 7 and patch 8 just replacing reading from bvec table
with reading from the pre-allocated page array.  Both should be fine, but the
latter is cleaner and simpler to do.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-02-28 23:42   ` Shaohua Li
@ 2017-03-02  2:34     ` Ming Lei
  2017-03-02  7:52       ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-02  2:34 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:42 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
>> Use this helper, instead of direct access to .bi_vcnt.
>
> what We really need to do for the behind IO is:
> - allocate memory and copy bio data to the memory
> - let behind bio do IO against the memory
>
> The behind bio doesn't need to have the exactly same bio_vec setting. If we
> just track the new memory, we don't need use the bio_segments_all and access
> bio_vec too.

But we need to figure out how many vecs(each vec store one page) to be
allocated for the cloned/behind bio, and that is the only value of
bio_segments_all() here. Or you have idea to avoid that?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-02-28 23:46   ` Shaohua Li
@ 2017-03-02  2:37     ` Ming Lei
  2017-03-02  7:47       ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-02  2:37 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:46 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:43PM +0800, Ming Lei wrote:
>> The cost is 128bytes(8*16) stack space in kernel thread context, and
>> just use the bio helper to retrieve pages from bio.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/raid10.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 0b97631e3905..6ffb64ab45f8 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -4670,7 +4670,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
>>       struct r10bio *r10b = &on_stack.r10_bio;
>>       int slot = 0;
>>       int idx = 0;
>> -     struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
>> +     struct bio_vec *bvl;
>> +     struct page *pages[RESYNC_PAGES];
>> +
>> +     /*
>> +      * This bio is allocated in reshape_request(), and size
>> +      * is still RESYNC_PAGES
>> +      */
>> +     bio_for_each_segment_all(bvl, r10_bio->master_bio, idx)
>> +             pages[idx] = bvl->bv_page;
>
> The reshape bio is doing IO against the memory we allocated for r10_bio, I'm
> wondering why we can't get the pages from r10_bio. In this way, we don't need
> access the bio_vec any more.

Reshap read is special and the bio(r10_bio->master_bio) isn't allocated from
.r10buf_pool, please see reshape_request().

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-03-02  2:37     ` Ming Lei
@ 2017-03-02  7:47       ` Shaohua Li
  2017-03-03  2:30         ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-03-02  7:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 02, 2017 at 10:37:49AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 1, 2017 at 7:46 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Tue, Feb 28, 2017 at 11:41:43PM +0800, Ming Lei wrote:
> >> The cost is 128bytes(8*16) stack space in kernel thread context, and
> >> just use the bio helper to retrieve pages from bio.
> >>
> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> ---
> >>  drivers/md/raid10.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >> index 0b97631e3905..6ffb64ab45f8 100644
> >> --- a/drivers/md/raid10.c
> >> +++ b/drivers/md/raid10.c
> >> @@ -4670,7 +4670,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
> >>       struct r10bio *r10b = &on_stack.r10_bio;
> >>       int slot = 0;
> >>       int idx = 0;
> >> -     struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
> >> +     struct bio_vec *bvl;
> >> +     struct page *pages[RESYNC_PAGES];
> >> +
> >> +     /*
> >> +      * This bio is allocated in reshape_request(), and size
> >> +      * is still RESYNC_PAGES
> >> +      */
> >> +     bio_for_each_segment_all(bvl, r10_bio->master_bio, idx)
> >> +             pages[idx] = bvl->bv_page;
> >
> > The reshape bio is doing IO against the memory we allocated for r10_bio, I'm
> > wondering why we can't get the pages from r10_bio. In this way, we don't need
> > access the bio_vec any more.
> 
> Reshap read is special and the bio(r10_bio->master_bio) isn't allocated from
> .r10buf_pool, please see reshape_request().

Why does this matter? The bio is still doing IO to the memory allocated in
r10_bio. bi_private->r10_bio->the pages.

My guess why you think the reshape read is special is the bio->bi_private
doesn't pointer to resync_pages. And since you let resync_pages pointer to
r10_bio and not vice versa, we can't find r10_bio, correct? I think this is
another reason why r10_bio embeds resync_pages (I mean a pointer to
resync_pages). With this way I suggested, the reshape read isn't special at
all. If we allocate memory for r10_bio/r1_bio, we shouldn't need access any bio
bvec.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-03-02  2:34     ` Ming Lei
@ 2017-03-02  7:52       ` Shaohua Li
  2017-03-03  2:20         ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-03-02  7:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 02, 2017 at 10:34:25AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 1, 2017 at 7:42 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
> >> Use this helper, instead of direct access to .bi_vcnt.
> >
> > what We really need to do for the behind IO is:
> > - allocate memory and copy bio data to the memory
> > - let behind bio do IO against the memory
> >
> > The behind bio doesn't need to have the exactly same bio_vec setting. If we
> > just track the new memory, we don't need use the bio_segments_all and access
> > bio_vec too.
> 
> But we need to figure out how many vecs(each vec store one page) to be
> allocated for the cloned/behind bio, and that is the only value of
> bio_segments_all() here. Or you have idea to avoid that?

As I said, the behind bio doesn't need to have the exactly same bio_vec
setting. We just allocate memory and copy original bio data to the memory,
then do IO against the new memory. The behind bio
segments == (bio->bi_iter.bi_size + PAGE_SIZE - 1) >> PAGE_SHIFT

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-03-02  2:25     ` Ming Lei
@ 2017-03-02 17:48       ` Shaohua Li
  2017-03-03  2:11         ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-03-02 17:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 02, 2017 at 10:25:10AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 1, 2017 at 7:37 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote:
> >> Now we allocate one page array for managing resync pages, instead
> >> of using bio's vec table to do that, and the old way is very hacky
> >> and won't work any more if multipage bvec is enabled.
> >>
> >> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> >> bytes per r1_bio, and it is fine because the inflight r1_bio for
> >> resync shouldn't be much, as pointed by Shaohua.
> >>
> >> Also the bio_reset() in raid1_sync_request() is removed because
> >> all bios are freshly new now and not necessary to reset any more.
> >>
> >> This patch can be thought as a cleanup too
> >>
> >> Suggested-by: Shaohua Li <shli@kernel.org>
> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> ---
> >>  drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
> >>  1 file changed, 53 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index c442b4657e2f..900144f39630 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
> >>  #define raid1_log(md, fmt, args...)                          \
> >>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
> >>
> >> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> >> +{
> >> +     return bio->bi_private;
> >> +}
> >> +
> >> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> >> +{
> >> +     return get_resync_pages(bio)->raid_bio;
> >> +}
> >
> > This is a weird between bio, r1bio and the resync_pages. I'd like the pages are
> 
> It is only a bit weird inside allocating and freeing r1bio, once all
> are allocated, you
> can see everthing is clean and simple:
> 
>     - r1bio includes lots of bioes,
>     - and one bio is attached by one resync_pages via .bi_private

I don't how complex to let r1bio pointer to the pages, but that's the nartual
way. r1bio owns the pages, not the pages own r1bio, so we should let r1bio
points to the pages. The bio.bi_private still points to r1bio.

> > embedded in r1bio. Maybe a pointer of r1bio to the pages. It's cleaner and more
> > straightforward.
> 
> In theory, the cleanest way is to allocate one resync_pages for each resync bio,
> but that isn't efficient. That is why this patch allocates all
> resync_pages together
> in r1buf_pool_alloc(), and split them into bio.
> 
> BTW, the only trick is just that the whole page array is stored in .bi_private
> of the 1st bio, then we can avoid to add one pointer into r1bio.

You will need to add pointer in resync_pages which points to r1bio
 
> >
> > I think the patch 6, 7 8 should be in a same patch. Otherwise bisect will be broken.
> >
> 
> No, it won't. Both patch 7 and patch 8 just replacing reading from bvec table
> with reading from the pre-allocated page array.  Both should be fine, but the
> latter is cleaner and simpler to do.

Ok, sounds good, though I doubt it's really worthy.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()
  2017-03-02  2:11     ` Ming Lei
@ 2017-03-02 17:49       ` Shaohua Li
  2017-03-03  1:57         ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-03-02 17:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 02, 2017 at 10:11:54AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 1, 2017 at 7:31 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote:
> >> This patch gets each page's reference of each bio for resync,
> >> then r1buf_pool_free() gets simplified a lot.
> >>
> >> The same policy has been taken in raid10's buf pool allocation/free
> >> too.
> >
> > We are going to delete the code, this simplify isn't really required.
> 
> Could you explain it a bit? Or I can put this patch into this patchset if you
> can provide one?

I mean you will replace the code soon, with the resync_pages stuff. So I
thought this isn't really required.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
  2017-03-02  2:09     ` Ming Lei
@ 2017-03-02 17:55       ` Shaohua Li
  2017-03-03  1:54         ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Shaohua Li @ 2017-03-02 17:55 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 02, 2017 at 10:09:14AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 1, 2017 at 7:30 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote:
> >> Now resync I/O use bio's bec table to manage pages,
> >> this way is very hacky, and may not work any more
> >> once multipage bvec is introduced.
> >>
> >> So introduce helpers and new data structure for
> >> managing resync I/O pages more cleanly.
> >>
> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> ---
> >>  drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 54 insertions(+)
> >>
> >> diff --git a/drivers/md/md.h b/drivers/md/md.h
> >> index 1d63239a1be4..b5a638d85cb4 100644
> >> --- a/drivers/md/md.h
> >> +++ b/drivers/md/md.h
> >> @@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
> >>  #define RESYNC_BLOCK_SIZE (64*1024)
> >>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> >>
> >> +/* for managing resync I/O pages */
> >> +struct resync_pages {
> >> +     unsigned        idx;    /* for get/put page from the pool */
> >> +     void            *raid_bio;
> >> +     struct page     *pages[RESYNC_PAGES];
> >> +};
> >
> > I'd like this to be embedded into r1bio directly. Not sure if we really need a
> > structure.
> 
> There are two reasons we can't put this into r1bio:
> - r1bio is used in both normal and resync I/O, not fair to allocate more
> in normal I/O, and that is why this patch wouldn't like to touch r1bio or r10bio
> 
> - the count of 'struct resync_pages' instance depends on raid_disks(raid1)
> or copies(raid10), both can't be decided during compiling.

Yes, I said it should be a pointer of r1bio pointing to the pages in other emails.
 
> >
> >> +}
> >> +
> >> +static inline bool resync_page_available(struct resync_pages *rp)
> >> +{
> >> +     return rp->idx < RESYNC_PAGES;
> >> +}
> >
> > Then we don't need this obscure API.
> 
> That is fine.
> 
> 
> Thanks,
> Ming Lei
 
> >
> >> +
> >> +static inline int resync_alloc_pages(struct resync_pages *rp,
> >> +                                  gfp_t gfp_flags)
> >> +{
> >> +     int i;
> >> +
> >> +     for (i = 0; i < RESYNC_PAGES; i++) {
> >> +             rp->pages[i] = alloc_page(gfp_flags);
> >> +             if (!rp->pages[i])
> >> +                     goto out_free;
> >> +     }
> >> +
> >> +     return 0;
> >> +
> >> + out_free:
> >> +     while (--i >= 0)
> >> +             __free_page(rp->pages[i]);
> >> +     return -ENOMEM;
> >> +}
> >> +
> >> +static inline void resync_free_pages(struct resync_pages *rp)
> >> +{
> >> +     int i;
> >> +
> >> +     for (i = 0; i < RESYNC_PAGES; i++)
> >> +             __free_page(rp->pages[i]);
> >
> > Since we will use get_page, shouldn't this be put_page?
> 
> You are right, will fix in v3.
> 
> >
> >> +}
> >> +
> >> +static inline void resync_get_all_pages(struct resync_pages *rp)
> >> +{
> >> +     int i;
> >> +
> >> +     for (i = 0; i < RESYNC_PAGES; i++)
> >> +             get_page(rp->pages[i]);
> >> +}
> >> +
> >> +static inline struct page *resync_fetch_page(struct resync_pages *rp)
> >> +{
> >> +     if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
> >> +             return NULL;
> >> +     return rp->pages[rp->idx++];
> >
> > I'd like the caller explicitly specify the index instead of a global variable
> > to track it, which will make the code more understandable and less error prone.
> 
> That is fine, but the index has to be per bio, and finally the index
> has to be stored
> in 'struct resync_pages', so every user has to call it in the following way:
> 
>           resync_fetch_page(rp, rp->idx);
> 
> then looks no benefit to pass index explicitly.

I suppose the callers always iterate though page 0 to RESYNC_PAGES - 1. Then
the callers can use an index by themselves. That will clearly show which page
the callers are using. The resync_fetch_page() wrap is a blackbox we always
need to refer to the definition.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
  2017-03-02 17:55       ` Shaohua Li
@ 2017-03-03  1:54         ` Ming Lei
  0 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-03-03  1:54 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 3, 2017 at 1:55 AM, Shaohua Li <shli@kernel.org> wrote:
> On Thu, Mar 02, 2017 at 10:09:14AM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 1, 2017 at 7:30 AM, Shaohua Li <shli@kernel.org> wrote:
>> > On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote:
>> >> Now resync I/O use bio's bec table to manage pages,
>> >> this way is very hacky, and may not work any more
>> >> once multipage bvec is introduced.
>> >>
>> >> So introduce helpers and new data structure for
>> >> managing resync I/O pages more cleanly.
>> >>
>> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >> ---
>> >>  drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>  1 file changed, 54 insertions(+)
>> >>
>> >> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> >> index 1d63239a1be4..b5a638d85cb4 100644
>> >> --- a/drivers/md/md.h
>> >> +++ b/drivers/md/md.h
>> >> @@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>> >>  #define RESYNC_BLOCK_SIZE (64*1024)
>> >>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>> >>
>> >> +/* for managing resync I/O pages */
>> >> +struct resync_pages {
>> >> +     unsigned        idx;    /* for get/put page from the pool */
>> >> +     void            *raid_bio;
>> >> +     struct page     *pages[RESYNC_PAGES];
>> >> +};
>> >
>> > I'd like this to be embedded into r1bio directly. Not sure if we really need a
>> > structure.
>>
>> There are two reasons we can't put this into r1bio:
>> - r1bio is used in both normal and resync I/O, not fair to allocate more
>> in normal I/O, and that is why this patch wouldn't like to touch r1bio or r10bio
>>
>> - the count of 'struct resync_pages' instance depends on raid_disks(raid1)
>> or copies(raid10), both can't be decided during compiling.
>
> Yes, I said it should be a pointer of r1bio pointing to the pages in other emails.

OK, got it, :-)

Actually I did that in my local tree, and finally I figured out not
necessary to introduce this pointor, which only usage is for freeing,
and we can get the
pointor from .bi_private of the 1st bio. And this stuff can be commented
clearly.

>
>> >
>> >> +}
>> >> +
>> >> +static inline bool resync_page_available(struct resync_pages *rp)
>> >> +{
>> >> +     return rp->idx < RESYNC_PAGES;
>> >> +}
>> >
>> > Then we don't need this obscure API.
>>
>> That is fine.
>>
>>
>> Thanks,
>> Ming Lei
>
>> >
>> >> +
>> >> +static inline int resync_alloc_pages(struct resync_pages *rp,
>> >> +                                  gfp_t gfp_flags)
>> >> +{
>> >> +     int i;
>> >> +
>> >> +     for (i = 0; i < RESYNC_PAGES; i++) {
>> >> +             rp->pages[i] = alloc_page(gfp_flags);
>> >> +             if (!rp->pages[i])
>> >> +                     goto out_free;
>> >> +     }
>> >> +
>> >> +     return 0;
>> >> +
>> >> + out_free:
>> >> +     while (--i >= 0)
>> >> +             __free_page(rp->pages[i]);
>> >> +     return -ENOMEM;
>> >> +}
>> >> +
>> >> +static inline void resync_free_pages(struct resync_pages *rp)
>> >> +{
>> >> +     int i;
>> >> +
>> >> +     for (i = 0; i < RESYNC_PAGES; i++)
>> >> +             __free_page(rp->pages[i]);
>> >
>> > Since we will use get_page, shouldn't this be put_page?
>>
>> You are right, will fix in v3.
>>
>> >
>> >> +}
>> >> +
>> >> +static inline void resync_get_all_pages(struct resync_pages *rp)
>> >> +{
>> >> +     int i;
>> >> +
>> >> +     for (i = 0; i < RESYNC_PAGES; i++)
>> >> +             get_page(rp->pages[i]);
>> >> +}
>> >> +
>> >> +static inline struct page *resync_fetch_page(struct resync_pages *rp)
>> >> +{
>> >> +     if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
>> >> +             return NULL;
>> >> +     return rp->pages[rp->idx++];
>> >
>> > I'd like the caller explicitly specify the index instead of a global variable
>> > to track it, which will make the code more understandable and less error prone.
>>
>> That is fine, but the index has to be per bio, and finally the index
>> has to be stored
>> in 'struct resync_pages', so every user has to call it in the following way:
>>
>>           resync_fetch_page(rp, rp->idx);
>>
>> then looks no benefit to pass index explicitly.
>
> I suppose the callers always iterate though page 0 to RESYNC_PAGES - 1. Then
> the callers can use an index by themselves. That will clearly show which page
> the callers are using. The resync_fetch_page() wrap is a blackbox we always
> need to refer to the definition.

OK, understood.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()
  2017-03-02 17:49       ` Shaohua Li
@ 2017-03-03  1:57         ` Ming Lei
  0 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-03-03  1:57 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 3, 2017 at 1:49 AM, Shaohua Li <shli@kernel.org> wrote:
> On Thu, Mar 02, 2017 at 10:11:54AM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 1, 2017 at 7:31 AM, Shaohua Li <shli@kernel.org> wrote:
>> > On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote:
>> >> This patch gets each page's reference of each bio for resync,
>> >> then r1buf_pool_free() gets simplified a lot.
>> >>
>> >> The same policy has been taken in raid10's buf pool allocation/free
>> >> too.
>> >
>> > We are going to delete the code, this simplify isn't really required.
>>
>> Could you explain it a bit? Or I can put this patch into this patchset if you
>> can provide one?
>
> I mean you will replace the code soon, with the resync_pages stuff. So I
> thought this isn't really required.

OK,  one benefit of this patch is to make the following one easier to review,
but not a big deal, I may merge this into the following patch.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-03-02 17:48       ` Shaohua Li
@ 2017-03-03  2:11         ` Ming Lei
  2017-03-03 17:38           ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-03  2:11 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 3, 2017 at 1:48 AM, Shaohua Li <shli@kernel.org> wrote:
> On Thu, Mar 02, 2017 at 10:25:10AM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 1, 2017 at 7:37 AM, Shaohua Li <shli@kernel.org> wrote:
>> > On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote:
>> >> Now we allocate one page array for managing resync pages, instead
>> >> of using bio's vec table to do that, and the old way is very hacky
>> >> and won't work any more if multipage bvec is enabled.
>> >>
>> >> The introduced cost is that we need to allocate (128 + 16) * raid_disks
>> >> bytes per r1_bio, and it is fine because the inflight r1_bio for
>> >> resync shouldn't be much, as pointed by Shaohua.
>> >>
>> >> Also the bio_reset() in raid1_sync_request() is removed because
>> >> all bios are freshly new now and not necessary to reset any more.
>> >>
>> >> This patch can be thought as a cleanup too
>> >>
>> >> Suggested-by: Shaohua Li <shli@kernel.org>
>> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >> ---
>> >>  drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
>> >>  1 file changed, 53 insertions(+), 30 deletions(-)
>> >>
>> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> >> index c442b4657e2f..900144f39630 100644
>> >> --- a/drivers/md/raid1.c
>> >> +++ b/drivers/md/raid1.c
>> >> @@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>> >>  #define raid1_log(md, fmt, args...)                          \
>> >>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>> >>
>> >> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
>> >> +{
>> >> +     return bio->bi_private;
>> >> +}
>> >> +
>> >> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>> >> +{
>> >> +     return get_resync_pages(bio)->raid_bio;
>> >> +}
>> >
>> > This is a weird between bio, r1bio and the resync_pages. I'd like the pages are
>>
>> It is only a bit weird inside allocating and freeing r1bio, once all
>> are allocated, you
>> can see everthing is clean and simple:
>>
>>     - r1bio includes lots of bioes,
>>     - and one bio is attached by one resync_pages via .bi_private
>
> I don't how complex to let r1bio pointer to the pages, but that's the nartual
> way. r1bio owns the pages, not the pages own r1bio, so we should let r1bio
> points to the pages. The bio.bi_private still points to r1bio.

Actually it is bio which owns the pages for doing its own I/O, and the only
thing related with r10bio is that bios may share these pages, but using
page refcount trick will make the relation quite implicit.

The only reason to allocate all resync_pages together is for sake of efficiency,
and just for avoiding to allocate one resync_pages one time for each bio.

We have to make .bi_private point to resync_pages(per bio), otherwise we
can't fetch pages into one bio at all, thinking about where to store the index
for each bio's pre-allocated pages, and it has to be per bio.

>
>> > embedded in r1bio. Maybe a pointer of r1bio to the pages. It's cleaner and more
>> > straightforward.
>>
>> In theory, the cleanest way is to allocate one resync_pages for each resync bio,
>> but that isn't efficient. That is why this patch allocates all
>> resync_pages together
>> in r1buf_pool_alloc(), and split them into bio.
>>
>> BTW, the only trick is just that the whole page array is stored in .bi_private
>> of the 1st bio, then we can avoid to add one pointer into r1bio.
>
> You will need to add pointer in resync_pages which points to r1bio

Yeah, that is just what this patchset is doing.

>
>> >
>> > I think the patch 6, 7 8 should be in a same patch. Otherwise bisect will be broken.
>> >
>>
>> No, it won't. Both patch 7 and patch 8 just replacing reading from bvec table
>> with reading from the pre-allocated page array.  Both should be fine, but the
>> latter is cleaner and simpler to do.
>
> Ok, sounds good, though I doubt it's really worthy.

Small patch with one purpose is always easy to review, do one thing,
do it better,
:-)


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-03-02  7:52       ` Shaohua Li
@ 2017-03-03  2:20         ` Ming Lei
  2017-03-03  6:22           ` Ming Lei
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-03  2:20 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 2, 2017 at 3:52 PM, Shaohua Li <shli@kernel.org> wrote:
> On Thu, Mar 02, 2017 at 10:34:25AM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 1, 2017 at 7:42 AM, Shaohua Li <shli@kernel.org> wrote:
>> > On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
>> >> Use this helper, instead of direct access to .bi_vcnt.
>> >
>> > what We really need to do for the behind IO is:
>> > - allocate memory and copy bio data to the memory
>> > - let behind bio do IO against the memory
>> >
>> > The behind bio doesn't need to have the exactly same bio_vec setting. If we
>> > just track the new memory, we don't need use the bio_segments_all and access
>> > bio_vec too.
>>
>> But we need to figure out how many vecs(each vec store one page) to be
>> allocated for the cloned/behind bio, and that is the only value of
>> bio_segments_all() here. Or you have idea to avoid that?
>
> As I said, the behind bio doesn't need to have the exactly same bio_vec
> setting. We just allocate memory and copy original bio data to the memory,
> then do IO against the new memory. The behind bio
> segments == (bio->bi_iter.bi_size + PAGE_SIZE - 1) >> PAGE_SHIFT

The equation isn't always correct, especially when bvec includes just
part of page, and it is quite often in case of mkfs, in which one bvec often
includes 512byte buffer.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-03-02  7:47       ` Shaohua Li
@ 2017-03-03  2:30         ` Ming Lei
  0 siblings, 0 replies; 39+ messages in thread
From: Ming Lei @ 2017-03-03  2:30 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Thu, Mar 2, 2017 at 3:47 PM, Shaohua Li <shli@kernel.org> wrote:
> On Thu, Mar 02, 2017 at 10:37:49AM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 1, 2017 at 7:46 AM, Shaohua Li <shli@kernel.org> wrote:
>> > On Tue, Feb 28, 2017 at 11:41:43PM +0800, Ming Lei wrote:
>> >> The cost is 128bytes(8*16) stack space in kernel thread context, and
>> >> just use the bio helper to retrieve pages from bio.
>> >>
>> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >> ---
>> >>  drivers/md/raid10.c | 12 ++++++++++--
>> >>  1 file changed, 10 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> >> index 0b97631e3905..6ffb64ab45f8 100644
>> >> --- a/drivers/md/raid10.c
>> >> +++ b/drivers/md/raid10.c
>> >> @@ -4670,7 +4670,15 @@ static int handle_reshape_read_error(struct mddev *mddev,
>> >>       struct r10bio *r10b = &on_stack.r10_bio;
>> >>       int slot = 0;
>> >>       int idx = 0;
>> >> -     struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
>> >> +     struct bio_vec *bvl;
>> >> +     struct page *pages[RESYNC_PAGES];
>> >> +
>> >> +     /*
>> >> +      * This bio is allocated in reshape_request(), and size
>> >> +      * is still RESYNC_PAGES
>> >> +      */
>> >> +     bio_for_each_segment_all(bvl, r10_bio->master_bio, idx)
>> >> +             pages[idx] = bvl->bv_page;
>> >
>> > The reshape bio is doing IO against the memory we allocated for r10_bio, I'm
>> > wondering why we can't get the pages from r10_bio. In this way, we don't need
>> > access the bio_vec any more.
>>
>> Reshap read is special and the bio(r10_bio->master_bio) isn't allocated from
>> .r10buf_pool, please see reshape_request().
>
> Why does this matter? The bio is still doing IO to the memory allocated in
> r10_bio. bi_private->r10_bio->the pages.

Good question, :-)

>
> My guess why you think the reshape read is special is the bio->bi_private
> doesn't pointer to resync_pages. And since you let resync_pages pointer to
> r10_bio and not vice versa, we can't find r10_bio, correct? I think this is
> another reason why r10_bio embeds resync_pages (I mean a pointer to
> resync_pages). With this way I suggested, the reshape read isn't special at

We still can get the resync_pages for r10_bio->master_bio via
r10_bio->devs[0].bio in handle_reshape_read_error(), will do that in v3.

> all. If we allocate memory for r10_bio/r1_bio, we shouldn't need access any bio
> bvec.

Right.



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-03-03  2:20         ` Ming Lei
@ 2017-03-03  6:22           ` Ming Lei
  2017-03-03 17:12             ` Shaohua Li
  0 siblings, 1 reply; 39+ messages in thread
From: Ming Lei @ 2017-03-03  6:22 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 3, 2017 at 10:20 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Thu, Mar 2, 2017 at 3:52 PM, Shaohua Li <shli@kernel.org> wrote:
>> On Thu, Mar 02, 2017 at 10:34:25AM +0800, Ming Lei wrote:
>>> Hi Shaohua,
>>>
>>> On Wed, Mar 1, 2017 at 7:42 AM, Shaohua Li <shli@kernel.org> wrote:
>>> > On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
>>> >> Use this helper, instead of direct access to .bi_vcnt.
>>> >
>>> > what We really need to do for the behind IO is:
>>> > - allocate memory and copy bio data to the memory
>>> > - let behind bio do IO against the memory
>>> >
>>> > The behind bio doesn't need to have the exactly same bio_vec setting. If we
>>> > just track the new memory, we don't need use the bio_segments_all and access
>>> > bio_vec too.
>>>
>>> But we need to figure out how many vecs(each vec store one page) to be
>>> allocated for the cloned/behind bio, and that is the only value of
>>> bio_segments_all() here. Or you have idea to avoid that?
>>
>> As I said, the behind bio doesn't need to have the exactly same bio_vec
>> setting. We just allocate memory and copy original bio data to the memory,
>> then do IO against the new memory. The behind bio
>> segments == (bio->bi_iter.bi_size + PAGE_SIZE - 1) >> PAGE_SHIFT
>
> The equation isn't always correct, especially when bvec includes just
> part of page, and it is quite often in case of mkfs, in which one bvec often
> includes 512byte buffer.

Think it further, your idea could be workable and more clean, but the change
can be a bit big, looks we need to switch handling write behind into
the following way:

1) replace bio_clone_bioset_partial() with bio_allocate(nr_vecs), and 'nr_vecs'
is computed with your equation;

2) allocate 'nr_vecs' pages once and share them among all created bio in 1)

3) for each created bio, add each page into the bio via bio_add_page()

4) only for the 1st created bio, call bio_copy_data() to copy data from
master bio.

Let me know if you are OK with the above implementaion.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/13] md: raid1: use bio_segments_all()
  2017-03-03  6:22           ` Ming Lei
@ 2017-03-03 17:12             ` Shaohua Li
  0 siblings, 0 replies; 39+ messages in thread
From: Shaohua Li @ 2017-03-03 17:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 03, 2017 at 02:22:30PM +0800, Ming Lei wrote:
> On Fri, Mar 3, 2017 at 10:20 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> > On Thu, Mar 2, 2017 at 3:52 PM, Shaohua Li <shli@kernel.org> wrote:
> >> On Thu, Mar 02, 2017 at 10:34:25AM +0800, Ming Lei wrote:
> >>> Hi Shaohua,
> >>>
> >>> On Wed, Mar 1, 2017 at 7:42 AM, Shaohua Li <shli@kernel.org> wrote:
> >>> > On Tue, Feb 28, 2017 at 11:41:39PM +0800, Ming Lei wrote:
> >>> >> Use this helper, instead of direct access to .bi_vcnt.
> >>> >
> >>> > what We really need to do for the behind IO is:
> >>> > - allocate memory and copy bio data to the memory
> >>> > - let behind bio do IO against the memory
> >>> >
> >>> > The behind bio doesn't need to have the exactly same bio_vec setting. If we
> >>> > just track the new memory, we don't need use the bio_segments_all and access
> >>> > bio_vec too.
> >>>
> >>> But we need to figure out how many vecs(each vec store one page) to be
> >>> allocated for the cloned/behind bio, and that is the only value of
> >>> bio_segments_all() here. Or you have idea to avoid that?
> >>
> >> As I said, the behind bio doesn't need to have the exactly same bio_vec
> >> setting. We just allocate memory and copy original bio data to the memory,
> >> then do IO against the new memory. The behind bio
> >> segments == (bio->bi_iter.bi_size + PAGE_SIZE - 1) >> PAGE_SHIFT
> >
> > The equation isn't always correct, especially when bvec includes just
> > part of page, and it is quite often in case of mkfs, in which one bvec often
> > includes 512byte buffer.
> 
> Think it further, your idea could be workable and more clean, but the change
> can be a bit big, looks we need to switch handling write behind into
> the following way:
> 
> 1) replace bio_clone_bioset_partial() with bio_allocate(nr_vecs), and 'nr_vecs'
> is computed with your equation;
> 
> 2) allocate 'nr_vecs' pages once and share them among all created bio in 1)
> 
> 3) for each created bio, add each page into the bio via bio_add_page()
> 
> 4) only for the 1st created bio, call bio_copy_data() to copy data from
> master bio.
> 
> Let me know if you are OK with the above implementaion.

Right, this is exactly what I'd like to do. This way we don't need touch
bvec and should be much cleaner.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages
  2017-03-03  2:11         ` Ming Lei
@ 2017-03-03 17:38           ` Shaohua Li
  0 siblings, 0 replies; 39+ messages in thread
From: Shaohua Li @ 2017-03-03 17:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig

On Fri, Mar 03, 2017 at 10:11:31AM +0800, Ming Lei wrote:
> On Fri, Mar 3, 2017 at 1:48 AM, Shaohua Li <shli@kernel.org> wrote:
> > On Thu, Mar 02, 2017 at 10:25:10AM +0800, Ming Lei wrote:
> >> Hi Shaohua,
> >>
> >> On Wed, Mar 1, 2017 at 7:37 AM, Shaohua Li <shli@kernel.org> wrote:
> >> > On Tue, Feb 28, 2017 at 11:41:36PM +0800, Ming Lei wrote:
> >> >> Now we allocate one page array for managing resync pages, instead
> >> >> of using bio's vec table to do that, and the old way is very hacky
> >> >> and won't work any more if multipage bvec is enabled.
> >> >>
> >> >> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> >> >> bytes per r1_bio, and it is fine because the inflight r1_bio for
> >> >> resync shouldn't be much, as pointed by Shaohua.
> >> >>
> >> >> Also the bio_reset() in raid1_sync_request() is removed because
> >> >> all bios are freshly new now and not necessary to reset any more.
> >> >>
> >> >> This patch can be thought as a cleanup too
> >> >>
> >> >> Suggested-by: Shaohua Li <shli@kernel.org>
> >> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> >> ---
> >> >>  drivers/md/raid1.c | 83 ++++++++++++++++++++++++++++++++++--------------------
> >> >>  1 file changed, 53 insertions(+), 30 deletions(-)
> >> >>
> >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> >> index c442b4657e2f..900144f39630 100644
> >> >> --- a/drivers/md/raid1.c
> >> >> +++ b/drivers/md/raid1.c
> >> >> @@ -77,6 +77,16 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
> >> >>  #define raid1_log(md, fmt, args...)                          \
> >> >>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
> >> >>
> >> >> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> >> >> +{
> >> >> +     return bio->bi_private;
> >> >> +}
> >> >> +
> >> >> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> >> >> +{
> >> >> +     return get_resync_pages(bio)->raid_bio;
> >> >> +}
> >> >
> >> > This is a weird between bio, r1bio and the resync_pages. I'd like the pages are
> >>
> >> It is only a bit weird inside allocating and freeing r1bio, once all
> >> are allocated, you
> >> can see everthing is clean and simple:
> >>
> >>     - r1bio includes lots of bioes,
> >>     - and one bio is attached by one resync_pages via .bi_private
> >
> > I don't how complex to let r1bio pointer to the pages, but that's the nartual
> > way. r1bio owns the pages, not the pages own r1bio, so we should let r1bio
> > points to the pages. The bio.bi_private still points to r1bio.
> 
> Actually it is bio which owns the pages for doing its own I/O, and the only
> thing related with r10bio is that bios may share these pages, but using
> page refcount trick will make the relation quite implicit.
>
> The only reason to allocate all resync_pages together is for sake of efficiency,
> and just for avoiding to allocate one resync_pages one time for each bio.
> 
> We have to make .bi_private point to resync_pages(per bio), otherwise we
> can't fetch pages into one bio at all, thinking about where to store the index
> for each bio's pre-allocated pages, and it has to be per bio.

So the reason is we can't find the corresponding pages of the bio if bi_private
points to r1bio, right? Got it. We don't have many choices in this way. Ok, I
don't insist. Please add some comments in the get_resync_r1bio to describe how
the data structure is organized.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-03-03 19:14 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-28 15:41 [PATCH v2 00/14] md: cleanup on direct access to bvec table Ming Lei
2017-02-28 15:41 ` [PATCH v2 01/13] block: introduce bio_segments_all() Ming Lei
2017-02-28 15:41 ` [PATCH v2 02/13] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
2017-02-28 15:41 ` [PATCH v2 03/13] md: move two macros into md.h Ming Lei
2017-02-28 15:41 ` [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way Ming Lei
2017-02-28 23:30   ` Shaohua Li
2017-03-02  2:09     ` Ming Lei
2017-03-02 17:55       ` Shaohua Li
2017-03-03  1:54         ` Ming Lei
2017-02-28 15:41 ` [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free() Ming Lei
2017-02-28 23:31   ` Shaohua Li
2017-03-02  2:11     ` Ming Lei
2017-03-02 17:49       ` Shaohua Li
2017-03-03  1:57         ` Ming Lei
2017-02-28 15:41 ` [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
2017-02-28 23:37   ` Shaohua Li
2017-03-02  2:25     ` Ming Lei
2017-03-02 17:48       ` Shaohua Li
2017-03-03  2:11         ` Ming Lei
2017-03-03 17:38           ` Shaohua Li
2017-02-28 15:41 ` [PATCH v2 07/13] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
2017-02-28 15:41 ` [PATCH v2 08/13] md: raid1: use bio helper in process_checks() Ming Lei
2017-02-28 23:39   ` Shaohua Li
2017-02-28 15:41 ` [PATCH v2 09/13] md: raid1: use bio_segments_all() Ming Lei
2017-02-28 23:42   ` Shaohua Li
2017-03-02  2:34     ` Ming Lei
2017-03-02  7:52       ` Shaohua Li
2017-03-03  2:20         ` Ming Lei
2017-03-03  6:22           ` Ming Lei
2017-03-03 17:12             ` Shaohua Li
2017-02-28 15:41 ` [PATCH v2 10/13] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
2017-02-28 15:41 ` [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
2017-02-28 23:43   ` Shaohua Li
2017-02-28 15:41 ` [PATCH v2 12/13] md: raid10: retrieve page from preallocated resync page array Ming Lei
2017-02-28 15:41 ` [PATCH v2 13/13] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
2017-02-28 23:46   ` Shaohua Li
2017-03-02  2:37     ` Ming Lei
2017-03-02  7:47       ` Shaohua Li
2017-03-03  2:30         ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).