All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/14]  md: cleanup on direct access to bvec table
@ 2017-03-16 16:12 Ming Lei
  2017-03-16 16:12 ` [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

In MD's resync I/O path, there are lots of direct access to bio's
bvec table. This patchset kills almost all, and the conversion
is quite straightforward. One root cause of direct access to bvec
table is that resync I/O uses the bio's bvec to manage pages.
In V1, as suggested by Shaohua, a new approach is used to manage
these pages for resync I/O, turns out code becomes more clean
and readable.

Once direct access to bvec table in MD is cleaned up, we may make
multipage bvec moving on.

V3:
	- improve pages allocation & bio clone for write behind(10/14)
	- retrieve pages from preallocated page array, and avoid to user
	bio helpers(14/14)
	- change on helpers about managing resync io pages(03/14)

V2:
	- remove the patch for introducing/applying bio_remove_last_page()

V1:
	- allocate page array to manage resync pages

Thanks,
Ming

Ming Lei (14):
  md: raid1/raid10: don't handle failure of bio_add_page()
  md: move two macros into md.h
  md: prepare for managing resync I/O pages in clean way
  md: raid1: simplify r1buf_pool_free()
  md: raid1: don't use bio's vec table to manage resync pages
  md: raid1: retrieve page from pre-allocated resync page array
  md: raid1: use bio helper in process_checks()
  block: introduce bio_copy_data_partial
  md: raid1: move 'offset' out of loop
  md: raid1: improve write behind
  md: raid10: refactor code of read reshape's .bi_end_io
  md: raid10: don't use bio's vec table to manage resync pages
  md: raid10: retrieve page from preallocated resync page array
  md: raid10: avoid direct access to bvec table in
    handle_reshape_read_error

 block/bio.c         |  60 +++++++++---
 drivers/md/md.h     |  55 +++++++++++
 drivers/md/raid1.c  | 267 ++++++++++++++++++++++++++++------------------------
 drivers/md/raid1.h  |  10 +-
 drivers/md/raid10.c | 224 +++++++++++++++++++++++--------------------
 include/linux/bio.h |   2 +
 6 files changed, 376 insertions(+), 242 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page()
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-27  9:14   ` Christoph Hellwig
  2017-03-16 16:12 ` [PATCH v3 02/14] md: move two macros into md.h Ming Lei
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

All bio_add_page() is for adding one page into resync bio,
which is big enough to hold RESYNC_PAGES pages, and
the current bio_add_page() doesn't check queue limit any more,
so it won't fail at all.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c  | 21 ++++++---------------
 drivers/md/raid10.c | 41 ++++++++++-------------------------------
 2 files changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index fbc2d7851b49..4a0b2ad5025e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2917,21 +2917,12 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 			bio = r1_bio->bios[i];
 			if (bio->bi_end_io) {
 				page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
-				if (bio_add_page(bio, page, len, 0) == 0) {
-					/* stop here */
-					bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
-					while (i > 0) {
-						i--;
-						bio = r1_bio->bios[i];
-						if (bio->bi_end_io==NULL)
-							continue;
-						/* remove last page from this bio */
-						bio->bi_vcnt--;
-						bio->bi_iter.bi_size -= len;
-						bio_clear_flag(bio, BIO_SEG_VALID);
-					}
-					goto bio_full;
-				}
+
+				/*
+				 * won't fail because the vec table is big
+				 * enough to hold all these pages
+				 */
+				bio_add_page(bio, page, len, 0);
 			}
 		}
 		nr_sectors += len>>9;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0536658c9d40..b4a56a488668 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3437,27 +3437,16 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (len == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
-			struct bio *bio2;
 			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
-			if (bio_add_page(bio, page, len, 0))
-				continue;
-
-			/* stop here */
-			bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
-			for (bio2 = biolist;
-			     bio2 && bio2 != bio;
-			     bio2 = bio2->bi_next) {
-				/* remove last page from this bio */
-				bio2->bi_vcnt--;
-				bio2->bi_iter.bi_size -= len;
-				bio_clear_flag(bio2, BIO_SEG_VALID);
-			}
-			goto bio_full;
+			/*
+			 * won't fail because the vec table is big enough
+			 * to hold all these pages
+			 */
+			bio_add_page(bio, page, len, 0);
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
 	} while (biolist->bi_vcnt < RESYNC_PAGES);
- bio_full:
 	r10_bio->sectors = nr_sectors;
 
 	while (biolist) {
@@ -4530,25 +4519,15 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		if (len > PAGE_SIZE)
 			len = PAGE_SIZE;
 		for (bio = blist; bio ; bio = bio->bi_next) {
-			struct bio *bio2;
-			if (bio_add_page(bio, page, len, 0))
-				continue;
-
-			/* Didn't fit, must stop */
-			for (bio2 = blist;
-			     bio2 && bio2 != bio;
-			     bio2 = bio2->bi_next) {
-				/* Remove last page from this bio */
-				bio2->bi_vcnt--;
-				bio2->bi_iter.bi_size -= len;
-				bio_clear_flag(bio2, BIO_SEG_VALID);
-			}
-			goto bio_full;
+			/*
+			 * won't fail because the vec table is big enough
+			 * to hold all these pages
+			 */
+			bio_add_page(bio, page, len, 0);
 		}
 		sector_nr += len >> 9;
 		nr_sectors += len >> 9;
 	}
-bio_full:
 	rcu_read_unlock();
 	r10_bio->sectors = nr_sectors;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 02/14] md: move two macros into md.h
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
  2017-03-16 16:12 ` [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-24  5:57     ` NeilBrown
  2017-03-16 16:12 ` [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way Ming Lei
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Both raid1 and raid10 share common resync
block size and page count, so move them into md.h.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.h     | 5 +++++
 drivers/md/raid1.c  | 2 --
 drivers/md/raid10.c | 3 ---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8859cbf84b6..1d63239a1be4 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -715,4 +715,9 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
 	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
 		mddev->queue->limits.max_write_same_sectors = 0;
 }
+
+/* Maximum size of each resync request */
+#define RESYNC_BLOCK_SIZE (64*1024)
+#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
+
 #endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4a0b2ad5025e..908e2caeb704 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -94,10 +94,8 @@ static void r1bio_pool_free(void *r1_bio, void *data)
 	kfree(r1_bio);
 }
 
-#define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_DEPTH 32
 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
 #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
 #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b4a56a488668..2b40420299e3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -125,9 +125,6 @@ static void r10bio_pool_free(void *r10_bio, void *data)
 	kfree(r10_bio);
 }
 
-/* Maximum size of each resync request */
-#define RESYNC_BLOCK_SIZE (64*1024)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 /* amount of memory to reserve for resync requests */
 #define RESYNC_WINDOW (1024*1024)
 /* maximum number of concurrent requests, memory permitting */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
  2017-03-16 16:12 ` [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
  2017-03-16 16:12 ` [PATCH v3 02/14] md: move two macros into md.h Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-24  6:00     ` NeilBrown
  2017-03-16 16:12 ` [PATCH v3 04/14] md: raid1: simplify r1buf_pool_free() Ming Lei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now resync I/O use bio's bec table to manage pages,
this way is very hacky, and may not work any more
once multipage bvec is introduced.

So introduce helpers and new data structure for
managing resync I/O pages more cleanly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1d63239a1be4..20c48032493b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -720,4 +720,54 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
 #define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 
+/* for managing resync I/O pages */
+struct resync_pages {
+	unsigned	idx;	/* for get/put page from the pool */
+	void		*raid_bio;
+	struct page	*pages[RESYNC_PAGES];
+};
+
+static inline int resync_alloc_pages(struct resync_pages *rp,
+				     gfp_t gfp_flags)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++) {
+		rp->pages[i] = alloc_page(gfp_flags);
+		if (!rp->pages[i])
+			goto out_free;
+	}
+
+	return 0;
+
+ out_free:
+	while (--i >= 0)
+		put_page(rp->pages[i]);
+	return -ENOMEM;
+}
+
+static inline void resync_free_pages(struct resync_pages *rp)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++)
+		put_page(rp->pages[i]);
+}
+
+static inline void resync_get_all_pages(struct resync_pages *rp)
+{
+	int i;
+
+	for (i = 0; i < RESYNC_PAGES; i++)
+		get_page(rp->pages[i]);
+}
+
+static inline struct page *resync_fetch_page(struct resync_pages *rp,
+					     unsigned idx)
+{
+	if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
+		return NULL;
+	return rp->pages[idx];
+}
+
 #endif /* _MD_MD_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 04/14] md: raid1: simplify r1buf_pool_free()
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (2 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

This patch gets each page's reference of each bio for resync,
then r1buf_pool_free() gets simplified a lot.

The same policy has been taken in raid10's buf pool allocation/free
too.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 908e2caeb704..e30d89690109 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -142,9 +142,12 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	/* If not user-requests, copy the page pointers to all bios */
 	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
 		for (i=0; i<RESYNC_PAGES ; i++)
-			for (j=1; j<pi->raid_disks; j++)
-				r1_bio->bios[j]->bi_io_vec[i].bv_page =
+			for (j=1; j<pi->raid_disks; j++) {
+				struct page *page =
 					r1_bio->bios[0]->bi_io_vec[i].bv_page;
+				get_page(page);
+				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
+			}
 	}
 
 	r1_bio->master_bio = NULL;
@@ -169,12 +172,8 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
 	struct r1bio *r1bio = __r1_bio;
 
 	for (i = 0; i < RESYNC_PAGES; i++)
-		for (j = pi->raid_disks; j-- ;) {
-			if (j == 0 ||
-			    r1bio->bios[j]->bi_io_vec[i].bv_page !=
-			    r1bio->bios[0]->bi_io_vec[i].bv_page)
-				safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
-		}
+		for (j = pi->raid_disks; j-- ;)
+			safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
 	for (i=0 ; i < pi->raid_disks; i++)
 		bio_put(r1bio->bios[i]);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (3 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 04/14] md: raid1: simplify r1buf_pool_free() Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-07-09 23:09     ` NeilBrown
  2017-03-16 16:12 ` [PATCH v3 06/14] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now we allocate one page array for managing resync pages, instead
of using bio's vec table to do that, and the old way is very hacky
and won't work any more if multipage bvec is enabled.

The introduced cost is that we need to allocate (128 + 16) * raid_disks
bytes per r1_bio, and it is fine because the inflight r1_bio for
resync shouldn't be much, as pointed by Shaohua.

Also the bio_reset() in raid1_sync_request() is removed because
all bios are freshly new now and not necessary to reset any more.

This patch can be thought as a cleanup too

Suggested-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 64 insertions(+), 30 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e30d89690109..0e64beb60e4d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
 #define raid1_log(md, fmt, args...)				\
 	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
 
+/*
+ * 'strct resync_pages' stores actual pages used for doing the resync
+ *  IO, and it is per-bio, so make .bi_private points to it.
+ */
+static inline struct resync_pages *get_resync_pages(struct bio *bio)
+{
+	return bio->bi_private;
+}
+
+/*
+ * for resync bio, r1bio pointer can be retrieved from the per-bio
+ * 'struct resync_pages'.
+ */
+static inline struct r1bio *get_resync_r1bio(struct bio *bio)
+{
+	return get_resync_pages(bio)->raid_bio;
+}
+
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct pool_info *pi = data;
@@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	struct r1bio *r1_bio;
 	struct bio *bio;
 	int need_pages;
-	int i, j;
+	int j;
+	struct resync_pages *rps;
 
 	r1_bio = r1bio_pool_alloc(gfp_flags, pi);
 	if (!r1_bio)
 		return NULL;
 
+	rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
+		      gfp_flags);
+	if (!rps)
+		goto out_free_r1bio;
+
 	/*
 	 * Allocate bios : 1 for reading, n-1 for writing
 	 */
@@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 		need_pages = pi->raid_disks;
 	else
 		need_pages = 1;
-	for (j = 0; j < need_pages; j++) {
+	for (j = 0; j < pi->raid_disks; j++) {
+		struct resync_pages *rp = &rps[j];
+
 		bio = r1_bio->bios[j];
-		bio->bi_vcnt = RESYNC_PAGES;
-
-		if (bio_alloc_pages(bio, gfp_flags))
-			goto out_free_pages;
-	}
-	/* If not user-requests, copy the page pointers to all bios */
-	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
-		for (i=0; i<RESYNC_PAGES ; i++)
-			for (j=1; j<pi->raid_disks; j++) {
-				struct page *page =
-					r1_bio->bios[0]->bi_io_vec[i].bv_page;
-				get_page(page);
-				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
-			}
+
+		if (j < need_pages) {
+			if (resync_alloc_pages(rp, gfp_flags))
+				goto out_free_pages;
+		} else {
+			memcpy(rp, &rps[0], sizeof(*rp));
+			resync_get_all_pages(rp);
+		}
+
+		rp->idx = 0;
+		rp->raid_bio = r1_bio;
+		bio->bi_private = rp;
 	}
 
 	r1_bio->master_bio = NULL;
@@ -156,11 +180,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 
 out_free_pages:
 	while (--j >= 0)
-		bio_free_pages(r1_bio->bios[j]);
+		resync_free_pages(&rps[j]);
 
 out_free_bio:
 	while (++j < pi->raid_disks)
 		bio_put(r1_bio->bios[j]);
+	kfree(rps);
+
+out_free_r1bio:
 	r1bio_pool_free(r1_bio, data);
 	return NULL;
 }
@@ -168,14 +195,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 static void r1buf_pool_free(void *__r1_bio, void *data)
 {
 	struct pool_info *pi = data;
-	int i,j;
+	int i;
 	struct r1bio *r1bio = __r1_bio;
+	struct resync_pages *rp = NULL;
 
-	for (i = 0; i < RESYNC_PAGES; i++)
-		for (j = pi->raid_disks; j-- ;)
-			safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
-	for (i=0 ; i < pi->raid_disks; i++)
+	for (i = pi->raid_disks; i--; ) {
+		rp = get_resync_pages(r1bio->bios[i]);
+		resync_free_pages(rp);
 		bio_put(r1bio->bios[i]);
+	}
+
+	/* resync pages array stored in the 1st bio's .bi_private */
+	kfree(rp);
 
 	r1bio_pool_free(r1bio, data);
 }
@@ -1863,7 +1894,7 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 
 static void end_sync_read(struct bio *bio)
 {
-	struct r1bio *r1_bio = bio->bi_private;
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
 
 	update_head_pos(r1_bio->read_disk, r1_bio);
 
@@ -1882,7 +1913,7 @@ static void end_sync_read(struct bio *bio)
 static void end_sync_write(struct bio *bio)
 {
 	int uptodate = !bio->bi_error;
-	struct r1bio *r1_bio = bio->bi_private;
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	sector_t first_bad;
@@ -2099,6 +2130,7 @@ static void process_checks(struct r1bio *r1_bio)
 		int size;
 		int error;
 		struct bio *b = r1_bio->bios[i];
+		struct resync_pages *rp = get_resync_pages(b);
 		if (b->bi_end_io != end_sync_read)
 			continue;
 		/* fixup the bio for reuse, but preserve errno */
@@ -2111,7 +2143,8 @@ static void process_checks(struct r1bio *r1_bio)
 			conf->mirrors[i].rdev->data_offset;
 		b->bi_bdev = conf->mirrors[i].rdev->bdev;
 		b->bi_end_io = end_sync_read;
-		b->bi_private = r1_bio;
+		rp->raid_bio = r1_bio;
+		b->bi_private = rp;
 
 		size = b->bi_iter.bi_size;
 		for (j = 0; j < vcnt ; j++) {
@@ -2769,7 +2802,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		struct md_rdev *rdev;
 		bio = r1_bio->bios[i];
-		bio_reset(bio);
 
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
 		if (rdev == NULL ||
@@ -2825,7 +2857,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&rdev->nr_pending);
 			bio->bi_iter.bi_sector = sector_nr + rdev->data_offset;
 			bio->bi_bdev = rdev->bdev;
-			bio->bi_private = r1_bio;
 			if (test_bit(FailFast, &rdev->flags))
 				bio->bi_opf |= MD_FAILFAST;
 		}
@@ -2911,9 +2942,12 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 
 		for (i = 0 ; i < conf->raid_disks * 2; i++) {
+			struct resync_pages *rp;
+
 			bio = r1_bio->bios[i];
+			rp = get_resync_pages(bio);
 			if (bio->bi_end_io) {
-				page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
+				page = resync_fetch_page(rp, rp->idx++);
 
 				/*
 				 * won't fail because the vec table is big
@@ -2925,8 +2959,8 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
 		sync_blocks -= (len>>9);
-	} while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES);
- bio_full:
+	} while (get_resync_pages(r1_bio->bios[disk]->bi_private)->idx < RESYNC_PAGES);
+
 	r1_bio->sectors = nr_sectors;
 
 	if (mddev_is_clustered(mddev) &&
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 06/14] md: raid1: retrieve page from pre-allocated resync page array
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (4 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 07/14] md: raid1: use bio helper in process_checks() Ming Lei
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now one page array is allocated for each resync bio, and we can
retrieve page from this table directly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0e64beb60e4d..b1345aa4efd8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1992,6 +1992,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
+	struct page **pages = get_resync_pages(bio)->pages;
 	sector_t sect = r1_bio->sector;
 	int sectors = r1_bio->sectors;
 	int idx = 0;
@@ -2025,7 +2026,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				 */
 				rdev = conf->mirrors[d].rdev;
 				if (sync_page_io(rdev, sect, s<<9,
-						 bio->bi_io_vec[idx].bv_page,
+						 pages[idx],
 						 REQ_OP_READ, 0, false)) {
 					success = 1;
 					break;
@@ -2080,7 +2081,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
-					    bio->bi_io_vec[idx].bv_page,
+					    pages[idx],
 					    WRITE) == 0) {
 				r1_bio->bios[d]->bi_end_io = NULL;
 				rdev_dec_pending(rdev, mddev);
@@ -2095,7 +2096,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
-					    bio->bi_io_vec[idx].bv_page,
+					    pages[idx],
 					    READ) != 0)
 				atomic_add(s, &rdev->corrected_errors);
 		}
@@ -2171,6 +2172,8 @@ static void process_checks(struct r1bio *r1_bio)
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
 		int error = sbio->bi_error;
+		struct page **ppages = get_resync_pages(pbio)->pages;
+		struct page **spages = get_resync_pages(sbio)->pages;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
@@ -2179,11 +2182,8 @@ static void process_checks(struct r1bio *r1_bio)
 
 		if (!error) {
 			for (j = vcnt; j-- ; ) {
-				struct page *p, *s;
-				p = pbio->bi_io_vec[j].bv_page;
-				s = sbio->bi_io_vec[j].bv_page;
-				if (memcmp(page_address(p),
-					   page_address(s),
+				if (memcmp(page_address(ppages[j]),
+					   page_address(spages[j]),
 					   sbio->bi_io_vec[j].bv_len))
 					break;
 			}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 07/14] md: raid1: use bio helper in process_checks()
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (5 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 06/14] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 08/14] block: introduce bio_copy_data_partial Ming Lei
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Avoid to direct access to bvec table.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b1345aa4efd8..4034a2963da8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2130,6 +2130,7 @@ static void process_checks(struct r1bio *r1_bio)
 		int j;
 		int size;
 		int error;
+		struct bio_vec *bi;
 		struct bio *b = r1_bio->bios[i];
 		struct resync_pages *rp = get_resync_pages(b);
 		if (b->bi_end_io != end_sync_read)
@@ -2148,9 +2149,7 @@ static void process_checks(struct r1bio *r1_bio)
 		b->bi_private = rp;
 
 		size = b->bi_iter.bi_size;
-		for (j = 0; j < vcnt ; j++) {
-			struct bio_vec *bi;
-			bi = &b->bi_io_vec[j];
+		bio_for_each_segment_all(bi, b, j) {
 			bi->bv_offset = 0;
 			if (size > PAGE_SIZE)
 				bi->bv_len = PAGE_SIZE;
@@ -2174,17 +2173,22 @@ static void process_checks(struct r1bio *r1_bio)
 		int error = sbio->bi_error;
 		struct page **ppages = get_resync_pages(pbio)->pages;
 		struct page **spages = get_resync_pages(sbio)->pages;
+		struct bio_vec *bi;
+		int page_len[RESYNC_PAGES];
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
 		/* Now we can 'fixup' the error value */
 		sbio->bi_error = 0;
 
+		bio_for_each_segment_all(bi, sbio, j)
+			page_len[j] = bi->bv_len;
+
 		if (!error) {
 			for (j = vcnt; j-- ; ) {
 				if (memcmp(page_address(ppages[j]),
 					   page_address(spages[j]),
-					   sbio->bi_io_vec[j].bv_len))
+					   page_len[j]))
 					break;
 			}
 		} else
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 08/14] block: introduce bio_copy_data_partial
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (6 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 07/14] md: raid1: use bio helper in process_checks() Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-24  5:34     ` Shaohua Li
  2017-03-24 16:41     ` Jens Axboe
  2017-03-16 16:12 ` [PATCH v3 09/14] md: raid1: move 'offset' out of loop Ming Lei
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Turns out we can use bio_copy_data in raid1's write behind,
and we can make alloc_behind_pages() more clean/efficient,
but we need to partial version of bio_copy_data().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c         | 60 +++++++++++++++++++++++++++++++++++++++++------------
 include/linux/bio.h |  2 ++
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e75878f8b14a..1ccff0dace89 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1025,19 +1025,8 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(bio_alloc_pages);
 
-/**
- * bio_copy_data - copy contents of data buffers from one chain of bios to
- * another
- * @src: source bio list
- * @dst: destination bio list
- *
- * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
- * @src and @dst as linked lists of bios.
- *
- * Stops when it reaches the end of either @src or @dst - that is, copies
- * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
- */
-void bio_copy_data(struct bio *dst, struct bio *src)
+static void __bio_copy_data(struct bio *dst, struct bio *src,
+			    int offset, int size)
 {
 	struct bvec_iter src_iter, dst_iter;
 	struct bio_vec src_bv, dst_bv;
@@ -1047,6 +1036,12 @@ void bio_copy_data(struct bio *dst, struct bio *src)
 	src_iter = src->bi_iter;
 	dst_iter = dst->bi_iter;
 
+	/* for supporting partial copy */
+	if (offset || size != src->bi_iter.bi_size) {
+		bio_advance_iter(src, &src_iter, offset);
+		src_iter.bi_size = size;
+	}
+
 	while (1) {
 		if (!src_iter.bi_size) {
 			src = src->bi_next;
@@ -1083,8 +1078,47 @@ void bio_copy_data(struct bio *dst, struct bio *src)
 		bio_advance_iter(dst, &dst_iter, bytes);
 	}
 }
+
+/**
+ * bio_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data(struct bio *dst, struct bio *src)
+{
+	__bio_copy_data(dst, src, 0, src->bi_iter.bi_size);
+}
 EXPORT_SYMBOL(bio_copy_data);
 
+/**
+ * bio_copy_data_partial - copy partial contents of data buffers from one
+ * chain of bios to another
+ * @dst: destination bio list
+ * @src: source bio list
+ * @offset: starting copy from the offset
+ * @size: how many bytes to copy
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data_partial(struct bio *dst, struct bio *src,
+			   int offset, int size)
+{
+	__bio_copy_data(dst, src, offset, size);
+
+}
+EXPORT_SYMBOL(bio_copy_data_partial);
+
 struct bio_map_data {
 	int is_our_pages;
 	struct iov_iter iter;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..42b62a0288b0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -468,6 +468,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
 #endif
 
 extern void bio_copy_data(struct bio *dst, struct bio *src);
+extern void bio_copy_data_partial(struct bio *dst, struct bio *src,
+				  int offset, int size);
 extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
 extern void bio_free_pages(struct bio *bio);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 09/14] md: raid1: move 'offset' out of loop
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (7 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 08/14] block: introduce bio_copy_data_partial Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 10/14] md: raid1: improve write behind Ming Lei
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

The 'offset' local variable can't be changed inside the loop, so
move it out.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4034a2963da8..2f3622c695ce 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1317,6 +1317,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t offset;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1481,13 +1482,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 	atomic_set(&r1_bio->behind_remaining, 0);
 
 	first_clone = 1;
+
+	offset = r1_bio->sector - bio->bi_iter.bi_sector;
 	for (i = 0; i < disks; i++) {
 		struct bio *mbio = NULL;
-		sector_t offset;
 		if (!r1_bio->bios[i])
 			continue;
 
-		offset = r1_bio->sector - bio->bi_iter.bi_sector;
 
 		if (first_clone) {
 			/* do behind I/O ?
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 10/14] md: raid1: improve write behind
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (8 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 09/14] md: raid1: move 'offset' out of loop Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 11/14] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

This patch improve handling of write behind in the following ways:

- introduce behind master bio to hold all write behind pages
- fast clone bios from behind master bio
- avoid to change bvec table directly
- use bio_copy_data() and make code more clean

Suggested-by: Shaohua Li <shli@fb.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid1.c | 118 ++++++++++++++++++++++++-----------------------------
 drivers/md/raid1.h |  10 +++--
 2 files changed, 61 insertions(+), 67 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2f3622c695ce..3c13286190c1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -405,12 +405,9 @@ static void close_write(struct r1bio *r1_bio)
 {
 	/* it really is the end of this request */
 	if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
-		/* free extra copy of the data pages */
-		int i = r1_bio->behind_page_count;
-		while (i--)
-			safe_put_page(r1_bio->behind_bvecs[i].bv_page);
-		kfree(r1_bio->behind_bvecs);
-		r1_bio->behind_bvecs = NULL;
+		bio_free_pages(r1_bio->behind_master_bio);
+		bio_put(r1_bio->behind_master_bio);
+		r1_bio->behind_master_bio = NULL;
 	}
 	/* clear the bitmap if all writes complete successfully */
 	bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
@@ -512,6 +509,10 @@ static void raid1_end_write_request(struct bio *bio)
 	}
 
 	if (behind) {
+		/* we release behind master bio when all write are done */
+		if (r1_bio->behind_master_bio == bio)
+			to_put = NULL;
+
 		if (test_bit(WriteMostly, &rdev->flags))
 			atomic_dec(&r1_bio->behind_remaining);
 
@@ -1096,39 +1097,46 @@ static void unfreeze_array(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-/* duplicate the data pages for behind I/O
- */
-static void alloc_behind_pages(struct bio *bio, struct r1bio *r1_bio)
+static struct bio *alloc_behind_master_bio(struct r1bio *r1_bio,
+					   struct bio *bio,
+					   int offset, int size)
 {
-	int i;
-	struct bio_vec *bvec;
-	struct bio_vec *bvecs = kzalloc(bio->bi_vcnt * sizeof(struct bio_vec),
-					GFP_NOIO);
-	if (unlikely(!bvecs))
-		return;
+	unsigned vcnt = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	int i = 0;
+	struct bio *behind_bio = NULL;
+
+	behind_bio = bio_alloc_mddev(GFP_NOIO, vcnt, r1_bio->mddev);
+	if (!behind_bio)
+		goto fail;
+
+	while (i < vcnt && size) {
+		struct page *page;
+		int len = min_t(int, PAGE_SIZE, size);
+
+		page = alloc_page(GFP_NOIO);
+		if (unlikely(!page))
+			goto free_pages;
+
+		bio_add_page(behind_bio, page, len, 0);
+
+		size -= len;
+		i++;
+	}
 
-	bio_for_each_segment_all(bvec, bio, i) {
-		bvecs[i] = *bvec;
-		bvecs[i].bv_page = alloc_page(GFP_NOIO);
-		if (unlikely(!bvecs[i].bv_page))
-			goto do_sync_io;
-		memcpy(kmap(bvecs[i].bv_page) + bvec->bv_offset,
-		       kmap(bvec->bv_page) + bvec->bv_offset, bvec->bv_len);
-		kunmap(bvecs[i].bv_page);
-		kunmap(bvec->bv_page);
-	}
-	r1_bio->behind_bvecs = bvecs;
-	r1_bio->behind_page_count = bio->bi_vcnt;
+	bio_copy_data_partial(behind_bio, bio, offset,
+			      behind_bio->bi_iter.bi_size);
+
+	r1_bio->behind_master_bio = behind_bio;;
 	set_bit(R1BIO_BehindIO, &r1_bio->state);
-	return;
 
-do_sync_io:
-	for (i = 0; i < bio->bi_vcnt; i++)
-		if (bvecs[i].bv_page)
-			put_page(bvecs[i].bv_page);
-	kfree(bvecs);
+	return behind_bio;
+
+ free_pages:
 	pr_debug("%dB behind alloc failed, doing sync I/O\n",
 		 bio->bi_iter.bi_size);
+	bio_free_pages(behind_bio);
+ fail:
+	return behind_bio;
 }
 
 struct raid1_plug_cb {
@@ -1499,11 +1507,9 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {
-				mbio = bio_clone_bioset_partial(bio, GFP_NOIO,
-								mddev->bio_set,
-								offset << 9,
-								max_sectors << 9);
-				alloc_behind_pages(mbio, r1_bio);
+				mbio = alloc_behind_master_bio(r1_bio, bio,
+							       offset << 9,
+							       max_sectors << 9);
 			}
 
 			bitmap_startwrite(bitmap, r1_bio->sector,
@@ -1514,26 +1520,17 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio)
 		}
 
 		if (!mbio) {
-			if (r1_bio->behind_bvecs)
-				mbio = bio_clone_bioset_partial(bio, GFP_NOIO,
-								mddev->bio_set,
-								offset << 9,
-								max_sectors << 9);
+			if (r1_bio->behind_master_bio)
+				mbio = bio_clone_fast(r1_bio->behind_master_bio,
+						      GFP_NOIO,
+						      mddev->bio_set);
 			else {
 				mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
 				bio_trim(mbio, offset, max_sectors);
 			}
 		}
 
-		if (r1_bio->behind_bvecs) {
-			struct bio_vec *bvec;
-			int j;
-
-			/*
-			 * We trimmed the bio, so _all is legit
-			 */
-			bio_for_each_segment_all(bvec, mbio, j)
-				bvec->bv_page = r1_bio->behind_bvecs[j].bv_page;
+		if (r1_bio->behind_master_bio) {
 			if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags))
 				atomic_inc(&r1_bio->behind_remaining);
 		}
@@ -2405,18 +2402,11 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
 		/* Write at 'sector' for 'sectors'*/
 
 		if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
-			unsigned vcnt = r1_bio->behind_page_count;
-			struct bio_vec *vec = r1_bio->behind_bvecs;
-
-			while (!vec->bv_page) {
-				vec++;
-				vcnt--;
-			}
-
-			wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
-			memcpy(wbio->bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
-
-			wbio->bi_vcnt = vcnt;
+			wbio = bio_clone_fast(r1_bio->behind_master_bio,
+					      GFP_NOIO,
+					      mddev->bio_set);
+			/* We really need a _all clone */
+			wbio->bi_iter = (struct bvec_iter){ 0 };
 		} else {
 			wbio = bio_clone_fast(r1_bio->master_bio, GFP_NOIO,
 					      mddev->bio_set);
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index dd22a37d0d83..4271cd7ac2de 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -153,9 +153,13 @@ struct r1bio {
 	int			read_disk;
 
 	struct list_head	retry_list;
-	/* Next two are only valid when R1BIO_BehindIO is set */
-	struct bio_vec		*behind_bvecs;
-	int			behind_page_count;
+
+	/*
+	 * When R1BIO_BehindIO is set, we store pages for write behind
+	 * in behind_master_bio.
+	 */
+	struct bio		*behind_master_bio;
+
 	/*
 	 * if the IO is in WRITE direction, then multiple bios are used.
 	 * We choose the number when they are allocated.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 11/14] md: raid10: refactor code of read reshape's .bi_end_io
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (9 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 10/14] md: raid1: improve write behind Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 12/14] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

reshape read request is a bit special and requires one extra
bio which isn't allocated from r10buf_pool.

Refactor the .bi_end_io for read reshape, so that we can use
raid10's resync page mangement approach easily in the following
patches.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2b40420299e3..98e53f39b66e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1909,17 +1909,9 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 	return err;
 }
 
-static void end_sync_read(struct bio *bio)
+static void __end_sync_read(struct r10bio *r10_bio, struct bio *bio, int d)
 {
-	struct r10bio *r10_bio = bio->bi_private;
 	struct r10conf *conf = r10_bio->mddev->private;
-	int d;
-
-	if (bio == r10_bio->master_bio) {
-		/* this is a reshape read */
-		d = r10_bio->read_slot; /* really the read dev */
-	} else
-		d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
 
 	if (!bio->bi_error)
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
@@ -1943,6 +1935,22 @@ static void end_sync_read(struct bio *bio)
 	}
 }
 
+static void end_sync_read(struct bio *bio)
+{
+	struct r10bio *r10_bio = bio->bi_private;
+	struct r10conf *conf = r10_bio->mddev->private;
+	int d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
+
+	__end_sync_read(r10_bio, bio, d);
+}
+
+static void end_reshape_read(struct bio *bio)
+{
+	struct r10bio *r10_bio = bio->bi_private;
+
+	__end_sync_read(r10_bio, bio, r10_bio->read_slot);
+}
+
 static void end_sync_request(struct r10bio *r10_bio)
 {
 	struct mddev *mddev = r10_bio->mddev;
@@ -4466,7 +4474,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
 			       + rdev->data_offset);
 	read_bio->bi_private = r10_bio;
-	read_bio->bi_end_io = end_sync_read;
+	read_bio->bi_end_io = end_reshape_read;
 	bio_set_op_attrs(read_bio, REQ_OP_READ, 0);
 	read_bio->bi_flags &= (~0UL << BIO_RESET_BITS);
 	read_bio->bi_error = 0;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 12/14] md: raid10: don't use bio's vec table to manage resync pages
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (10 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 11/14] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 13/14] md: raid10: retrieve page from preallocated resync page array Ming Lei
  2017-03-16 16:12 ` [PATCH v3 14/14] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now we allocate one page array for managing resync pages, instead
of using bio's vec table to do that, and the old way is very hacky
and won't work any more if multipage bvec is enabled.

The introduced cost is that we need to allocate (128 + 16) * copies
bytes per r10_bio, and it is fine because the inflight r10_bio for
resync shouldn't be much, as pointed by Shaohua.

Also bio_reset() in raid10_sync_request() and reshape_request()
are removed because all bios are freshly new now in these functions
and not necessary to reset any more.

This patch can be thought as cleanup too.

Suggested-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 134 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 82 insertions(+), 52 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 98e53f39b66e..d2cb68971486 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -110,6 +110,24 @@ static void end_reshape(struct r10conf *conf);
 #define raid10_log(md, fmt, args...)				\
 	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid10 " fmt, ##args); } while (0)
 
+/*
+ * 'strct resync_pages' stores actual pages used for doing the resync
+ *  IO, and it is per-bio, so make .bi_private points to it.
+ */
+static inline struct resync_pages *get_resync_pages(struct bio *bio)
+{
+	return bio->bi_private;
+}
+
+/*
+ * for resync bio, r10bio pointer can be retrieved from the per-bio
+ * 'struct resync_pages'.
+ */
+static inline struct r10bio *get_resync_r10bio(struct bio *bio)
+{
+	return get_resync_pages(bio)->raid_bio;
+}
+
 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct r10conf *conf = data;
@@ -140,11 +158,11 @@ static void r10bio_pool_free(void *r10_bio, void *data)
 static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct r10conf *conf = data;
-	struct page *page;
 	struct r10bio *r10_bio;
 	struct bio *bio;
-	int i, j;
-	int nalloc;
+	int j;
+	int nalloc, nalloc_rp;
+	struct resync_pages *rps;
 
 	r10_bio = r10bio_pool_alloc(gfp_flags, conf);
 	if (!r10_bio)
@@ -156,6 +174,15 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	else
 		nalloc = 2; /* recovery */
 
+	/* allocate once for all bios */
+	if (!conf->have_replacement)
+		nalloc_rp = nalloc;
+	else
+		nalloc_rp = nalloc * 2;
+	rps = kmalloc(sizeof(struct resync_pages) * nalloc_rp, gfp_flags);
+	if (!rps)
+		goto out_free_r10bio;
+
 	/*
 	 * Allocate bios.
 	 */
@@ -175,36 +202,40 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	 * Allocate RESYNC_PAGES data pages and attach them
 	 * where needed.
 	 */
-	for (j = 0 ; j < nalloc; j++) {
+	for (j = 0; j < nalloc; j++) {
 		struct bio *rbio = r10_bio->devs[j].repl_bio;
+		struct resync_pages *rp, *rp_repl;
+
+		rp = &rps[j];
+		if (rbio)
+			rp_repl = &rps[nalloc + j];
+
 		bio = r10_bio->devs[j].bio;
-		for (i = 0; i < RESYNC_PAGES; i++) {
-			if (j > 0 && !test_bit(MD_RECOVERY_SYNC,
-					       &conf->mddev->recovery)) {
-				/* we can share bv_page's during recovery
-				 * and reshape */
-				struct bio *rbio = r10_bio->devs[0].bio;
-				page = rbio->bi_io_vec[i].bv_page;
-				get_page(page);
-			} else
-				page = alloc_page(gfp_flags);
-			if (unlikely(!page))
+
+		if (!j || test_bit(MD_RECOVERY_SYNC,
+				   &conf->mddev->recovery)) {
+			if (resync_alloc_pages(rp, gfp_flags))
 				goto out_free_pages;
+		} else {
+			memcpy(rp, &rps[0], sizeof(*rp));
+			resync_get_all_pages(rp);
+		}
 
-			bio->bi_io_vec[i].bv_page = page;
-			if (rbio)
-				rbio->bi_io_vec[i].bv_page = page;
+		rp->idx = 0;
+		rp->raid_bio = r10_bio;
+		bio->bi_private = rp;
+		if (rbio) {
+			memcpy(rp_repl, rp, sizeof(*rp));
+			rbio->bi_private = rp_repl;
 		}
 	}
 
 	return r10_bio;
 
 out_free_pages:
-	for ( ; i > 0 ; i--)
-		safe_put_page(bio->bi_io_vec[i-1].bv_page);
-	while (j--)
-		for (i = 0; i < RESYNC_PAGES ; i++)
-			safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
+	while (--j >= 0)
+		resync_free_pages(&rps[j * 2]);
+
 	j = 0;
 out_free_bio:
 	for ( ; j < nalloc; j++) {
@@ -213,30 +244,34 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 		if (r10_bio->devs[j].repl_bio)
 			bio_put(r10_bio->devs[j].repl_bio);
 	}
+	kfree(rps);
+out_free_r10bio:
 	r10bio_pool_free(r10_bio, conf);
 	return NULL;
 }
 
 static void r10buf_pool_free(void *__r10_bio, void *data)
 {
-	int i;
 	struct r10conf *conf = data;
 	struct r10bio *r10bio = __r10_bio;
 	int j;
+	struct resync_pages *rp = NULL;
 
-	for (j=0; j < conf->copies; j++) {
+	for (j = conf->copies; j--; ) {
 		struct bio *bio = r10bio->devs[j].bio;
-		if (bio) {
-			for (i = 0; i < RESYNC_PAGES; i++) {
-				safe_put_page(bio->bi_io_vec[i].bv_page);
-				bio->bi_io_vec[i].bv_page = NULL;
-			}
-			bio_put(bio);
-		}
+
+		rp = get_resync_pages(bio);
+		resync_free_pages(rp);
+		bio_put(bio);
+
 		bio = r10bio->devs[j].repl_bio;
 		if (bio)
 			bio_put(bio);
 	}
+
+	/* resync pages array stored in the 1st bio's .bi_private */
+	kfree(rp);
+
 	r10bio_pool_free(r10bio, conf);
 }
 
@@ -1937,7 +1972,7 @@ static void __end_sync_read(struct r10bio *r10_bio, struct bio *bio, int d)
 
 static void end_sync_read(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct r10conf *conf = r10_bio->mddev->private;
 	int d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
 
@@ -1946,6 +1981,7 @@ static void end_sync_read(struct bio *bio)
 
 static void end_reshape_read(struct bio *bio)
 {
+	/* reshape read bio isn't allocated from r10buf_pool */
 	struct r10bio *r10_bio = bio->bi_private;
 
 	__end_sync_read(r10_bio, bio, r10_bio->read_slot);
@@ -1980,7 +2016,7 @@ static void end_sync_request(struct r10bio *r10_bio)
 
 static void end_sync_write(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
 	int d;
@@ -2060,6 +2096,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	for (i=0 ; i < conf->copies ; i++) {
 		int  j, d;
 		struct md_rdev *rdev;
+		struct resync_pages *rp;
 
 		tbio = r10_bio->devs[i].bio;
 
@@ -2101,11 +2138,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		 * First we need to fixup bv_offset, bv_len and
 		 * bi_vecs, as the read request might have corrupted these
 		 */
+		rp = get_resync_pages(tbio);
 		bio_reset(tbio);
 
 		tbio->bi_vcnt = vcnt;
 		tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
-		tbio->bi_private = r10_bio;
+		rp->raid_bio = r10_bio;
+		tbio->bi_private = rp;
 		tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
 		tbio->bi_end_io = end_sync_write;
 		bio_set_op_attrs(tbio, REQ_OP_WRITE, 0);
@@ -3173,10 +3212,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					}
 				}
 				bio = r10_bio->devs[0].bio;
-				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
-				bio->bi_private = r10_bio;
 				bio->bi_end_io = end_sync_read;
 				bio_set_op_attrs(bio, REQ_OP_READ, 0);
 				if (test_bit(FailFast, &rdev->flags))
@@ -3200,10 +3237,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 
 				if (!test_bit(In_sync, &mrdev->flags)) {
 					bio = r10_bio->devs[1].bio;
-					bio_reset(bio);
 					bio->bi_next = biolist;
 					biolist = bio;
-					bio->bi_private = r10_bio;
 					bio->bi_end_io = end_sync_write;
 					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 					bio->bi_iter.bi_sector = to_addr
@@ -3228,10 +3263,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				if (mreplace == NULL || bio == NULL ||
 				    test_bit(Faulty, &mreplace->flags))
 					break;
-				bio_reset(bio);
 				bio->bi_next = biolist;
 				biolist = bio;
-				bio->bi_private = r10_bio;
 				bio->bi_end_io = end_sync_write;
 				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 				bio->bi_iter.bi_sector = to_addr +
@@ -3353,7 +3386,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				r10_bio->devs[i].repl_bio->bi_end_io = NULL;
 
 			bio = r10_bio->devs[i].bio;
-			bio_reset(bio);
 			bio->bi_error = -EIO;
 			rcu_read_lock();
 			rdev = rcu_dereference(conf->mirrors[d].rdev);
@@ -3378,7 +3410,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&r10_bio->remaining);
 			bio->bi_next = biolist;
 			biolist = bio;
-			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_read;
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3397,13 +3428,11 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 
 			/* Need to set up for writing to the replacement */
 			bio = r10_bio->devs[i].repl_bio;
-			bio_reset(bio);
 			bio->bi_error = -EIO;
 
 			sector = r10_bio->devs[i].addr;
 			bio->bi_next = biolist;
 			biolist = bio;
-			bio->bi_private = r10_bio;
 			bio->bi_end_io = end_sync_write;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3442,7 +3471,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		if (len == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
-			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
+			struct resync_pages *rp = get_resync_pages(bio);
+			page = resync_fetch_page(rp, rp->idx++);
 			/*
 			 * won't fail because the vec table is big enough
 			 * to hold all these pages
@@ -3451,7 +3481,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;
-	} while (biolist->bi_vcnt < RESYNC_PAGES);
+	} while (get_resync_pages(biolist)->idx < RESYNC_PAGES);
 	r10_bio->sectors = nr_sectors;
 
 	while (biolist) {
@@ -3459,7 +3489,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		biolist = biolist->bi_next;
 
 		bio->bi_next = NULL;
-		r10_bio = bio->bi_private;
+		r10_bio = get_resync_r10bio(bio);
 		r10_bio->sectors = nr_sectors;
 
 		if (bio->bi_end_io == end_sync_read) {
@@ -4354,6 +4384,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	struct bio *blist;
 	struct bio *bio, *read_bio;
 	int sectors_done = 0;
+	struct page **pages;
 
 	if (sector_nr == 0) {
 		/* If restarting in the middle, skip the initial sectors */
@@ -4504,11 +4535,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		if (!rdev2 || test_bit(Faulty, &rdev2->flags))
 			continue;
 
-		bio_reset(b);
 		b->bi_bdev = rdev2->bdev;
 		b->bi_iter.bi_sector = r10_bio->devs[s/2].addr +
 			rdev2->new_data_offset;
-		b->bi_private = r10_bio;
 		b->bi_end_io = end_reshape_write;
 		bio_set_op_attrs(b, REQ_OP_WRITE, 0);
 		b->bi_next = blist;
@@ -4518,8 +4547,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	/* Now add as many pages as possible to all of these bios. */
 
 	nr_sectors = 0;
+	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
 	for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
-		struct page *page = r10_bio->devs[0].bio->bi_io_vec[s/(PAGE_SIZE>>9)].bv_page;
+		struct page *page = pages[s / (PAGE_SIZE >> 9)];
 		int len = (max_sectors - s) << 9;
 		if (len > PAGE_SIZE)
 			len = PAGE_SIZE;
@@ -4703,7 +4733,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 
 static void end_reshape_write(struct bio *bio)
 {
-	struct r10bio *r10_bio = bio->bi_private;
+	struct r10bio *r10_bio = get_resync_r10bio(bio);
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
 	int d;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 13/14] md: raid10: retrieve page from preallocated resync page array
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (11 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 12/14] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  2017-03-16 16:12 ` [PATCH v3 14/14] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

Now one page array is allocated for each resync bio, and we can
retrieve page from this table directly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d2cb68971486..8db3b0869be4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2075,6 +2075,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	int i, first;
 	struct bio *tbio, *fbio;
 	int vcnt;
+	struct page **tpages, **fpages;
 
 	atomic_set(&r10_bio->remaining, 1);
 
@@ -2090,6 +2091,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	fbio = r10_bio->devs[i].bio;
 	fbio->bi_iter.bi_size = r10_bio->sectors << 9;
 	fbio->bi_iter.bi_idx = 0;
+	fpages = get_resync_pages(fbio)->pages;
 
 	vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
 	/* now find blocks with errors */
@@ -2104,6 +2106,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 			continue;
 		if (i == first)
 			continue;
+
+		tpages = get_resync_pages(tbio)->pages;
 		d = r10_bio->devs[i].devnum;
 		rdev = conf->mirrors[d].rdev;
 		if (!r10_bio->devs[i].bio->bi_error) {
@@ -2116,8 +2120,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 				int len = PAGE_SIZE;
 				if (sectors < (len / 512))
 					len = sectors * 512;
-				if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
-					   page_address(tbio->bi_io_vec[j].bv_page),
+				if (memcmp(page_address(fpages[j]),
+					   page_address(tpages[j]),
 					   len))
 					break;
 				sectors -= len/512;
@@ -2215,6 +2219,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 	int idx = 0;
 	int dr = r10_bio->devs[0].devnum;
 	int dw = r10_bio->devs[1].devnum;
+	struct page **pages = get_resync_pages(bio)->pages;
 
 	while (sectors) {
 		int s = sectors;
@@ -2230,7 +2235,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 		ok = sync_page_io(rdev,
 				  addr,
 				  s << 9,
-				  bio->bi_io_vec[idx].bv_page,
+				  pages[idx],
 				  REQ_OP_READ, 0, false);
 		if (ok) {
 			rdev = conf->mirrors[dw].rdev;
@@ -2238,7 +2243,7 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 			ok = sync_page_io(rdev,
 					  addr,
 					  s << 9,
-					  bio->bi_io_vec[idx].bv_page,
+					  pages[idx],
 					  REQ_OP_WRITE, 0, false);
 			if (!ok) {
 				set_bit(WriteErrorSeen, &rdev->flags);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 14/14] md: raid10: avoid direct access to bvec table in handle_reshape_read_error
  2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
                   ` (12 preceding siblings ...)
  2017-03-16 16:12 ` [PATCH v3 13/14] md: raid10: retrieve page from preallocated resync page array Ming Lei
@ 2017-03-16 16:12 ` Ming Lei
  13 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-16 16:12 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

All reshape I/O share pages from 1st copy device, so just use that pages
for avoiding direct access to bvec table in handle_reshape_read_error.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/raid10.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8db3b0869be4..67cf09c0c649 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4681,7 +4681,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
 	struct r10bio *r10b = &on_stack.r10_bio;
 	int slot = 0;
 	int idx = 0;
-	struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
+	struct page **pages;
+
+	/* reshape IOs share pages from .devs[0].bio */
+	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
 
 	r10b->sector = r10_bio->sector;
 	__raid10_find_phys(&conf->prev, r10b);
@@ -4710,7 +4713,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 			success = sync_page_io(rdev,
 					       addr,
 					       s << 9,
-					       bvec[idx].bv_page,
+					       pages[idx],
 					       REQ_OP_READ, 0, false);
 			rdev_dec_pending(rdev, mddev);
 			rcu_read_lock();
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/14] block: introduce bio_copy_data_partial
  2017-03-16 16:12 ` [PATCH v3 08/14] block: introduce bio_copy_data_partial Ming Lei
@ 2017-03-24  5:34     ` Shaohua Li
  2017-03-24 16:41     ` Jens Axboe
  1 sibling, 0 replies; 44+ messages in thread
From: Shaohua Li @ 2017-03-24  5:34 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

Jens,
can you look at this patch? If it's ok, I'd like to route it through md tree.

Thanks,
Shaohua

On Fri, Mar 17, 2017 at 12:12:29AM +0800, Ming Lei wrote:
> Turns out we can use bio_copy_data in raid1's write behind,
> and we can make alloc_behind_pages() more clean/efficient,
> but we need to partial version of bio_copy_data().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c         | 60 +++++++++++++++++++++++++++++++++++++++++------------
>  include/linux/bio.h |  2 ++
>  2 files changed, 49 insertions(+), 13 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index e75878f8b14a..1ccff0dace89 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1025,19 +1025,8 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(bio_alloc_pages);
>  
> -/**
> - * bio_copy_data - copy contents of data buffers from one chain of bios to
> - * another
> - * @src: source bio list
> - * @dst: destination bio list
> - *
> - * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> - * @src and @dst as linked lists of bios.
> - *
> - * Stops when it reaches the end of either @src or @dst - that is, copies
> - * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> - */
> -void bio_copy_data(struct bio *dst, struct bio *src)
> +static void __bio_copy_data(struct bio *dst, struct bio *src,
> +			    int offset, int size)
>  {
>  	struct bvec_iter src_iter, dst_iter;
>  	struct bio_vec src_bv, dst_bv;
> @@ -1047,6 +1036,12 @@ void bio_copy_data(struct bio *dst, struct bio *src)
>  	src_iter = src->bi_iter;
>  	dst_iter = dst->bi_iter;
>  
> +	/* for supporting partial copy */
> +	if (offset || size != src->bi_iter.bi_size) {
> +		bio_advance_iter(src, &src_iter, offset);
> +		src_iter.bi_size = size;
> +	}
> +
>  	while (1) {
>  		if (!src_iter.bi_size) {
>  			src = src->bi_next;
> @@ -1083,8 +1078,47 @@ void bio_copy_data(struct bio *dst, struct bio *src)
>  		bio_advance_iter(dst, &dst_iter, bytes);
>  	}
>  }
> +
> +/**
> + * bio_copy_data - copy contents of data buffers from one chain of bios to
> + * another
> + * @src: source bio list
> + * @dst: destination bio list
> + *
> + * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> + * @src and @dst as linked lists of bios.
> + *
> + * Stops when it reaches the end of either @src or @dst - that is, copies
> + * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> + */
> +void bio_copy_data(struct bio *dst, struct bio *src)
> +{
> +	__bio_copy_data(dst, src, 0, src->bi_iter.bi_size);
> +}
>  EXPORT_SYMBOL(bio_copy_data);
>  
> +/**
> + * bio_copy_data_partial - copy partial contents of data buffers from one
> + * chain of bios to another
> + * @dst: destination bio list
> + * @src: source bio list
> + * @offset: starting copy from the offset
> + * @size: how many bytes to copy
> + *
> + * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> + * @src and @dst as linked lists of bios.
> + *
> + * Stops when it reaches the end of either @src or @dst - that is, copies
> + * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> + */
> +void bio_copy_data_partial(struct bio *dst, struct bio *src,
> +			   int offset, int size)
> +{
> +	__bio_copy_data(dst, src, offset, size);
> +
> +}
> +EXPORT_SYMBOL(bio_copy_data_partial);
> +
>  struct bio_map_data {
>  	int is_our_pages;
>  	struct iov_iter iter;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 8e521194f6fc..42b62a0288b0 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -468,6 +468,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
>  #endif
>  
>  extern void bio_copy_data(struct bio *dst, struct bio *src);
> +extern void bio_copy_data_partial(struct bio *dst, struct bio *src,
> +				  int offset, int size);
>  extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
>  extern void bio_free_pages(struct bio *bio);
>  
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/14] block: introduce bio_copy_data_partial
@ 2017-03-24  5:34     ` Shaohua Li
  0 siblings, 0 replies; 44+ messages in thread
From: Shaohua Li @ 2017-03-24  5:34 UTC (permalink / raw)
  To: Ming Lei, axboe; +Cc: Jens Axboe, linux-raid, linux-block, Christoph Hellwig

Jens,
can you look at this patch? If it's ok, I'd like to route it through md tree.

Thanks,
Shaohua

On Fri, Mar 17, 2017 at 12:12:29AM +0800, Ming Lei wrote:
> Turns out we can use bio_copy_data in raid1's write behind,
> and we can make alloc_behind_pages() more clean/efficient,
> but we need to partial version of bio_copy_data().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c         | 60 +++++++++++++++++++++++++++++++++++++++++------------
>  include/linux/bio.h |  2 ++
>  2 files changed, 49 insertions(+), 13 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index e75878f8b14a..1ccff0dace89 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1025,19 +1025,8 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(bio_alloc_pages);
>  
> -/**
> - * bio_copy_data - copy contents of data buffers from one chain of bios to
> - * another
> - * @src: source bio list
> - * @dst: destination bio list
> - *
> - * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> - * @src and @dst as linked lists of bios.
> - *
> - * Stops when it reaches the end of either @src or @dst - that is, copies
> - * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> - */
> -void bio_copy_data(struct bio *dst, struct bio *src)
> +static void __bio_copy_data(struct bio *dst, struct bio *src,
> +			    int offset, int size)
>  {
>  	struct bvec_iter src_iter, dst_iter;
>  	struct bio_vec src_bv, dst_bv;
> @@ -1047,6 +1036,12 @@ void bio_copy_data(struct bio *dst, struct bio *src)
>  	src_iter = src->bi_iter;
>  	dst_iter = dst->bi_iter;
>  
> +	/* for supporting partial copy */
> +	if (offset || size != src->bi_iter.bi_size) {
> +		bio_advance_iter(src, &src_iter, offset);
> +		src_iter.bi_size = size;
> +	}
> +
>  	while (1) {
>  		if (!src_iter.bi_size) {
>  			src = src->bi_next;
> @@ -1083,8 +1078,47 @@ void bio_copy_data(struct bio *dst, struct bio *src)
>  		bio_advance_iter(dst, &dst_iter, bytes);
>  	}
>  }
> +
> +/**
> + * bio_copy_data - copy contents of data buffers from one chain of bios to
> + * another
> + * @src: source bio list
> + * @dst: destination bio list
> + *
> + * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> + * @src and @dst as linked lists of bios.
> + *
> + * Stops when it reaches the end of either @src or @dst - that is, copies
> + * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> + */
> +void bio_copy_data(struct bio *dst, struct bio *src)
> +{
> +	__bio_copy_data(dst, src, 0, src->bi_iter.bi_size);
> +}
>  EXPORT_SYMBOL(bio_copy_data);
>  
> +/**
> + * bio_copy_data_partial - copy partial contents of data buffers from one
> + * chain of bios to another
> + * @dst: destination bio list
> + * @src: source bio list
> + * @offset: starting copy from the offset
> + * @size: how many bytes to copy
> + *
> + * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
> + * @src and @dst as linked lists of bios.
> + *
> + * Stops when it reaches the end of either @src or @dst - that is, copies
> + * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
> + */
> +void bio_copy_data_partial(struct bio *dst, struct bio *src,
> +			   int offset, int size)
> +{
> +	__bio_copy_data(dst, src, offset, size);
> +
> +}
> +EXPORT_SYMBOL(bio_copy_data_partial);
> +
>  struct bio_map_data {
>  	int is_our_pages;
>  	struct iov_iter iter;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 8e521194f6fc..42b62a0288b0 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -468,6 +468,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
>  #endif
>  
>  extern void bio_copy_data(struct bio *dst, struct bio *src);
> +extern void bio_copy_data_partial(struct bio *dst, struct bio *src,
> +				  int offset, int size);
>  extern int bio_alloc_pages(struct bio *bio, gfp_t gfp);
>  extern void bio_free_pages(struct bio *bio);
>  
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
  2017-03-16 16:12 ` [PATCH v3 02/14] md: move two macros into md.h Ming Lei
@ 2017-03-24  5:57     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-24  5:57 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Both raid1 and raid10 share common resync
> block size and page count, so move them into md.h.

I don't think this is necessary.
These are just "magic" numbers.  They don't have any real
meaning and so don't belong in md.h, or and .h file.

Possibly we should find more meaningful numbers, or make them auto-size
or something.  I'm also happy for them to stay as they are for now.
But I don't think we should pretend that they are meaningful.

Thanks,
NeilBrown


>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/md.h     | 5 +++++
>  drivers/md/raid1.c  | 2 --
>  drivers/md/raid10.c | 3 ---
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b8859cbf84b6..1d63239a1be4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -715,4 +715,9 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>  	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
>  		mddev->queue->limits.max_write_same_sectors = 0;
>  }
> +
> +/* Maximum size of each resync request */
> +#define RESYNC_BLOCK_SIZE (64*1024)
> +#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> +
>  #endif /* _MD_MD_H */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 4a0b2ad5025e..908e2caeb704 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -94,10 +94,8 @@ static void r1bio_pool_free(void *r1_bio, void *data)
>  	kfree(r1_bio);
>  }
>  
> -#define RESYNC_BLOCK_SIZE (64*1024)
>  #define RESYNC_DEPTH 32
>  #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
>  #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
>  #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index b4a56a488668..2b40420299e3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -125,9 +125,6 @@ static void r10bio_pool_free(void *r10_bio, void *data)
>  	kfree(r10_bio);
>  }
>  
> -/* Maximum size of each resync request */
> -#define RESYNC_BLOCK_SIZE (64*1024)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  /* amount of memory to reserve for resync requests */
>  #define RESYNC_WINDOW (1024*1024)
>  /* maximum number of concurrent requests, memory permitting */
> -- 
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
@ 2017-03-24  5:57     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-24  5:57 UTC (permalink / raw)
  To: Ming Lei, Shaohua Li, Jens Axboe, linux-raid, linux-block,
	Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Both raid1 and raid10 share common resync
> block size and page count, so move them into md.h.

I don't think this is necessary.
These are just "magic" numbers.  They don't have any real
meaning and so don't belong in md.h, or and .h file.

Possibly we should find more meaningful numbers, or make them auto-size
or something.  I'm also happy for them to stay as they are for now.
But I don't think we should pretend that they are meaningful.

Thanks,
NeilBrown


>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/md.h     | 5 +++++
>  drivers/md/raid1.c  | 2 --
>  drivers/md/raid10.c | 3 ---
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b8859cbf84b6..1d63239a1be4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -715,4 +715,9 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>  	    !bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors)
>  		mddev->queue->limits.max_write_same_sectors = 0;
>  }
> +
> +/* Maximum size of each resync request */
> +#define RESYNC_BLOCK_SIZE (64*1024)
> +#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> +
>  #endif /* _MD_MD_H */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 4a0b2ad5025e..908e2caeb704 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -94,10 +94,8 @@ static void r1bio_pool_free(void *r1_bio, void *data)
>  	kfree(r1_bio);
>  }
>  
> -#define RESYNC_BLOCK_SIZE (64*1024)
>  #define RESYNC_DEPTH 32
>  #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
>  #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
>  #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index b4a56a488668..2b40420299e3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -125,9 +125,6 @@ static void r10bio_pool_free(void *r10_bio, void *data)
>  	kfree(r10_bio);
>  }
>  
> -/* Maximum size of each resync request */
> -#define RESYNC_BLOCK_SIZE (64*1024)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  /* amount of memory to reserve for resync requests */
>  #define RESYNC_WINDOW (1024*1024)
>  /* maximum number of concurrent requests, memory permitting */
> -- 
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way
  2017-03-16 16:12 ` [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way Ming Lei
@ 2017-03-24  6:00     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-24  6:00 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Now resync I/O use bio's bec table to manage pages,
> this way is very hacky, and may not work any more
> once multipage bvec is introduced.
>
> So introduce helpers and new data structure for
> managing resync I/O pages more cleanly.
>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/md.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)

I don't think this should go in md.h

Maybe create a "raid1-10.h" or similar if you really want to.

NeilBrown

>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1d63239a1be4..20c48032493b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -720,4 +720,54 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>  #define RESYNC_BLOCK_SIZE (64*1024)
>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  
> +/* for managing resync I/O pages */
> +struct resync_pages {
> +	unsigned	idx;	/* for get/put page from the pool */
> +	void		*raid_bio;
> +	struct page	*pages[RESYNC_PAGES];
> +};
> +
> +static inline int resync_alloc_pages(struct resync_pages *rp,
> +				     gfp_t gfp_flags)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++) {
> +		rp->pages[i] = alloc_page(gfp_flags);
> +		if (!rp->pages[i])
> +			goto out_free;
> +	}
> +
> +	return 0;
> +
> + out_free:
> +	while (--i >= 0)
> +		put_page(rp->pages[i]);
> +	return -ENOMEM;
> +}
> +
> +static inline void resync_free_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		put_page(rp->pages[i]);
> +}
> +
> +static inline void resync_get_all_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		get_page(rp->pages[i]);
> +}
> +
> +static inline struct page *resync_fetch_page(struct resync_pages *rp,
> +					     unsigned idx)
> +{
> +	if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
> +		return NULL;
> +	return rp->pages[idx];
> +}
> +
>  #endif /* _MD_MD_H */
> -- 
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way
@ 2017-03-24  6:00     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-24  6:00 UTC (permalink / raw)
  To: Ming Lei, Shaohua Li, Jens Axboe, linux-raid, linux-block,
	Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Now resync I/O use bio's bec table to manage pages,
> this way is very hacky, and may not work any more
> once multipage bvec is introduced.
>
> So introduce helpers and new data structure for
> managing resync I/O pages more cleanly.
>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/md.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)

I don't think this should go in md.h

Maybe create a "raid1-10.h" or similar if you really want to.

NeilBrown

>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1d63239a1be4..20c48032493b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -720,4 +720,54 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>  #define RESYNC_BLOCK_SIZE (64*1024)
>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  
> +/* for managing resync I/O pages */
> +struct resync_pages {
> +	unsigned	idx;	/* for get/put page from the pool */
> +	void		*raid_bio;
> +	struct page	*pages[RESYNC_PAGES];
> +};
> +
> +static inline int resync_alloc_pages(struct resync_pages *rp,
> +				     gfp_t gfp_flags)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++) {
> +		rp->pages[i] = alloc_page(gfp_flags);
> +		if (!rp->pages[i])
> +			goto out_free;
> +	}
> +
> +	return 0;
> +
> + out_free:
> +	while (--i >= 0)
> +		put_page(rp->pages[i]);
> +	return -ENOMEM;
> +}
> +
> +static inline void resync_free_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		put_page(rp->pages[i]);
> +}
> +
> +static inline void resync_get_all_pages(struct resync_pages *rp)
> +{
> +	int i;
> +
> +	for (i = 0; i < RESYNC_PAGES; i++)
> +		get_page(rp->pages[i]);
> +}
> +
> +static inline struct page *resync_fetch_page(struct resync_pages *rp,
> +					     unsigned idx)
> +{
> +	if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
> +		return NULL;
> +	return rp->pages[idx];
> +}
> +
>  #endif /* _MD_MD_H */
> -- 
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
  2017-03-24  5:57     ` NeilBrown
  (?)
@ 2017-03-24  6:30     ` Ming Lei
  -1 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-03-24  6:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Fri, Mar 24, 2017 at 1:57 PM, NeilBrown <neilb@suse.com> wrote:
> On Fri, Mar 17 2017, Ming Lei wrote:
>
>> Both raid1 and raid10 share common resync
>> block size and page count, so move them into md.h.
>
> I don't think this is necessary.
> These are just "magic" numbers.  They don't have any real
> meaning and so don't belong in md.h, or and .h file.

The thing is that RESYNC_PAGES is needed in the following patch 3:

     [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way

so how about moving the macros into raid1-10.h? Cause you suggest
to create that header for holding the introduced helpers in patch3.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/14] block: introduce bio_copy_data_partial
  2017-03-16 16:12 ` [PATCH v3 08/14] block: introduce bio_copy_data_partial Ming Lei
@ 2017-03-24 16:41     ` Jens Axboe
  2017-03-24 16:41     ` Jens Axboe
  1 sibling, 0 replies; 44+ messages in thread
From: Jens Axboe @ 2017-03-24 16:41 UTC (permalink / raw)
  To: Ming Lei, Shaohua Li, linux-raid, linux-block, Christoph Hellwig

On 03/16/2017 10:12 AM, Ming Lei wrote:
> Turns out we can use bio_copy_data in raid1's write behind,
> and we can make alloc_behind_pages() more clean/efficient,
> but we need to partial version of bio_copy_data().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

Reviewed-by: Jens Axboe <axboe@fb.com>

Shaohua, feel free to pull this through the md tree, that will be much
easier.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/14] block: introduce bio_copy_data_partial
@ 2017-03-24 16:41     ` Jens Axboe
  0 siblings, 0 replies; 44+ messages in thread
From: Jens Axboe @ 2017-03-24 16:41 UTC (permalink / raw)
  To: Ming Lei, Shaohua Li, linux-raid, linux-block, Christoph Hellwig

On 03/16/2017 10:12 AM, Ming Lei wrote:
> Turns out we can use bio_copy_data in raid1's write behind,
> and we can make alloc_behind_pages() more clean/efficient,
> but we need to partial version of bio_copy_data().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

Reviewed-by: Jens Axboe <axboe@fb.com>

Shaohua, feel free to pull this through the md tree, that will be much
easier.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
  2017-03-24  5:57     ` NeilBrown
  (?)
  (?)
@ 2017-03-24 16:53     ` Shaohua Li
  2017-03-27  9:15       ` Christoph Hellwig
  -1 siblings, 1 reply; 44+ messages in thread
From: Shaohua Li @ 2017-03-24 16:53 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ming Lei, Jens Axboe, linux-raid, linux-block, Christoph Hellwig

On Fri, Mar 24, 2017 at 04:57:37PM +1100, Neil Brown wrote:
> On Fri, Mar 17 2017, Ming Lei wrote:
> 
> > Both raid1 and raid10 share common resync
> > block size and page count, so move them into md.h.
> 
> I don't think this is necessary.
> These are just "magic" numbers.  They don't have any real
> meaning and so don't belong in md.h, or and .h file.
> 
> Possibly we should find more meaningful numbers, or make them auto-size
> or something.  I'm also happy for them to stay as they are for now.
> But I don't think we should pretend that they are meaningful.

I had the same concern when I looked at this patch firstly. The number for
raid1/10 doesn't need to be the same. But if we don't move the number to a
generic header, the third patch will become a little more complicated. I
eventually ignored this issue. If we really need different number for raid1/10,
lets do it at that time.

I think your suggestion that moving the number to raid1-10.h makes sense, and
add a comment declaring the number isn't required to be the same for raid1/10.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page()
  2017-03-16 16:12 ` [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
@ 2017-03-27  9:14   ` Christoph Hellwig
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
  2017-03-24 16:53     ` Shaohua Li
@ 2017-03-27  9:15       ` Christoph Hellwig
  2017-03-27  9:52           ` NeilBrown
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:15 UTC (permalink / raw)
  To: Shaohua Li
  Cc: NeilBrown, Ming Lei, Jens Axboe, linux-raid, linux-block,
	Christoph Hellwig

On Fri, Mar 24, 2017 at 09:53:25AM -0700, Shaohua Li wrote:
> 
> I had the same concern when I looked at this patch firstly. The number for
> raid1/10 doesn't need to be the same. But if we don't move the number to a
> generic header, the third patch will become a little more complicated. I
> eventually ignored this issue. If we really need different number for raid1/10,
> lets do it at that time.

Which brings up my usual queastion:  Is is really that benefitical for
us to keep the raid1.c code around instead of making it a special short
cut case in raid10.c?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
  2017-03-27  9:15       ` Christoph Hellwig
@ 2017-03-27  9:52           ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-27  9:52 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Ming Lei, Jens Axboe, linux-raid, linux-block, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

On Mon, Mar 27 2017, Christoph Hellwig wrote:

> On Fri, Mar 24, 2017 at 09:53:25AM -0700, Shaohua Li wrote:
>> 
>> I had the same concern when I looked at this patch firstly. The number for
>> raid1/10 doesn't need to be the same. But if we don't move the number to a
>> generic header, the third patch will become a little more complicated. I
>> eventually ignored this issue. If we really need different number for raid1/10,
>> lets do it at that time.
>
> Which brings up my usual queastion:  Is is really that benefitical for
> us to keep the raid1.c code around instead of making it a special short
> cut case in raid10.c?

Patches welcome.

They would need to handle write-mostly and write-behind.  They would
also need to avoid the assumption of a chunk size for RAID1.
Undoubtedly do-able.  Hard to say how beneficial it would be, or how much
it would cost.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/14] md: move two macros into md.h
@ 2017-03-27  9:52           ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-03-27  9:52 UTC (permalink / raw)
  To: Christoph Hellwig, Shaohua Li
  Cc: Ming Lei, Jens Axboe, linux-raid, linux-block, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

On Mon, Mar 27 2017, Christoph Hellwig wrote:

> On Fri, Mar 24, 2017 at 09:53:25AM -0700, Shaohua Li wrote:
>> 
>> I had the same concern when I looked at this patch firstly. The number for
>> raid1/10 doesn't need to be the same. But if we don't move the number to a
>> generic header, the third patch will become a little more complicated. I
>> eventually ignored this issue. If we really need different number for raid1/10,
>> lets do it at that time.
>
> Which brings up my usual queastion:  Is is really that benefitical for
> us to keep the raid1.c code around instead of making it a special short
> cut case in raid10.c?

Patches welcome.

They would need to handle write-mostly and write-behind.  They would
also need to avoid the assumption of a chunk size for RAID1.
Undoubtedly do-able.  Hard to say how beneficial it would be, or how much
it would cost.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-03-16 16:12 ` [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
@ 2017-07-09 23:09     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-09 23:09 UTC (permalink / raw)
  To: Shaohua Li, Jens Axboe, linux-raid, linux-block, Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 3988 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Now we allocate one page array for managing resync pages, instead
> of using bio's vec table to do that, and the old way is very hacky
> and won't work any more if multipage bvec is enabled.
>
> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> bytes per r1_bio, and it is fine because the inflight r1_bio for
> resync shouldn't be much, as pointed by Shaohua.
>
> Also the bio_reset() in raid1_sync_request() is removed because
> all bios are freshly new now and not necessary to reset any more.
>
> This patch can be thought as a cleanup too
>
> Suggested-by: Shaohua Li <shli@kernel.org>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 64 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e30d89690109..0e64beb60e4d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>  #define raid1_log(md, fmt, args...)				\
>  	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>  
> +/*
> + * 'strct resync_pages' stores actual pages used for doing the resync
> + *  IO, and it is per-bio, so make .bi_private points to it.
> + */
> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +{
> +	return bio->bi_private;
> +}
> +
> +/*
> + * for resync bio, r1bio pointer can be retrieved from the per-bio
> + * 'struct resync_pages'.
> + */
> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> +{
> +	return get_resync_pages(bio)->raid_bio;
> +}
> +
>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>  {
>  	struct pool_info *pi = data;
> @@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  	struct r1bio *r1_bio;
>  	struct bio *bio;
>  	int need_pages;
> -	int i, j;
> +	int j;
> +	struct resync_pages *rps;
>  
>  	r1_bio = r1bio_pool_alloc(gfp_flags, pi);
>  	if (!r1_bio)
>  		return NULL;
>  
> +	rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
> +		      gfp_flags);
> +	if (!rps)
> +		goto out_free_r1bio;
> +
>  	/*
>  	 * Allocate bios : 1 for reading, n-1 for writing
>  	 */
> @@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  		need_pages = pi->raid_disks;
>  	else
>  		need_pages = 1;
> -	for (j = 0; j < need_pages; j++) {
> +	for (j = 0; j < pi->raid_disks; j++) {
> +		struct resync_pages *rp = &rps[j];
> +
>  		bio = r1_bio->bios[j];
> -		bio->bi_vcnt = RESYNC_PAGES;
> -
> -		if (bio_alloc_pages(bio, gfp_flags))
> -			goto out_free_pages;
> -	}
> -	/* If not user-requests, copy the page pointers to all bios */
> -	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> -		for (i=0; i<RESYNC_PAGES ; i++)
> -			for (j=1; j<pi->raid_disks; j++) {
> -				struct page *page =
> -					r1_bio->bios[0]->bi_io_vec[i].bv_page;
> -				get_page(page);
> -				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
> -			}
> +
> +		if (j < need_pages) {
> +			if (resync_alloc_pages(rp, gfp_flags))
> +				goto out_free_pages;
> +		} else {
> +			memcpy(rp, &rps[0], sizeof(*rp));
> +			resync_get_all_pages(rp);
> +		}
> +
> +		rp->idx = 0;

This is the only place the ->idx is initialized, in r1buf_pool_alloc().
The mempool alloc function is suppose to allocate memory, not initialize
it.

If the mempool_alloc() call cannot allocate memory it will use memory
from the pool.  If this memory has already been used, then it will no
longer have the initialized value.

In short: you need to initialise memory *after* calling
mempool_alloc(), unless you ensure it is reset to the init values before
calling mempool_free().

https://bugzilla.kernel.org/show_bug.cgi?id=196307

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
@ 2017-07-09 23:09     ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-09 23:09 UTC (permalink / raw)
  To: Ming Lei, Shaohua Li, Jens Axboe, linux-raid, linux-block,
	Christoph Hellwig
  Cc: Ming Lei

[-- Attachment #1: Type: text/plain, Size: 3988 bytes --]

On Fri, Mar 17 2017, Ming Lei wrote:

> Now we allocate one page array for managing resync pages, instead
> of using bio's vec table to do that, and the old way is very hacky
> and won't work any more if multipage bvec is enabled.
>
> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> bytes per r1_bio, and it is fine because the inflight r1_bio for
> resync shouldn't be much, as pointed by Shaohua.
>
> Also the bio_reset() in raid1_sync_request() is removed because
> all bios are freshly new now and not necessary to reset any more.
>
> This patch can be thought as a cleanup too
>
> Suggested-by: Shaohua Li <shli@kernel.org>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 64 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e30d89690109..0e64beb60e4d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>  #define raid1_log(md, fmt, args...)				\
>  	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>  
> +/*
> + * 'strct resync_pages' stores actual pages used for doing the resync
> + *  IO, and it is per-bio, so make .bi_private points to it.
> + */
> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +{
> +	return bio->bi_private;
> +}
> +
> +/*
> + * for resync bio, r1bio pointer can be retrieved from the per-bio
> + * 'struct resync_pages'.
> + */
> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> +{
> +	return get_resync_pages(bio)->raid_bio;
> +}
> +
>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>  {
>  	struct pool_info *pi = data;
> @@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  	struct r1bio *r1_bio;
>  	struct bio *bio;
>  	int need_pages;
> -	int i, j;
> +	int j;
> +	struct resync_pages *rps;
>  
>  	r1_bio = r1bio_pool_alloc(gfp_flags, pi);
>  	if (!r1_bio)
>  		return NULL;
>  
> +	rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
> +		      gfp_flags);
> +	if (!rps)
> +		goto out_free_r1bio;
> +
>  	/*
>  	 * Allocate bios : 1 for reading, n-1 for writing
>  	 */
> @@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  		need_pages = pi->raid_disks;
>  	else
>  		need_pages = 1;
> -	for (j = 0; j < need_pages; j++) {
> +	for (j = 0; j < pi->raid_disks; j++) {
> +		struct resync_pages *rp = &rps[j];
> +
>  		bio = r1_bio->bios[j];
> -		bio->bi_vcnt = RESYNC_PAGES;
> -
> -		if (bio_alloc_pages(bio, gfp_flags))
> -			goto out_free_pages;
> -	}
> -	/* If not user-requests, copy the page pointers to all bios */
> -	if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> -		for (i=0; i<RESYNC_PAGES ; i++)
> -			for (j=1; j<pi->raid_disks; j++) {
> -				struct page *page =
> -					r1_bio->bios[0]->bi_io_vec[i].bv_page;
> -				get_page(page);
> -				r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
> -			}
> +
> +		if (j < need_pages) {
> +			if (resync_alloc_pages(rp, gfp_flags))
> +				goto out_free_pages;
> +		} else {
> +			memcpy(rp, &rps[0], sizeof(*rp));
> +			resync_get_all_pages(rp);
> +		}
> +
> +		rp->idx = 0;

This is the only place the ->idx is initialized, in r1buf_pool_alloc().
The mempool alloc function is suppose to allocate memory, not initialize
it.

If the mempool_alloc() call cannot allocate memory it will use memory
from the pool.  If this memory has already been used, then it will no
longer have the initialized value.

In short: you need to initialise memory *after* calling
mempool_alloc(), unless you ensure it is reset to the init values before
calling mempool_free().

https://bugzilla.kernel.org/show_bug.cgi?id=196307

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-09 23:09     ` NeilBrown
  (?)
@ 2017-07-10  3:35     ` Ming Lei
  2017-07-10  4:13       ` Ming Lei
  -1 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-07-10  3:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> On Fri, Mar 17 2017, Ming Lei wrote:
>
>> Now we allocate one page array for managing resync pages, instead
>> of using bio's vec table to do that, and the old way is very hacky
>> and won't work any more if multipage bvec is enabled.
>>
>> The introduced cost is that we need to allocate (128 + 16) * raid_disks
>> bytes per r1_bio, and it is fine because the inflight r1_bio for
>> resync shouldn't be much, as pointed by Shaohua.
>>
>> Also the bio_reset() in raid1_sync_request() is removed because
>> all bios are freshly new now and not necessary to reset any more.
>>
>> This patch can be thought as a cleanup too
>>
>> Suggested-by: Shaohua Li <shli@kernel.org>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 64 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index e30d89690109..0e64beb60e4d 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
>>  #define raid1_log(md, fmt, args...)                          \
>>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
>>
>> +/*
>> + * 'strct resync_pages' stores actual pages used for doing the resync
>> + *  IO, and it is per-bio, so make .bi_private points to it.
>> + */
>> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
>> +{
>> +     return bio->bi_private;
>> +}
>> +
>> +/*
>> + * for resync bio, r1bio pointer can be retrieved from the per-bio
>> + * 'struct resync_pages'.
>> + */
>> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>> +{
>> +     return get_resync_pages(bio)->raid_bio;
>> +}
>> +
>>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>>  {
>>       struct pool_info *pi = data;
>> @@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>>       struct r1bio *r1_bio;
>>       struct bio *bio;
>>       int need_pages;
>> -     int i, j;
>> +     int j;
>> +     struct resync_pages *rps;
>>
>>       r1_bio = r1bio_pool_alloc(gfp_flags, pi);
>>       if (!r1_bio)
>>               return NULL;
>>
>> +     rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
>> +                   gfp_flags);
>> +     if (!rps)
>> +             goto out_free_r1bio;
>> +
>>       /*
>>        * Allocate bios : 1 for reading, n-1 for writing
>>        */
>> @@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>>               need_pages = pi->raid_disks;
>>       else
>>               need_pages = 1;
>> -     for (j = 0; j < need_pages; j++) {
>> +     for (j = 0; j < pi->raid_disks; j++) {
>> +             struct resync_pages *rp = &rps[j];
>> +
>>               bio = r1_bio->bios[j];
>> -             bio->bi_vcnt = RESYNC_PAGES;
>> -
>> -             if (bio_alloc_pages(bio, gfp_flags))
>> -                     goto out_free_pages;
>> -     }
>> -     /* If not user-requests, copy the page pointers to all bios */
>> -     if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
>> -             for (i=0; i<RESYNC_PAGES ; i++)
>> -                     for (j=1; j<pi->raid_disks; j++) {
>> -                             struct page *page =
>> -                                     r1_bio->bios[0]->bi_io_vec[i].bv_page;
>> -                             get_page(page);
>> -                             r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
>> -                     }
>> +
>> +             if (j < need_pages) {
>> +                     if (resync_alloc_pages(rp, gfp_flags))
>> +                             goto out_free_pages;
>> +             } else {
>> +                     memcpy(rp, &rps[0], sizeof(*rp));
>> +                     resync_get_all_pages(rp);
>> +             }
>> +
>> +             rp->idx = 0;
>
> This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> The mempool alloc function is suppose to allocate memory, not initialize
> it.
>
> If the mempool_alloc() call cannot allocate memory it will use memory
> from the pool.  If this memory has already been used, then it will no
> longer have the initialized value.
>
> In short: you need to initialise memory *after* calling
> mempool_alloc(), unless you ensure it is reset to the init values before
> calling mempool_free().
>
> https://bugzilla.kernel.org/show_bug.cgi?id=196307

OK, thanks for posting it out.

Another fix might be to reinitialize the variable(rp->idx = 0) in
r1buf_pool_free().
Or just set it as zero every time when it is used.

But I don't understand why mempool_free() calls pool->free() at the end of
this function, which may cause to run pool->free() on a new allocated buf,
seems a bug in mempool?


--
Ming Lei

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10  3:35     ` Ming Lei
@ 2017-07-10  4:13       ` Ming Lei
  2017-07-10  4:38           ` NeilBrown
  0 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-07-10  4:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: NeilBrown, Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> > On Fri, Mar 17 2017, Ming Lei wrote:
> >
> >> Now we allocate one page array for managing resync pages, instead
> >> of using bio's vec table to do that, and the old way is very hacky
> >> and won't work any more if multipage bvec is enabled.
> >>
> >> The introduced cost is that we need to allocate (128 + 16) * raid_disks
> >> bytes per r1_bio, and it is fine because the inflight r1_bio for
> >> resync shouldn't be much, as pointed by Shaohua.
> >>
> >> Also the bio_reset() in raid1_sync_request() is removed because
> >> all bios are freshly new now and not necessary to reset any more.
> >>
> >> This patch can be thought as a cleanup too
> >>
> >> Suggested-by: Shaohua Li <shli@kernel.org>
> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> ---
> >>  drivers/md/raid1.c | 94 +++++++++++++++++++++++++++++++++++++-----------------
> >>  1 file changed, 64 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index e30d89690109..0e64beb60e4d 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -80,6 +80,24 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
> >>  #define raid1_log(md, fmt, args...)                          \
> >>       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
> >>
> >> +/*
> >> + * 'strct resync_pages' stores actual pages used for doing the resync
> >> + *  IO, and it is per-bio, so make .bi_private points to it.
> >> + */
> >> +static inline struct resync_pages *get_resync_pages(struct bio *bio)
> >> +{
> >> +     return bio->bi_private;
> >> +}
> >> +
> >> +/*
> >> + * for resync bio, r1bio pointer can be retrieved from the per-bio
> >> + * 'struct resync_pages'.
> >> + */
> >> +static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> >> +{
> >> +     return get_resync_pages(bio)->raid_bio;
> >> +}
> >> +
> >>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> >>  {
> >>       struct pool_info *pi = data;
> >> @@ -107,12 +125,18 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> >>       struct r1bio *r1_bio;
> >>       struct bio *bio;
> >>       int need_pages;
> >> -     int i, j;
> >> +     int j;
> >> +     struct resync_pages *rps;
> >>
> >>       r1_bio = r1bio_pool_alloc(gfp_flags, pi);
> >>       if (!r1_bio)
> >>               return NULL;
> >>
> >> +     rps = kmalloc(sizeof(struct resync_pages) * pi->raid_disks,
> >> +                   gfp_flags);
> >> +     if (!rps)
> >> +             goto out_free_r1bio;
> >> +
> >>       /*
> >>        * Allocate bios : 1 for reading, n-1 for writing
> >>        */
> >> @@ -132,22 +156,22 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> >>               need_pages = pi->raid_disks;
> >>       else
> >>               need_pages = 1;
> >> -     for (j = 0; j < need_pages; j++) {
> >> +     for (j = 0; j < pi->raid_disks; j++) {
> >> +             struct resync_pages *rp = &rps[j];
> >> +
> >>               bio = r1_bio->bios[j];
> >> -             bio->bi_vcnt = RESYNC_PAGES;
> >> -
> >> -             if (bio_alloc_pages(bio, gfp_flags))
> >> -                     goto out_free_pages;
> >> -     }
> >> -     /* If not user-requests, copy the page pointers to all bios */
> >> -     if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> >> -             for (i=0; i<RESYNC_PAGES ; i++)
> >> -                     for (j=1; j<pi->raid_disks; j++) {
> >> -                             struct page *page =
> >> -                                     r1_bio->bios[0]->bi_io_vec[i].bv_page;
> >> -                             get_page(page);
> >> -                             r1_bio->bios[j]->bi_io_vec[i].bv_page = page;
> >> -                     }
> >> +
> >> +             if (j < need_pages) {
> >> +                     if (resync_alloc_pages(rp, gfp_flags))
> >> +                             goto out_free_pages;
> >> +             } else {
> >> +                     memcpy(rp, &rps[0], sizeof(*rp));
> >> +                     resync_get_all_pages(rp);
> >> +             }
> >> +
> >> +             rp->idx = 0;
> >
> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> > The mempool alloc function is suppose to allocate memory, not initialize
> > it.
> >
> > If the mempool_alloc() call cannot allocate memory it will use memory
> > from the pool.  If this memory has already been used, then it will no
> > longer have the initialized value.
> >
> > In short: you need to initialise memory *after* calling
> > mempool_alloc(), unless you ensure it is reset to the init values before
> > calling mempool_free().
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> 
> OK, thanks for posting it out.
> 
> Another fix might be to reinitialize the variable(rp->idx = 0) in
> r1buf_pool_free().
> Or just set it as zero every time when it is used.
> 
> But I don't understand why mempool_free() calls pool->free() at the end of
> this function, which may cause to run pool->free() on a new allocated buf,
> seems a bug in mempool?

Looks I missed the 'return' in mempool_free(), so it is fine.

How about the following fix?

---
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e1a7e3d4c5e4..d31b06da3e3d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -242,6 +242,7 @@ static void put_buf(struct r1bio *r1_bio)
 		struct bio *bio = r1_bio->bios[i];
 		if (bio->bi_end_io)
 			rdev_dec_pending(conf->mirrors[i].rdev, r1_bio->mddev);
+		get_resync_pages(bio)->idx = 0;
 	}
 
 	mempool_free(r1_bio, conf->r1buf_pool);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 797ed60abd5e..c61523768745 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -299,12 +299,21 @@ static void free_r10bio(struct r10bio *r10_bio)
 	mempool_free(r10_bio, conf->r10bio_pool);
 }
 
-static void put_buf(struct r10bio *r10_bio)
+static void free_r10bio_buf(struct r10bio *r10_bio, struct r10conf *conf)
 {
-	struct r10conf *conf = r10_bio->mddev->private;
+	int j;
+
+	for (j = conf->copies; j--; )
+		get_resync_pages(r10_bio->devs[j].bio)->idx = 0;
 
 	mempool_free(r10_bio, conf->r10buf_pool);
+}
+
+static void put_buf(struct r10bio *r10_bio)
+{
+	struct r10conf *conf = r10_bio->mddev->private;
 
+	free_r10bio_buf(r10_bio, conf);
 	lower_barrier(conf);
 }
 
@@ -4383,7 +4392,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		 * on all the target devices.
 		 */
 		// FIXME
-		mempool_free(r10_bio, conf->r10buf_pool);
+		free_r10bio_buf(r10_bio, conf);
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 		return sectors_done;
 	}

-- 
Ming

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10  4:13       ` Ming Lei
@ 2017-07-10  4:38           ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-10  4:38 UTC (permalink / raw)
  To: Ming Lei, Ming Lei
  Cc: Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

On Mon, Jul 10 2017, Ming Lei wrote:

> On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
>> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
...
>> >> +
>> >> +             rp->idx = 0;
>> >
>> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
>> > The mempool alloc function is suppose to allocate memory, not initialize
>> > it.
>> >
>> > If the mempool_alloc() call cannot allocate memory it will use memory
>> > from the pool.  If this memory has already been used, then it will no
>> > longer have the initialized value.
>> >
>> > In short: you need to initialise memory *after* calling
>> > mempool_alloc(), unless you ensure it is reset to the init values before
>> > calling mempool_free().
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
>> 
>> OK, thanks for posting it out.
>> 
>> Another fix might be to reinitialize the variable(rp->idx = 0) in
>> r1buf_pool_free().
>> Or just set it as zero every time when it is used.
>> 
>> But I don't understand why mempool_free() calls pool->free() at the end of
>> this function, which may cause to run pool->free() on a new allocated buf,
>> seems a bug in mempool?
>
> Looks I missed the 'return' in mempool_free(), so it is fine.
>
> How about the following fix?

It looks like it would probably work, but it is rather unusual to
initialise something just before freeing it.

Couldn't you just move the initialization to shortly after the
mempool_alloc() call.  There looks like a good place that already loops
over all the bios....

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
@ 2017-07-10  4:38           ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-10  4:38 UTC (permalink / raw)
  To: Ming Lei, Ming Lei
  Cc: Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

On Mon, Jul 10 2017, Ming Lei wrote:

> On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
>> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
...
>> >> +
>> >> +             rp->idx = 0;
>> >
>> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
>> > The mempool alloc function is suppose to allocate memory, not initialize
>> > it.
>> >
>> > If the mempool_alloc() call cannot allocate memory it will use memory
>> > from the pool.  If this memory has already been used, then it will no
>> > longer have the initialized value.
>> >
>> > In short: you need to initialise memory *after* calling
>> > mempool_alloc(), unless you ensure it is reset to the init values before
>> > calling mempool_free().
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
>> 
>> OK, thanks for posting it out.
>> 
>> Another fix might be to reinitialize the variable(rp->idx = 0) in
>> r1buf_pool_free().
>> Or just set it as zero every time when it is used.
>> 
>> But I don't understand why mempool_free() calls pool->free() at the end of
>> this function, which may cause to run pool->free() on a new allocated buf,
>> seems a bug in mempool?
>
> Looks I missed the 'return' in mempool_free(), so it is fine.
>
> How about the following fix?

It looks like it would probably work, but it is rather unusual to
initialise something just before freeing it.

Couldn't you just move the initialization to shortly after the
mempool_alloc() call.  There looks like a good place that already loops
over all the bios....

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10  4:38           ` NeilBrown
@ 2017-07-10  7:25             ` Ming Lei
  -1 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-07-10  7:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ming Lei, Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> On Mon, Jul 10 2017, Ming Lei wrote:
> 
> > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> ...
> >> >> +
> >> >> +             rp->idx = 0;
> >> >
> >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> >> > The mempool alloc function is suppose to allocate memory, not initialize
> >> > it.
> >> >
> >> > If the mempool_alloc() call cannot allocate memory it will use memory
> >> > from the pool.  If this memory has already been used, then it will no
> >> > longer have the initialized value.
> >> >
> >> > In short: you need to initialise memory *after* calling
> >> > mempool_alloc(), unless you ensure it is reset to the init values before
> >> > calling mempool_free().
> >> >
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> >> 
> >> OK, thanks for posting it out.
> >> 
> >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> >> r1buf_pool_free().
> >> Or just set it as zero every time when it is used.
> >> 
> >> But I don't understand why mempool_free() calls pool->free() at the end of
> >> this function, which may cause to run pool->free() on a new allocated buf,
> >> seems a bug in mempool?
> >
> > Looks I missed the 'return' in mempool_free(), so it is fine.
> >
> > How about the following fix?
> 
> It looks like it would probably work, but it is rather unusual to
> initialise something just before freeing it.
> 
> Couldn't you just move the initialization to shortly after the
> mempool_alloc() call.  There looks like a good place that already loops
> over all the bios....

OK, follows the revised patch according to your suggestion.
---

From 68f9936635b3dda13c87a6b6125ac543145bb940 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Mon, 10 Jul 2017 15:16:16 +0800
Subject: [PATCH] MD: move initialization of resync pages' index out of mempool
 allocator

mempool_alloc() is only responsible for allocation, not for initialization,
so we need to move the initialization of resync pages's index out of the
allocator function.

Reported-by: NeilBrown <neilb@suse.com>
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/raid1.c  | 4 +++-
 drivers/md/raid10.c | 6 +++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e1a7e3d4c5e4..26f5efba0504 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -170,7 +170,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 			resync_get_all_pages(rp);
 		}
 
-		rp->idx = 0;
 		rp->raid_bio = r1_bio;
 		bio->bi_private = rp;
 	}
@@ -2698,6 +2697,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		struct md_rdev *rdev;
 		bio = r1_bio->bios[i];
 
+		/* This initialization should follow mempool_alloc() */
+		get_resync_pages(bio)->idx = 0;
+
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
 		if (rdev == NULL ||
 		    test_bit(Faulty, &rdev->flags)) {
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 797ed60abd5e..5ebcb7487284 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -221,7 +221,6 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 			resync_get_all_pages(rp);
 		}
 
-		rp->idx = 0;
 		rp->raid_bio = r10_bio;
 		bio->bi_private = rp;
 		if (rbio) {
@@ -3095,6 +3094,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio = r10_bio->devs[0].bio;
 				bio->bi_next = biolist;
 				biolist = bio;
+				get_resync_pages(bio)->idx = 0;
 				bio->bi_end_io = end_sync_read;
 				bio_set_op_attrs(bio, REQ_OP_READ, 0);
 				if (test_bit(FailFast, &rdev->flags))
@@ -3120,6 +3120,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					bio = r10_bio->devs[1].bio;
 					bio->bi_next = biolist;
 					biolist = bio;
+					get_resync_pages(bio)->idx = 0;
 					bio->bi_end_io = end_sync_write;
 					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 					bio->bi_iter.bi_sector = to_addr
@@ -3146,6 +3147,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					break;
 				bio->bi_next = biolist;
 				biolist = bio;
+				get_resync_pages(bio)->idx = 0;
 				bio->bi_end_io = end_sync_write;
 				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 				bio->bi_iter.bi_sector = to_addr +
@@ -3291,6 +3293,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&r10_bio->remaining);
 			bio->bi_next = biolist;
 			biolist = bio;
+			get_resync_pages(bio)->idx = 0;
 			bio->bi_end_io = end_sync_read;
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3314,6 +3317,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			sector = r10_bio->devs[i].addr;
 			bio->bi_next = biolist;
 			biolist = bio;
+			get_resync_pages(bio)->idx = 0;
 			bio->bi_end_io = end_sync_write;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
-- 
2.9.4



-- 
Ming

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
@ 2017-07-10  7:25             ` Ming Lei
  0 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-07-10  7:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ming Lei, Shaohua Li, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> On Mon, Jul 10 2017, Ming Lei wrote:
> 
> > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> ...
> >> >> +
> >> >> +             rp->idx = 0;
> >> >
> >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> >> > The mempool alloc function is suppose to allocate memory, not initialize
> >> > it.
> >> >
> >> > If the mempool_alloc() call cannot allocate memory it will use memory
> >> > from the pool.  If this memory has already been used, then it will no
> >> > longer have the initialized value.
> >> >
> >> > In short: you need to initialise memory *after* calling
> >> > mempool_alloc(), unless you ensure it is reset to the init values before
> >> > calling mempool_free().
> >> >
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> >> 
> >> OK, thanks for posting it out.
> >> 
> >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> >> r1buf_pool_free().
> >> Or just set it as zero every time when it is used.
> >> 
> >> But I don't understand why mempool_free() calls pool->free() at the end of
> >> this function, which may cause to run pool->free() on a new allocated buf,
> >> seems a bug in mempool?
> >
> > Looks I missed the 'return' in mempool_free(), so it is fine.
> >
> > How about the following fix?
> 
> It looks like it would probably work, but it is rather unusual to
> initialise something just before freeing it.
> 
> Couldn't you just move the initialization to shortly after the
> mempool_alloc() call.  There looks like a good place that already loops
> over all the bios....

OK, follows the revised patch according to your suggestion.
---

>From 68f9936635b3dda13c87a6b6125ac543145bb940 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Mon, 10 Jul 2017 15:16:16 +0800
Subject: [PATCH] MD: move initialization of resync pages' index out of mempool
 allocator

mempool_alloc() is only responsible for allocation, not for initialization,
so we need to move the initialization of resync pages's index out of the
allocator function.

Reported-by: NeilBrown <neilb@suse.com>
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/raid1.c  | 4 +++-
 drivers/md/raid10.c | 6 +++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e1a7e3d4c5e4..26f5efba0504 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -170,7 +170,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 			resync_get_all_pages(rp);
 		}
 
-		rp->idx = 0;
 		rp->raid_bio = r1_bio;
 		bio->bi_private = rp;
 	}
@@ -2698,6 +2697,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		struct md_rdev *rdev;
 		bio = r1_bio->bios[i];
 
+		/* This initialization should follow mempool_alloc() */
+		get_resync_pages(bio)->idx = 0;
+
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
 		if (rdev == NULL ||
 		    test_bit(Faulty, &rdev->flags)) {
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 797ed60abd5e..5ebcb7487284 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -221,7 +221,6 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 			resync_get_all_pages(rp);
 		}
 
-		rp->idx = 0;
 		rp->raid_bio = r10_bio;
 		bio->bi_private = rp;
 		if (rbio) {
@@ -3095,6 +3094,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 				bio = r10_bio->devs[0].bio;
 				bio->bi_next = biolist;
 				biolist = bio;
+				get_resync_pages(bio)->idx = 0;
 				bio->bi_end_io = end_sync_read;
 				bio_set_op_attrs(bio, REQ_OP_READ, 0);
 				if (test_bit(FailFast, &rdev->flags))
@@ -3120,6 +3120,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					bio = r10_bio->devs[1].bio;
 					bio->bi_next = biolist;
 					biolist = bio;
+					get_resync_pages(bio)->idx = 0;
 					bio->bi_end_io = end_sync_write;
 					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 					bio->bi_iter.bi_sector = to_addr
@@ -3146,6 +3147,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 					break;
 				bio->bi_next = biolist;
 				biolist = bio;
+				get_resync_pages(bio)->idx = 0;
 				bio->bi_end_io = end_sync_write;
 				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 				bio->bi_iter.bi_sector = to_addr +
@@ -3291,6 +3293,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			atomic_inc(&r10_bio->remaining);
 			bio->bi_next = biolist;
 			biolist = bio;
+			get_resync_pages(bio)->idx = 0;
 			bio->bi_end_io = end_sync_read;
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
@@ -3314,6 +3317,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 			sector = r10_bio->devs[i].addr;
 			bio->bi_next = biolist;
 			biolist = bio;
+			get_resync_pages(bio)->idx = 0;
 			bio->bi_end_io = end_sync_write;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
-- 
2.9.4



-- 
Ming

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10  7:25             ` Ming Lei
  (?)
@ 2017-07-10 19:05             ` Shaohua Li
  2017-07-10 22:54               ` Ming Lei
  2017-07-10 23:14                 ` NeilBrown
  -1 siblings, 2 replies; 44+ messages in thread
From: Shaohua Li @ 2017-07-10 19:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: NeilBrown, Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> > On Mon, Jul 10 2017, Ming Lei wrote:
> > 
> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> > ...
> > >> >> +
> > >> >> +             rp->idx = 0;
> > >> >
> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> > >> > The mempool alloc function is suppose to allocate memory, not initialize
> > >> > it.
> > >> >
> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
> > >> > from the pool.  If this memory has already been used, then it will no
> > >> > longer have the initialized value.
> > >> >
> > >> > In short: you need to initialise memory *after* calling
> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
> > >> > calling mempool_free().
> > >> >
> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> > >> 
> > >> OK, thanks for posting it out.
> > >> 
> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> > >> r1buf_pool_free().
> > >> Or just set it as zero every time when it is used.
> > >> 
> > >> But I don't understand why mempool_free() calls pool->free() at the end of
> > >> this function, which may cause to run pool->free() on a new allocated buf,
> > >> seems a bug in mempool?
> > >
> > > Looks I missed the 'return' in mempool_free(), so it is fine.
> > >
> > > How about the following fix?
> > 
> > It looks like it would probably work, but it is rather unusual to
> > initialise something just before freeing it.
> > 
> > Couldn't you just move the initialization to shortly after the
> > mempool_alloc() call.  There looks like a good place that already loops
> > over all the bios....
> 
> OK, follows the revised patch according to your suggestion.
> ---
> 
> From 68f9936635b3dda13c87a6b6125ac543145bb940 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@redhat.com>
> Date: Mon, 10 Jul 2017 15:16:16 +0800
> Subject: [PATCH] MD: move initialization of resync pages' index out of mempool
>  allocator
> 
> mempool_alloc() is only responsible for allocation, not for initialization,
> so we need to move the initialization of resync pages's index out of the
> allocator function.
> 
> Reported-by: NeilBrown <neilb@suse.com>
> Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
> Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/md/raid1.c  | 4 +++-
>  drivers/md/raid10.c | 6 +++++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index e1a7e3d4c5e4..26f5efba0504 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -170,7 +170,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>  			resync_get_all_pages(rp);
>  		}
>  
> -		rp->idx = 0;
>  		rp->raid_bio = r1_bio;
>  		bio->bi_private = rp;
>  	}
> @@ -2698,6 +2697,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>  		struct md_rdev *rdev;
>  		bio = r1_bio->bios[i];
>  
> +		/* This initialization should follow mempool_alloc() */
> +		get_resync_pages(bio)->idx = 0;
> +

This is fragile and hard to maintain. Can we add a wrap for the
allocation/init?

Thanks,
Shaohua

>  		rdev = rcu_dereference(conf->mirrors[i].rdev);
>  		if (rdev == NULL ||
>  		    test_bit(Faulty, &rdev->flags)) {
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 797ed60abd5e..5ebcb7487284 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -221,7 +221,6 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  			resync_get_all_pages(rp);
>  		}
>  
> -		rp->idx = 0;
>  		rp->raid_bio = r10_bio;
>  		bio->bi_private = rp;
>  		if (rbio) {
> @@ -3095,6 +3094,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  				bio = r10_bio->devs[0].bio;
>  				bio->bi_next = biolist;
>  				biolist = bio;
> +				get_resync_pages(bio)->idx = 0;
>  				bio->bi_end_io = end_sync_read;
>  				bio_set_op_attrs(bio, REQ_OP_READ, 0);
>  				if (test_bit(FailFast, &rdev->flags))
> @@ -3120,6 +3120,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  					bio = r10_bio->devs[1].bio;
>  					bio->bi_next = biolist;
>  					biolist = bio;
> +					get_resync_pages(bio)->idx = 0;
>  					bio->bi_end_io = end_sync_write;
>  					bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  					bio->bi_iter.bi_sector = to_addr
> @@ -3146,6 +3147,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  					break;
>  				bio->bi_next = biolist;
>  				biolist = bio;
> +				get_resync_pages(bio)->idx = 0;
>  				bio->bi_end_io = end_sync_write;
>  				bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  				bio->bi_iter.bi_sector = to_addr +
> @@ -3291,6 +3293,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  			atomic_inc(&r10_bio->remaining);
>  			bio->bi_next = biolist;
>  			biolist = bio;
> +			get_resync_pages(bio)->idx = 0;
>  			bio->bi_end_io = end_sync_read;
>  			bio_set_op_attrs(bio, REQ_OP_READ, 0);
>  			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
> @@ -3314,6 +3317,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>  			sector = r10_bio->devs[i].addr;
>  			bio->bi_next = biolist;
>  			biolist = bio;
> +			get_resync_pages(bio)->idx = 0;
>  			bio->bi_end_io = end_sync_write;
>  			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>  			if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
> -- 
> 2.9.4
> 
> 
> 
> -- 
> Ming

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10 19:05             ` Shaohua Li
@ 2017-07-10 22:54               ` Ming Lei
  2017-07-10 23:14                 ` NeilBrown
  1 sibling, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-07-10 22:54 UTC (permalink / raw)
  To: Shaohua Li
  Cc: NeilBrown, Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Mon, Jul 10, 2017 at 12:05:49PM -0700, Shaohua Li wrote:
> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
> > On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> > > On Mon, Jul 10 2017, Ming Lei wrote:
> > > 
> > > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> > > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> > > ...
> > > >> >> +
> > > >> >> +             rp->idx = 0;
> > > >> >
> > > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> > > >> > The mempool alloc function is suppose to allocate memory, not initialize
> > > >> > it.
> > > >> >
> > > >> > If the mempool_alloc() call cannot allocate memory it will use memory
> > > >> > from the pool.  If this memory has already been used, then it will no
> > > >> > longer have the initialized value.
> > > >> >
> > > >> > In short: you need to initialise memory *after* calling
> > > >> > mempool_alloc(), unless you ensure it is reset to the init values before
> > > >> > calling mempool_free().
> > > >> >
> > > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> > > >> 
> > > >> OK, thanks for posting it out.
> > > >> 
> > > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> > > >> r1buf_pool_free().
> > > >> Or just set it as zero every time when it is used.
> > > >> 
> > > >> But I don't understand why mempool_free() calls pool->free() at the end of
> > > >> this function, which may cause to run pool->free() on a new allocated buf,
> > > >> seems a bug in mempool?
> > > >
> > > > Looks I missed the 'return' in mempool_free(), so it is fine.
> > > >
> > > > How about the following fix?
> > > 
> > > It looks like it would probably work, but it is rather unusual to
> > > initialise something just before freeing it.
> > > 
> > > Couldn't you just move the initialization to shortly after the
> > > mempool_alloc() call.  There looks like a good place that already loops
> > > over all the bios....
> > 
> > OK, follows the revised patch according to your suggestion.
> > ---
> > 
> > From 68f9936635b3dda13c87a6b6125ac543145bb940 Mon Sep 17 00:00:00 2001
> > From: Ming Lei <ming.lei@redhat.com>
> > Date: Mon, 10 Jul 2017 15:16:16 +0800
> > Subject: [PATCH] MD: move initialization of resync pages' index out of mempool
> >  allocator
> > 
> > mempool_alloc() is only responsible for allocation, not for initialization,
> > so we need to move the initialization of resync pages's index out of the
> > allocator function.
> > 
> > Reported-by: NeilBrown <neilb@suse.com>
> > Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
> > Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/md/raid1.c  | 4 +++-
> >  drivers/md/raid10.c | 6 +++++-
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index e1a7e3d4c5e4..26f5efba0504 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -170,7 +170,6 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> >  			resync_get_all_pages(rp);
> >  		}
> >  
> > -		rp->idx = 0;
> >  		rp->raid_bio = r1_bio;
> >  		bio->bi_private = rp;
> >  	}
> > @@ -2698,6 +2697,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
> >  		struct md_rdev *rdev;
> >  		bio = r1_bio->bios[i];
> >  
> > +		/* This initialization should follow mempool_alloc() */
> > +		get_resync_pages(bio)->idx = 0;
> > +
> 
> This is fragile and hard to maintain. Can we add a wrap for the
> allocation/init?

The resync pages's index init is done in the following loop
after mempool_alloc(), actually the whole loop is still sort of init,
so I think it isn't fragile, and it should be fine.

-- 
Ming

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10 19:05             ` Shaohua Li
@ 2017-07-10 23:14                 ` NeilBrown
  2017-07-10 23:14                 ` NeilBrown
  1 sibling, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-10 23:14 UTC (permalink / raw)
  To: Shaohua Li, Ming Lei
  Cc: Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 3622 bytes --]

On Mon, Jul 10 2017, Shaohua Li wrote:

> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
>> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
>> > On Mon, Jul 10 2017, Ming Lei wrote:
>> > 
>> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
>> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
>> > ...
>> > >> >> +
>> > >> >> +             rp->idx = 0;
>> > >> >
>> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
>> > >> > The mempool alloc function is suppose to allocate memory, not initialize
>> > >> > it.
>> > >> >
>> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
>> > >> > from the pool.  If this memory has already been used, then it will no
>> > >> > longer have the initialized value.
>> > >> >
>> > >> > In short: you need to initialise memory *after* calling
>> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
>> > >> > calling mempool_free().
>> > >> >
>> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
>> > >> 
>> > >> OK, thanks for posting it out.
>> > >> 
>> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
>> > >> r1buf_pool_free().
>> > >> Or just set it as zero every time when it is used.
>> > >> 
>> > >> But I don't understand why mempool_free() calls pool->free() at the end of
>> > >> this function, which may cause to run pool->free() on a new allocated buf,
>> > >> seems a bug in mempool?
>> > >
>> > > Looks I missed the 'return' in mempool_free(), so it is fine.
>> > >
>> > > How about the following fix?
>> > 
>> > It looks like it would probably work, but it is rather unusual to
>> > initialise something just before freeing it.
>> > 
>> > Couldn't you just move the initialization to shortly after the
>> > mempool_alloc() call.  There looks like a good place that already loops
>> > over all the bios....
>> 
>> OK, follows the revised patch according to your suggestion.

Thanks.

That isn't as tidy as I hoped.  So I went deeper into the code to try to
understand why...

I think that maybe we should just discard the ->idx field completely.
It is only used in this code:

	do {
		struct page *page;
		int len = PAGE_SIZE;
		if (sector_nr + (len>>9) > max_sector)
			len = (max_sector - sector_nr) << 9;
		if (len == 0)
			break;
		for (bio= biolist ; bio ; bio=bio->bi_next) {
			struct resync_pages *rp = get_resync_pages(bio);
			page = resync_fetch_page(rp, rp->idx++);
			/*
			 * won't fail because the vec table is big enough
			 * to hold all these pages
			 */
			bio_add_page(bio, page, len, 0);
		}
		nr_sectors += len>>9;
		sector_nr += len>>9;
	} while (get_resync_pages(biolist)->idx < RESYNC_PAGES);

and all of the different 'rp' always have the same value for 'idx'.
This code is more complex than it needs to be.  This is because it used
to be possible for bio_add_page() to fail.  That cannot happen any more.
So we can make the code something like:

  for (idx = 0; idx < RESYNC_PAGES; idx++) {
     struct page *page;
     int len = PAGE_SIZE;
     if (sector_nr + (len >> 9) > max_sector)
         len = (max_sector - sector_nr) << 9
     if (len == 0)
         break;
     for (bio = biolist; bio; bio = bio->bi_next) {
        struct resync_pages *rp = get_resync_pages(bio);
        page = resync_fetch_page(rp, idx);
        bio_add_page(bio, page, len, 0);
     }
     nr_sectors += len >> 9;
     sector_nr += len >> 9;
  }

Or did I miss something?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
@ 2017-07-10 23:14                 ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2017-07-10 23:14 UTC (permalink / raw)
  To: Shaohua Li, Ming Lei
  Cc: Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 3622 bytes --]

On Mon, Jul 10 2017, Shaohua Li wrote:

> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
>> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
>> > On Mon, Jul 10 2017, Ming Lei wrote:
>> > 
>> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
>> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
>> > ...
>> > >> >> +
>> > >> >> +             rp->idx = 0;
>> > >> >
>> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
>> > >> > The mempool alloc function is suppose to allocate memory, not initialize
>> > >> > it.
>> > >> >
>> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
>> > >> > from the pool.  If this memory has already been used, then it will no
>> > >> > longer have the initialized value.
>> > >> >
>> > >> > In short: you need to initialise memory *after* calling
>> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
>> > >> > calling mempool_free().
>> > >> >
>> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
>> > >> 
>> > >> OK, thanks for posting it out.
>> > >> 
>> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
>> > >> r1buf_pool_free().
>> > >> Or just set it as zero every time when it is used.
>> > >> 
>> > >> But I don't understand why mempool_free() calls pool->free() at the end of
>> > >> this function, which may cause to run pool->free() on a new allocated buf,
>> > >> seems a bug in mempool?
>> > >
>> > > Looks I missed the 'return' in mempool_free(), so it is fine.
>> > >
>> > > How about the following fix?
>> > 
>> > It looks like it would probably work, but it is rather unusual to
>> > initialise something just before freeing it.
>> > 
>> > Couldn't you just move the initialization to shortly after the
>> > mempool_alloc() call.  There looks like a good place that already loops
>> > over all the bios....
>> 
>> OK, follows the revised patch according to your suggestion.

Thanks.

That isn't as tidy as I hoped.  So I went deeper into the code to try to
understand why...

I think that maybe we should just discard the ->idx field completely.
It is only used in this code:

	do {
		struct page *page;
		int len = PAGE_SIZE;
		if (sector_nr + (len>>9) > max_sector)
			len = (max_sector - sector_nr) << 9;
		if (len == 0)
			break;
		for (bio= biolist ; bio ; bio=bio->bi_next) {
			struct resync_pages *rp = get_resync_pages(bio);
			page = resync_fetch_page(rp, rp->idx++);
			/*
			 * won't fail because the vec table is big enough
			 * to hold all these pages
			 */
			bio_add_page(bio, page, len, 0);
		}
		nr_sectors += len>>9;
		sector_nr += len>>9;
	} while (get_resync_pages(biolist)->idx < RESYNC_PAGES);

and all of the different 'rp' always have the same value for 'idx'.
This code is more complex than it needs to be.  This is because it used
to be possible for bio_add_page() to fail.  That cannot happen any more.
So we can make the code something like:

  for (idx = 0; idx < RESYNC_PAGES; idx++) {
     struct page *page;
     int len = PAGE_SIZE;
     if (sector_nr + (len >> 9) > max_sector)
         len = (max_sector - sector_nr) << 9
     if (len == 0)
         break;
     for (bio = biolist; bio; bio = bio->bi_next) {
        struct resync_pages *rp = get_resync_pages(bio);
        page = resync_fetch_page(rp, idx);
        bio_add_page(bio, page, len, 0);
     }
     nr_sectors += len >> 9;
     sector_nr += len >> 9;
  }

Or did I miss something?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-10 23:14                 ` NeilBrown
  (?)
@ 2017-07-12  1:40                 ` Ming Lei
  2017-07-12 16:30                   ` Shaohua Li
  -1 siblings, 1 reply; 44+ messages in thread
From: Ming Lei @ 2017-07-12  1:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: Shaohua Li, Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Tue, Jul 11, 2017 at 7:14 AM, NeilBrown <neilb@suse.com> wrote:
> On Mon, Jul 10 2017, Shaohua Li wrote:
>
>> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
>>> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
>>> > On Mon, Jul 10 2017, Ming Lei wrote:
>>> >
>>> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
>>> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
>>> > ...
>>> > >> >> +
>>> > >> >> +             rp->idx = 0;
>>> > >> >
>>> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
>>> > >> > The mempool alloc function is suppose to allocate memory, not initialize
>>> > >> > it.
>>> > >> >
>>> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
>>> > >> > from the pool.  If this memory has already been used, then it will no
>>> > >> > longer have the initialized value.
>>> > >> >
>>> > >> > In short: you need to initialise memory *after* calling
>>> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
>>> > >> > calling mempool_free().
>>> > >> >
>>> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
>>> > >>
>>> > >> OK, thanks for posting it out.
>>> > >>
>>> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
>>> > >> r1buf_pool_free().
>>> > >> Or just set it as zero every time when it is used.
>>> > >>
>>> > >> But I don't understand why mempool_free() calls pool->free() at the end of
>>> > >> this function, which may cause to run pool->free() on a new allocated buf,
>>> > >> seems a bug in mempool?
>>> > >
>>> > > Looks I missed the 'return' in mempool_free(), so it is fine.
>>> > >
>>> > > How about the following fix?
>>> >
>>> > It looks like it would probably work, but it is rather unusual to
>>> > initialise something just before freeing it.
>>> >
>>> > Couldn't you just move the initialization to shortly after the
>>> > mempool_alloc() call.  There looks like a good place that already loops
>>> > over all the bios....
>>>
>>> OK, follows the revised patch according to your suggestion.
>
> Thanks.
>
> That isn't as tidy as I hoped.  So I went deeper into the code to try to
> understand why...
>
> I think that maybe we should just discard the ->idx field completely.
> It is only used in this code:
>
>         do {
>                 struct page *page;
>                 int len = PAGE_SIZE;
>                 if (sector_nr + (len>>9) > max_sector)
>                         len = (max_sector - sector_nr) << 9;
>                 if (len == 0)
>                         break;
>                 for (bio= biolist ; bio ; bio=bio->bi_next) {
>                         struct resync_pages *rp = get_resync_pages(bio);
>                         page = resync_fetch_page(rp, rp->idx++);
>                         /*
>                          * won't fail because the vec table is big enough
>                          * to hold all these pages
>                          */
>                         bio_add_page(bio, page, len, 0);
>                 }
>                 nr_sectors += len>>9;
>                 sector_nr += len>>9;
>         } while (get_resync_pages(biolist)->idx < RESYNC_PAGES);
>
> and all of the different 'rp' always have the same value for 'idx'.
> This code is more complex than it needs to be.  This is because it used
> to be possible for bio_add_page() to fail.  That cannot happen any more.
> So we can make the code something like:
>
>   for (idx = 0; idx < RESYNC_PAGES; idx++) {
>      struct page *page;
>      int len = PAGE_SIZE;
>      if (sector_nr + (len >> 9) > max_sector)
>          len = (max_sector - sector_nr) << 9
>      if (len == 0)
>          break;
>      for (bio = biolist; bio; bio = bio->bi_next) {
>         struct resync_pages *rp = get_resync_pages(bio);
>         page = resync_fetch_page(rp, idx);
>         bio_add_page(bio, page, len, 0);
>      }
>      nr_sectors += len >> 9;
>      sector_nr += len >> 9;
>   }
>
> Or did I miss something?

I think this approach is much clean.


--
Ming Lei

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-12  1:40                 ` Ming Lei
@ 2017-07-12 16:30                   ` Shaohua Li
  2017-07-13  1:22                     ` Ming Lei
  0 siblings, 1 reply; 44+ messages in thread
From: Shaohua Li @ 2017-07-12 16:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: NeilBrown, Ming Lei, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Wed, Jul 12, 2017 at 09:40:10AM +0800, Ming Lei wrote:
> On Tue, Jul 11, 2017 at 7:14 AM, NeilBrown <neilb@suse.com> wrote:
> > On Mon, Jul 10 2017, Shaohua Li wrote:
> >
> >> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
> >>> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> >>> > On Mon, Jul 10 2017, Ming Lei wrote:
> >>> >
> >>> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> >>> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> >>> > ...
> >>> > >> >> +
> >>> > >> >> +             rp->idx = 0;
> >>> > >> >
> >>> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> >>> > >> > The mempool alloc function is suppose to allocate memory, not initialize
> >>> > >> > it.
> >>> > >> >
> >>> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
> >>> > >> > from the pool.  If this memory has already been used, then it will no
> >>> > >> > longer have the initialized value.
> >>> > >> >
> >>> > >> > In short: you need to initialise memory *after* calling
> >>> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
> >>> > >> > calling mempool_free().
> >>> > >> >
> >>> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> >>> > >>
> >>> > >> OK, thanks for posting it out.
> >>> > >>
> >>> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> >>> > >> r1buf_pool_free().
> >>> > >> Or just set it as zero every time when it is used.
> >>> > >>
> >>> > >> But I don't understand why mempool_free() calls pool->free() at the end of
> >>> > >> this function, which may cause to run pool->free() on a new allocated buf,
> >>> > >> seems a bug in mempool?
> >>> > >
> >>> > > Looks I missed the 'return' in mempool_free(), so it is fine.
> >>> > >
> >>> > > How about the following fix?
> >>> >
> >>> > It looks like it would probably work, but it is rather unusual to
> >>> > initialise something just before freeing it.
> >>> >
> >>> > Couldn't you just move the initialization to shortly after the
> >>> > mempool_alloc() call.  There looks like a good place that already loops
> >>> > over all the bios....
> >>>
> >>> OK, follows the revised patch according to your suggestion.
> >
> > Thanks.
> >
> > That isn't as tidy as I hoped.  So I went deeper into the code to try to
> > understand why...
> >
> > I think that maybe we should just discard the ->idx field completely.
> > It is only used in this code:
> >
> >         do {
> >                 struct page *page;
> >                 int len = PAGE_SIZE;
> >                 if (sector_nr + (len>>9) > max_sector)
> >                         len = (max_sector - sector_nr) << 9;
> >                 if (len == 0)
> >                         break;
> >                 for (bio= biolist ; bio ; bio=bio->bi_next) {
> >                         struct resync_pages *rp = get_resync_pages(bio);
> >                         page = resync_fetch_page(rp, rp->idx++);
> >                         /*
> >                          * won't fail because the vec table is big enough
> >                          * to hold all these pages
> >                          */
> >                         bio_add_page(bio, page, len, 0);
> >                 }
> >                 nr_sectors += len>>9;
> >                 sector_nr += len>>9;
> >         } while (get_resync_pages(biolist)->idx < RESYNC_PAGES);
> >
> > and all of the different 'rp' always have the same value for 'idx'.
> > This code is more complex than it needs to be.  This is because it used
> > to be possible for bio_add_page() to fail.  That cannot happen any more.
> > So we can make the code something like:
> >
> >   for (idx = 0; idx < RESYNC_PAGES; idx++) {
> >      struct page *page;
> >      int len = PAGE_SIZE;
> >      if (sector_nr + (len >> 9) > max_sector)
> >          len = (max_sector - sector_nr) << 9
> >      if (len == 0)
> >          break;
> >      for (bio = biolist; bio; bio = bio->bi_next) {
> >         struct resync_pages *rp = get_resync_pages(bio);
> >         page = resync_fetch_page(rp, idx);
> >         bio_add_page(bio, page, len, 0);
> >      }
> >      nr_sectors += len >> 9;
> >      sector_nr += len >> 9;
> >   }
> >
> > Or did I miss something?
> 
> I think this approach is much clean.

Thought I suggested not using the 'idx' in your previous post, but you said
there is reason (not because of bio_add_page) not to do it. Is that changed?
can't remember the details, I need to dig the mail archives. 

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages
  2017-07-12 16:30                   ` Shaohua Li
@ 2017-07-13  1:22                     ` Ming Lei
  0 siblings, 0 replies; 44+ messages in thread
From: Ming Lei @ 2017-07-13  1:22 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Ming Lei, NeilBrown, Jens Axboe,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT, linux-block,
	Christoph Hellwig

On Wed, Jul 12, 2017 at 09:30:50AM -0700, Shaohua Li wrote:
> On Wed, Jul 12, 2017 at 09:40:10AM +0800, Ming Lei wrote:
> > On Tue, Jul 11, 2017 at 7:14 AM, NeilBrown <neilb@suse.com> wrote:
> > > On Mon, Jul 10 2017, Shaohua Li wrote:
> > >
> > >> On Mon, Jul 10, 2017 at 03:25:41PM +0800, Ming Lei wrote:
> > >>> On Mon, Jul 10, 2017 at 02:38:19PM +1000, NeilBrown wrote:
> > >>> > On Mon, Jul 10 2017, Ming Lei wrote:
> > >>> >
> > >>> > > On Mon, Jul 10, 2017 at 11:35:12AM +0800, Ming Lei wrote:
> > >>> > >> On Mon, Jul 10, 2017 at 7:09 AM, NeilBrown <neilb@suse.com> wrote:
> > >>> > ...
> > >>> > >> >> +
> > >>> > >> >> +             rp->idx = 0;
> > >>> > >> >
> > >>> > >> > This is the only place the ->idx is initialized, in r1buf_pool_alloc().
> > >>> > >> > The mempool alloc function is suppose to allocate memory, not initialize
> > >>> > >> > it.
> > >>> > >> >
> > >>> > >> > If the mempool_alloc() call cannot allocate memory it will use memory
> > >>> > >> > from the pool.  If this memory has already been used, then it will no
> > >>> > >> > longer have the initialized value.
> > >>> > >> >
> > >>> > >> > In short: you need to initialise memory *after* calling
> > >>> > >> > mempool_alloc(), unless you ensure it is reset to the init values before
> > >>> > >> > calling mempool_free().
> > >>> > >> >
> > >>> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=196307
> > >>> > >>
> > >>> > >> OK, thanks for posting it out.
> > >>> > >>
> > >>> > >> Another fix might be to reinitialize the variable(rp->idx = 0) in
> > >>> > >> r1buf_pool_free().
> > >>> > >> Or just set it as zero every time when it is used.
> > >>> > >>
> > >>> > >> But I don't understand why mempool_free() calls pool->free() at the end of
> > >>> > >> this function, which may cause to run pool->free() on a new allocated buf,
> > >>> > >> seems a bug in mempool?
> > >>> > >
> > >>> > > Looks I missed the 'return' in mempool_free(), so it is fine.
> > >>> > >
> > >>> > > How about the following fix?
> > >>> >
> > >>> > It looks like it would probably work, but it is rather unusual to
> > >>> > initialise something just before freeing it.
> > >>> >
> > >>> > Couldn't you just move the initialization to shortly after the
> > >>> > mempool_alloc() call.  There looks like a good place that already loops
> > >>> > over all the bios....
> > >>>
> > >>> OK, follows the revised patch according to your suggestion.
> > >
> > > Thanks.
> > >
> > > That isn't as tidy as I hoped.  So I went deeper into the code to try to
> > > understand why...
> > >
> > > I think that maybe we should just discard the ->idx field completely.
> > > It is only used in this code:
> > >
> > >         do {
> > >                 struct page *page;
> > >                 int len = PAGE_SIZE;
> > >                 if (sector_nr + (len>>9) > max_sector)
> > >                         len = (max_sector - sector_nr) << 9;
> > >                 if (len == 0)
> > >                         break;
> > >                 for (bio= biolist ; bio ; bio=bio->bi_next) {
> > >                         struct resync_pages *rp = get_resync_pages(bio);
> > >                         page = resync_fetch_page(rp, rp->idx++);
> > >                         /*
> > >                          * won't fail because the vec table is big enough
> > >                          * to hold all these pages
> > >                          */
> > >                         bio_add_page(bio, page, len, 0);
> > >                 }
> > >                 nr_sectors += len>>9;
> > >                 sector_nr += len>>9;
> > >         } while (get_resync_pages(biolist)->idx < RESYNC_PAGES);
> > >
> > > and all of the different 'rp' always have the same value for 'idx'.
> > > This code is more complex than it needs to be.  This is because it used
> > > to be possible for bio_add_page() to fail.  That cannot happen any more.
> > > So we can make the code something like:
> > >
> > >   for (idx = 0; idx < RESYNC_PAGES; idx++) {
> > >      struct page *page;
> > >      int len = PAGE_SIZE;
> > >      if (sector_nr + (len >> 9) > max_sector)
> > >          len = (max_sector - sector_nr) << 9
> > >      if (len == 0)
> > >          break;
> > >      for (bio = biolist; bio; bio = bio->bi_next) {
> > >         struct resync_pages *rp = get_resync_pages(bio);
> > >         page = resync_fetch_page(rp, idx);
> > >         bio_add_page(bio, page, len, 0);
> > >      }
> > >      nr_sectors += len >> 9;
> > >      sector_nr += len >> 9;
> > >   }
> > >
> > > Or did I miss something?
> > 
> > I think this approach is much clean.
> 
> Thought I suggested not using the 'idx' in your previous post, but you said
> there is reason (not because of bio_add_page) not to do it. Is that changed?
> can't remember the details, I need to dig the mail archives. 

I found it:

	http://marc.info/?l=linux-raid&m=148847751302825&w=2

Not sure why I didn't change to this way in v3, but the idea is correct.
Maybe I misunderstood it that time.

-- 
Ming

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-07-13  1:22 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 16:12 [PATCH v3 00/14] md: cleanup on direct access to bvec table Ming Lei
2017-03-16 16:12 ` [PATCH v3 01/14] md: raid1/raid10: don't handle failure of bio_add_page() Ming Lei
2017-03-27  9:14   ` Christoph Hellwig
2017-03-16 16:12 ` [PATCH v3 02/14] md: move two macros into md.h Ming Lei
2017-03-24  5:57   ` NeilBrown
2017-03-24  5:57     ` NeilBrown
2017-03-24  6:30     ` Ming Lei
2017-03-24 16:53     ` Shaohua Li
2017-03-27  9:15       ` Christoph Hellwig
2017-03-27  9:52         ` NeilBrown
2017-03-27  9:52           ` NeilBrown
2017-03-16 16:12 ` [PATCH v3 03/14] md: prepare for managing resync I/O pages in clean way Ming Lei
2017-03-24  6:00   ` NeilBrown
2017-03-24  6:00     ` NeilBrown
2017-03-16 16:12 ` [PATCH v3 04/14] md: raid1: simplify r1buf_pool_free() Ming Lei
2017-03-16 16:12 ` [PATCH v3 05/14] md: raid1: don't use bio's vec table to manage resync pages Ming Lei
2017-07-09 23:09   ` NeilBrown
2017-07-09 23:09     ` NeilBrown
2017-07-10  3:35     ` Ming Lei
2017-07-10  4:13       ` Ming Lei
2017-07-10  4:38         ` NeilBrown
2017-07-10  4:38           ` NeilBrown
2017-07-10  7:25           ` Ming Lei
2017-07-10  7:25             ` Ming Lei
2017-07-10 19:05             ` Shaohua Li
2017-07-10 22:54               ` Ming Lei
2017-07-10 23:14               ` NeilBrown
2017-07-10 23:14                 ` NeilBrown
2017-07-12  1:40                 ` Ming Lei
2017-07-12 16:30                   ` Shaohua Li
2017-07-13  1:22                     ` Ming Lei
2017-03-16 16:12 ` [PATCH v3 06/14] md: raid1: retrieve page from pre-allocated resync page array Ming Lei
2017-03-16 16:12 ` [PATCH v3 07/14] md: raid1: use bio helper in process_checks() Ming Lei
2017-03-16 16:12 ` [PATCH v3 08/14] block: introduce bio_copy_data_partial Ming Lei
2017-03-24  5:34   ` Shaohua Li
2017-03-24  5:34     ` Shaohua Li
2017-03-24 16:41   ` Jens Axboe
2017-03-24 16:41     ` Jens Axboe
2017-03-16 16:12 ` [PATCH v3 09/14] md: raid1: move 'offset' out of loop Ming Lei
2017-03-16 16:12 ` [PATCH v3 10/14] md: raid1: improve write behind Ming Lei
2017-03-16 16:12 ` [PATCH v3 11/14] md: raid10: refactor code of read reshape's .bi_end_io Ming Lei
2017-03-16 16:12 ` [PATCH v3 12/14] md: raid10: don't use bio's vec table to manage resync pages Ming Lei
2017-03-16 16:12 ` [PATCH v3 13/14] md: raid10: retrieve page from preallocated resync page array Ming Lei
2017-03-16 16:12 ` [PATCH v3 14/14] md: raid10: avoid direct access to bvec table in handle_reshape_read_error Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.