* [PATCH v2 0/5] md: use bio_clone_fast()
@ 2017-02-14 15:28 Ming Lei
2017-02-14 15:28 ` [PATCH v2 1/5] block: introduce bio_clone_bioset_partial() Ming Lei
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:28 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
Hi,
This patches replaces bio_clone() with bio_fast_clone() in
bio_clone_mddev() because:
1) bio_clone_mddev() is used in raid normal I/O and isn't in
resync I/O path, and all the direct access to bvec table in
raid happens on resync I/O only except for write behind of raid1.
Write behind is treated specially, so the replacement is safe.
2) for write behind, bio_clone() is kept, but this patchset
introduces bio_clone_bioset_partial() to just clone one specific
bvecs range instead of whole table. Then write behind is improved
too.
V2:
1) move 3rd patch into 2nd one
2) kill bio_clone_mddev() and use bio_clone_fast() directly
3) define local variable 'offset' as sector_t in raid1_write_request()
V1:
1) don't introduce bio_clone_slow_mddev_partial()
2) return failure if mddev->bio_set can't be created
3) remove check in bio_clone_mddev() as suggested by
Christoph Hellwig.
4) rename bio_clone_mddev() as bio_clone_fast_mddev()
Ming Lei (5):
block: introduce bio_clone_bioset_partial()
md: fail if mddev->bio_set can't be created
md/raid1: use bio_clone_bioset_partial() in case of write behind
md: remove unnecessary check on mddev
md: fast clone bio in bio_clone_mddev()
block/bio.c | 61 +++++++++++++++++++++++++++++++++++++++++------------
drivers/md/faulty.c | 2 +-
drivers/md/md.c | 15 ++++---------
drivers/md/md.h | 2 --
drivers/md/raid1.c | 28 +++++++++++++++++-------
drivers/md/raid10.c | 11 +++++-----
drivers/md/raid5.c | 4 ++--
include/linux/bio.h | 11 ++++++++--
8 files changed, 89 insertions(+), 45 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/5] block: introduce bio_clone_bioset_partial()
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
@ 2017-02-14 15:28 ` Ming Lei
2017-02-14 15:29 ` [PATCH v2 2/5] md: fail if mddev->bio_set can't be created Ming Lei
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:28 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
md still need bio clone(not the fast version) for behind write,
and it is more efficient to use bio_clone_bioset_partial().
The idea is simple and just copy the bvecs range specified from
parameters.
Reviewed-by: Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
block/bio.c | 61 +++++++++++++++++++++++++++++++++++++++++------------
include/linux/bio.h | 11 ++++++++--
2 files changed, 57 insertions(+), 15 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 4b564d0c3e29..5eec5e08417f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -625,21 +625,20 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs)
}
EXPORT_SYMBOL(bio_clone_fast);
-/**
- * bio_clone_bioset - clone a bio
- * @bio_src: bio to clone
- * @gfp_mask: allocation priority
- * @bs: bio_set to allocate from
- *
- * Clone bio. Caller will own the returned bio, but not the actual data it
- * points to. Reference count of returned bio will be one.
- */
-struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
- struct bio_set *bs)
+static struct bio *__bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
+ struct bio_set *bs, int offset,
+ int size)
{
struct bvec_iter iter;
struct bio_vec bv;
struct bio *bio;
+ struct bvec_iter iter_src = bio_src->bi_iter;
+
+ /* for supporting partial clone */
+ if (offset || size != bio_src->bi_iter.bi_size) {
+ bio_advance_iter(bio_src, &iter_src, offset);
+ iter_src.bi_size = size;
+ }
/*
* Pre immutable biovecs, __bio_clone() used to just do a memcpy from
@@ -663,7 +662,8 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
* __bio_clone_fast() anyways.
*/
- bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
+ bio = bio_alloc_bioset(gfp_mask, __bio_segments(bio_src,
+ &iter_src), bs);
if (!bio)
return NULL;
bio->bi_bdev = bio_src->bi_bdev;
@@ -680,7 +680,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
break;
default:
- bio_for_each_segment(bv, bio_src, iter)
+ __bio_for_each_segment(bv, bio_src, iter, iter_src)
bio->bi_io_vec[bio->bi_vcnt++] = bv;
break;
}
@@ -699,9 +699,44 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
return bio;
}
+
+/**
+ * bio_clone_bioset - clone a bio
+ * @bio_src: bio to clone
+ * @gfp_mask: allocation priority
+ * @bs: bio_set to allocate from
+ *
+ * Clone bio. Caller will own the returned bio, but not the actual data it
+ * points to. Reference count of returned bio will be one.
+ */
+struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
+ struct bio_set *bs)
+{
+ return __bio_clone_bioset(bio_src, gfp_mask, bs, 0,
+ bio_src->bi_iter.bi_size);
+}
EXPORT_SYMBOL(bio_clone_bioset);
/**
+ * bio_clone_bioset_partial - clone a partial bio
+ * @bio_src: bio to clone
+ * @gfp_mask: allocation priority
+ * @bs: bio_set to allocate from
+ * @offset: cloned starting from the offset
+ * @size: size for the cloned bio
+ *
+ * Clone bio. Caller will own the returned bio, but not the actual data it
+ * points to. Reference count of returned bio will be one.
+ */
+struct bio *bio_clone_bioset_partial(struct bio *bio_src, gfp_t gfp_mask,
+ struct bio_set *bs, int offset,
+ int size)
+{
+ return __bio_clone_bioset(bio_src, gfp_mask, bs, offset, size);
+}
+EXPORT_SYMBOL(bio_clone_bioset_partial);
+
+/**
* bio_add_pc_page - attempt to add page to bio
* @q: the target queue
* @bio: destination bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7cf8a6c70a3f..8e521194f6fc 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -183,7 +183,7 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
-static inline unsigned bio_segments(struct bio *bio)
+static inline unsigned __bio_segments(struct bio *bio, struct bvec_iter *bvec)
{
unsigned segs = 0;
struct bio_vec bv;
@@ -205,12 +205,17 @@ static inline unsigned bio_segments(struct bio *bio)
break;
}
- bio_for_each_segment(bv, bio, iter)
+ __bio_for_each_segment(bv, bio, iter, *bvec)
segs++;
return segs;
}
+static inline unsigned bio_segments(struct bio *bio)
+{
+ return __bio_segments(bio, &bio->bi_iter);
+}
+
/*
* get a reference to a bio, so it won't disappear. the intended use is
* something like:
@@ -384,6 +389,8 @@ extern void bio_put(struct bio *);
extern void __bio_clone_fast(struct bio *, struct bio *);
extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);
extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+extern struct bio *bio_clone_bioset_partial(struct bio *, gfp_t,
+ struct bio_set *, int, int);
extern struct bio_set *fs_bio_set;
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/5] md: fail if mddev->bio_set can't be created
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
2017-02-14 15:28 ` [PATCH v2 1/5] block: introduce bio_clone_bioset_partial() Ming Lei
@ 2017-02-14 15:29 ` Ming Lei
2017-02-14 15:29 ` [PATCH v2 3/5] md/raid1: use bio_clone_bioset_partial() in case of write behind Ming Lei
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:29 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
The current behaviour is to fall back to allocate
bio from 'fs_bio_set', that isn't a correct way
because it might cause deadlock.
So this patch simply return failure if mddev->bio_set
can't be created.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/md/md.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7e9a495e4160..b5e2adf3493b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5228,8 +5228,11 @@ int md_run(struct mddev *mddev)
sysfs_notify_dirent_safe(rdev->sysfs_state);
}
- if (mddev->bio_set == NULL)
+ if (mddev->bio_set == NULL) {
mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0);
+ if (!mddev->bio_set)
+ return -ENOMEM;
+ }
spin_lock(&pers_lock);
pers = find_pers(mddev->level, mddev->clevel);
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] md/raid1: use bio_clone_bioset_partial() in case of write behind
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
2017-02-14 15:28 ` [PATCH v2 1/5] block: introduce bio_clone_bioset_partial() Ming Lei
2017-02-14 15:29 ` [PATCH v2 2/5] md: fail if mddev->bio_set can't be created Ming Lei
@ 2017-02-14 15:29 ` Ming Lei
2017-02-14 15:29 ` [PATCH v2 4/5] md: remove unnecessary check on mddev Ming Lei
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:29 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
Write behind need to replace pages in bio's bvecs, and we have
to clone a fresh bio with new bvec table, so use the introduced
bio_clone_bioset_partial() for it.
For other bio_clone_mddev() cases, we will use fast clone since
they don't need to touch bvec table.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/md/raid1.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 830ff2b20346..b7548e0a9eca 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1341,13 +1341,12 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
first_clone = 1;
for (i = 0; i < disks; i++) {
- struct bio *mbio;
+ struct bio *mbio = NULL;
+ sector_t offset;
if (!r1_bio->bios[i])
continue;
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- bio_trim(mbio, r1_bio->sector - bio->bi_iter.bi_sector,
- max_sectors);
+ offset = r1_bio->sector - bio->bi_iter.bi_sector;
if (first_clone) {
/* do behind I/O ?
@@ -1357,8 +1356,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
if (bitmap &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
- !waitqueue_active(&bitmap->behind_wait))
+ !waitqueue_active(&bitmap->behind_wait)) {
+ mbio = bio_clone_bioset_partial(bio, GFP_NOIO,
+ mddev->bio_set,
+ offset,
+ max_sectors);
alloc_behind_pages(mbio, r1_bio);
+ }
bitmap_startwrite(bitmap, r1_bio->sector,
r1_bio->sectors,
@@ -1366,6 +1370,12 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
&r1_bio->state));
first_clone = 0;
}
+
+ if (!mbio) {
+ mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ bio_trim(mbio, offset, max_sectors);
+ }
+
if (r1_bio->behind_bvecs) {
struct bio_vec *bvec;
int j;
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] md: remove unnecessary check on mddev
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
` (2 preceding siblings ...)
2017-02-14 15:29 ` [PATCH v2 3/5] md/raid1: use bio_clone_bioset_partial() in case of write behind Ming Lei
@ 2017-02-14 15:29 ` Ming Lei
2017-02-14 15:29 ` [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev() Ming Lei
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:29 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
mddev is never NULL and neither is ->bio_set, so
remove the check.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/md/md.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b5e2adf3493b..6cd96fde2a67 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -193,9 +193,6 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
{
- if (!mddev || !mddev->bio_set)
- return bio_clone(bio, gfp_mask);
-
return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
}
EXPORT_SYMBOL_GPL(bio_clone_mddev);
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
` (3 preceding siblings ...)
2017-02-14 15:29 ` [PATCH v2 4/5] md: remove unnecessary check on mddev Ming Lei
@ 2017-02-14 15:29 ` Ming Lei
2017-02-14 16:01 ` Christoph Hellwig
2017-02-14 15:31 ` Jens Axboe
2017-02-15 19:19 ` Shaohua Li
6 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2017-02-14 15:29 UTC (permalink / raw)
To: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
Cc: Ming Lei
Firstly bio_clone_mddev() is used in raid normal I/O and isn't
in resync I/O path.
Secondly all the direct access to bvec table in raid happens on
resync I/O except for write behind of raid1, in which we still
use bio_clone() for allocating new bvec table.
So this patch replaces bio_clone() with bio_clone_fast()
in bio_clone_mddev().
Also kill bio_clone_mddev() and call bio_clone_fast() directly, as
suggested by Christoph Hellwig.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/md/faulty.c | 2 +-
drivers/md/md.c | 7 -------
drivers/md/md.h | 2 --
drivers/md/raid1.c | 10 ++++++----
drivers/md/raid10.c | 11 +++++------
drivers/md/raid5.c | 4 ++--
6 files changed, 14 insertions(+), 22 deletions(-)
diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 685aa2d77e25..b0536cfd8e17 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -214,7 +214,7 @@ static void faulty_make_request(struct mddev *mddev, struct bio *bio)
}
}
if (failit) {
- struct bio *b = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ struct bio *b = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
b->bi_bdev = conf->rdev->bdev;
b->bi_private = bio;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6cd96fde2a67..985374f20e2e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -190,13 +190,6 @@ struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
}
EXPORT_SYMBOL_GPL(bio_alloc_mddev);
-struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
- struct mddev *mddev)
-{
- return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
-}
-EXPORT_SYMBOL_GPL(bio_clone_mddev);
-
/*
* We have a system wide 'event count' that is incremented
* on any 'interesting' event, and readers of /proc/mdstat
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 2a514036a83d..a86ad62079de 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -673,8 +673,6 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
-extern struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
- struct mddev *mddev);
extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b7548e0a9eca..85f309836fd7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1108,7 +1108,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
r1_bio->read_disk = rdisk;
r1_bio->start_next_window = 0;
- read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
@@ -1372,7 +1372,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
}
if (!mbio) {
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(mbio, offset, max_sectors);
}
@@ -2283,7 +2283,8 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
wbio->bi_vcnt = vcnt;
} else {
- wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
+ wbio = bio_clone_fast(r1_bio->master_bio, GFP_NOIO,
+ mddev->bio_set);
}
bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
@@ -2421,7 +2422,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
const unsigned long do_sync
= r1_bio->master_bio->bi_opf & REQ_SYNC;
r1_bio->read_disk = disk;
- bio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
+ bio = bio_clone_fast(r1_bio->master_bio, GFP_NOIO,
+ mddev->bio_set);
bio_trim(bio, r1_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
r1_bio->bios[r1_bio->read_disk] = bio;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6bc5c2a85160..063c43d83b72 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1132,7 +1132,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
}
slot = r10_bio->read_slot;
- read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ read_bio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(read_bio, r10_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
@@ -1406,7 +1406,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
int d = r10_bio->devs[i].devnum;
if (r10_bio->devs[i].bio) {
struct md_rdev *rdev = conf->mirrors[d].rdev;
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(mbio, r10_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
r10_bio->devs[i].bio = mbio;
@@ -1457,7 +1457,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
smp_mb();
rdev = conf->mirrors[d].rdev;
}
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ mbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(mbio, r10_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
r10_bio->devs[i].repl_bio = mbio;
@@ -2565,7 +2565,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
if (sectors > sect_to_write)
sectors = sect_to_write;
/* Write at 'sector' for 'sectors' */
- wbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ wbio = bio_clone_fast(bio, GFP_NOIO, mddev->bio_set);
bio_trim(wbio, sector - bio->bi_iter.bi_sector, sectors);
wsector = r10_bio->devs[i].addr + (sector - r10_bio->sector);
wbio->bi_iter.bi_sector = wsector +
@@ -2641,8 +2641,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
mdname(mddev),
bdevname(rdev->bdev, b),
(unsigned long long)r10_bio->sector);
- bio = bio_clone_mddev(r10_bio->master_bio,
- GFP_NOIO, mddev);
+ bio = bio_clone_fast(r10_bio->master_bio, GFP_NOIO, mddev->bio_set);
bio_trim(bio, r10_bio->sector - bio->bi_iter.bi_sector, max_sectors);
r10_bio->devs[slot].bio = bio;
r10_bio->devs[slot].rdev = rdev;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a1b41184e850..c16d09614cf6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5056,9 +5056,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
return 0;
}
/*
- * use bio_clone_mddev to make a copy of the bio
+ * use bio_clone_fast to make a copy of the bio
*/
- align_bi = bio_clone_mddev(raid_bio, GFP_NOIO, mddev);
+ align_bi = bio_clone_fast(raid_bio, GFP_NOIO, mddev->bio_set);
if (!align_bi)
return 0;
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] md: use bio_clone_fast()
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
@ 2017-02-14 15:31 ` Jens Axboe
2017-02-14 15:29 ` [PATCH v2 2/5] md: fail if mddev->bio_set can't be created Ming Lei
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2017-02-14 15:31 UTC (permalink / raw)
To: Ming Lei, Shaohua Li, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
On 02/14/2017 08:28 AM, Ming Lei wrote:
> Hi,
>
> This patches replaces bio_clone() with bio_fast_clone() in
> bio_clone_mddev() because:
>
> 1) bio_clone_mddev() is used in raid normal I/O and isn't in
> resync I/O path, and all the direct access to bvec table in
> raid happens on resync I/O only except for write behind of raid1.
> Write behind is treated specially, so the replacement is safe.
>
> 2) for write behind, bio_clone() is kept, but this patchset
> introduces bio_clone_bioset_partial() to just clone one specific
> bvecs range instead of whole table. Then write behind is improved
> too.
You can add my reviewed-by to the first patch. Shaohua, I'm fine
with you carrying this in the md tree, that would be the easiest
way forward.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] md: use bio_clone_fast()
@ 2017-02-14 15:31 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2017-02-14 15:31 UTC (permalink / raw)
To: Ming Lei, Shaohua Li, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
On 02/14/2017 08:28 AM, Ming Lei wrote:
> Hi,
>
> This patches replaces bio_clone() with bio_fast_clone() in
> bio_clone_mddev() because:
>
> 1) bio_clone_mddev() is used in raid normal I/O and isn't in
> resync I/O path, and all the direct access to bvec table in
> raid happens on resync I/O only except for write behind of raid1.
> Write behind is treated specially, so the replacement is safe.
>
> 2) for write behind, bio_clone() is kept, but this patchset
> introduces bio_clone_bioset_partial() to just clone one specific
> bvecs range instead of whole table. Then write behind is improved
> too.
You can add my reviewed-by to the first patch. Shaohua, I'm fine
with you carrying this in the md tree, that would be the easiest
way forward.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()
2017-02-14 15:29 ` [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev() Ming Lei
@ 2017-02-14 16:01 ` Christoph Hellwig
2017-02-15 19:20 ` Shaohua Li
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2017-02-14 16:01 UTC (permalink / raw)
To: Ming Lei
Cc: Shaohua Li, Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote:
> Firstly bio_clone_mddev() is used in raid normal I/O and isn't
> in resync I/O path.
>
> Secondly all the direct access to bvec table in raid happens on
> resync I/O except for write behind of raid1, in which we still
> use bio_clone() for allocating new bvec table.
>
> So this patch replaces bio_clone() with bio_clone_fast()
> in bio_clone_mddev().
>
> Also kill bio_clone_mddev() and call bio_clone_fast() directly, as
> suggested by Christoph Hellwig.
>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
Btw, can you also tack on another patch to kill bio_alloc_mddev
to be consistent?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] md: use bio_clone_fast()
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
` (5 preceding siblings ...)
2017-02-14 15:31 ` Jens Axboe
@ 2017-02-15 19:19 ` Shaohua Li
6 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2017-02-15 19:19 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-kernel, linux-raid, linux-block,
Christoph Hellwig, NeilBrown
On Tue, Feb 14, 2017 at 11:28:58PM +0800, Ming Lei wrote:
> Hi,
>
> This patches replaces bio_clone() with bio_fast_clone() in
> bio_clone_mddev() because:
>
> 1) bio_clone_mddev() is used in raid normal I/O and isn't in
> resync I/O path, and all the direct access to bvec table in
> raid happens on resync I/O only except for write behind of raid1.
> Write behind is treated specially, so the replacement is safe.
>
> 2) for write behind, bio_clone() is kept, but this patchset
> introduces bio_clone_bioset_partial() to just clone one specific
> bvecs range instead of whole table. Then write behind is improved
> too.
>
> V2:
> 1) move 3rd patch into 2nd one
> 2) kill bio_clone_mddev() and use bio_clone_fast() directly
> 3) define local variable 'offset' as sector_t in raid1_write_request()
>
> V1:
> 1) don't introduce bio_clone_slow_mddev_partial()
> 2) return failure if mddev->bio_set can't be created
> 3) remove check in bio_clone_mddev() as suggested by
> Christoph Hellwig.
> 4) rename bio_clone_mddev() as bio_clone_fast_mddev()
Applied, thanks!
> Ming Lei (5):
> block: introduce bio_clone_bioset_partial()
> md: fail if mddev->bio_set can't be created
> md/raid1: use bio_clone_bioset_partial() in case of write behind
> md: remove unnecessary check on mddev
> md: fast clone bio in bio_clone_mddev()
>
> block/bio.c | 61 +++++++++++++++++++++++++++++++++++++++++------------
> drivers/md/faulty.c | 2 +-
> drivers/md/md.c | 15 ++++---------
> drivers/md/md.h | 2 --
> drivers/md/raid1.c | 28 +++++++++++++++++-------
> drivers/md/raid10.c | 11 +++++-----
> drivers/md/raid5.c | 4 ++--
> include/linux/bio.h | 11 ++++++++--
> 8 files changed, 89 insertions(+), 45 deletions(-)
>
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()
2017-02-14 16:01 ` Christoph Hellwig
@ 2017-02-15 19:20 ` Shaohua Li
2017-02-15 23:19 ` Shaohua Li
0 siblings, 1 reply; 13+ messages in thread
From: Shaohua Li @ 2017-02-15 19:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Ming Lei, Jens Axboe, linux-kernel, linux-raid, linux-block, NeilBrown
On Tue, Feb 14, 2017 at 08:01:09AM -0800, Christoph Hellwig wrote:
> On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote:
> > Firstly bio_clone_mddev() is used in raid normal I/O and isn't
> > in resync I/O path.
> >
> > Secondly all the direct access to bvec table in raid happens on
> > resync I/O except for write behind of raid1, in which we still
> > use bio_clone() for allocating new bvec table.
> >
> > So this patch replaces bio_clone() with bio_clone_fast()
> > in bio_clone_mddev().
> >
> > Also kill bio_clone_mddev() and call bio_clone_fast() directly, as
> > suggested by Christoph Hellwig.
> >
> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>
> Looks fine,
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> Btw, can you also tack on another patch to kill bio_alloc_mddev
> to be consistent?
I'll take care of this
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()
2017-02-15 19:20 ` Shaohua Li
@ 2017-02-15 23:19 ` Shaohua Li
0 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2017-02-15 23:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Ming Lei, Jens Axboe, linux-kernel, linux-raid, linux-block, NeilBrown
On Wed, Feb 15, 2017 at 11:20:25AM -0800, Shaohua Li wrote:
> On Tue, Feb 14, 2017 at 08:01:09AM -0800, Christoph Hellwig wrote:
> > On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote:
> > > Firstly bio_clone_mddev() is used in raid normal I/O and isn't
> > > in resync I/O path.
> > >
> > > Secondly all the direct access to bvec table in raid happens on
> > > resync I/O except for write behind of raid1, in which we still
> > > use bio_clone() for allocating new bvec table.
> > >
> > > So this patch replaces bio_clone() with bio_clone_fast()
> > > in bio_clone_mddev().
> > >
> > > Also kill bio_clone_mddev() and call bio_clone_fast() directly, as
> > > suggested by Christoph Hellwig.
> > >
> > > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >
> > Looks fine,
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> >
> > Btw, can you also tack on another patch to kill bio_alloc_mddev
> > to be consistent?
>
> I'll take care of this
Hmm, bio_alloc_mddev is a little different, it could be called without the
bio_set. We can only guarantee the bio_set is valid for bio for md personality
not for metadata. Fully fixing the bio_set issue will need rework mddev struct
allocation, don't think it's worthy doing. So below is one removing the export
of the wrap.
From eaf70c32e5cca68110a0cf7a0311ef0ac97f4d42 Mon Sep 17 00:00:00 2001
Message-Id: <eaf70c32e5cca68110a0cf7a0311ef0ac97f4d42.1487200103.git.shli@fb.com>
From: Shaohua Li <shli@fb.com>
Date: Wed, 15 Feb 2017 11:41:21 -0800
Subject: [PATCH] md: don't export bio_alloc_mddev
When mddev is running, it has a valid bio_set, so the bio_alloc_mddev
wrap is unncessary. The problem is we run some IO (eg, load superblock)
before mddev is fully initialized. At that time bio_set isn't
initialized. Moving bio_set creation to md_alloc doesn't help either,
because dm-raid doesn't use it.
Signed-off-by: Shaohua Li <shli@fb.com>
---
drivers/md/md.c | 12 ++++++------
drivers/md/md.h | 2 --
drivers/md/raid1.c | 2 +-
drivers/md/raid10.c | 2 +-
4 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 55e7e7a..bef1863 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -171,11 +171,12 @@ static const struct block_device_operations md_fops;
static int start_readonly;
-/* bio_clone_mddev
- * like bio_clone, but with a local bio set
+/*
+ * This is only required for meta data related bio, where bio_set might not be
+ * initialized yet. dm-raid doesn't use md_alloc, so we can't alloc bio_set
+ * there.
*/
-
-struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
+static struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev)
{
struct bio *b;
@@ -188,7 +189,6 @@ struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
return NULL;
return b;
}
-EXPORT_SYMBOL_GPL(bio_alloc_mddev);
/*
* We have a system wide 'event count' that is incremented
@@ -393,7 +393,7 @@ static void submit_flushes(struct work_struct *ws)
atomic_inc(&rdev->nr_pending);
atomic_inc(&rdev->nr_pending);
rcu_read_unlock();
- bi = bio_alloc_mddev(GFP_NOIO, 0, mddev);
+ bi = bio_alloc_bioset(GFP_NOIO, 0, mddev->bio_set);
bi->bi_end_io = md_end_flush;
bi->bi_private = rdev;
bi->bi_bdev = rdev->bdev;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8859cb..44fe227 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -673,8 +673,6 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
-extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
- struct mddev *mddev);
extern void md_unplug(struct blk_plug_cb *cb, bool from_schedule);
extern void md_reload_sb(struct mddev *mddev, int raid_disk);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ad5c948..2180a9a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2281,7 +2281,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
vcnt--;
}
- wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
+ wbio = bio_alloc_bioset(GFP_NOIO, vcnt, mddev->bio_set);
memcpy(wbio->bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
wbio->bi_vcnt = vcnt;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ade7d69..a1f8e98 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4473,7 +4473,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
return sectors_done;
}
- read_bio = bio_alloc_mddev(GFP_KERNEL, RESYNC_PAGES, mddev);
+ read_bio = bio_alloc_bioset(GFP_KERNEL, RESYNC_PAGES, mddev->bio_set);
read_bio->bi_bdev = rdev->bdev;
read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()
@ 2017-02-15 23:19 ` Shaohua Li
0 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2017-02-15 23:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Ming Lei, Jens Axboe, linux-kernel, linux-raid, linux-block, NeilBrown
On Wed, Feb 15, 2017 at 11:20:25AM -0800, Shaohua Li wrote:
> On Tue, Feb 14, 2017 at 08:01:09AM -0800, Christoph Hellwig wrote:
> > On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote:
> > > Firstly bio_clone_mddev() is used in raid normal I/O and isn't
> > > in resync I/O path.
> > >
> > > Secondly all the direct access to bvec table in raid happens on
> > > resync I/O except for write behind of raid1, in which we still
> > > use bio_clone() for allocating new bvec table.
> > >
> > > So this patch replaces bio_clone() with bio_clone_fast()
> > > in bio_clone_mddev().
> > >
> > > Also kill bio_clone_mddev() and call bio_clone_fast() directly, as
> > > suggested by Christoph Hellwig.
> > >
> > > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >
> > Looks fine,
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> >
> > Btw, can you also tack on another patch to kill bio_alloc_mddev
> > to be consistent?
>
> I'll take care of this
Hmm, bio_alloc_mddev is a little different, it could be called without the
bio_set. We can only guarantee the bio_set is valid for bio for md personality
not for metadata. Fully fixing the bio_set issue will need rework mddev struct
allocation, don't think it's worthy doing. So below is one removing the export
of the wrap.
>From eaf70c32e5cca68110a0cf7a0311ef0ac97f4d42 Mon Sep 17 00:00:00 2001
Message-Id: <eaf70c32e5cca68110a0cf7a0311ef0ac97f4d42.1487200103.git.shli@fb.com>
From: Shaohua Li <shli@fb.com>
Date: Wed, 15 Feb 2017 11:41:21 -0800
Subject: [PATCH] md: don't export bio_alloc_mddev
When mddev is running, it has a valid bio_set, so the bio_alloc_mddev
wrap is unncessary. The problem is we run some IO (eg, load superblock)
before mddev is fully initialized. At that time bio_set isn't
initialized. Moving bio_set creation to md_alloc doesn't help either,
because dm-raid doesn't use it.
Signed-off-by: Shaohua Li <shli@fb.com>
---
drivers/md/md.c | 12 ++++++------
drivers/md/md.h | 2 --
drivers/md/raid1.c | 2 +-
drivers/md/raid10.c | 2 +-
4 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 55e7e7a..bef1863 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -171,11 +171,12 @@ static const struct block_device_operations md_fops;
static int start_readonly;
-/* bio_clone_mddev
- * like bio_clone, but with a local bio set
+/*
+ * This is only required for meta data related bio, where bio_set might not be
+ * initialized yet. dm-raid doesn't use md_alloc, so we can't alloc bio_set
+ * there.
*/
-
-struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
+static struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev)
{
struct bio *b;
@@ -188,7 +189,6 @@ struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
return NULL;
return b;
}
-EXPORT_SYMBOL_GPL(bio_alloc_mddev);
/*
* We have a system wide 'event count' that is incremented
@@ -393,7 +393,7 @@ static void submit_flushes(struct work_struct *ws)
atomic_inc(&rdev->nr_pending);
atomic_inc(&rdev->nr_pending);
rcu_read_unlock();
- bi = bio_alloc_mddev(GFP_NOIO, 0, mddev);
+ bi = bio_alloc_bioset(GFP_NOIO, 0, mddev->bio_set);
bi->bi_end_io = md_end_flush;
bi->bi_private = rdev;
bi->bi_bdev = rdev->bdev;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8859cb..44fe227 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -673,8 +673,6 @@ extern void md_rdev_clear(struct md_rdev *rdev);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
-extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
- struct mddev *mddev);
extern void md_unplug(struct blk_plug_cb *cb, bool from_schedule);
extern void md_reload_sb(struct mddev *mddev, int raid_disk);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ad5c948..2180a9a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2281,7 +2281,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
vcnt--;
}
- wbio = bio_alloc_mddev(GFP_NOIO, vcnt, mddev);
+ wbio = bio_alloc_bioset(GFP_NOIO, vcnt, mddev->bio_set);
memcpy(wbio->bi_io_vec, vec, vcnt * sizeof(struct bio_vec));
wbio->bi_vcnt = vcnt;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ade7d69..a1f8e98 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4473,7 +4473,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
return sectors_done;
}
- read_bio = bio_alloc_mddev(GFP_KERNEL, RESYNC_PAGES, mddev);
+ read_bio = bio_alloc_bioset(GFP_KERNEL, RESYNC_PAGES, mddev->bio_set);
read_bio->bi_bdev = rdev->bdev;
read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-02-15 23:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-14 15:28 [PATCH v2 0/5] md: use bio_clone_fast() Ming Lei
2017-02-14 15:28 ` [PATCH v2 1/5] block: introduce bio_clone_bioset_partial() Ming Lei
2017-02-14 15:29 ` [PATCH v2 2/5] md: fail if mddev->bio_set can't be created Ming Lei
2017-02-14 15:29 ` [PATCH v2 3/5] md/raid1: use bio_clone_bioset_partial() in case of write behind Ming Lei
2017-02-14 15:29 ` [PATCH v2 4/5] md: remove unnecessary check on mddev Ming Lei
2017-02-14 15:29 ` [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev() Ming Lei
2017-02-14 16:01 ` Christoph Hellwig
2017-02-15 19:20 ` Shaohua Li
2017-02-15 23:19 ` Shaohua Li
2017-02-15 23:19 ` Shaohua Li
2017-02-14 15:31 ` [PATCH v2 0/5] md: use bio_clone_fast() Jens Axboe
2017-02-14 15:31 ` Jens Axboe
2017-02-15 19:19 ` Shaohua Li
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.