* [PATCH 1/5] md: revert io stats accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
@ 2021-05-18 5:32 ` Guoqing Jiang
2021-05-18 5:32 ` [PATCH 2/5] md: the latest try for improve " Guoqing Jiang
` (4 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-18 5:32 UTC (permalink / raw)
To: song; +Cc: linux-raid, artur.paszkiewicz, Guoqing Jiang
The commit 41d2d848e5c0 ("md: improve io stats accounting") could cause
double fault problem per the report [1], and also it is not correct to
change ->bi_end_io if md don't own it, so let's revert it.
And io stats accounting will be replemented in later commits.
[1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t
Fixes: 41d2d848e5c0 ("md: improve io stats accounting")
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
drivers/md/md.c | 45 ---------------------------------------------
drivers/md/md.h | 1 -
2 files changed, 46 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 49f897fbb89b..7ba00e4c862d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -441,30 +441,6 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
}
EXPORT_SYMBOL(md_handle_request);
-struct md_io {
- struct mddev *mddev;
- bio_end_io_t *orig_bi_end_io;
- void *orig_bi_private;
- struct block_device *orig_bi_bdev;
- unsigned long start_time;
-};
-
-static void md_end_io(struct bio *bio)
-{
- struct md_io *md_io = bio->bi_private;
- struct mddev *mddev = md_io->mddev;
-
- bio_end_io_acct_remapped(bio, md_io->start_time, md_io->orig_bi_bdev);
-
- bio->bi_end_io = md_io->orig_bi_end_io;
- bio->bi_private = md_io->orig_bi_private;
-
- mempool_free(md_io, &mddev->md_io_pool);
-
- if (bio->bi_end_io)
- bio->bi_end_io(bio);
-}
-
static blk_qc_t md_submit_bio(struct bio *bio)
{
const int rw = bio_data_dir(bio);
@@ -489,21 +465,6 @@ static blk_qc_t md_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
- if (bio->bi_end_io != md_end_io) {
- struct md_io *md_io;
-
- md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
- md_io->mddev = mddev;
- md_io->orig_bi_end_io = bio->bi_end_io;
- md_io->orig_bi_private = bio->bi_private;
- md_io->orig_bi_bdev = bio->bi_bdev;
-
- bio->bi_end_io = md_end_io;
- bio->bi_private = md_io;
-
- md_io->start_time = bio_start_io_acct(bio);
- }
-
/* bio could be mergeable after passing to underlayer */
bio->bi_opf &= ~REQ_NOMERGE;
@@ -5608,7 +5569,6 @@ static void md_free(struct kobject *ko)
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
- mempool_exit(&mddev->md_io_pool);
kfree(mddev);
}
@@ -5705,11 +5665,6 @@ static int md_alloc(dev_t dev, char *name)
*/
mddev->hold_active = UNTIL_STOP;
- error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
- sizeof(struct md_io));
- if (error)
- goto abort;
-
error = -ENOMEM;
mddev->queue = blk_alloc_queue(NUMA_NO_NODE);
if (!mddev->queue)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index fb7eab58cfd5..4da240ffe2c5 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -487,7 +487,6 @@ struct mddev {
struct bio_set sync_set; /* for sync operations like
* metadata and bitmap writes
*/
- mempool_t md_io_pool;
/* Generic flush handling.
* The last to finish preflush schedules a worker to submit
--
2.25.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
2021-05-18 5:32 ` [PATCH 1/5] md: revert " Guoqing Jiang
@ 2021-05-18 5:32 ` Guoqing Jiang
2021-05-18 10:12 ` Artur Paszkiewicz
2021-05-18 13:35 ` Christoph Hellwig
2021-05-18 5:32 ` [PATCH 3/5] md-multipath: enable io accounting Guoqing Jiang
` (3 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-18 5:32 UTC (permalink / raw)
To: song; +Cc: linux-raid, artur.paszkiewicz, Guoqing Jiang
From: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Use generic io accounting functions to manage io stats. There was an
attempt to do this earlier in commit 18c0b223cf990172 ("md: use generic
io stats accounting functions to simplify io stat accounting"), but it
did not include a call to generic_end_io_acct() and caused issues with
tracking in-flight IOs, so it was later removed in commit 74672d069b29
("md: fix md io stats accounting broken").
This patch attempts to fix this by using both generic_start_io_acct()
and generic_end_io_acct(). To make it possible, in md_make_request() a
bio is cloned with additional data - struct md_io, which includes the io
start_time. A new bioset is introduced for this purpose. We call
generic_start_io_acct() and pass the clone instead of the original to
md_handle_request(). When it completes, we call generic_end_io_acct()
and complete the original bio.
This adds correct statistics about in-flight IOs and IO processing time,
interpreted e.g. in iostat as await, svctm, aqu-sz and %util.
It also fixes a situation where too many IOs where reported if a bio was
re-submitted to the mddev, because io accounting is now performed only
on newly arriving bios.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
[Guoqing: rebase and make generic accounting applies to personalities
which don't have clone infrastructure]
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
drivers/md/md.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
drivers/md/md.h | 1 +
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7ba00e4c862d..35355c187377 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -441,6 +441,25 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
}
EXPORT_SYMBOL(md_handle_request);
+struct md_io {
+ struct mddev *mddev;
+ struct bio *orig_bio;
+ unsigned long start_time;
+ struct bio orig_bio_clone;
+};
+
+static void md_end_io(struct bio *bio)
+{
+ struct md_io *md_io = bio->bi_private;
+ struct bio *orig_bio = md_io->orig_bio;
+
+ orig_bio->bi_status = bio->bi_status;
+
+ bio_end_io_acct_remapped(orig_bio, md_io->start_time, orig_bio->bi_bdev);
+ bio_put(bio);
+ bio_endio(orig_bio);
+}
+
static blk_qc_t md_submit_bio(struct bio *bio)
{
const int rw = bio_data_dir(bio);
@@ -465,6 +484,29 @@ static blk_qc_t md_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
+ /*
+ * We don't clone bio for multipath, raid1 and raid10 since we can reuse
+ * their clone infrastructure.
+ */
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue) &&
+ (bio->bi_pool != &mddev->md_io_bs) &&
+ (mddev->level != 1) && (mddev->level != 10) &&
+ (mddev->level != LEVEL_MULTIPATH)) {
+ struct md_io *md_io;
+ struct bio *clone;
+
+ clone = bio_clone_fast(bio, GFP_NOIO, &mddev->md_io_bs);
+
+ md_io = container_of(clone, struct md_io, orig_bio_clone);
+ md_io->mddev = mddev;
+ md_io->orig_bio = bio;
+ md_io->start_time = bio_start_io_acct(bio);
+
+ clone->bi_end_io = md_end_io;
+ clone->bi_private = md_io;
+ bio = clone;
+ }
+
/* bio could be mergeable after passing to underlayer */
bio->bi_opf &= ~REQ_NOMERGE;
@@ -2340,7 +2382,8 @@ int md_integrity_register(struct mddev *mddev)
bdev_get_integrity(reference->bdev));
pr_debug("md: data integrity enabled on %s\n", mdname(mddev));
- if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE)) {
+ if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE) ||
+ bioset_integrity_create(&mddev->md_io_bs, BIO_POOL_SIZE)) {
pr_err("md: failed to create integrity pool for %s\n",
mdname(mddev));
return -EINVAL;
@@ -5569,6 +5612,7 @@ static void md_free(struct kobject *ko)
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
+ bioset_exit(&mddev->md_io_bs);
kfree(mddev);
}
@@ -5864,6 +5908,12 @@ int md_run(struct mddev *mddev)
if (err)
return err;
}
+ if (!bioset_initialized(&mddev->md_io_bs)) {
+ err = bioset_init(&mddev->md_io_bs, BIO_POOL_SIZE,
+ offsetof(struct md_io, orig_bio_clone), 0);
+ if (err)
+ return err;
+ }
spin_lock(&pers_lock);
pers = find_pers(mddev->level, mddev->clevel);
@@ -6041,6 +6091,7 @@ int md_run(struct mddev *mddev)
abort:
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
+ bioset_exit(&mddev->md_io_bs);
return err;
}
EXPORT_SYMBOL_GPL(md_run);
@@ -6264,6 +6315,7 @@ void md_stop(struct mddev *mddev)
__md_stop(mddev);
bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set);
+ bioset_exit(&mddev->md_io_bs);
}
EXPORT_SYMBOL_GPL(md_stop);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4da240ffe2c5..1dece6b73c9e 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -487,6 +487,7 @@ struct mddev {
struct bio_set sync_set; /* for sync operations like
* metadata and bitmap writes
*/
+ struct bio_set md_io_bs; /* for io accounting */
/* Generic flush handling.
* The last to finish preflush schedules a worker to submit
--
2.25.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-18 5:32 ` [PATCH 2/5] md: the latest try for improve " Guoqing Jiang
@ 2021-05-18 10:12 ` Artur Paszkiewicz
2021-05-19 1:30 ` Guoqing Jiang
2021-05-18 13:35 ` Christoph Hellwig
1 sibling, 1 reply; 20+ messages in thread
From: Artur Paszkiewicz @ 2021-05-18 10:12 UTC (permalink / raw)
To: Guoqing Jiang, song; +Cc: linux-raid, Guoqing Jiang
On 18.05.2021 07:32, Guoqing Jiang wrote:
> + /*
> + * We don't clone bio for multipath, raid1 and raid10 since we can reuse
> + * their clone infrastructure.
> + */
> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue) &&
> + (bio->bi_pool != &mddev->md_io_bs) &&
> + (mddev->level != 1) && (mddev->level != 10) &&
> + (mddev->level != LEVEL_MULTIPATH)) {
Maybe add a flag to struct md_personality and check it here? Something
that will be set only for the personalities which clone the bio
themselves.
Doesn't this need to check the bio->bi_pool also against mddev->bio_set
to skip the bios split by md? Similarly to the check against
bio_chain_endio which you did before.
Thanks,
Artur
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-18 10:12 ` Artur Paszkiewicz
@ 2021-05-19 1:30 ` Guoqing Jiang
2021-05-19 7:26 ` Artur Paszkiewicz
0 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 1:30 UTC (permalink / raw)
To: Artur Paszkiewicz, song; +Cc: linux-raid, Guoqing Jiang
On 5/18/21 6:12 PM, Artur Paszkiewicz wrote:
> On 18.05.2021 07:32, Guoqing Jiang wrote:
>> + /*
>> + * We don't clone bio for multipath, raid1 and raid10 since we can reuse
>> + * their clone infrastructure.
>> + */
>> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue) &&
>> + (bio->bi_pool != &mddev->md_io_bs) &&
>> + (mddev->level != 1) && (mddev->level != 10) &&
>> + (mddev->level != LEVEL_MULTIPATH)) {
> Maybe add a flag to struct md_personality and check it here? Something
> that will be set only for the personalities which clone the bio
> themselves.
Good point.
> Doesn't this need to check the bio->bi_pool also against mddev->bio_set
> to skip the bios split by md? Similarly to the check against
> bio_chain_endio which you did before.
Hmm, raid0 allocates split bio from mddev->bio_set, but raid5 is
different, it splits bio from r5conf->bio_split. So either let raid5 also
splits bio from mddev->bio_set, or add an additional checking for
raid5. Thoughts?
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-19 1:30 ` Guoqing Jiang
@ 2021-05-19 7:26 ` Artur Paszkiewicz
2021-05-19 7:46 ` Christoph Hellwig
2021-05-19 8:09 ` Guoqing Jiang
0 siblings, 2 replies; 20+ messages in thread
From: Artur Paszkiewicz @ 2021-05-19 7:26 UTC (permalink / raw)
To: Guoqing Jiang, song; +Cc: linux-raid, Guoqing Jiang
On 19.05.2021 03:30, Guoqing Jiang wrote:
> Hmm, raid0 allocates split bio from mddev->bio_set, but raid5 is
> different, it splits bio from r5conf->bio_split. So either let raid5 also
> splits bio from mddev->bio_set, or add an additional checking for
> raid5. Thoughts?
It looks like raid5 has a different bio set for that because it uses
mddev->bio_set for something else - allocating a bio for rdev. So I
think it can be changed to split from mddev->bio_set and have a private
bio set for the rdev bio allocation.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-19 7:26 ` Artur Paszkiewicz
@ 2021-05-19 7:46 ` Christoph Hellwig
2021-05-19 8:09 ` Guoqing Jiang
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2021-05-19 7:46 UTC (permalink / raw)
To: Artur Paszkiewicz; +Cc: Guoqing Jiang, song, linux-raid, Guoqing Jiang
On Wed, May 19, 2021 at 09:26:21AM +0200, Artur Paszkiewicz wrote:
> On 19.05.2021 03:30, Guoqing Jiang wrote:
> > Hmm, raid0 allocates split bio from mddev->bio_set, but raid5 is
> > different, it splits bio from r5conf->bio_split. So either let raid5 also
> > splits bio from mddev->bio_set, or add an additional checking for
> > raid5. Thoughts?
>
> It looks like raid5 has a different bio set for that because it uses
> mddev->bio_set for something else - allocating a bio for rdev. So I
> think it can be changed to split from mddev->bio_set and have a private
> bio set for the rdev bio allocation.
Just wondering: what about moving the allocation of the clone into the
personalities entirely?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-19 7:26 ` Artur Paszkiewicz
2021-05-19 7:46 ` Christoph Hellwig
@ 2021-05-19 8:09 ` Guoqing Jiang
2021-05-20 0:45 ` Song Liu
1 sibling, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 8:09 UTC (permalink / raw)
To: Artur Paszkiewicz, song; +Cc: linux-raid, Guoqing Jiang
On 5/19/21 3:26 PM, Artur Paszkiewicz wrote:
> On 19.05.2021 03:30, Guoqing Jiang wrote:
>> Hmm, raid0 allocates split bio from mddev->bio_set, but raid5 is
>> different, it splits bio from r5conf->bio_split. So either let raid5 also
>> splits bio from mddev->bio_set, or add an additional checking for
>> raid5. Thoughts?
> It looks like raid5 has a different bio set for that because it uses
> mddev->bio_set for something else - allocating a bio for rdev. So I
> think it can be changed to split from mddev->bio_set and have a private
> bio set for the rdev bio allocation.
Instead of add a flag, I tried this way:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index e5d7411cba9b..d309b639b5d9 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -748,6 +748,19 @@ static void raid0_quiesce(struct mddev *mddev, int
quiesce)
{
}
+/*
+ * Don't account the bio if it was split from mddev->bio_set.
+ */
+static bool raid0_accounting_bio(struct mddev *mddev, struct bio *bio)
+{
+ bool ret = true;
+
+ if (bio->bi_pool == &mddev->bio_set)
+ ret = false;
+
+ return ret;
+}
+
static struct md_personality raid0_personality=
{
.name = "raid0",
@@ -760,6 +773,7 @@ static struct md_personality raid0_personality=
.size = raid0_size,
.takeover = raid0_takeover,
.quiesce = raid0_quiesce,
+ .accounting_bio = raid0_accounting_bio,
};
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 841e1c1aa5e6..070ccf55c534 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8596,6 +8596,20 @@ static void *raid6_takeover(struct mddev *mddev)
return setup_conf(mddev);
}
+/*
+ * Don't account the bio if it was split from r5conf->bio_split.
+ */
+static bool raid5_accounting_bio(struct mddev *mddev, struct bio *bio)
+{
+ bool ret = true;
+ struct r5conf *conf = mddev->private;
+
+ if (bio->bi_pool == &conf->bio_split)
+ ret = false;
+
+ return ret;
+}
+
static int raid5_change_consistency_policy(struct mddev *mddev, const
char *buf)
{
struct r5conf *conf;
@@ -8687,6 +8701,7 @@ static struct md_personality raid6_personality =
.finish_reshape = raid5_finish_reshape,
.quiesce = raid5_quiesce,
.takeover = raid6_takeover,
+ .accounting_bio = raid5_accounting_bio,
Thanks,
Guoqing
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-19 8:09 ` Guoqing Jiang
@ 2021-05-20 0:45 ` Song Liu
0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2021-05-20 0:45 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Artur Paszkiewicz, linux-raid, Guoqing Jiang
On Wed, May 19, 2021 at 1:09 AM Guoqing Jiang <jgq516@gmail.com> wrote:
>
>
>
> On 5/19/21 3:26 PM, Artur Paszkiewicz wrote:
> > On 19.05.2021 03:30, Guoqing Jiang wrote:
> >> Hmm, raid0 allocates split bio from mddev->bio_set, but raid5 is
> >> different, it splits bio from r5conf->bio_split. So either let raid5 also
> >> splits bio from mddev->bio_set, or add an additional checking for
> >> raid5. Thoughts?
> > It looks like raid5 has a different bio set for that because it uses
> > mddev->bio_set for something else - allocating a bio for rdev. So I
> > think it can be changed to split from mddev->bio_set and have a private
> > bio set for the rdev bio allocation.
>
>
> Instead of add a flag, I tried this way:
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index e5d7411cba9b..d309b639b5d9 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -748,6 +748,19 @@ static void raid0_quiesce(struct mddev *mddev, int
> quiesce)
> {
> }
>
> +/*
> + * Don't account the bio if it was split from mddev->bio_set.
> + */
> +static bool raid0_accounting_bio(struct mddev *mddev, struct bio *bio)
> +{
> + bool ret = true;
> +
> + if (bio->bi_pool == &mddev->bio_set)
> + ret = false;
> +
> + return ret;
> +}
> +
> static struct md_personality raid0_personality=
> {
> .name = "raid0",
> @@ -760,6 +773,7 @@ static struct md_personality raid0_personality=
> .size = raid0_size,
> .takeover = raid0_takeover,
> .quiesce = raid0_quiesce,
> + .accounting_bio = raid0_accounting_bio,
> };
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 841e1c1aa5e6..070ccf55c534 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8596,6 +8596,20 @@ static void *raid6_takeover(struct mddev *mddev)
> return setup_conf(mddev);
> }
>
> +/*
> + * Don't account the bio if it was split from r5conf->bio_split.
> + */
> +static bool raid5_accounting_bio(struct mddev *mddev, struct bio *bio)
> +{
> + bool ret = true;
> + struct r5conf *conf = mddev->private;
> +
> + if (bio->bi_pool == &conf->bio_split)
> + ret = false;
> +
> + return ret;
> +}
> +
> static int raid5_change_consistency_policy(struct mddev *mddev, const
> char *buf)
> {
> struct r5conf *conf;
> @@ -8687,6 +8701,7 @@ static struct md_personality raid6_personality =
> .finish_reshape = raid5_finish_reshape,
> .quiesce = raid5_quiesce,
> .takeover = raid6_takeover,
> + .accounting_bio = raid5_accounting_bio,
I like the accounting_bio idea. Let's go with it.
Thanks,
Song
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-18 5:32 ` [PATCH 2/5] md: the latest try for improve " Guoqing Jiang
2021-05-18 10:12 ` Artur Paszkiewicz
@ 2021-05-18 13:35 ` Christoph Hellwig
2021-05-19 1:40 ` Guoqing Jiang
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-05-18 13:35 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
> + struct bio orig_bio_clone;
Maybe call this bio_clone? or just bio?
> +static void md_end_io(struct bio *bio)
> +{
> + struct md_io *md_io = bio->bi_private;
> + struct bio *orig_bio = md_io->orig_bio;
> +
> + orig_bio->bi_status = bio->bi_status;
> +
> + bio_end_io_acct_remapped(orig_bio, md_io->start_time, orig_bio->bi_bdev);
Overly long line. And trivially fixed by just using bio_end_io_acct.
> + /*
> + * We don't clone bio for multipath, raid1 and raid10 since we can reuse
> + * their clone infrastructure.
> + */
> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue) &&
> + (bio->bi_pool != &mddev->md_io_bs) &&
> + (mddev->level != 1) && (mddev->level != 10) &&
> + (mddev->level != LEVEL_MULTIPATH)) {
This should use a flag in struct md_personality instead.
> - if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE)) {
> + if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE) ||
> + bioset_integrity_create(&mddev->md_io_bs, BIO_POOL_SIZE)) {
If the md_io_bs creation fails bio_set needs to be cleaned up.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] md: the latest try for improve io stats accounting
2021-05-18 13:35 ` Christoph Hellwig
@ 2021-05-19 1:40 ` Guoqing Jiang
0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 1:40 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
On 5/18/21 9:35 PM, Christoph Hellwig wrote:
>> + struct bio orig_bio_clone;
> Maybe call this bio_clone? or just bio?
Ok, will rename it.
>> +static void md_end_io(struct bio *bio)
>> +{
>> + struct md_io *md_io = bio->bi_private;
>> + struct bio *orig_bio = md_io->orig_bio;
>> +
>> + orig_bio->bi_status = bio->bi_status;
>> +
>> + bio_end_io_acct_remapped(orig_bio, md_io->start_time, orig_bio->bi_bdev);
> Overly long line. And trivially fixed by just using bio_end_io_acct.
Thanks for reminder.
>> + /*
>> + * We don't clone bio for multipath, raid1 and raid10 since we can reuse
>> + * their clone infrastructure.
>> + */
>> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue) &&
>> + (bio->bi_pool != &mddev->md_io_bs) &&
>> + (mddev->level != 1) && (mddev->level != 10) &&
>> + (mddev->level != LEVEL_MULTIPATH)) {
> This should use a flag in struct md_personality instead.
Yes.
>> - if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE)) {
>> + if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE) ||
>> + bioset_integrity_create(&mddev->md_io_bs, BIO_POOL_SIZE)) {
> If the md_io_bs creation fails bio_set needs to be cleaned up.
Good catch, and seems we need to call bioset_exit(&mddev->bio_set) as well.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/5] md-multipath: enable io accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
2021-05-18 5:32 ` [PATCH 1/5] md: revert " Guoqing Jiang
2021-05-18 5:32 ` [PATCH 2/5] md: the latest try for improve " Guoqing Jiang
@ 2021-05-18 5:32 ` Guoqing Jiang
2021-05-18 13:37 ` Christoph Hellwig
2021-05-18 5:32 ` [PATCH 4/5] md/raid1: " Guoqing Jiang
` (2 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-18 5:32 UTC (permalink / raw)
To: song; +Cc: linux-raid, artur.paszkiewicz, Guoqing Jiang
We can do the accounting between make_request and io is finished, also
introduce start_time accordingly.
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
drivers/md/md-multipath.c | 5 +++++
drivers/md/md-multipath.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
index 776bbe542db5..17cf35f4acdb 100644
--- a/drivers/md/md-multipath.c
+++ b/drivers/md/md-multipath.c
@@ -71,6 +71,8 @@ static void multipath_end_bh_io(struct multipath_bh *mp_bh, blk_status_t status)
struct mpconf *conf = mp_bh->mddev->private;
bio->bi_status = status;
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ bio_end_io_acct_remapped(bio, mp_bh->start_time, bio->bi_bdev);
bio_endio(bio);
mempool_free(mp_bh, &conf->pool);
}
@@ -124,6 +126,9 @@ static bool multipath_make_request(struct mddev *mddev, struct bio * bio)
bio_init(&mp_bh->bio, NULL, 0);
__bio_clone_fast(&mp_bh->bio, bio);
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ mp_bh->start_time = bio_start_io_acct(bio);
+
mp_bh->bio.bi_iter.bi_sector += multipath->rdev->data_offset;
bio_set_dev(&mp_bh->bio, multipath->rdev->bdev);
mp_bh->bio.bi_opf |= REQ_FAILFAST_TRANSPORT;
diff --git a/drivers/md/md-multipath.h b/drivers/md/md-multipath.h
index b3099e5fc4d7..71376d95eac0 100644
--- a/drivers/md/md-multipath.h
+++ b/drivers/md/md-multipath.h
@@ -28,5 +28,6 @@ struct multipath_bh {
struct bio bio;
int path;
struct list_head retry_list;
+ unsigned long start_time;
};
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] md-multipath: enable io accounting
2021-05-18 5:32 ` [PATCH 3/5] md-multipath: enable io accounting Guoqing Jiang
@ 2021-05-18 13:37 ` Christoph Hellwig
2021-05-19 1:46 ` Guoqing Jiang
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-05-18 13:37 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
Is anyone actually still using MD multipathing? I wonder if we just
need to deprecate and eventually remove this code..
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] md-multipath: enable io accounting
2021-05-18 13:37 ` Christoph Hellwig
@ 2021-05-19 1:46 ` Guoqing Jiang
0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 1:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
On 5/18/21 9:37 PM, Christoph Hellwig wrote:
> Is anyone actually still using MD multipathing? I wonder if we just
> need to deprecate and eventually remove this code..
Not sure about it, only linear is marked with deprecated.
drivers/md/md-linear.c:MODULE_ALIAS("md-personality-1"); /* LINEAR -
deprecated*/
Perhaps better to mention they are deprecated in Kconfig.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/5] md/raid1: enable io accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
` (2 preceding siblings ...)
2021-05-18 5:32 ` [PATCH 3/5] md-multipath: enable io accounting Guoqing Jiang
@ 2021-05-18 5:32 ` Guoqing Jiang
2021-05-18 13:40 ` Christoph Hellwig
2021-05-18 5:32 ` [PATCH 5/5] md/raid10: " Guoqing Jiang
2021-05-19 0:14 ` [PATCH 0/5] md: io stats accounting Song Liu
5 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-18 5:32 UTC (permalink / raw)
To: song; +Cc: linux-raid, artur.paszkiewicz, Guoqing Jiang
For raid1, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.
Also introduce start_time in r1bio accordingly.
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
drivers/md/raid1.c | 11 +++++++++++
drivers/md/raid1.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ced076ba560e..b08a47523dbb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -300,6 +300,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
bio->bi_status = BLK_STS_IOERR;
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ bio_end_io_acct_remapped(bio, r1_bio->start_time, bio->bi_bdev);
bio_endio(bio);
}
@@ -1292,6 +1294,13 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
r1_bio->read_disk = rdisk;
+ /*
+ * Reuse print_msg, if it is false then a fresh r1_bio is just
+ * allocated before.
+ */
+ if (!print_msg && blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ r1_bio->start_time = bio_start_io_acct(bio);
+
read_bio = bio_clone_fast(bio, gfp, &mddev->bio_set);
r1_bio->bios[rdisk] = read_bio;
@@ -1461,6 +1470,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
r1_bio->sectors = max_sectors;
}
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ r1_bio->start_time = bio_start_io_acct(bio);
atomic_set(&r1_bio->remaining, 1);
atomic_set(&r1_bio->behind_remaining, 0);
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index b7eb09e8c025..ccf10e59b116 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -158,6 +158,7 @@ struct r1bio {
sector_t sector;
int sectors;
unsigned long state;
+ unsigned long start_time;
struct mddev *mddev;
/*
* original bio going to /dev/mdx
--
2.25.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] md/raid1: enable io accounting
2021-05-18 5:32 ` [PATCH 4/5] md/raid1: " Guoqing Jiang
@ 2021-05-18 13:40 ` Christoph Hellwig
2021-05-19 1:47 ` Guoqing Jiang
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-05-18 13:40 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
On Tue, May 18, 2021 at 01:32:24PM +0800, Guoqing Jiang wrote:
> For raid1, we record the start time between split bio and clone bio,
> and finish the accounting in the final endio.
>
> Also introduce start_time in r1bio accordingly.
>
> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
> ---
> drivers/md/raid1.c | 11 +++++++++++
> drivers/md/raid1.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index ced076ba560e..b08a47523dbb 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -300,6 +300,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> bio->bi_status = BLK_STS_IOERR;
>
> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
> + bio_end_io_acct_remapped(bio, r1_bio->start_time, bio->bi_bdev);
This can use bio_end_io_acct.
> + /*
> + * Reuse print_msg, if it is false then a fresh r1_bio is just
> + * allocated before.
> + */
> + if (!print_msg && blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
> + r1_bio->start_time = bio_start_io_acct(bio);
> +
Please rename the print_msg vaiable to something sensible and drop the
comment.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] md/raid1: enable io accounting
2021-05-18 13:40 ` Christoph Hellwig
@ 2021-05-19 1:47 ` Guoqing Jiang
0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 1:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: song, linux-raid, artur.paszkiewicz, Guoqing Jiang
On 5/18/21 9:40 PM, Christoph Hellwig wrote:
> On Tue, May 18, 2021 at 01:32:24PM +0800, Guoqing Jiang wrote:
>> For raid1, we record the start time between split bio and clone bio,
>> and finish the accounting in the final endio.
>>
>> Also introduce start_time in r1bio accordingly.
>>
>> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
>> ---
>> drivers/md/raid1.c | 11 +++++++++++
>> drivers/md/raid1.h | 1 +
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index ced076ba560e..b08a47523dbb 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -300,6 +300,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
>> bio->bi_status = BLK_STS_IOERR;
>>
>> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
>> + bio_end_io_acct_remapped(bio, r1_bio->start_time, bio->bi_bdev);
> This can use bio_end_io_acct.
Sure.
>> + /*
>> + * Reuse print_msg, if it is false then a fresh r1_bio is just
>> + * allocated before.
>> + */
>> + if (!print_msg && blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
>> + r1_bio->start_time = bio_start_io_acct(bio);
>> +
> Please rename the print_msg vaiable to something sensible and drop the
> comment.
Ok.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/5] md/raid10: enable io accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
` (3 preceding siblings ...)
2021-05-18 5:32 ` [PATCH 4/5] md/raid1: " Guoqing Jiang
@ 2021-05-18 5:32 ` Guoqing Jiang
2021-05-19 0:14 ` [PATCH 0/5] md: io stats accounting Song Liu
5 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-18 5:32 UTC (permalink / raw)
To: song; +Cc: linux-raid, artur.paszkiewicz, Guoqing Jiang
For raid10, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.
Also introduce start_time in r10bio accordingly.
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
drivers/md/raid10.c | 7 +++++++
drivers/md/raid10.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 13f5e6b2a73d..38795c237830 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -297,6 +297,9 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
bio->bi_status = BLK_STS_IOERR;
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ bio_end_io_acct_remapped(bio, r10_bio->start_time,
+ bio->bi_bdev);
bio_endio(bio);
/*
* Wake up any possible resync thread that waits for the device
@@ -1184,6 +1187,8 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
}
slot = r10_bio->read_slot;
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ r10_bio->start_time = bio_start_io_acct(bio);
read_bio = bio_clone_fast(bio, gfp, &mddev->bio_set);
r10_bio->devs[slot].bio = read_bio;
@@ -1483,6 +1488,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
r10_bio->master_bio = bio;
}
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ r10_bio->start_time = bio_start_io_acct(bio);
atomic_set(&r10_bio->remaining, 1);
md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0);
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 1461fd55311b..c34bb196790e 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -124,6 +124,7 @@ struct r10bio {
sector_t sector; /* virtual sector number */
int sectors;
unsigned long state;
+ unsigned long start_time;
struct mddev *mddev;
/*
* original bio going to /dev/mdx
--
2.25.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 0/5] md: io stats accounting
2021-05-18 5:32 [PATCH 0/5] md: io stats accounting Guoqing Jiang
` (4 preceding siblings ...)
2021-05-18 5:32 ` [PATCH 5/5] md/raid10: " Guoqing Jiang
@ 2021-05-19 0:14 ` Song Liu
2021-05-19 1:50 ` Guoqing Jiang
5 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2021-05-19 0:14 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: linux-raid, Artur Paszkiewicz, Guoqing Jiang
On Mon, May 17, 2021 at 10:33 PM Guoqing Jiang <jgq516@gmail.com> wrote:
>
> Hi Song,
>
> Based on previous discussion, this set reverts current mechanism, then
> switches back to the v1 version from Artur.
>
> Also reuses the current clone infrastructer for mpath, raid1 and raid10.
>
> Thanks,
> Guoqing
Thanks Guoqing! Please address Christoph's feedback and send v2.
Song
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/5] md: io stats accounting
2021-05-19 0:14 ` [PATCH 0/5] md: io stats accounting Song Liu
@ 2021-05-19 1:50 ` Guoqing Jiang
0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2021-05-19 1:50 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, Artur Paszkiewicz, Guoqing Jiang
On 5/19/21 8:14 AM, Song Liu wrote:
> On Mon, May 17, 2021 at 10:33 PM Guoqing Jiang <jgq516@gmail.com> wrote:
>> Hi Song,
>>
>> Based on previous discussion, this set reverts current mechanism, then
>> switches back to the v1 version from Artur.
>>
>> Also reuses the current clone infrastructer for mpath, raid1 and raid10.
>>
>> Thanks,
>> Guoqing
> Thanks Guoqing! Please address Christoph's feedback and send v2.
Sure, please comment about the checking about skip the split by md.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 20+ messages in thread