linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] md: improve io stats accounting
@ 2020-07-03  9:13 Artur Paszkiewicz
  2020-07-03  9:26 ` Guoqing Jiang
  0 siblings, 1 reply; 7+ messages in thread
From: Artur Paszkiewicz @ 2020-07-03  9:13 UTC (permalink / raw)
  To: song; +Cc: linux-raid, guoqing.jiang, Artur Paszkiewicz

Use generic io accounting functions to manage io stats. There was an
attempt to do this earlier in commit 18c0b223cf99 ("md: use generic io
stats accounting functions to simplify io stat accounting"), but it did
not include a call to generic_end_io_acct() and caused issues with
tracking in-flight IOs, so it was later removed in commit 74672d069b29
("md: fix md io stats accounting broken").

This patch attempts to fix this by using both disk_start_io_acct() and
disk_end_io_acct(). To make it possible, a struct md_io is allocated for
every new md bio, which includes the io start_time. A new mempool is
introduced for this purpose. We override bio->bi_end_io with our own
callback and call disk_start_io_acct() before passing the bio to
md_handle_request(). When it completes, we call disk_end_io_acct() and
the original bi_end_io callback.

This adds correct statistics about in-flight IOs and IO processing time,
interpreted e.g. in iostat as await, svctm, aqu-sz and %util.

It also fixes a situation where too many IOs where reported if a bio was
re-submitted to the mddev, because io accounting is now performed only
on newly arriving bios.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
---
v4:
- Use disk_{start,end}_io_acct() instead of bio_{start,end}_io_acct() to
  pass mddev->gendisk directly, not bio->bi_disk which gets modified by
  some personalities.

v3:
- Use bio_start_io_acct() return value for md_io->start_time (thanks
  Guoqing!)

v2:
- Just override the bi_end_io without having to clone the original bio.
- Rebased onto latest md-next.

 drivers/md/md.c | 57 ++++++++++++++++++++++++++++++++++++++-----------
 drivers/md/md.h |  1 +
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8bb69c61afe0..63aeebd9266b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -463,12 +463,33 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
 }
 EXPORT_SYMBOL(md_handle_request);
 
+struct md_io {
+	struct mddev *mddev;
+	bio_end_io_t *orig_bi_end_io;
+	void *orig_bi_private;
+	unsigned long start_time;
+};
+
+static void md_end_io(struct bio *bio)
+{
+	struct md_io *md_io = bio->bi_private;
+	struct mddev *mddev = md_io->mddev;
+
+	disk_end_io_acct(mddev->gendisk, bio_op(bio), md_io->start_time);
+
+	bio->bi_end_io = md_io->orig_bi_end_io;
+	bio->bi_private = md_io->orig_bi_private;
+
+	mempool_free(md_io, &mddev->md_io_pool);
+
+	if (bio->bi_end_io)
+		bio->bi_end_io(bio);
+}
+
 static blk_qc_t md_submit_bio(struct bio *bio)
 {
 	const int rw = bio_data_dir(bio);
-	const int sgrp = op_stat_group(bio_op(bio));
 	struct mddev *mddev = bio->bi_disk->private_data;
-	unsigned int sectors;
 
 	if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
 		bio_io_error(bio);
@@ -488,21 +509,27 @@ static blk_qc_t md_submit_bio(struct bio *bio)
 		return BLK_QC_T_NONE;
 	}
 
-	/*
-	 * save the sectors now since our bio can
-	 * go away inside make_request
-	 */
-	sectors = bio_sectors(bio);
+	if (bio->bi_end_io != md_end_io) {
+		struct md_io *md_io;
+
+		md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
+		md_io->mddev = mddev;
+		md_io->orig_bi_end_io = bio->bi_end_io;
+		md_io->orig_bi_private = bio->bi_private;
+
+		bio->bi_end_io = md_end_io;
+		bio->bi_private = md_io;
+
+		md_io->start_time = disk_start_io_acct(mddev->gendisk,
+						       bio_sectors(bio),
+						       bio_op(bio));
+	}
+
 	/* bio could be mergeable after passing to underlayer */
 	bio->bi_opf &= ~REQ_NOMERGE;
 
 	md_handle_request(mddev, bio);
 
-	part_stat_lock();
-	part_stat_inc(&mddev->gendisk->part0, ios[sgrp]);
-	part_stat_add(&mddev->gendisk->part0, sectors[sgrp], sectors);
-	part_stat_unlock();
-
 	return BLK_QC_T_NONE;
 }
 
@@ -5545,6 +5572,7 @@ static void md_free(struct kobject *ko)
 
 	bioset_exit(&mddev->bio_set);
 	bioset_exit(&mddev->sync_set);
+	mempool_exit(&mddev->md_io_pool);
 	kfree(mddev);
 }
 
@@ -5640,6 +5668,11 @@ static int md_alloc(dev_t dev, char *name)
 		 */
 		mddev->hold_active = UNTIL_STOP;
 
+	error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
+					  sizeof(struct md_io));
+	if (error)
+		goto abort;
+
 	error = -ENOMEM;
 	mddev->queue = blk_alloc_queue(NUMA_NO_NODE);
 	if (!mddev->queue)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 612814d07d35..c26fa8bd41e7 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -481,6 +481,7 @@ struct mddev {
 	struct bio_set			sync_set; /* for sync operations like
 						   * metadata and bitmap writes
 						   */
+	mempool_t			md_io_pool;
 
 	/* Generic flush handling.
 	 * The last to finish preflush schedules a worker to submit
-- 
2.26.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-03  9:13 [PATCH v4] md: improve io stats accounting Artur Paszkiewicz
@ 2020-07-03  9:26 ` Guoqing Jiang
  2020-07-04  0:32   ` Song Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Guoqing Jiang @ 2020-07-03  9:26 UTC (permalink / raw)
  To: Artur Paszkiewicz, song; +Cc: linux-raid

Looks good, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

Thanks,
Guoqing

On 7/3/20 11:13 AM, Artur Paszkiewicz wrote:
> Use generic io accounting functions to manage io stats. There was an
> attempt to do this earlier in commit 18c0b223cf99 ("md: use generic io
> stats accounting functions to simplify io stat accounting"), but it did
> not include a call to generic_end_io_acct() and caused issues with
> tracking in-flight IOs, so it was later removed in commit 74672d069b29
> ("md: fix md io stats accounting broken").
>
> This patch attempts to fix this by using both disk_start_io_acct() and
> disk_end_io_acct(). To make it possible, a struct md_io is allocated for
> every new md bio, which includes the io start_time. A new mempool is
> introduced for this purpose. We override bio->bi_end_io with our own
> callback and call disk_start_io_acct() before passing the bio to
> md_handle_request(). When it completes, we call disk_end_io_acct() and
> the original bi_end_io callback.
>
> This adds correct statistics about in-flight IOs and IO processing time,
> interpreted e.g. in iostat as await, svctm, aqu-sz and %util.
>
> It also fixes a situation where too many IOs where reported if a bio was
> re-submitted to the mddev, because io accounting is now performed only
> on newly arriving bios.
>
> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> ---
> v4:
> - Use disk_{start,end}_io_acct() instead of bio_{start,end}_io_acct() to
>    pass mddev->gendisk directly, not bio->bi_disk which gets modified by
>    some personalities.
>
> v3:
> - Use bio_start_io_acct() return value for md_io->start_time (thanks
>    Guoqing!)
>
> v2:
> - Just override the bi_end_io without having to clone the original bio.
> - Rebased onto latest md-next.
>
>   drivers/md/md.c | 57 ++++++++++++++++++++++++++++++++++++++-----------
>   drivers/md/md.h |  1 +
>   2 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8bb69c61afe0..63aeebd9266b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -463,12 +463,33 @@ void md_handle_request(struct mddev *mddev, struct bio *bio)
>   }
>   EXPORT_SYMBOL(md_handle_request);
>   
> +struct md_io {
> +	struct mddev *mddev;
> +	bio_end_io_t *orig_bi_end_io;
> +	void *orig_bi_private;
> +	unsigned long start_time;
> +};
> +
> +static void md_end_io(struct bio *bio)
> +{
> +	struct md_io *md_io = bio->bi_private;
> +	struct mddev *mddev = md_io->mddev;
> +
> +	disk_end_io_acct(mddev->gendisk, bio_op(bio), md_io->start_time);
> +
> +	bio->bi_end_io = md_io->orig_bi_end_io;
> +	bio->bi_private = md_io->orig_bi_private;
> +
> +	mempool_free(md_io, &mddev->md_io_pool);
> +
> +	if (bio->bi_end_io)
> +		bio->bi_end_io(bio);
> +}
> +
>   static blk_qc_t md_submit_bio(struct bio *bio)
>   {
>   	const int rw = bio_data_dir(bio);
> -	const int sgrp = op_stat_group(bio_op(bio));
>   	struct mddev *mddev = bio->bi_disk->private_data;
> -	unsigned int sectors;
>   
>   	if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
>   		bio_io_error(bio);
> @@ -488,21 +509,27 @@ static blk_qc_t md_submit_bio(struct bio *bio)
>   		return BLK_QC_T_NONE;
>   	}
>   
> -	/*
> -	 * save the sectors now since our bio can
> -	 * go away inside make_request
> -	 */
> -	sectors = bio_sectors(bio);
> +	if (bio->bi_end_io != md_end_io) {
> +		struct md_io *md_io;
> +
> +		md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
> +		md_io->mddev = mddev;
> +		md_io->orig_bi_end_io = bio->bi_end_io;
> +		md_io->orig_bi_private = bio->bi_private;
> +
> +		bio->bi_end_io = md_end_io;
> +		bio->bi_private = md_io;
> +
> +		md_io->start_time = disk_start_io_acct(mddev->gendisk,
> +						       bio_sectors(bio),
> +						       bio_op(bio));
> +	}
> +
>   	/* bio could be mergeable after passing to underlayer */
>   	bio->bi_opf &= ~REQ_NOMERGE;
>   
>   	md_handle_request(mddev, bio);
>   
> -	part_stat_lock();
> -	part_stat_inc(&mddev->gendisk->part0, ios[sgrp]);
> -	part_stat_add(&mddev->gendisk->part0, sectors[sgrp], sectors);
> -	part_stat_unlock();
> -
>   	return BLK_QC_T_NONE;
>   }
>   
> @@ -5545,6 +5572,7 @@ static void md_free(struct kobject *ko)
>   
>   	bioset_exit(&mddev->bio_set);
>   	bioset_exit(&mddev->sync_set);
> +	mempool_exit(&mddev->md_io_pool);
>   	kfree(mddev);
>   }
>   
> @@ -5640,6 +5668,11 @@ static int md_alloc(dev_t dev, char *name)
>   		 */
>   		mddev->hold_active = UNTIL_STOP;
>   
> +	error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> +					  sizeof(struct md_io));
> +	if (error)
> +		goto abort;
> +
>   	error = -ENOMEM;
>   	mddev->queue = blk_alloc_queue(NUMA_NO_NODE);
>   	if (!mddev->queue)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 612814d07d35..c26fa8bd41e7 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -481,6 +481,7 @@ struct mddev {
>   	struct bio_set			sync_set; /* for sync operations like
>   						   * metadata and bitmap writes
>   						   */
> +	mempool_t			md_io_pool;
>   
>   	/* Generic flush handling.
>   	 * The last to finish preflush schedules a worker to submit

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-03  9:26 ` Guoqing Jiang
@ 2020-07-04  0:32   ` Song Liu
  2020-07-16 17:29     ` Song Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Song Liu @ 2020-07-04  0:32 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Artur Paszkiewicz, linux-raid

On Fri, Jul 3, 2020 at 2:27 AM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
> Looks good, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>
> Thanks,
> Guoqing
>
> On 7/3/20 11:13 AM, Artur Paszkiewicz wrote:
> > Use generic io accounting functions to manage io stats. There was an
> > attempt to do this earlier in commit 18c0b223cf99 ("md: use generic io
> > stats accounting functions to simplify io stat accounting"), but it did
> > not include a call to generic_end_io_acct() and caused issues with
> > tracking in-flight IOs, so it was later removed in commit 74672d069b29
> > ("md: fix md io stats accounting broken").
> >
> > This patch attempts to fix this by using both disk_start_io_acct() and
> > disk_end_io_acct(). To make it possible, a struct md_io is allocated for
> > every new md bio, which includes the io start_time. A new mempool is
> > introduced for this purpose. We override bio->bi_end_io with our own
> > callback and call disk_start_io_acct() before passing the bio to
> > md_handle_request(). When it completes, we call disk_end_io_acct() and
> > the original bi_end_io callback.
> >
> > This adds correct statistics about in-flight IOs and IO processing time,
> > interpreted e.g. in iostat as await, svctm, aqu-sz and %util.
> >
> > It also fixes a situation where too many IOs where reported if a bio was
> > re-submitted to the mddev, because io accounting is now performed only
> > on newly arriving bios.
> >
> > Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>

Applied to md-next. Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-04  0:32   ` Song Liu
@ 2020-07-16 17:29     ` Song Liu
  2020-07-17 10:44       ` Artur Paszkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Song Liu @ 2020-07-16 17:29 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Artur Paszkiewicz, linux-raid

On Fri, Jul 3, 2020 at 5:32 PM Song Liu <song@kernel.org> wrote:
>
> On Fri, Jul 3, 2020 at 2:27 AM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
> >
> > Looks good, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> >
> > Thanks,
> > Guoqing
> >
> > On 7/3/20 11:13 AM, Artur Paszkiewicz wrote:
> > > Use generic io accounting functions to manage io stats. There was an
> > > attempt to do this earlier in commit 18c0b223cf99 ("md: use generic io
> > > stats accounting functions to simplify io stat accounting"), but it did
> > > not include a call to generic_end_io_acct() and caused issues with
> > > tracking in-flight IOs, so it was later removed in commit 74672d069b29
> > > ("md: fix md io stats accounting broken").
> > >
> > > This patch attempts to fix this by using both disk_start_io_acct() and
> > > disk_end_io_acct(). To make it possible, a struct md_io is allocated for
> > > every new md bio, which includes the io start_time. A new mempool is
> > > introduced for this purpose. We override bio->bi_end_io with our own
> > > callback and call disk_start_io_acct() before passing the bio to
> > > md_handle_request(). When it completes, we call disk_end_io_acct() and
> > > the original bi_end_io callback.
> > >
> > > This adds correct statistics about in-flight IOs and IO processing time,
> > > interpreted e.g. in iostat as await, svctm, aqu-sz and %util.
> > >
> > > It also fixes a situation where too many IOs where reported if a bio was
> > > re-submitted to the mddev, because io accounting is now performed only
> > > on newly arriving bios.
> > >
> > > Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
>
> Applied to md-next. Thanks!

I just noticed another issue with this work on raid456, as iostat
shows something
like:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1        6306.50 18248.00  636.00 1280.00    45.11    76.19
129.65     3.03    1.23    0.67    1.51   0.76 145.50
nvme1n1       11441.50 13234.00 1069.50  961.00    71.87    55.39
128.35     3.32    1.30    0.90    1.75   0.72 146.50
nvme2n1        8280.50 16352.50  971.50 1231.00    65.53    68.65
124.77     3.20    1.17    0.69    1.54   0.64 142.00
nvme3n1        6158.50 18199.50  567.00 1453.50    39.81    76.74
118.13     3.50    1.40    0.88    1.60   0.73 146.50
md0               0.00     0.00 1436.00 1411.00    89.75    88.19
128.00    22.98    8.07    0.16   16.12   0.52 147.00

md0 here is a RAID-6 array with 4 devices. %util of > 100% is clearly
wrong here.
This only doesn't happen to RAID-0 or RAID-1 in my tests.

Artur, could you please take a look at this?

Thanks,
Song

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-16 17:29     ` Song Liu
@ 2020-07-17 10:44       ` Artur Paszkiewicz
  2020-07-17 12:45         ` Guoqing Jiang
  2020-07-17 23:49         ` Song Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Artur Paszkiewicz @ 2020-07-17 10:44 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang; +Cc: linux-raid

On 7/16/20 7:29 PM, Song Liu wrote:
> I just noticed another issue with this work on raid456, as iostat
> shows something
> like:
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1        6306.50 18248.00  636.00 1280.00    45.11    76.19
> 129.65     3.03    1.23    0.67    1.51   0.76 145.50
> nvme1n1       11441.50 13234.00 1069.50  961.00    71.87    55.39
> 128.35     3.32    1.30    0.90    1.75   0.72 146.50
> nvme2n1        8280.50 16352.50  971.50 1231.00    65.53    68.65
> 124.77     3.20    1.17    0.69    1.54   0.64 142.00
> nvme3n1        6158.50 18199.50  567.00 1453.50    39.81    76.74
> 118.13     3.50    1.40    0.88    1.60   0.73 146.50
> md0               0.00     0.00 1436.00 1411.00    89.75    88.19
> 128.00    22.98    8.07    0.16   16.12   0.52 147.00
> 
> md0 here is a RAID-6 array with 4 devices. %util of > 100% is clearly
> wrong here.
> This only doesn't happen to RAID-0 or RAID-1 in my tests.
> 
> Artur, could you please take a look at this?

Hi Song,

I think it's not caused by this patch, because %util of the member
drives is affected as well. I reverted the patch and it's still
happening:

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
md0             20.00      2.50     0.00   0.00    0.00   128.00   21.00      2.62     0.00   0.00    0.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1         13.00      1.62   279.00  95.55    0.77   128.00    4.00      0.50   372.00  98.94 1289.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.17 146.70
nvme1n1         15.00      1.88   310.00  95.38    0.53   128.00   21.00      2.62   341.00  94.20 1180.29   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   24.80 146.90
nvme2n1         16.00      2.00   310.00  95.09    0.69   128.00   19.00      2.38   341.00  94.72  832.89   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   15.84 146.80
nvme3n1         18.00      2.25   403.00  95.72    0.72   128.00   16.00      2.00   248.00  93.94  765.69   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   12.26 114.30

I was only able to reproduce it on a VM, it doesn't occur on real
hardware (for me). What was your test configuration?

Thanks,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-17 10:44       ` Artur Paszkiewicz
@ 2020-07-17 12:45         ` Guoqing Jiang
  2020-07-17 23:49         ` Song Liu
  1 sibling, 0 replies; 7+ messages in thread
From: Guoqing Jiang @ 2020-07-17 12:45 UTC (permalink / raw)
  To: Artur Paszkiewicz, Song Liu; +Cc: linux-raid

On 7/17/20 12:44 PM, Artur Paszkiewicz wrote:
> On 7/16/20 7:29 PM, Song Liu wrote:
>> I just noticed another issue with this work on raid456, as iostat
>> shows something
>> like:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> nvme0n1        6306.50 18248.00  636.00 1280.00    45.11    76.19
>> 129.65     3.03    1.23    0.67    1.51   0.76 145.50
>> nvme1n1       11441.50 13234.00 1069.50  961.00    71.87    55.39
>> 128.35     3.32    1.30    0.90    1.75   0.72 146.50
>> nvme2n1        8280.50 16352.50  971.50 1231.00    65.53    68.65
>> 124.77     3.20    1.17    0.69    1.54   0.64 142.00
>> nvme3n1        6158.50 18199.50  567.00 1453.50    39.81    76.74
>> 118.13     3.50    1.40    0.88    1.60   0.73 146.50
>> md0               0.00     0.00 1436.00 1411.00    89.75    88.19
>> 128.00    22.98    8.07    0.16   16.12   0.52 147.00
>>
>> md0 here is a RAID-6 array with 4 devices. %util of > 100% is clearly
>> wrong here.
>> This only doesn't happen to RAID-0 or RAID-1 in my tests.
>>
>> Artur, could you please take a look at this?
> Hi Song,
>
> I think it's not caused by this patch, because %util of the member
> drives is affected as well. I reverted the patch and it's still
> happening:
>
> Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
> md0             20.00      2.50     0.00   0.00    0.00   128.00   21.00      2.62     0.00   0.00    0.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
> nvme0n1         13.00      1.62   279.00  95.55    0.77   128.00    4.00      0.50   372.00  98.94 1289.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.17 146.70
> nvme1n1         15.00      1.88   310.00  95.38    0.53   128.00   21.00      2.62   341.00  94.20 1180.29   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   24.80 146.90
> nvme2n1         16.00      2.00   310.00  95.09    0.69   128.00   19.00      2.38   341.00  94.72  832.89   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   15.84 146.80
> nvme3n1         18.00      2.25   403.00  95.72    0.72   128.00   16.00      2.00   248.00  93.94  765.69   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   12.26 114.30
>
> I was only able to reproduce it on a VM, it doesn't occur on real
> hardware (for me). What was your test configuration?

Just FYI,  I suspect it could be related to the commit 2b8bd423614c595
("block/diskstats: more accurate approximation of io_ticks for slow disks").

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] md: improve io stats accounting
  2020-07-17 10:44       ` Artur Paszkiewicz
  2020-07-17 12:45         ` Guoqing Jiang
@ 2020-07-17 23:49         ` Song Liu
  1 sibling, 0 replies; 7+ messages in thread
From: Song Liu @ 2020-07-17 23:49 UTC (permalink / raw)
  To: Artur Paszkiewicz; +Cc: Guoqing Jiang, linux-raid

On Fri, Jul 17, 2020 at 3:44 AM Artur Paszkiewicz
<artur.paszkiewicz@intel.com> wrote:
>
> On 7/16/20 7:29 PM, Song Liu wrote:
> > I just noticed another issue with this work on raid456, as iostat
> > shows something
> > like:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> > avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > nvme0n1        6306.50 18248.00  636.00 1280.00    45.11    76.19
> > 129.65     3.03    1.23    0.67    1.51   0.76 145.50
> > nvme1n1       11441.50 13234.00 1069.50  961.00    71.87    55.39
> > 128.35     3.32    1.30    0.90    1.75   0.72 146.50
> > nvme2n1        8280.50 16352.50  971.50 1231.00    65.53    68.65
> > 124.77     3.20    1.17    0.69    1.54   0.64 142.00
> > nvme3n1        6158.50 18199.50  567.00 1453.50    39.81    76.74
> > 118.13     3.50    1.40    0.88    1.60   0.73 146.50
> > md0               0.00     0.00 1436.00 1411.00    89.75    88.19
> > 128.00    22.98    8.07    0.16   16.12   0.52 147.00
> >
> > md0 here is a RAID-6 array with 4 devices. %util of > 100% is clearly
> > wrong here.
> > This only doesn't happen to RAID-0 or RAID-1 in my tests.
> >
> > Artur, could you please take a look at this?
>
> Hi Song,
>
> I think it's not caused by this patch, because %util of the member
> drives is affected as well. I reverted the patch and it's still
> happening:
>
> Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
> md0             20.00      2.50     0.00   0.00    0.00   128.00   21.00      2.62     0.00   0.00    0.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
> nvme0n1         13.00      1.62   279.00  95.55    0.77   128.00    4.00      0.50   372.00  98.94 1289.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.17 146.70
> nvme1n1         15.00      1.88   310.00  95.38    0.53   128.00   21.00      2.62   341.00  94.20 1180.29   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   24.80 146.90
> nvme2n1         16.00      2.00   310.00  95.09    0.69   128.00   19.00      2.38   341.00  94.72  832.89   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   15.84 146.80
> nvme3n1         18.00      2.25   403.00  95.72    0.72   128.00   16.00      2.00   248.00  93.94  765.69   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   12.26 114.30
>
> I was only able to reproduce it on a VM, it doesn't occur on real
> hardware (for me). What was your test configuration?

I was testing on VM. But I didn't see this issue after reverting the
patch. Let me
test more.

Thanks,
Song


> Thanks,
> Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-17 23:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-03  9:13 [PATCH v4] md: improve io stats accounting Artur Paszkiewicz
2020-07-03  9:26 ` Guoqing Jiang
2020-07-04  0:32   ` Song Liu
2020-07-16 17:29     ` Song Liu
2020-07-17 10:44       ` Artur Paszkiewicz
2020-07-17 12:45         ` Guoqing Jiang
2020-07-17 23:49         ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).