linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] block: fix inaccurate io_ticks
@ 2020-10-23  6:46 Weiping Zhang
  2020-10-23  8:46 ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Weiping Zhang @ 2020-10-23  6:46 UTC (permalink / raw)
  To: axboe, ming.lei, snitzer, mpatocka; +Cc: linux-block

Do not add io_ticks if there is no infligh io when start a new IO,
otherwise an extra 1 jiffy will be add to this IO.

I run the following command on a host, with different kernel version.

fio -name=test -ioengine=sync -bs=4K -rw=write
-filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
-runtime=300 -rate=2m,2m

If we run fio in a sync direct io mode, IO will be proccessed one by one,
you can see that there are 512 IOs completed in one second.

kernel: 4.19.0

Device: rrqm/s wrqm/s  r/s    w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda       0.00   0.00 0.00 512.00  0.00  2.00     8.00     0.21  0.40    0.00    0.40  0.40 20.60

The averate io.latency is 0.4ms, so the disk time cost in one second
should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%.

Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
io.latency io.latency will be 1 + 0.4 = 1.4ms,
1.4 * 512 = 716.8ms, so the %util show it about 72%.

Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  1.41  72.10

After this patch:
Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  0.39  20.00

Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-core.c | 19 ++++++++++++++-----
 block/blk.h      |  1 +
 block/genhd.c    |  2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ac00d2fa4eb4..789a5c40b6a6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1256,14 +1256,14 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
 
-static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
+static void update_io_ticks(struct hd_struct *part, unsigned long now, bool inflight)
 {
 	unsigned long stamp;
 again:
 	stamp = READ_ONCE(part->stamp);
 	if (unlikely(stamp != now)) {
-		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
-			__part_stat_add(part, io_ticks, end ? now - stamp : 1);
+		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp) && inflight)
+			__part_stat_add(part, io_ticks, now - stamp);
 	}
 	if (part->partno) {
 		part = &part_to_disk(part)->part0;
@@ -1310,13 +1310,20 @@ void blk_account_io_done(struct request *req, u64 now)
 
 void blk_account_io_start(struct request *rq)
 {
+	struct hd_struct *part;
+	struct request_queue *q;
+	int inflight;
+
 	if (!blk_do_io_stat(rq))
 		return;
 
 	rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
 
 	part_stat_lock();
-	update_io_ticks(rq->part, jiffies, false);
+	part = rq->part;
+	q = part_to_disk(part)->queue;
+	inflight = blk_mq_in_flight(q, part);
+	update_io_ticks(part, jiffies, inflight > 0 ? true : false);
 	part_stat_unlock();
 }
 
@@ -1325,12 +1332,14 @@ static unsigned long __part_start_io_acct(struct hd_struct *part,
 {
 	const int sgrp = op_stat_group(op);
 	unsigned long now = READ_ONCE(jiffies);
+	int inflight;
 
 	part_stat_lock();
-	update_io_ticks(part, now, false);
 	part_stat_inc(part, ios[sgrp]);
 	part_stat_add(part, sectors[sgrp], sectors);
 	part_stat_local_inc(part, in_flight[op_is_write(op)]);
+	inflight = part_in_flight(part);
+	update_io_ticks(part, now, inflight > 0 ? true : false);
 	part_stat_unlock();
 
 	return now;
diff --git a/block/blk.h b/block/blk.h
index dfab98465db9..0b9d1c13ceee 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -443,5 +443,6 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
 int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
+unsigned int part_in_flight(struct hd_struct *part);
 
 #endif /* BLK_INTERNAL_H */
diff --git a/block/genhd.c b/block/genhd.c
index 0a273211fec2..89ef5539577a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -109,7 +109,7 @@ static void part_stat_read_all(struct hd_struct *part, struct disk_stats *stat)
 	}
 }
 
-static unsigned int part_in_flight(struct hd_struct *part)
+unsigned int part_in_flight(struct hd_struct *part)
 {
 	unsigned int inflight = 0;
 	int cpu;
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] block: fix inaccurate io_ticks
  2020-10-23  6:46 [PATCH RFC] block: fix inaccurate io_ticks Weiping Zhang
@ 2020-10-23  8:46 ` Ming Lei
  2020-10-23  8:56   ` Weiping Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-10-23  8:46 UTC (permalink / raw)
  To: axboe, snitzer, mpatocka, linux-block

On Fri, Oct 23, 2020 at 02:46:32PM +0800, Weiping Zhang wrote:
> Do not add io_ticks if there is no infligh io when start a new IO,
> otherwise an extra 1 jiffy will be add to this IO.
> 
> I run the following command on a host, with different kernel version.
> 
> fio -name=test -ioengine=sync -bs=4K -rw=write
> -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> -runtime=300 -rate=2m,2m
> 
> If we run fio in a sync direct io mode, IO will be proccessed one by one,
> you can see that there are 512 IOs completed in one second.
> 
> kernel: 4.19.0
> 
> Device: rrqm/s wrqm/s  r/s    w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> vda       0.00   0.00 0.00 512.00  0.00  2.00     8.00     0.21  0.40    0.00    0.40  0.40 20.60
> 
> The averate io.latency is 0.4ms, so the disk time cost in one second
> should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%.
> 
> Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
> io.latency io.latency will be 1 + 0.4 = 1.4ms,
> 1.4 * 512 = 716.8ms, so the %util show it about 72%.
> 
> Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  1.41  72.10
> 
> After this patch:
> Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  0.39  20.00
> 
> Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>  block/blk-core.c | 19 ++++++++++++++-----
>  block/blk.h      |  1 +
>  block/genhd.c    |  2 +-
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ac00d2fa4eb4..789a5c40b6a6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1256,14 +1256,14 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
>  }
>  EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
>  
> -static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> +static void update_io_ticks(struct hd_struct *part, unsigned long now, bool inflight)
>  {
>  	unsigned long stamp;
>  again:
>  	stamp = READ_ONCE(part->stamp);
>  	if (unlikely(stamp != now)) {
> -		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> -			__part_stat_add(part, io_ticks, end ? now - stamp : 1);
> +		if (likely(cmpxchg(&part->stamp, stamp, now) == stamp) && inflight)
> +			__part_stat_add(part, io_ticks, now - stamp);
>  	}
>  	if (part->partno) {
>  		part = &part_to_disk(part)->part0;
> @@ -1310,13 +1310,20 @@ void blk_account_io_done(struct request *req, u64 now)
>  
>  void blk_account_io_start(struct request *rq)
>  {
> +	struct hd_struct *part;
> +	struct request_queue *q;
> +	int inflight;
> +
>  	if (!blk_do_io_stat(rq))
>  		return;
>  
>  	rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
>  
>  	part_stat_lock();
> -	update_io_ticks(rq->part, jiffies, false);
> +	part = rq->part;
> +	q = part_to_disk(part)->queue;
> +	inflight = blk_mq_in_flight(q, part);
> +	update_io_ticks(part, jiffies, inflight > 0 ? true : false);

Yeah, this account issue can be fixed by applying such 'inflight' info.
However, blk_mq_in_flight() isn't cheap enough, I did get soft lockup
report because of blk_mq_in_flight() called in I/O path.

BTW, this way is just like reverting 5b18b5a73760 ("block: delete
part_round_stats and switch to less precise counting").



Thanks, 
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] block: fix inaccurate io_ticks
  2020-10-23  8:46 ` Ming Lei
@ 2020-10-23  8:56   ` Weiping Zhang
  2020-10-23  9:11     ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Weiping Zhang @ 2020-10-23  8:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Fri, Oct 23, 2020 at 4:49 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, Oct 23, 2020 at 02:46:32PM +0800, Weiping Zhang wrote:
> > Do not add io_ticks if there is no infligh io when start a new IO,
> > otherwise an extra 1 jiffy will be add to this IO.
> >
> > I run the following command on a host, with different kernel version.
> >
> > fio -name=test -ioengine=sync -bs=4K -rw=write
> > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> > -runtime=300 -rate=2m,2m
> >
> > If we run fio in a sync direct io mode, IO will be proccessed one by one,
> > you can see that there are 512 IOs completed in one second.
> >
> > kernel: 4.19.0
> >
> > Device: rrqm/s wrqm/s  r/s    w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> > vda       0.00   0.00 0.00 512.00  0.00  2.00     8.00     0.21  0.40    0.00    0.40  0.40 20.60
> >
> > The averate io.latency is 0.4ms, so the disk time cost in one second
> > should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%.
> >
> > Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
> > io.latency io.latency will be 1 + 0.4 = 1.4ms,
> > 1.4 * 512 = 716.8ms, so the %util show it about 72%.
> >
> > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  1.41  72.10
> >
> > After this patch:
> > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  0.39  20.00
> >
> > Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > ---
> >  block/blk-core.c | 19 ++++++++++++++-----
> >  block/blk.h      |  1 +
> >  block/genhd.c    |  2 +-
> >  3 files changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index ac00d2fa4eb4..789a5c40b6a6 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1256,14 +1256,14 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
> >  }
> >  EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
> >
> > -static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> > +static void update_io_ticks(struct hd_struct *part, unsigned long now, bool inflight)
> >  {
> >       unsigned long stamp;
> >  again:
> >       stamp = READ_ONCE(part->stamp);
> >       if (unlikely(stamp != now)) {
> > -             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> > -                     __part_stat_add(part, io_ticks, end ? now - stamp : 1);
> > +             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp) && inflight)
> > +                     __part_stat_add(part, io_ticks, now - stamp);
> >       }
> >       if (part->partno) {
> >               part = &part_to_disk(part)->part0;
> > @@ -1310,13 +1310,20 @@ void blk_account_io_done(struct request *req, u64 now)
> >
> >  void blk_account_io_start(struct request *rq)
> >  {
> > +     struct hd_struct *part;
> > +     struct request_queue *q;
> > +     int inflight;
> > +
> >       if (!blk_do_io_stat(rq))
> >               return;
> >
> >       rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> >
> >       part_stat_lock();
> > -     update_io_ticks(rq->part, jiffies, false);
> > +     part = rq->part;
> > +     q = part_to_disk(part)->queue;
> > +     inflight = blk_mq_in_flight(q, part);
> > +     update_io_ticks(part, jiffies, inflight > 0 ? true : false);
>
> Yeah, this account issue can be fixed by applying such 'inflight' info.
> However, blk_mq_in_flight() isn't cheap enough, I did get soft lockup
> report because of blk_mq_in_flight() called in I/O path.
>
> BTW, this way is just like reverting 5b18b5a73760 ("block: delete
> part_round_stats and switch to less precise counting").
>
>
Hello Ming,

Shall we switch it to atomic mode ? update inflight_count when
start/done for every IO.
Or any other cheaper way.

>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] block: fix inaccurate io_ticks
  2020-10-23  8:56   ` Weiping Zhang
@ 2020-10-23  9:11     ` Ming Lei
  2020-10-25 11:34       ` Weiping Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-10-23  9:11 UTC (permalink / raw)
  To: Weiping Zhang; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Fri, Oct 23, 2020 at 04:56:08PM +0800, Weiping Zhang wrote:
> On Fri, Oct 23, 2020 at 4:49 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, Oct 23, 2020 at 02:46:32PM +0800, Weiping Zhang wrote:
> > > Do not add io_ticks if there is no infligh io when start a new IO,
> > > otherwise an extra 1 jiffy will be add to this IO.
> > >
> > > I run the following command on a host, with different kernel version.
> > >
> > > fio -name=test -ioengine=sync -bs=4K -rw=write
> > > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> > > -runtime=300 -rate=2m,2m
> > >
> > > If we run fio in a sync direct io mode, IO will be proccessed one by one,
> > > you can see that there are 512 IOs completed in one second.
> > >
> > > kernel: 4.19.0
> > >
> > > Device: rrqm/s wrqm/s  r/s    w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > vda       0.00   0.00 0.00 512.00  0.00  2.00     8.00     0.21  0.40    0.00    0.40  0.40 20.60
> > >
> > > The averate io.latency is 0.4ms, so the disk time cost in one second
> > > should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%.
> > >
> > > Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
> > > io.latency io.latency will be 1 + 0.4 = 1.4ms,
> > > 1.4 * 512 = 716.8ms, so the %util show it about 72%.
> > >
> > > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  1.41  72.10
> > >
> > > After this patch:
> > > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  0.39  20.00
> > >
> > > Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > > ---
> > >  block/blk-core.c | 19 ++++++++++++++-----
> > >  block/blk.h      |  1 +
> > >  block/genhd.c    |  2 +-
> > >  3 files changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index ac00d2fa4eb4..789a5c40b6a6 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -1256,14 +1256,14 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
> > >  }
> > >  EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
> > >
> > > -static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> > > +static void update_io_ticks(struct hd_struct *part, unsigned long now, bool inflight)
> > >  {
> > >       unsigned long stamp;
> > >  again:
> > >       stamp = READ_ONCE(part->stamp);
> > >       if (unlikely(stamp != now)) {
> > > -             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> > > -                     __part_stat_add(part, io_ticks, end ? now - stamp : 1);
> > > +             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp) && inflight)
> > > +                     __part_stat_add(part, io_ticks, now - stamp);
> > >       }
> > >       if (part->partno) {
> > >               part = &part_to_disk(part)->part0;
> > > @@ -1310,13 +1310,20 @@ void blk_account_io_done(struct request *req, u64 now)
> > >
> > >  void blk_account_io_start(struct request *rq)
> > >  {
> > > +     struct hd_struct *part;
> > > +     struct request_queue *q;
> > > +     int inflight;
> > > +
> > >       if (!blk_do_io_stat(rq))
> > >               return;
> > >
> > >       rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> > >
> > >       part_stat_lock();
> > > -     update_io_ticks(rq->part, jiffies, false);
> > > +     part = rq->part;
> > > +     q = part_to_disk(part)->queue;
> > > +     inflight = blk_mq_in_flight(q, part);
> > > +     update_io_ticks(part, jiffies, inflight > 0 ? true : false);
> >
> > Yeah, this account issue can be fixed by applying such 'inflight' info.
> > However, blk_mq_in_flight() isn't cheap enough, I did get soft lockup
> > report because of blk_mq_in_flight() called in I/O path.
> >
> > BTW, this way is just like reverting 5b18b5a73760 ("block: delete
> > part_round_stats and switch to less precise counting").
> >
> >
> Hello Ming,
> 
> Shall we switch it to atomic mode ? update inflight_count when
> start/done for every IO.

That is more expensive than blk_mq_in_flight().

> Or any other cheaper way.

I guess it is hard to figure out one cheaper way to figure out
IO in-flight count especially in case of multiple CPU cores and
Millions of IOPS.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] block: fix inaccurate io_ticks
  2020-10-23  9:11     ` Ming Lei
@ 2020-10-25 11:34       ` Weiping Zhang
  0 siblings, 0 replies; 5+ messages in thread
From: Weiping Zhang @ 2020-10-25 11:34 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Mike Snitzer, mpatocka, linux-block

On Fri, Oct 23, 2020 at 5:11 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, Oct 23, 2020 at 04:56:08PM +0800, Weiping Zhang wrote:
> > On Fri, Oct 23, 2020 at 4:49 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Fri, Oct 23, 2020 at 02:46:32PM +0800, Weiping Zhang wrote:
> > > > Do not add io_ticks if there is no infligh io when start a new IO,
> > > > otherwise an extra 1 jiffy will be add to this IO.
> > > >
> > > > I run the following command on a host, with different kernel version.
> > > >
> > > > fio -name=test -ioengine=sync -bs=4K -rw=write
> > > > -filename=/home/test.fio.log -size=100M -time_based=1 -direct=1
> > > > -runtime=300 -rate=2m,2m
> > > >
> > > > If we run fio in a sync direct io mode, IO will be proccessed one by one,
> > > > you can see that there are 512 IOs completed in one second.
> > > >
> > > > kernel: 4.19.0
> > > >
> > > > Device: rrqm/s wrqm/s  r/s    w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > > vda       0.00   0.00 0.00 512.00  0.00  2.00     8.00     0.21  0.40    0.00    0.40  0.40 20.60
> > > >
> > > > The averate io.latency is 0.4ms, so the disk time cost in one second
> > > > should be 0.4 * 512 = 204.8 ms, that means, %util should be 20%.
> > > >
> > > > Becase update_io_ticks will add a extra 1 jiffy(1ms) for every IO, the
> > > > io.latency io.latency will be 1 + 0.4 = 1.4ms,
> > > > 1.4 * 512 = 716.8ms, so the %util show it about 72%.
> > > >
> > > > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > > > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  1.41  72.10
> > > >
> > > > After this patch:
> > > > Device  r/s    w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
> > > > vda    0.00 512.00  0.00  2.00   0.00   0.00  0.00  0.00    0.00    0.40   0.20     0.00     4.00  0.39  20.00
> > > >
> > > > Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
> > > > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > > > ---
> > > >  block/blk-core.c | 19 ++++++++++++++-----
> > > >  block/blk.h      |  1 +
> > > >  block/genhd.c    |  2 +-
> > > >  3 files changed, 16 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index ac00d2fa4eb4..789a5c40b6a6 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -1256,14 +1256,14 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
> > > >
> > > > -static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
> > > > +static void update_io_ticks(struct hd_struct *part, unsigned long now, bool inflight)
> > > >  {
> > > >       unsigned long stamp;
> > > >  again:
> > > >       stamp = READ_ONCE(part->stamp);
> > > >       if (unlikely(stamp != now)) {
> > > > -             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
> > > > -                     __part_stat_add(part, io_ticks, end ? now - stamp : 1);
> > > > +             if (likely(cmpxchg(&part->stamp, stamp, now) == stamp) && inflight)
> > > > +                     __part_stat_add(part, io_ticks, now - stamp);
> > > >       }
> > > >       if (part->partno) {
> > > >               part = &part_to_disk(part)->part0;
> > > > @@ -1310,13 +1310,20 @@ void blk_account_io_done(struct request *req, u64 now)
> > > >
> > > >  void blk_account_io_start(struct request *rq)
> > > >  {
> > > > +     struct hd_struct *part;
> > > > +     struct request_queue *q;
> > > > +     int inflight;
> > > > +
> > > >       if (!blk_do_io_stat(rq))
> > > >               return;
> > > >
> > > >       rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> > > >
> > > >       part_stat_lock();
> > > > -     update_io_ticks(rq->part, jiffies, false);
> > > > +     part = rq->part;
> > > > +     q = part_to_disk(part)->queue;
> > > > +     inflight = blk_mq_in_flight(q, part);
> > > > +     update_io_ticks(part, jiffies, inflight > 0 ? true : false);
> > >
> > > Yeah, this account issue can be fixed by applying such 'inflight' info.
> > > However, blk_mq_in_flight() isn't cheap enough, I did get soft lockup
> > > report because of blk_mq_in_flight() called in I/O path.
Hello Ming,

Can you share your test script ?
> > > BTW, this way is just like reverting 5b18b5a73760 ("block: delete
> > > part_round_stats and switch to less precise counting").
> > >
> > >
> > Hello Ming,
> >
> > Shall we switch it to atomic mode ? update inflight_count when
> > start/done for every IO.
>
> That is more expensive than blk_mq_in_flight().
>
> > Or any other cheaper way.
>
> I guess it is hard to figure out one cheaper way to figure out
> IO in-flight count especially in case of multiple CPU cores and
> Millions of IOPS.
>

For io_ticks, we only need to know if there is any inflight IO and do
not care how many
inflight IOs. So an optimization at here is to test any set bit in
tagset, return true directly,
it will save some cpu cycles in most of case. Send v2 later.

Thanks
Weiping

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-25 11:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-23  6:46 [PATCH RFC] block: fix inaccurate io_ticks Weiping Zhang
2020-10-23  8:46 ` Ming Lei
2020-10-23  8:56   ` Weiping Zhang
2020-10-23  9:11     ` Ming Lei
2020-10-25 11:34       ` Weiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).