All of lore.kernel.org
 help / color / mirror / Atom feed
* testing io.low limit for blk-throttle
@ 2018-04-22  9:23 Paolo Valente
  2018-04-22 13:29 ` jianchao.wang
  2018-04-23  6:05 ` Joseph Qi
  0 siblings, 2 replies; 23+ messages in thread
From: Paolo Valente @ 2018-04-22  9:23 UTC (permalink / raw)
  To: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson

Hi Shaohua, all,
at last, I started testing your io.low limit for blk-throttle.  One of
the things I'm interested in is how good throttling is in achieving a
high throughput in the presence of realistic, variable workloads.

However, I seem to have bumped into a totally different problem.  The
io.low parameter doesn't seem to guarantee what I understand it is meant
to guarantee: minimum per-group bandwidths.  For example, with
- one group, the interfered, containing one process that does sequential
  reads with fio
- io.low set to 100MB/s for the interfered
- six other groups, the interferers, with each interferer containing one
  process doing sequential read with fio
- io.low set to 10MB/s for each interferer
- the workload executed on an SSD, with a 500MB/s of overall throughput
the interfered gets only 75MB/s.

In particular, the throughput of the interfered becomes lower and
lower as the number of interferers is increased.  So you can make it
become even much lower than the 75MB/s in the example above.  There
seems to be no control on bandwidth.

Am I doing something wrong?  Or did I simply misunderstand the goal of
io.low, and the only parameter for guaranteeing the desired bandwidth to
a group is io.max (to be used indirectly, by limiting the bandwidth of
the interferers)?

If useful for you, you can reproduce the above test very quickly, by
using the S suite [1] and typing:

cd thr-lat-with-interference
sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 =
10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read =
read read read" -R "0 0 0 0 0 0"

Looking forward to your feedback,
Paolo

[1]=20=

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-22  9:23 testing io.low limit for blk-throttle Paolo Valente
@ 2018-04-22 13:29 ` jianchao.wang
  2018-04-22 15:53   ` Paolo Valente
  2018-04-23  6:05 ` Joseph Qi
  1 sibling, 1 reply; 23+ messages in thread
From: jianchao.wang @ 2018-04-22 13:29 UTC (permalink / raw)
  To: Paolo Valente, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson

Hi Paolo

I used to meet similar issue on io.low.
Can you try the following patch to see whether the issue could be fixed.
https://marc.info/?l=linux-block&m=152325456307423&w=2
https://marc.info/?l=linux-block&m=152325457607425&w=2

Thanks
Jianchao

On 04/22/2018 05:23 PM, Paolo Valente wrote:
> Hi Shaohua, all,
> at last, I started testing your io.low limit for blk-throttle.  One of
> the things I'm interested in is how good throttling is in achieving a
> high throughput in the presence of realistic, variable workloads.
> 
> However, I seem to have bumped into a totally different problem.  The
> io.low parameter doesn't seem to guarantee what I understand it is meant
> to guarantee: minimum per-group bandwidths.  For example, with
> - one group, the interfered, containing one process that does sequential
>   reads with fio
> - io.low set to 100MB/s for the interfered
> - six other groups, the interferers, with each interferer containing one
>   process doing sequential read with fio
> - io.low set to 10MB/s for each interferer
> - the workload executed on an SSD, with a 500MB/s of overall throughput
> the interfered gets only 75MB/s.
> 
> In particular, the throughput of the interfered becomes lower and
> lower as the number of interferers is increased.  So you can make it
> become even much lower than the 75MB/s in the example above.  There
> seems to be no control on bandwidth.
> 
> Am I doing something wrong?  Or did I simply misunderstand the goal of
> io.low, and the only parameter for guaranteeing the desired bandwidth to
> a group is io.max (to be used indirectly, by limiting the bandwidth of
> the interferers)?
> 
> If useful for you, you can reproduce the above test very quickly, by
> using the S suite [1] and typing:
> 
> cd thr-lat-with-interference
> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read read read read" -R "0 0 0 0 0 0"
> 
> Looking forward to your feedback,
> Paolo
> 
> [1] 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-22 13:29 ` jianchao.wang
@ 2018-04-22 15:53   ` Paolo Valente
  2018-04-23  2:19     ` jianchao.wang
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2018-04-22 15:53 UTC (permalink / raw)
  To: jianchao.wang
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson



> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang =
<jianchao.w.wang@oracle.com> ha scritto:
>=20
> Hi Paolo
>=20
> I used to meet similar issue on io.low.
> Can you try the following patch to see whether the issue could be =
fixed.
> https://marc.info/?l=3Dlinux-block&m=3D152325456307423&w=3D2
> https://marc.info/?l=3Dlinux-block&m=3D152325457607425&w=3D2
>=20

Just tried. Unfortunately, nothing seems to change :(

Thanks,
Paolo

> Thanks
> Jianchao
>=20
> On 04/22/2018 05:23 PM, Paolo Valente wrote:
>> Hi Shaohua, all,
>> at last, I started testing your io.low limit for blk-throttle.  One =
of
>> the things I'm interested in is how good throttling is in achieving a
>> high throughput in the presence of realistic, variable workloads.
>>=20
>> However, I seem to have bumped into a totally different problem.  The
>> io.low parameter doesn't seem to guarantee what I understand it is =
meant
>> to guarantee: minimum per-group bandwidths.  For example, with
>> - one group, the interfered, containing one process that does =
sequential
>>  reads with fio
>> - io.low set to 100MB/s for the interfered
>> - six other groups, the interferers, with each interferer containing =
one
>>  process doing sequential read with fio
>> - io.low set to 10MB/s for each interferer
>> - the workload executed on an SSD, with a 500MB/s of overall =
throughput
>> the interfered gets only 75MB/s.
>>=20
>> In particular, the throughput of the interfered becomes lower and
>> lower as the number of interferers is increased.  So you can make it
>> become even much lower than the 75MB/s in the example above.  There
>> seems to be no control on bandwidth.
>>=20
>> Am I doing something wrong?  Or did I simply misunderstand the goal =
of
>> io.low, and the only parameter for guaranteeing the desired bandwidth =
to
>> a group is io.max (to be used indirectly, by limiting the bandwidth =
of
>> the interferers)?
>>=20
>> If useful for you, you can reproduce the above test very quickly, by
>> using the S suite [1] and typing:
>>=20
>> cd thr-lat-with-interference
>> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 =
10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read =
read read read" -R "0 0 0 0 0 0"
>>=20
>> Looking forward to your feedback,
>> Paolo
>>=20
>> [1]=20
>>=20

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-22 15:53   ` Paolo Valente
@ 2018-04-23  2:19     ` jianchao.wang
  2018-04-23  5:32       ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: jianchao.wang @ 2018-04-23  2:19 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson

Hi Paolo

As I said, I used to meet similar scenario.
After dome debug, I found out 3 issues.

Here is my setup command:
mkdir test0 test1
echo "259:0 riops=150000" > test0/io.max 
echo "259:0 riops=150000" > test1/io.max 
echo "259:0 riops=150000" > test2/io.max 
echo "259:0 riops=50000 wiops=50000 rbps=209715200 wbps=209715200 idle=200 latency=10" > test0/io.low 
echo "259:0 riops=50000 wiops=50000 rbps=209715200 wbps=209715200 idle=200 latency=10" > test1/io.low 

My NVMe card's max bps is ~600M, and max iops is ~160k.
Two cgroups' io.low is bps 200M and 50k. io.max is iops 150k

1. I setup 2 cgroups test0 and test1, one process per cgroup.
Even if only the process in test0 does IO, its iops is just 50k.
This is fixed by following patch.
https://marc.info/?l=linux-block&m=152325457607425&w=2 

2. Let the process in test0 and test1 both do IO.
Sometimes, the iops of both cgroup are 50k, look at the log, blk-throl's upgrade always fails.
This is fixed by following patch:
https://marc.info/?l=linux-block&m=152325456307423&w=2

3. After applied patch 1 and 2, still see that one of cgroup's iops will fall down to 30k ~ 40k but
blk-throl doesn't downgrade. It is due to even if the iops has been lower than the io.low limit for some time,
but the cgroup is idle, so downgrade fails. More detailed, it is due to the code segment in throtl_tg_is_idle

            (tg->latency_target && tg->bio_cnt &&
		tg->bad_bio_cnt * 5 < tg->bio_cnt)

I fixed it with following patch.
But I'm not sure about this patch, so I didn't submit it.
Please also try it. :)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index b5ba845..c9a43a4 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1819,7 +1819,7 @@ static unsigned long tg_last_low_overflow_time(struct throtl_grp *tg)
        return ret;
 }
 
-static bool throtl_tg_is_idle(struct throtl_grp *tg)
+static bool throtl_tg_is_idle(struct throtl_grp *tg, bool latency)
 {
        /*
         * cgroup is idle if:
@@ -1836,7 +1836,7 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
              tg->idletime_threshold == DFL_IDLE_THRESHOLD ||
              (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
              tg->avg_idletime > tg->idletime_threshold ||
-             (tg->latency_target && tg->bio_cnt &&
+             (tg->latency_target && tg->bio_cnt && latency &&
                tg->bad_bio_cnt * 5 < tg->bio_cnt);
        throtl_log(&tg->service_queue,
                "avg_idle=%ld, idle_threshold=%ld, bad_bio=%d, total_bio=%d, is_idle=%d, scale=%d",
@@ -1867,7 +1867,7 @@ static bool throtl_tg_can_upgrade(struct throtl_grp *tg)
 
        if (time_after_eq(jiffies,
                tg_last_low_overflow_time(tg) + tg->td->throtl_slice) &&
-           throtl_tg_is_idle(tg))
+           throtl_tg_is_idle(tg, true))
                return true;
        return false;
 }
@@ -1983,7 +1983,7 @@ static bool throtl_tg_can_downgrade(struct throtl_grp *tg)
        if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) &&
            time_after_eq(now, tg_last_low_overflow_time(tg) +
                                        td->throtl_slice) &&
-           (!throtl_tg_is_idle(tg) ||
+           (!throtl_tg_is_idle(tg, false) ||
             !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
                return true;
        return false;



On 04/22/2018 11:53 PM, Paolo Valente wrote:
> 
> 
>> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang <jianchao.w.wang@oracle.com> ha scritto:
>>
>> Hi Paolo
>>
>> I used to meet similar issue on io.low.
>> Can you try the following patch to see whether the issue could be fixed.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325456307423-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=AZ4kllxCfaXspjeSylBpK8K7ai6IPjSiffrGmzt4VEM&e=
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dblock-26m-3D152325457607425-26w-3D2&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=asJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=1EhsoSMte3kIxuVSYBFSE9W2jRKrIWI5z7-stlZ80H4&e=
>>
> 
> Just tried. Unfortunately, nothing seems to change :(
> 
> Thanks,
> Paolo
> 
>> Thanks
>> Jianchao
>>
>> On 04/22/2018 05:23 PM, Paolo Valente wrote:
>>> Hi Shaohua, all,
>>> at last, I started testing your io.low limit for blk-throttle.  One of
>>> the things I'm interested in is how good throttling is in achieving a
>>> high throughput in the presence of realistic, variable workloads.
>>>
>>> However, I seem to have bumped into a totally different problem.  The
>>> io.low parameter doesn't seem to guarantee what I understand it is meant
>>> to guarantee: minimum per-group bandwidths.  For example, with
>>> - one group, the interfered, containing one process that does sequential
>>>  reads with fio
>>> - io.low set to 100MB/s for the interfered
>>> - six other groups, the interferers, with each interferer containing one
>>>  process doing sequential read with fio
>>> - io.low set to 10MB/s for each interferer
>>> - the workload executed on an SSD, with a 500MB/s of overall throughput
>>> the interfered gets only 75MB/s.
>>>
>>> In particular, the throughput of the interfered becomes lower and
>>> lower as the number of interferers is increased.  So you can make it
>>> become even much lower than the 75MB/s in the example above.  There
>>> seems to be no control on bandwidth.
>>>
>>> Am I doing something wrong?  Or did I simply misunderstand the goal of
>>> io.low, and the only parameter for guaranteeing the desired bandwidth to
>>> a group is io.max (to be used indirectly, by limiting the bandwidth of
>>> the interferers)?
>>>
>>> If useful for you, you can reproduce the above test very quickly, by
>>> using the S suite [1] and typing:
>>>
>>> cd thr-lat-with-interference
>>> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read read read read" -R "0 0 0 0 0 0"
>>>
>>> Looking forward to your feedback,
>>> Paolo
>>>
>>> [1] 
>>>
> 
> 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  2:19     ` jianchao.wang
@ 2018-04-23  5:32       ` Paolo Valente
  2018-04-23  6:35         ` jianchao.wang
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2018-04-23  5:32 UTC (permalink / raw)
  To: jianchao.wang
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson



> Il giorno 23 apr 2018, alle ore 04:19, jianchao.wang =
<jianchao.w.wang@oracle.com> ha scritto:
>=20
> Hi Paolo
>=20
> As I said, I used to meet similar scenario.
> After dome debug, I found out 3 issues.
>=20
> Here is my setup command:
> mkdir test0 test1
> echo "259:0 riops=3D150000" > test0/io.max=20
> echo "259:0 riops=3D150000" > test1/io.max=20
> echo "259:0 riops=3D150000" > test2/io.max=20
> echo "259:0 riops=3D50000 wiops=3D50000 rbps=3D209715200 =
wbps=3D209715200 idle=3D200 latency=3D10" > test0/io.low=20
> echo "259:0 riops=3D50000 wiops=3D50000 rbps=3D209715200 =
wbps=3D209715200 idle=3D200 latency=3D10" > test1/io.low=20
>=20
> My NVMe card's max bps is ~600M, and max iops is ~160k.
> Two cgroups' io.low is bps 200M and 50k. io.max is iops 150k
>=20
> 1. I setup 2 cgroups test0 and test1, one process per cgroup.
> Even if only the process in test0 does IO, its iops is just 50k.
> This is fixed by following patch.
> https://marc.info/?l=3Dlinux-block&m=3D152325457607425&w=3D2=20
>=20
> 2. Let the process in test0 and test1 both do IO.
> Sometimes, the iops of both cgroup are 50k, look at the log, =
blk-throl's upgrade always fails.
> This is fixed by following patch:
> https://marc.info/?l=3Dlinux-block&m=3D152325456307423&w=3D2
>=20
> 3. After applied patch 1 and 2, still see that one of cgroup's iops =
will fall down to 30k ~ 40k but
> blk-throl doesn't downgrade. It is due to even if the iops has been =
lower than the io.low limit for some time,
> but the cgroup is idle, so downgrade fails. More detailed, it is due =
to the code segment in throtl_tg_is_idle
>=20
>            (tg->latency_target && tg->bio_cnt &&
> 		tg->bad_bio_cnt * 5 < tg->bio_cnt)
>=20
> I fixed it with following patch.
> But I'm not sure about this patch, so I didn't submit it.
> Please also try it. :)
>=20

Thanks for sharing this fix.  I tried it too, but nothing changes in
my test :(

At this point, my doubt is still: am I getting io.low limit right?  I
understand that an I/O-bound group should be guaranteed a rbps at
least equal to the rbps set with io.low for that group (of course,
provided that the sum of io.low limits is lower than the rate at which
the device serves all the I/O generated by the groups).  Is this
really what io.low shall guarantee?

Thanks,
Paolo


> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index b5ba845..c9a43a4 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1819,7 +1819,7 @@ static unsigned long =
tg_last_low_overflow_time(struct throtl_grp *tg)
>        return ret;
> }
>=20
> -static bool throtl_tg_is_idle(struct throtl_grp *tg)
> +static bool throtl_tg_is_idle(struct throtl_grp *tg, bool latency)
> {
>        /*
>         * cgroup is idle if:
> @@ -1836,7 +1836,7 @@ static bool throtl_tg_is_idle(struct throtl_grp =
*tg)
>              tg->idletime_threshold =3D=3D DFL_IDLE_THRESHOLD ||
>              (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
>              tg->avg_idletime > tg->idletime_threshold ||
> -             (tg->latency_target && tg->bio_cnt &&
> +             (tg->latency_target && tg->bio_cnt && latency &&
>                tg->bad_bio_cnt * 5 < tg->bio_cnt);
>        throtl_log(&tg->service_queue,
>                "avg_idle=3D%ld, idle_threshold=3D%ld, bad_bio=3D%d, =
total_bio=3D%d, is_idle=3D%d, scale=3D%d",
> @@ -1867,7 +1867,7 @@ static bool throtl_tg_can_upgrade(struct =
throtl_grp *tg)
>=20
>        if (time_after_eq(jiffies,
>                tg_last_low_overflow_time(tg) + tg->td->throtl_slice) =
&&
> -           throtl_tg_is_idle(tg))
> +           throtl_tg_is_idle(tg, true))
>                return true;
>        return false;
> }
> @@ -1983,7 +1983,7 @@ static bool throtl_tg_can_downgrade(struct =
throtl_grp *tg)
>        if (time_after_eq(now, td->low_upgrade_time + td->throtl_slice) =
&&
>            time_after_eq(now, tg_last_low_overflow_time(tg) +
>                                        td->throtl_slice) &&
> -           (!throtl_tg_is_idle(tg) ||
> +           (!throtl_tg_is_idle(tg, false) ||
>             !list_empty(&tg_to_blkg(tg)->blkcg->css.children)))
>                return true;
>        return false;
>=20
>=20
>=20
> On 04/22/2018 11:53 PM, Paolo Valente wrote:
>>=20
>>=20
>>> Il giorno 22 apr 2018, alle ore 15:29, jianchao.wang =
<jianchao.w.wang@oracle.com> ha scritto:
>>>=20
>>> Hi Paolo
>>>=20
>>> I used to meet similar issue on io.low.
>>> Can you try the following patch to see whether the issue could be =
fixed.
>>> =
https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__marc.info_-3Fl-3Dli=
nux-2Dblock-26m-3D152325456307423-26w-3D2&d=3DDwIFAg&c=3DRoP1YumCXCgaWHvlZ=
YR8PZh8Bv7qIrMUB65eapI_JnE&r=3D7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=
&m=3DasJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=3DAZ4kllxCfaXspjeSylBpK=
8K7ai6IPjSiffrGmzt4VEM&e=3D
>>> =
https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__marc.info_-3Fl-3Dli=
nux-2Dblock-26m-3D152325457607425-26w-3D2&d=3DDwIFAg&c=3DRoP1YumCXCgaWHvlZ=
YR8PZh8Bv7qIrMUB65eapI_JnE&r=3D7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=
&m=3DasJMDy9zIe2AqRVpoLbe9RMjsdZOJZ0HrRWTM3CPZeA&s=3D1EhsoSMte3kIxuVSYBFSE=
9W2jRKrIWI5z7-stlZ80H4&e=3D
>>>=20
>>=20
>> Just tried. Unfortunately, nothing seems to change :(
>>=20
>> Thanks,
>> Paolo
>>=20
>>> Thanks
>>> Jianchao
>>>=20
>>> On 04/22/2018 05:23 PM, Paolo Valente wrote:
>>>> Hi Shaohua, all,
>>>> at last, I started testing your io.low limit for blk-throttle.  One =
of
>>>> the things I'm interested in is how good throttling is in achieving =
a
>>>> high throughput in the presence of realistic, variable workloads.
>>>>=20
>>>> However, I seem to have bumped into a totally different problem.  =
The
>>>> io.low parameter doesn't seem to guarantee what I understand it is =
meant
>>>> to guarantee: minimum per-group bandwidths.  For example, with
>>>> - one group, the interfered, containing one process that does =
sequential
>>>> reads with fio
>>>> - io.low set to 100MB/s for the interfered
>>>> - six other groups, the interferers, with each interferer =
containing one
>>>> process doing sequential read with fio
>>>> - io.low set to 10MB/s for each interferer
>>>> - the workload executed on an SSD, with a 500MB/s of overall =
throughput
>>>> the interfered gets only 75MB/s.
>>>>=20
>>>> In particular, the throughput of the interfered becomes lower and
>>>> lower as the number of interferers is increased.  So you can make =
it
>>>> become even much lower than the 75MB/s in the example above.  There
>>>> seems to be no control on bandwidth.
>>>>=20
>>>> Am I doing something wrong?  Or did I simply misunderstand the goal =
of
>>>> io.low, and the only parameter for guaranteeing the desired =
bandwidth to
>>>> a group is io.max (to be used indirectly, by limiting the bandwidth =
of
>>>> the interferers)?
>>>>=20
>>>> If useful for you, you can reproduce the above test very quickly, =
by
>>>> using the S suite [1] and typing:
>>>>=20
>>>> cd thr-lat-with-interference
>>>> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 =
10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read =
read read read" -R "0 0 0 0 0 0"
>>>>=20
>>>> Looking forward to your feedback,
>>>> Paolo
>>>>=20
>>>> [1]=20
>>>>=20
>>=20
>>=20

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-22  9:23 testing io.low limit for blk-throttle Paolo Valente
  2018-04-22 13:29 ` jianchao.wang
@ 2018-04-23  6:05 ` Joseph Qi
  2018-04-23  7:35   ` Paolo Valente
  1 sibling, 1 reply; 23+ messages in thread
From: Joseph Qi @ 2018-04-23  6:05 UTC (permalink / raw)
  To: Paolo Valente, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson

Hi Paolo,
What's your idle and latency config?
IMO, io.low will allow others run more bandwidth if cgroup's average
idle time is high or latency is low. In such cases, low limit won't get
guaranteed.

Thanks,
Joseph

On 18/4/22 17:23, Paolo Valente wrote:
> Hi Shaohua, all,
> at last, I started testing your io.low limit for blk-throttle.  One of
> the things I'm interested in is how good throttling is in achieving a
> high throughput in the presence of realistic, variable workloads.
> 
> However, I seem to have bumped into a totally different problem.  The
> io.low parameter doesn't seem to guarantee what I understand it is meant
> to guarantee: minimum per-group bandwidths.  For example, with
> - one group, the interfered, containing one process that does sequential
>   reads with fio
> - io.low set to 100MB/s for the interfered
> - six other groups, the interferers, with each interferer containing one
>   process doing sequential read with fio
> - io.low set to 10MB/s for each interferer
> - the workload executed on an SSD, with a 500MB/s of overall throughput
> the interfered gets only 75MB/s.
> 
> In particular, the throughput of the interfered becomes lower and
> lower as the number of interferers is increased.  So you can make it
> become even much lower than the 75MB/s in the example above.  There
> seems to be no control on bandwidth.
> 
> Am I doing something wrong?  Or did I simply misunderstand the goal of
> io.low, and the only parameter for guaranteeing the desired bandwidth to
> a group is io.max (to be used indirectly, by limiting the bandwidth of
> the interferers)?
> 
> If useful for you, you can reproduce the above test very quickly, by
> using the S suite [1] and typing:
> 
> cd thr-lat-with-interference
> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read read read read" -R "0 0 0 0 0 0"
> 
> Looking forward to your feedback,
> Paolo
> 
> [1] 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  5:32       ` Paolo Valente
@ 2018-04-23  6:35         ` jianchao.wang
  2018-04-23  7:37           ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: jianchao.wang @ 2018-04-23  6:35 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson

Hi Paolo

On 04/23/2018 01:32 PM, Paolo Valente wrote:
> Thanks for sharing this fix.  I tried it too, but nothing changes in
> my test :(> 

That's really sad.

> At this point, my doubt is still: am I getting io.low limit right?  I
> understand that an I/O-bound group should be guaranteed a rbps at
> least equal to the rbps set with io.low for that group (of course,
> provided that the sum of io.low limits is lower than the rate at which
> the device serves all the I/O generated by the groups).  Is this
> really what io.low shall guarantee?

I agree with your point about this even if I'm not qualified to judge it.

On the other hand, could you share your test case and blk-throl config here ?

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  6:05 ` Joseph Qi
@ 2018-04-23  7:35   ` Paolo Valente
  2018-04-23  9:01     ` Joseph Qi
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2018-04-23  7:35 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson



> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi =
<jiangqi903@gmail.com> ha scritto:
>=20
> Hi Paolo,

Hi Joseph,
thanks for chiming in.

> What's your idle and latency config?

I didn't set them at all, as the only (explicit) requirement in my
basic test is that one of the group is guaranteed a minimum bps.


> IMO, io.low will allow others run more bandwidth if cgroup's average
> idle time is high or latency is low.

What you say here makes me think that I simply misunderstood the
purpose of io.low.  So, here is my problem/question: "I only need to
guarantee at least a minimum bandwidth, in bps, to a group.  Is the
io.low limit the way to go?"

I know that I can use just io.max (unless I misunderstood the goal of
io.max too :( ), but my extra purpose would be to not waste bandwidth
when some group is idle.  Yet, as for now, io.low is not working even
for the first, simpler goal, i.e., guaranteeing a minimum bandwidth to
one group when all groups are active.

Am I getting something wrong?

Otherwise, if there are some special values for idle and latency
parameters that would make throttle work for my test, I'll be of
course happy to try them.

Thanks,
Paolo

> In such cases, low limit won't get
> guaranteed.
>=20
> Thanks,
> Joseph
>=20
> On 18/4/22 17:23, Paolo Valente wrote:
>> Hi Shaohua, all,
>> at last, I started testing your io.low limit for blk-throttle.  One =
of
>> the things I'm interested in is how good throttling is in achieving a
>> high throughput in the presence of realistic, variable workloads.
>>=20
>> However, I seem to have bumped into a totally different problem.  The
>> io.low parameter doesn't seem to guarantee what I understand it is =
meant
>> to guarantee: minimum per-group bandwidths.  For example, with
>> - one group, the interfered, containing one process that does =
sequential
>>  reads with fio
>> - io.low set to 100MB/s for the interfered
>> - six other groups, the interferers, with each interferer containing =
one
>>  process doing sequential read with fio
>> - io.low set to 10MB/s for each interferer
>> - the workload executed on an SSD, with a 500MB/s of overall =
throughput
>> the interfered gets only 75MB/s.
>>=20
>> In particular, the throughput of the interfered becomes lower and
>> lower as the number of interferers is increased.  So you can make it
>> become even much lower than the 75MB/s in the example above.  There
>> seems to be no control on bandwidth.
>>=20
>> Am I doing something wrong?  Or did I simply misunderstand the goal =
of
>> io.low, and the only parameter for guaranteeing the desired bandwidth =
to
>> a group is io.max (to be used indirectly, by limiting the bandwidth =
of
>> the interferers)?
>>=20
>> If useful for you, you can reproduce the above test very quickly, by
>> using the S suite [1] and typing:
>>=20
>> cd thr-lat-with-interference
>> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 =
10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read =
read read read" -R "0 0 0 0 0 0"
>>=20
>> Looking forward to your feedback,
>> Paolo
>>=20
>> [1]=20
>>=20

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  6:35         ` jianchao.wang
@ 2018-04-23  7:37           ` Paolo Valente
  2018-04-23  8:26             ` jianchao.wang
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2018-04-23  7:37 UTC (permalink / raw)
  To: jianchao.wang
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson



> Il giorno 23 apr 2018, alle ore 08:35, jianchao.wang =
<jianchao.w.wang@oracle.com> ha scritto:
>=20
> Hi Paolo
>=20
> On 04/23/2018 01:32 PM, Paolo Valente wrote:
>> Thanks for sharing this fix.  I tried it too, but nothing changes in
>> my test :(>=20
>=20
> That's really sad.
>=20
>> At this point, my doubt is still: am I getting io.low limit right?  I
>> understand that an I/O-bound group should be guaranteed a rbps at
>> least equal to the rbps set with io.low for that group (of course,
>> provided that the sum of io.low limits is lower than the rate at =
which
>> the device serves all the I/O generated by the groups).  Is this
>> really what io.low shall guarantee?
>=20
> I agree with your point about this even if I'm not qualified to judge =
it.
>=20

ok, thank for your feedback.

> On the other hand, could you share your test case and blk-throl config =
here ?
>=20

I wrote the description of the test, and the way I made it (and so the =
way you can easily reproduce it exactly) in my first email. I'm =
repeating it here for your convenience.

With
- one group, the interfered, containing one process that does sequential
 reads with fio
- io.low set to 100MB/s for the interfered
- six other groups, the interferers, with each interferer containing one
 process doing sequential read with fio
- io.low set to 10MB/s for each interferer
- the workload executed on an SSD, with a 500MB/s of overall throughput
the interfered gets only 75MB/s.

In particular, the throughput of the interfered becomes lower and
lower as the number of interferers is increased.  So you can make it
become even much lower than the 75MB/s in the example above.  There
seems to be no control on bandwidth.

Am I doing something wrong?  Or did I simply misunderstand the goal of
io.low, and the only parameter for guaranteeing the desired bandwidth to
a group is io.max (to be used indirectly, by limiting the bandwidth of
the interferers)?

If useful for you, you can reproduce the above test very quickly, by
using the S suite [1] and typing:

cd thr-lat-with-interference
sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 =
10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read =
read read read" -R "0 0 0 0 0 0"

[1] https://github.com/Algodev-github/S

> Thanks
> Jianchao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  7:37           ` Paolo Valente
@ 2018-04-23  8:26             ` jianchao.wang
  0 siblings, 0 replies; 23+ messages in thread
From: jianchao.wang @ 2018-04-23  8:26 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson

Hi Paolo

When I test execute the script, I got this
8:0 rbps=10000000 wbps=0 riops=0 wiops=0 idle=0 latency=max

The idle is 0.
I'm afraid the io.low would not work.
Please refer to the following code in tg_set_limit

	/* force user to configure all settings for low limit  */
	if (!(tg->bps[READ][LIMIT_LOW] || tg->iops[READ][LIMIT_LOW] ||
	      tg->bps[WRITE][LIMIT_LOW] || tg->iops[WRITE][LIMIT_LOW]) ||
	    tg->idletime_threshold_conf == DFL_IDLE_THRESHOLD ||    //-----> HERE
	    tg->latency_target_conf == DFL_LATENCY_TARGET) {
		tg->bps[READ][LIMIT_LOW] = 0;
		tg->bps[WRITE][LIMIT_LOW] = 0;
		tg->iops[READ][LIMIT_LOW] = 0;
		tg->iops[WRITE][LIMIT_LOW] = 0;
		tg->idletime_threshold = DFL_IDLE_THRESHOLD;
		tg->latency_target = DFL_LATENCY_TARGET;
	} else if (index == LIMIT_LOW) {
		tg->idletime_threshold = tg->idletime_threshold_conf;
		tg->latency_target = tg->latency_target_conf;
	}

	blk_throtl_update_limit_valid(tg->td);


Thanks
Jianchao

On 04/23/2018 03:37 PM, Paolo Valente wrote:
> cd thr-lat-with-interference
> sudo ./thr-lat-with-interference.sh -b t -w 100000000 -W "10000000 10000000 10000000 10000000 10000000 10000000" -n 6 -T "read read read read read read" -R "0 0 0 0 0 0"

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  7:35   ` Paolo Valente
@ 2018-04-23  9:01     ` Joseph Qi
  2018-04-24 12:12         ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Joseph Qi @ 2018-04-23  9:01 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson



On 18/4/23 15:35, Paolo Valente wrote:
> 
> 
>> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>
>> Hi Paolo,
> 
> Hi Joseph,
> thanks for chiming in.
> 
>> What's your idle and latency config?
> 
> I didn't set them at all, as the only (explicit) requirement in my
> basic test is that one of the group is guaranteed a minimum bps.
> 
> 
>> IMO, io.low will allow others run more bandwidth if cgroup's average
>> idle time is high or latency is low.
> 
> What you say here makes me think that I simply misunderstood the
> purpose of io.low.  So, here is my problem/question: "I only need to
> guarantee at least a minimum bandwidth, in bps, to a group.  Is the
> io.low limit the way to go?"
> 
> I know that I can use just io.max (unless I misunderstood the goal of
> io.max too :( ), but my extra purpose would be to not waste bandwidth
> when some group is idle.  Yet, as for now, io.low is not working even
> for the first, simpler goal, i.e., guaranteeing a minimum bandwidth to
> one group when all groups are active.
> 
> Am I getting something wrong?
> 
> Otherwise, if there are some special values for idle and latency
> parameters that would make throttle work for my test, I'll be of
> course happy to try them.
> 
I think you can try idle time with 1000us for all cgroups, and latency
target 100us for cgroup with low limit 100MB/s and 2000us for cgroups
with low limit 10MB/s. That means cgroup with low latency target will
be preferred.
BTW, from my expeierence the parameters are not easy to set because
they are strongly correlated to the cgroup IO behavior.

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-23  9:01     ` Joseph Qi
@ 2018-04-24 12:12         ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-04-24 12:12 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo



> Il giorno 23 apr 2018, alle ore 11:01, Joseph Qi =
<jiangqi903@gmail.com> ha scritto:
>=20
>=20
>=20
> On 18/4/23 15:35, Paolo Valente wrote:
>>=20
>>=20
>>> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi =
<jiangqi903@gmail.com> ha scritto:
>>>=20
>>> Hi Paolo,
>>=20
>> Hi Joseph,
>> thanks for chiming in.
>>=20
>>> What's your idle and latency config?
>>=20
>> I didn't set them at all, as the only (explicit) requirement in my
>> basic test is that one of the group is guaranteed a minimum bps.
>>=20
>>=20
>>> IMO, io.low will allow others run more bandwidth if cgroup's average
>>> idle time is high or latency is low.
>>=20
>> What you say here makes me think that I simply misunderstood the
>> purpose of io.low.  So, here is my problem/question: "I only need to
>> guarantee at least a minimum bandwidth, in bps, to a group.  Is the
>> io.low limit the way to go?"
>>=20
>> I know that I can use just io.max (unless I misunderstood the goal of
>> io.max too :( ), but my extra purpose would be to not waste bandwidth
>> when some group is idle.  Yet, as for now, io.low is not working even
>> for the first, simpler goal, i.e., guaranteeing a minimum bandwidth =
to
>> one group when all groups are active.
>>=20
>> Am I getting something wrong?
>>=20
>> Otherwise, if there are some special values for idle and latency
>> parameters that would make throttle work for my test, I'll be of
>> course happy to try them.
>>=20
> I think you can try idle time with 1000us for all cgroups, and latency
> target 100us for cgroup with low limit 100MB/s and 2000us for cgroups
> with low limit 10MB/s. That means cgroup with low latency target will
> be preferred.
> BTW, from my expeierence the parameters are not easy to set because
> they are strongly correlated to the cgroup IO behavior.
>=20

+Tejun (I guess he might be interested in the results below)

Hi Joseph,
thanks for chiming in. Your suggestion did work!

At first, I thought I had also understood the use of latency from the
outcome of your suggestion: "want low limit really guaranteed for a
group?  set target latency to a low value for it." But then, as a
crosscheck, I repeated the same exact test, but reversing target
latencies: I gave 2000 to the interfered (the group with 100MB/s
limit) and 100 to the interferers.  And the interfered still got more
than 100MB/s!  So I exaggerated: 20000 to the interfered.
Same outcome :(

I tried really many other combinations, to try to figure this out, but
results seemed more or less random w.r.t. to latency values.  I
didn't even start to test different values for idle.

So, the only sound lesson that I seem to have learned is: if I want
low limits to be enforced, I have to set target latency and idle
explicitly.  The actual values of latencies matter little, or not at
all. At least this holds for my simple tests.

At any rate, thanks to your help, Joseph, I could move to the most
interesting part for me: how effective is blk-throttle with low
limits?  I could well be wrong again, but my results do not seem that
good.  With the simplest type of non-toy example I considered, I
recorded throughput losses, apparently caused mainly by blk-throttle,
and ranging from 64% to 75%.

Here is a worst-case example.  For each step, I'm reporting below the
command by which you can reproduce that step with the
thr-lat-with-interference benchmark of the S suite [1].  I just split
bandwidth equally among five groups, on my SSD.  The device showed a
peak rate of ~515MB/s in this test, so I set rpbs to 100MB/s for each
group (and tried various values, and combinations of values, for the
target latency, without any effect on the results).  To begin, I made
every group do sequential reads.  Everything worked perfectly fine.

But then I made one group do random I/O [2], and troubles began.  Even
if the group doing random I/O was given a target latency of 100usec
(or lower), while the other had a target latency of 2000usec, the poor
random-I/O group got only 4.7 MB/s!  (A single process doing 4k sync
random I/O reaches 25MB/s on my SSD.)

I guess things broke because low limits did not comply any longer with
the lower speed that device reached with the new, mixed workload: the
device reached 376MB/s, while the sum of the low limits was 500MB/s.
BTW the 'fault' for this loss of throughput was not only of the device
and the workload: if I switched throttling off, then the device still
reached its peak rate, although granting only 1.3MB/s to the
random-I/O group.

So, to comply with the 376MB/s, I lowered the low limits to 74MB/s per
group (to avoid a too tight 75MB/s) [3].  A little better: the
random-I/O group got 7.2 MB/s.  But the total throughput went down
further, to 289MB/s, and became again lower than the sum of the low
limits.  Most certainly, this time the throughput went down mainly
because blk-throttling was serving the random I/O more than before.

To make a long story short, I arrived to setting just 12MB/s as low
limit for each group [4].  The random-I/O group was finally happy,
with a revitalizing 12.77MB/s.  But the total throughput dropped down
to 127MB/s, i.e., ~25% of the peak rate of the device.  Now the
'fault' for the throughput loss seemed undoubtedly of blk-throttle.
The latter was evidently over-throttling some group.

To sum up, for my device, 12MB/s seems to be the highest value for
which low limits can be guaranteed.  But setting these limits entails
a high cost: if just one group really does random I/O, then 75% of the
throughput is lost.

There would be other issues too.  For example, 12MB/s might be too
little for the needs of some group in some time period.  This fact would
make it extremely difficult, if ever possible, to set low limits that
comply with the needs of more dynamic (and probably more
realistic) workloads than the above one.

I think this is all, sorry for the long mail, I tried to shrink it as
much as possible.  Looking forward to some feedback.

Thanks,
Paolo

[1] https://github.com/Algodev-github/S
[2] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 100M -W 100M -t =
randread -L 2000
[3] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 74M -W 74M -t =
randread -L 2000
[4] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 12M -W 12M -t =
randread -L 2000

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
@ 2018-04-24 12:12         ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-04-24 12:12 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo



> Il giorno 23 apr 2018, alle ore 11:01, Joseph Qi <jiangqi903@gmail.com> ha scritto:
> 
> 
> 
> On 18/4/23 15:35, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>> 
>>> Hi Paolo,
>> 
>> Hi Joseph,
>> thanks for chiming in.
>> 
>>> What's your idle and latency config?
>> 
>> I didn't set them at all, as the only (explicit) requirement in my
>> basic test is that one of the group is guaranteed a minimum bps.
>> 
>> 
>>> IMO, io.low will allow others run more bandwidth if cgroup's average
>>> idle time is high or latency is low.
>> 
>> What you say here makes me think that I simply misunderstood the
>> purpose of io.low.  So, here is my problem/question: "I only need to
>> guarantee at least a minimum bandwidth, in bps, to a group.  Is the
>> io.low limit the way to go?"
>> 
>> I know that I can use just io.max (unless I misunderstood the goal of
>> io.max too :( ), but my extra purpose would be to not waste bandwidth
>> when some group is idle.  Yet, as for now, io.low is not working even
>> for the first, simpler goal, i.e., guaranteeing a minimum bandwidth to
>> one group when all groups are active.
>> 
>> Am I getting something wrong?
>> 
>> Otherwise, if there are some special values for idle and latency
>> parameters that would make throttle work for my test, I'll be of
>> course happy to try them.
>> 
> I think you can try idle time with 1000us for all cgroups, and latency
> target 100us for cgroup with low limit 100MB/s and 2000us for cgroups
> with low limit 10MB/s. That means cgroup with low latency target will
> be preferred.
> BTW, from my expeierence the parameters are not easy to set because
> they are strongly correlated to the cgroup IO behavior.
> 

+Tejun (I guess he might be interested in the results below)

Hi Joseph,
thanks for chiming in. Your suggestion did work!

At first, I thought I had also understood the use of latency from the
outcome of your suggestion: "want low limit really guaranteed for a
group?  set target latency to a low value for it." But then, as a
crosscheck, I repeated the same exact test, but reversing target
latencies: I gave 2000 to the interfered (the group with 100MB/s
limit) and 100 to the interferers.  And the interfered still got more
than 100MB/s!  So I exaggerated: 20000 to the interfered.
Same outcome :(

I tried really many other combinations, to try to figure this out, but
results seemed more or less random w.r.t. to latency values.  I
didn't even start to test different values for idle.

So, the only sound lesson that I seem to have learned is: if I want
low limits to be enforced, I have to set target latency and idle
explicitly.  The actual values of latencies matter little, or not at
all. At least this holds for my simple tests.

At any rate, thanks to your help, Joseph, I could move to the most
interesting part for me: how effective is blk-throttle with low
limits?  I could well be wrong again, but my results do not seem that
good.  With the simplest type of non-toy example I considered, I
recorded throughput losses, apparently caused mainly by blk-throttle,
and ranging from 64% to 75%.

Here is a worst-case example.  For each step, I'm reporting below the
command by which you can reproduce that step with the
thr-lat-with-interference benchmark of the S suite [1].  I just split
bandwidth equally among five groups, on my SSD.  The device showed a
peak rate of ~515MB/s in this test, so I set rpbs to 100MB/s for each
group (and tried various values, and combinations of values, for the
target latency, without any effect on the results).  To begin, I made
every group do sequential reads.  Everything worked perfectly fine.

But then I made one group do random I/O [2], and troubles began.  Even
if the group doing random I/O was given a target latency of 100usec
(or lower), while the other had a target latency of 2000usec, the poor
random-I/O group got only 4.7 MB/s!  (A single process doing 4k sync
random I/O reaches 25MB/s on my SSD.)

I guess things broke because low limits did not comply any longer with
the lower speed that device reached with the new, mixed workload: the
device reached 376MB/s, while the sum of the low limits was 500MB/s.
BTW the 'fault' for this loss of throughput was not only of the device
and the workload: if I switched throttling off, then the device still
reached its peak rate, although granting only 1.3MB/s to the
random-I/O group.

So, to comply with the 376MB/s, I lowered the low limits to 74MB/s per
group (to avoid a too tight 75MB/s) [3].  A little better: the
random-I/O group got 7.2 MB/s.  But the total throughput went down
further, to 289MB/s, and became again lower than the sum of the low
limits.  Most certainly, this time the throughput went down mainly
because blk-throttling was serving the random I/O more than before.

To make a long story short, I arrived to setting just 12MB/s as low
limit for each group [4].  The random-I/O group was finally happy,
with a revitalizing 12.77MB/s.  But the total throughput dropped down
to 127MB/s, i.e., ~25% of the peak rate of the device.  Now the
'fault' for the throughput loss seemed undoubtedly of blk-throttle.
The latter was evidently over-throttling some group.

To sum up, for my device, 12MB/s seems to be the highest value for
which low limits can be guaranteed.  But setting these limits entails
a high cost: if just one group really does random I/O, then 75% of the
throughput is lost.

There would be other issues too.  For example, 12MB/s might be too
little for the needs of some group in some time period.  This fact would
make it extremely difficult, if ever possible, to set low limits that
comply with the needs of more dynamic (and probably more
realistic) workloads than the above one.

I think this is all, sorry for the long mail, I tried to shrink it as
much as possible.  Looking forward to some feedback.

Thanks,
Paolo

[1] https://github.com/Algodev-github/S
[2] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 100M -W 100M -t randread -L 2000
[3] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 74M -W 74M -t randread -L 2000
[4] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 12M -W 12M -t randread -L 2000

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-24 12:12         ` Paolo Valente
  (?)
@ 2018-04-25 12:13         ` Joseph Qi
  2018-04-26 17:27           ` Paolo Valente
  -1 siblings, 1 reply; 23+ messages in thread
From: Joseph Qi @ 2018-04-25 12:13 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo

Hi Paolo,

On 18/4/24 20:12, Paolo Valente wrote:
> 
> 
>> Il giorno 23 apr 2018, alle ore 11:01, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>
>>
>>
>> On 18/4/23 15:35, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 23 apr 2018, alle ore 08:05, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>>>
>>>> Hi Paolo,
>>>
>>> Hi Joseph,
>>> thanks for chiming in.
>>>
>>>> What's your idle and latency config?
>>>
>>> I didn't set them at all, as the only (explicit) requirement in my
>>> basic test is that one of the group is guaranteed a minimum bps.
>>>
>>>
>>>> IMO, io.low will allow others run more bandwidth if cgroup's average
>>>> idle time is high or latency is low.
>>>
>>> What you say here makes me think that I simply misunderstood the
>>> purpose of io.low.  So, here is my problem/question: "I only need to
>>> guarantee at least a minimum bandwidth, in bps, to a group.  Is the
>>> io.low limit the way to go?"
>>>
>>> I know that I can use just io.max (unless I misunderstood the goal of
>>> io.max too :( ), but my extra purpose would be to not waste bandwidth
>>> when some group is idle.  Yet, as for now, io.low is not working even
>>> for the first, simpler goal, i.e., guaranteeing a minimum bandwidth to
>>> one group when all groups are active.
>>>
>>> Am I getting something wrong?
>>>
>>> Otherwise, if there are some special values for idle and latency
>>> parameters that would make throttle work for my test, I'll be of
>>> course happy to try them.
>>>
>> I think you can try idle time with 1000us for all cgroups, and latency
>> target 100us for cgroup with low limit 100MB/s and 2000us for cgroups
>> with low limit 10MB/s. That means cgroup with low latency target will
>> be preferred.
>> BTW, from my expeierence the parameters are not easy to set because
>> they are strongly correlated to the cgroup IO behavior.
>>
> 
> +Tejun (I guess he might be interested in the results below)
> 
> Hi Joseph,
> thanks for chiming in. Your suggestion did work!
> 
> At first, I thought I had also understood the use of latency from the
> outcome of your suggestion: "want low limit really guaranteed for a
> group?  set target latency to a low value for it." But then, as a
> crosscheck, I repeated the same exact test, but reversing target
> latencies: I gave 2000 to the interfered (the group with 100MB/s
> limit) and 100 to the interferers.  And the interfered still got more
> than 100MB/s!  So I exaggerated: 20000 to the interfered.
> Same outcome :(
> 
> I tried really many other combinations, to try to figure this out, but
> results seemed more or less random w.r.t. to latency values.  I
> didn't even start to test different values for idle.
> 
> So, the only sound lesson that I seem to have learned is: if I want
> low limits to be enforced, I have to set target latency and idle
> explicitly.  The actual values of latencies matter little, or not at
> all. At least this holds for my simple tests.
> 
> At any rate, thanks to your help, Joseph, I could move to the most
> interesting part for me: how effective is blk-throttle with low
> limits?  I could well be wrong again, but my results do not seem that
> good.  With the simplest type of non-toy example I considered, I
> recorded throughput losses, apparently caused mainly by blk-throttle,
> and ranging from 64% to 75%.
> 
> Here is a worst-case example.  For each step, I'm reporting below the
> command by which you can reproduce that step with the
> thr-lat-with-interference benchmark of the S suite [1].  I just split
> bandwidth equally among five groups, on my SSD.  The device showed a
> peak rate of ~515MB/s in this test, so I set rpbs to 100MB/s for each
> group (and tried various values, and combinations of values, for the
> target latency, without any effect on the results).  To begin, I made
> every group do sequential reads.  Everything worked perfectly fine.
> 
> But then I made one group do random I/O [2], and troubles began.  Even
> if the group doing random I/O was given a target latency of 100usec
> (or lower), while the other had a target latency of 2000usec, the poor
> random-I/O group got only 4.7 MB/s!  (A single process doing 4k sync
> random I/O reaches 25MB/s on my SSD.)
> 
> I guess things broke because low limits did not comply any longer with
> the lower speed that device reached with the new, mixed workload: the
> device reached 376MB/s, while the sum of the low limits was 500MB/s.
> BTW the 'fault' for this loss of throughput was not only of the device
> and the workload: if I switched throttling off, then the device still
> reached its peak rate, although granting only 1.3MB/s to the
> random-I/O group.
> 
> So, to comply with the 376MB/s, I lowered the low limits to 74MB/s per
> group (to avoid a too tight 75MB/s) [3].  A little better: the
> random-I/O group got 7.2 MB/s.  But the total throughput went down
> further, to 289MB/s, and became again lower than the sum of the low
> limits.  Most certainly, this time the throughput went down mainly
> because blk-throttling was serving the random I/O more than before.
> 
> To make a long story short, I arrived to setting just 12MB/s as low
> limit for each group [4].  The random-I/O group was finally happy,
> with a revitalizing 12.77MB/s.  But the total throughput dropped down
> to 127MB/s, i.e., ~25% of the peak rate of the device.  Now the
> 'fault' for the throughput loss seemed undoubtedly of blk-throttle.
> The latter was evidently over-throttling some group.
> 
> To sum up, for my device, 12MB/s seems to be the highest value for
> which low limits can be guaranteed.  But setting these limits entails
> a high cost: if just one group really does random I/O, then 75% of the
> throughput is lost.
> 
> There would be other issues too.  For example, 12MB/s might be too
> little for the needs of some group in some time period.  This fact would
> make it extremely difficult, if ever possible, to set low limits that
> comply with the needs of more dynamic (and probably more
> realistic) workloads than the above one.
> 
Could you run blktrace as well when testing your case? There are several
throtl traces to help analyze whether it is caused by frequently
upgrade/downgrade.
If all cgroups are just running under low, I'am afraid the case you
tested has something to do with how SSD handle mixed workload IOs.

Thanks,
Joseph

> I think this is all, sorry for the long mail, I tried to shrink it as
> much as possible.  Looking forward to some feedback.
> 
> Thanks,
> Paolo
> 
> [1] https://github.com/Algodev-github/S
> [2] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 100M -W 100M -t randread -L 2000
> [3] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 74M -W 74M -t randread -L 2000
> [4] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 12M -W 12M -t randread -L 2000
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-25 12:13         ` Joseph Qi
@ 2018-04-26 17:27           ` Paolo Valente
  2018-04-27  3:27             ` Joseph Qi
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2018-04-26 17:27 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo,
	'Paolo Valente' via bfq-iosched

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]



> Il giorno 25 apr 2018, alle ore 14:13, Joseph Qi <jiangqi903@gmail.com> ha scritto:
> 
> Hi Paolo,
> 

Hi Joseph

> ...
> Could you run blktrace as well when testing your case? There are several
> throtl traces to help analyze whether it is caused by frequently
> upgrade/downgrade.

Certainly.  You can find a trace attached.  Unfortunately, I'm not
familiar with the internals of blk-throttle and low limit, so, if you
want me to analyze the trace, give me some hints on what I have to
look for.  Otherwise, I'll be happy to learn from your analysis.

> If all cgroups are just running under low, I'am afraid the case you
> tested has something to do with how SSD handle mixed workload IOs.
> 

That's a rather important point.  To investigate it, I repeated the
same test with both bfq-mq, the development version of bfq, which
contains also improvements not yet in mainline, and bfq, the version
you can find in mainline.  I set strict_guarantees to 1 for both
bfq-mq and bfq (namely, the value for which bfq-mq/bfq reaches the
lowest total throughput), and I gave the random-I/O group twice the
weight of the other groups [1] (just as tentative values).

Result are rather different now.  With bfq-mq, the random-I/O group
enjoys about the same throughput as with blk-throttle, precisely
11.16MB/s, but the total throughput is now 2.2 times as high:
284.127MB/s.  The performance of bfq is expectedly poorer, as some
important improvements have not yet been poured from bfq-mq to bfq:
9.16MB/s for the random-I/O group, and 190MB/s for the total
throughput.

From this, I guess we can deduce that the cause of the low throughput
with blk-throttle is blk-throttle, and not the drive capabilities.  As
you already pointed out, the attached trace can tell us what went
wrong.

Thanks,
Paolo

[1] sudo ./thr-lat-with-interference.sh -b p -t randread -n 5 -w 100 -W 50


[-- Attachment #2: trace.zip --]
[-- Type: application/zip, Size: 2445237 bytes --]

[-- Attachment #3: Type: text/plain, Size: 506 bytes --]




> Thanks,
> Joseph
> 
>> I think this is all, sorry for the long mail, I tried to shrink it as
>> much as possible.  Looking forward to some feedback.
>> 
>> Thanks,
>> Paolo
>> 
>> [1] https://github.com/Algodev-github/S
>> [2] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 100M -W 100M -t randread -L 2000
>> [3] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 74M -W 74M -t randread -L 2000
>> [4] sudo ./thr-lat-with-interference.sh -b t -n 4 -w 12M -W 12M -t randread -L 2000


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-24 12:12         ` Paolo Valente
  (?)
  (?)
@ 2018-04-26 18:32         ` Tejun Heo
  2018-04-27  2:09           ` jianchao.wang
  2018-05-03 16:35             ` Paolo Valente
  -1 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2018-04-26 18:32 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Joseph Qi, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson, LKML

Hello,

On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote:
> +Tejun (I guess he might be interested in the results below)

Our experiments didn't work out too well either.  At this point, it
isn't clear whether io.low will ever leave experimental state.  We're
trying to find a working solution.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-26 18:32         ` Tejun Heo
@ 2018-04-27  2:09           ` jianchao.wang
  2018-04-27  2:40             ` Joseph Qi
  2018-05-03 16:35             ` Paolo Valente
  1 sibling, 1 reply; 23+ messages in thread
From: jianchao.wang @ 2018-04-27  2:09 UTC (permalink / raw)
  To: Tejun Heo, Joseph Qi
  Cc: Paolo Valente, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson, LKML

Hi Tejun and Joseph

On 04/27/2018 02:32 AM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote:
>> +Tejun (I guess he might be interested in the results below)
> 
> Our experiments didn't work out too well either.  At this point, it
> isn't clear whether io.low will ever leave experimental state.  We're
> trying to find a working solution.

Would you please take a look at the following two patches.

https://marc.info/?l=linux-block&m=152325456307423&w=2
https://marc.info/?l=linux-block&m=152325457607425&w=2

In addition, when I tested blk-throtl io.low on NVMe card, I always got
even if the iops has been lower than io.low limit for a while, but the
due to group is not idle, the downgrade always fails.

       tg->latency_target && tg->bio_cnt &&
		tg->bad_bio_cnt * 5 < tg->bio_cn

the latency always looks well even the sum of two groups's iops has reached the top.
so I disable this check on my test, plus the 2 patches above, the io.low
could basically works.

My NVMe card's max bps is ~600M, and max iops is ~160k.
Here is my config
io.low riops=50000 wiops=50000 rbps=209715200 wbps=209715200 idle=200 latency=10
io.max riops=150000
There are two cgroups in my test, both of them have same config.

In addition, saying "basically work" is due to the iops of the two cgroup will jump up and down.
such as, I launched one fio test per cgroup, the iops will wave as following:

group0   30k  50k   70k   60k  40k
group1   120k 100k  80k   90k  110k

however, if I launched two fio tests only in one cgroup, the iops of two test could stay 
about 70k~80k.

Could help to explain this scenario ?

Thanks in advance
Jianchao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-27  2:09           ` jianchao.wang
@ 2018-04-27  2:40             ` Joseph Qi
  0 siblings, 0 replies; 23+ messages in thread
From: Joseph Qi @ 2018-04-27  2:40 UTC (permalink / raw)
  To: jianchao.wang, Tejun Heo
  Cc: Paolo Valente, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson, LKML

Hi Jianchao,

On 18/4/27 10:09, jianchao.wang wrote:
> Hi Tejun and Joseph
> 
> On 04/27/2018 02:32 AM, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote:
>>> +Tejun (I guess he might be interested in the results below)
>>
>> Our experiments didn't work out too well either.  At this point, it
>> isn't clear whether io.low will ever leave experimental state.  We're
>> trying to find a working solution.
> 
> Would you please take a look at the following two patches.
> 
> https://marc.info/?l=linux-block&m=152325456307423&w=2
> https://marc.info/?l=linux-block&m=152325457607425&w=2
> 
> In addition, when I tested blk-throtl io.low on NVMe card, I always got
> even if the iops has been lower than io.low limit for a while, but the
> due to group is not idle, the downgrade always fails.
> 
>        tg->latency_target && tg->bio_cnt &&
> 		tg->bad_bio_cnt * 5 < tg->bio_cn
> 

I'm afraid the latency check is a must for io.low. Because idle time
check can only apply to simple scenarios from my test.

Yes, in some cases last_low_overflow_time does have problems.
And for not downgrade properly, I've also posted two patches before,
waiting Shaohua's review. You can also have a try.

https://patchwork.kernel.org/patch/10177185/
https://patchwork.kernel.org/patch/10177187/

Thanks,
Joseph

> the latency always looks well even the sum of two groups's iops has reached the top.
> so I disable this check on my test, plus the 2 patches above, the io.low
> could basically works.
> 
> My NVMe card's max bps is ~600M, and max iops is ~160k.
> Here is my config
> io.low riops=50000 wiops=50000 rbps=209715200 wbps=209715200 idle=200 latency=10
> io.max riops=150000
> There are two cgroups in my test, both of them have same config.
> 
> In addition, saying "basically work" is due to the iops of the two cgroup will jump up and down.
> such as, I launched one fio test per cgroup, the iops will wave as following:
> 
> group0   30k  50k   70k   60k  40k
> group1   120k 100k  80k   90k  110k
> 
> however, if I launched two fio tests only in one cgroup, the iops of two test could stay 
> about 70k~80k.
> 
> Could help to explain this scenario ?
> 
> Thanks in advance
> Jianchao
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-26 17:27           ` Paolo Valente
@ 2018-04-27  3:27             ` Joseph Qi
  2018-04-27  5:14                 ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Joseph Qi @ 2018-04-27  3:27 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo,
	'Paolo Valente' via bfq-iosched

Hi Paolo,

On 18/4/27 01:27, Paolo Valente wrote:
> 
> 
>> Il giorno 25 apr 2018, alle ore 14:13, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>
>> Hi Paolo,
>>
> 
> Hi Joseph
> 
>> ...
>> Could you run blktrace as well when testing your case? There are several
>> throtl traces to help analyze whether it is caused by frequently
>> upgrade/downgrade.
> 
> Certainly.  You can find a trace attached.  Unfortunately, I'm not
> familiar with the internals of blk-throttle and low limit, so, if you
> want me to analyze the trace, give me some hints on what I have to
> look for.  Otherwise, I'll be happy to learn from your analysis.
> 

I've taken a glance at your blktrace attached. It is only upgrade at first and
then downgrade (just adjust limit, not to LIMIT_LOW) frequently.
But I don't know why it always thinks throttle group is not idle.

For example:
fio-2336  [004] d...   428.458249:   8,16   m   N throtl avg_idle=90, idle_threshold=1000, bad_bio=10, total_bio=84, is_idle=0, scale=9
fio-2336  [004] d...   428.458251:   8,16   m   N throtl downgrade, scale 4

In throtl_tg_is_idle():
is_idle = ... ||
	(tg->latency_target && tg->bio_cnt &&
	 tg->bad_bio_cnt * 5 < tg->bio_cnt);

It should be idle and allow run more bandwidth. But here the result shows not
idle (is_idle=0). I have to do more investigation to figure it out why. 

You can also filter these logs using:
grep throtl trace | grep -E 'upgrade|downgrade|is_idle'

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-27  3:27             ` Joseph Qi
@ 2018-04-27  5:14                 ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-04-27  5:14 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo,
	'Paolo Valente' via bfq-iosched



> Il giorno 27 apr 2018, alle ore 05:27, Joseph Qi =
<jiangqi903@gmail.com> ha scritto:
>=20
> Hi Paolo,
>=20
> On 18/4/27 01:27, Paolo Valente wrote:
>>=20
>>=20
>>> Il giorno 25 apr 2018, alle ore 14:13, Joseph Qi =
<jiangqi903@gmail.com> ha scritto:
>>>=20
>>> Hi Paolo,
>>>=20
>>=20
>> Hi Joseph
>>=20
>>> ...
>>> Could you run blktrace as well when testing your case? There are =
several
>>> throtl traces to help analyze whether it is caused by frequently
>>> upgrade/downgrade.
>>=20
>> Certainly.  You can find a trace attached.  Unfortunately, I'm not
>> familiar with the internals of blk-throttle and low limit, so, if you
>> want me to analyze the trace, give me some hints on what I have to
>> look for.  Otherwise, I'll be happy to learn from your analysis.
>>=20
>=20
> I've taken a glance at your blktrace attached. It is only upgrade at =
first and
> then downgrade (just adjust limit, not to LIMIT_LOW) frequently.
> But I don't know why it always thinks throttle group is not idle.
>=20
> For example:
> fio-2336  [004] d...   428.458249:   8,16   m   N throtl avg_idle=3D90, =
idle_threshold=3D1000, bad_bio=3D10, total_bio=3D84, is_idle=3D0, =
scale=3D9
> fio-2336  [004] d...   428.458251:   8,16   m   N throtl downgrade, =
scale 4
>=20
> In throtl_tg_is_idle():
> is_idle =3D ... ||
> 	(tg->latency_target && tg->bio_cnt &&
> 	 tg->bad_bio_cnt * 5 < tg->bio_cnt);
>=20
> It should be idle and allow run more bandwidth. But here the result =
shows not
> idle (is_idle=3D0). I have to do more investigation to figure it out =
why.=20
>=20

Hi Joseph,
actually this doesn't surprise me much, for this scenario I expected
exactly that blk-throttle would have considered the random-I/O group,
for most of the time,
1) non idle,
2) above the 100usec target latency, and
3) below low limit,

In fact,
1) The group can evidently issue I/O at a much higher rate than that
received, so, immediately after its last pending I/O has been served,
the group issues new I/O; in the end, it is is non idle most of the
time
2) To try to enforce the 10MB/s limit, blk-throttle necessarily makes
the group oscillate around 10MB/s, which means that the group is
frequently below limit (this would not have held only if the group had
actually received much more than 10MB/s, but it is not so)
3) For each of the 4k random I/Os of the group, the time needed by the
drive to serve that I/O is already around 40-50usec.  So, since the
group is of course not constantly in service, it is very easy that,
because of throttling, the latency of most I/Os of the group goes
beyond 100usec.

But, as it is often the case for me, I might have simply misunderstood
blk-throttle parameters, and I might be just wrong here.

Thanks,
Paolo

> You can also filter these logs using:
> grep throtl trace | grep -E 'upgrade|downgrade|is_idle'
>=20
> Thanks,
> Joseph

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
@ 2018-04-27  5:14                 ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-04-27  5:14 UTC (permalink / raw)
  To: Joseph Qi
  Cc: linux-block, Jens Axboe, Shaohua Li, Mark Brown, Linus Walleij,
	Ulf Hansson, LKML, Tejun Heo,
	'Paolo Valente' via bfq-iosched



> Il giorno 27 apr 2018, alle ore 05:27, Joseph Qi <jiangqi903@gmail.com> ha scritto:
> 
> Hi Paolo,
> 
> On 18/4/27 01:27, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 25 apr 2018, alle ore 14:13, Joseph Qi <jiangqi903@gmail.com> ha scritto:
>>> 
>>> Hi Paolo,
>>> 
>> 
>> Hi Joseph
>> 
>>> ...
>>> Could you run blktrace as well when testing your case? There are several
>>> throtl traces to help analyze whether it is caused by frequently
>>> upgrade/downgrade.
>> 
>> Certainly.  You can find a trace attached.  Unfortunately, I'm not
>> familiar with the internals of blk-throttle and low limit, so, if you
>> want me to analyze the trace, give me some hints on what I have to
>> look for.  Otherwise, I'll be happy to learn from your analysis.
>> 
> 
> I've taken a glance at your blktrace attached. It is only upgrade at first and
> then downgrade (just adjust limit, not to LIMIT_LOW) frequently.
> But I don't know why it always thinks throttle group is not idle.
> 
> For example:
> fio-2336  [004] d...   428.458249:   8,16   m   N throtl avg_idle=90, idle_threshold=1000, bad_bio=10, total_bio=84, is_idle=0, scale=9
> fio-2336  [004] d...   428.458251:   8,16   m   N throtl downgrade, scale 4
> 
> In throtl_tg_is_idle():
> is_idle = ... ||
> 	(tg->latency_target && tg->bio_cnt &&
> 	 tg->bad_bio_cnt * 5 < tg->bio_cnt);
> 
> It should be idle and allow run more bandwidth. But here the result shows not
> idle (is_idle=0). I have to do more investigation to figure it out why. 
> 

Hi Joseph,
actually this doesn't surprise me much, for this scenario I expected
exactly that blk-throttle would have considered the random-I/O group,
for most of the time,
1) non idle,
2) above the 100usec target latency, and
3) below low limit,

In fact,
1) The group can evidently issue I/O at a much higher rate than that
received, so, immediately after its last pending I/O has been served,
the group issues new I/O; in the end, it is is non idle most of the
time
2) To try to enforce the 10MB/s limit, blk-throttle necessarily makes
the group oscillate around 10MB/s, which means that the group is
frequently below limit (this would not have held only if the group had
actually received much more than 10MB/s, but it is not so)
3) For each of the 4k random I/Os of the group, the time needed by the
drive to serve that I/O is already around 40-50usec.  So, since the
group is of course not constantly in service, it is very easy that,
because of throttling, the latency of most I/Os of the group goes
beyond 100usec.

But, as it is often the case for me, I might have simply misunderstood
blk-throttle parameters, and I might be just wrong here.

Thanks,
Paolo

> You can also filter these logs using:
> grep throtl trace | grep -E 'upgrade|downgrade|is_idle'
> 
> Thanks,
> Joseph

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
  2018-04-26 18:32         ` Tejun Heo
@ 2018-05-03 16:35             ` Paolo Valente
  2018-05-03 16:35             ` Paolo Valente
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-05-03 16:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Joseph Qi, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson, LKML



> Il giorno 26 apr 2018, alle ore 20:32, Tejun Heo <tj@kernel.org> ha =
scritto:
>=20
> Hello,
>=20
> On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote:
>> +Tejun (I guess he might be interested in the results below)
>=20
> Our experiments didn't work out too well either.  At this point, it
> isn't clear whether io.low will ever leave experimental state.  We're
> trying to find a working solution.
>=20

Thanks for this update, Tejun.  I'm still working (very slowly) on a
survey of the current state of affairs in terms of bandwidth and
latency guarantees in the block layer.  The synthesis of the results
I've collected so far is, more or less:

"The problem of reaching a high throughput and, at the same time,
guaranteeing bandwidth and latency is still unsolved, apart from
simple cases, such as homogenous, constant workloads"

I'm anticipating this, because I don't want to risk to underestimate
anybody's work.  So, if anyone has examples of how, e.g., to
distribute I/O bandwidth as desired among heterogenous workloads (for
instance, random vs sequential workloads) that might fluctuate over
time, without losing total throughput, please tell me, and I'll test
them.

Thanks,
Paolo

> Thanks.
>=20
> --=20
> tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: testing io.low limit for blk-throttle
@ 2018-05-03 16:35             ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2018-05-03 16:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Joseph Qi, linux-block, Jens Axboe, Shaohua Li, Mark Brown,
	Linus Walleij, Ulf Hansson, LKML



> Il giorno 26 apr 2018, alle ore 20:32, Tejun Heo <tj@kernel.org> ha scritto:
> 
> Hello,
> 
> On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote:
>> +Tejun (I guess he might be interested in the results below)
> 
> Our experiments didn't work out too well either.  At this point, it
> isn't clear whether io.low will ever leave experimental state.  We're
> trying to find a working solution.
> 

Thanks for this update, Tejun.  I'm still working (very slowly) on a
survey of the current state of affairs in terms of bandwidth and
latency guarantees in the block layer.  The synthesis of the results
I've collected so far is, more or less:

"The problem of reaching a high throughput and, at the same time,
guaranteeing bandwidth and latency is still unsolved, apart from
simple cases, such as homogenous, constant workloads"

I'm anticipating this, because I don't want to risk to underestimate
anybody's work.  So, if anyone has examples of how, e.g., to
distribute I/O bandwidth as desired among heterogenous workloads (for
instance, random vs sequential workloads) that might fluctuate over
time, without losing total throughput, please tell me, and I'll test
them.

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-05-03 16:35 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-22  9:23 testing io.low limit for blk-throttle Paolo Valente
2018-04-22 13:29 ` jianchao.wang
2018-04-22 15:53   ` Paolo Valente
2018-04-23  2:19     ` jianchao.wang
2018-04-23  5:32       ` Paolo Valente
2018-04-23  6:35         ` jianchao.wang
2018-04-23  7:37           ` Paolo Valente
2018-04-23  8:26             ` jianchao.wang
2018-04-23  6:05 ` Joseph Qi
2018-04-23  7:35   ` Paolo Valente
2018-04-23  9:01     ` Joseph Qi
2018-04-24 12:12       ` Paolo Valente
2018-04-24 12:12         ` Paolo Valente
2018-04-25 12:13         ` Joseph Qi
2018-04-26 17:27           ` Paolo Valente
2018-04-27  3:27             ` Joseph Qi
2018-04-27  5:14               ` Paolo Valente
2018-04-27  5:14                 ` Paolo Valente
2018-04-26 18:32         ` Tejun Heo
2018-04-27  2:09           ` jianchao.wang
2018-04-27  2:40             ` Joseph Qi
2018-05-03 16:35           ` Paolo Valente
2018-05-03 16:35             ` Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.