All of lore.kernel.org
 help / color / mirror / Atom feed
* high overhead of functions blkg_*stats_* in bfq
@ 2017-10-17 10:11 Paolo Valente
  2017-10-17 12:45 ` Paolo Valente
  2017-10-18 13:19 ` Tejun Heo
  0 siblings, 2 replies; 35+ messages in thread
From: Paolo Valente @ 2017-10-17 10:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-block, Luca Miccio

Hi Tejun, all,
in our work for reducing bfq overhead, we bumped into an unexpected
fact: the functions blkg_*stats_*, invoked in bfq to update cgroups
statistics as in cfq, take about 40% of the total execution time of
bfq.  This causes an additional serious slowdown on any multicore cpu,
as most bfq functions, from which blkg_*stats_* get invoked, are
protected by a per-device scheduler lock.  To give you an idea, on an
Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
null_blk (configured with 0 latency), if the update of groups stats is
removed, then the throughput grows from 260 to 404 KIOPS.  This and
all the other results we might share in this thread can be reproduced
very easily with a (useful) script made by Luca Miccio [1].

We tried to understand the reason for this high overhead, and, in
particular, to find out whether whether there was some issue that we
could address on our own.  But the causes seem somehow substantial:
one of the most time-consuming operations needed by some blkg_*stats_*
functions is, e.g., find_next_bit, for which we don't see any trivial
replacement.

So, as a first attempt to reduce this severe slowdown, we have made a
patch that moves the invocation of blkg_*stats_* functions outside the
critical sections protected by the bfq lock.  Still, these functions
apparently need to be protected with the request_queue lock, because
the group they are invoked on may otherwise disappear before or while
these functions are executed.  Fortunately, tests run without even
this lock have shown that the serialization caused by this lock has a
little impact (5% of throughput reduction).  As for results, moving
these functions outside the bfq lock does improve throughput: it
grows, e.g., from 260 to 316 KIOPS in the above test case.  But we are
still rather far from the optimum.

Do you have any clue about possible solutions to reduce the overhead
of these functions?  If no relatively quick solution is available, we
are planning to prepare, in addition to the above patch to increase
parallelism, a further patch to give the user the possibility to
disable stats update, so as to gain a full throughput boost of up to
55% (according to the tests we have run so far on a few different
systems).

Thanks,
Paolo

[1] https://github.com/Algodev-github/IOSpeed

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-17 10:11 high overhead of functions blkg_*stats_* in bfq Paolo Valente
@ 2017-10-17 12:45 ` Paolo Valente
  2017-10-17 16:45   ` Linus Walleij
  2017-10-18 13:19 ` Tejun Heo
  1 sibling, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-10-17 12:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, Mark Brown, Ulf Hansson, Linus Walleij

+Ulf Hansson, Mark Brown, Linus Walleij

> Il giorno 17 ott 2017, alle ore 12:11, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
> Hi Tejun, all,
> in our work for reducing bfq overhead, we bumped into an unexpected
> fact: the functions blkg_*stats_*, invoked in bfq to update cgroups
> statistics as in cfq, take about 40% of the total execution time of
> bfq.  This causes an additional serious slowdown on any multicore cpu,
> as most bfq functions, from which blkg_*stats_* get invoked, are
> protected by a per-device scheduler lock.  To give you an idea, on an
> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
> null_blk (configured with 0 latency), if the update of groups stats is
> removed, then the throughput grows from 260 to 404 KIOPS.  This and
> all the other results we might share in this thread can be reproduced
> very easily with a (useful) script made by Luca Miccio [1].
>=20
> We tried to understand the reason for this high overhead, and, in
> particular, to find out whether whether there was some issue that we
> could address on our own.  But the causes seem somehow substantial:
> one of the most time-consuming operations needed by some blkg_*stats_*
> functions is, e.g., find_next_bit, for which we don't see any trivial
> replacement.
>=20
> So, as a first attempt to reduce this severe slowdown, we have made a
> patch that moves the invocation of blkg_*stats_* functions outside the
> critical sections protected by the bfq lock.  Still, these functions
> apparently need to be protected with the request_queue lock, because
> the group they are invoked on may otherwise disappear before or while
> these functions are executed.  Fortunately, tests run without even
> this lock have shown that the serialization caused by this lock has a
> little impact (5% of throughput reduction).  As for results, moving
> these functions outside the bfq lock does improve throughput: it
> grows, e.g., from 260 to 316 KIOPS in the above test case.  But we are
> still rather far from the optimum.
>=20
> Do you have any clue about possible solutions to reduce the overhead
> of these functions?  If no relatively quick solution is available, we
> are planning to prepare, in addition to the above patch to increase
> parallelism, a further patch to give the user the possibility to
> disable stats update, so as to gain a full throughput boost of up to
> 55% (according to the tests we have run so far on a few different
> systems).
>=20
> Thanks,
> Paolo
>=20
> [1] https://github.com/Algodev-github/IOSpeed

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-17 12:45 ` Paolo Valente
@ 2017-10-17 16:45   ` Linus Walleij
  2017-10-17 16:49     ` Jens Axboe
  0 siblings, 1 reply; 35+ messages in thread
From: Linus Walleij @ 2017-10-17 16:45 UTC (permalink / raw)
  To: Paolo Valente, H. Peter Anvin, Akinobu Mita, David Howells
  Cc: Tejun Heo, linux-block, Luca Miccio, Mark Brown, Ulf Hansson

On Tue, Oct 17, 2017 at 2:45 PM, Paolo Valente <paolo.valente@linaro.org> wrote:

> one of the most time-consuming operations needed by some blkg_*stats_*
> functions is, e.g., find_next_bit, for which we don't see any trivial
> replacement.

So this is one of the things that often falls down to a per-arch
assembly optimization, c.f. arch/arm/include/asm/bitops.h

On x86 I can't see any assembly optimization of this, so the
generic routines in lib/find_bit.c are used AFAICT.

This might be a silly question, but If you are testing this on x86,
do you think it would help if someone stepped in and slapped in
some optimized assembly for those functions?

(I guess that is like saying, "instead of a trivial replacement,
what about a really complicated one"?)

A simple git log arch/x86/include/asm/bitops.h doesn't show
any traces of anyone trying to optimize those for x86.

I paged in the x86 assembly people, they definately knows whether
that is a good idea or if it sucks. (And if it was done in the past.)

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-17 16:45   ` Linus Walleij
@ 2017-10-17 16:49     ` Jens Axboe
  0 siblings, 0 replies; 35+ messages in thread
From: Jens Axboe @ 2017-10-17 16:49 UTC (permalink / raw)
  To: Linus Walleij, Paolo Valente, H. Peter Anvin, Akinobu Mita,
	David Howells
  Cc: Tejun Heo, linux-block, Luca Miccio, Mark Brown, Ulf Hansson

On 10/17/2017 10:45 AM, Linus Walleij wrote:
> On Tue, Oct 17, 2017 at 2:45 PM, Paolo Valente <paolo.valente@linaro.org> wrote:
> 
>> one of the most time-consuming operations needed by some blkg_*stats_*
>> functions is, e.g., find_next_bit, for which we don't see any trivial
>> replacement.
> 
> So this is one of the things that often falls down to a per-arch
> assembly optimization, c.f. arch/arm/include/asm/bitops.h
> 
> On x86 I can't see any assembly optimization of this, so the
> generic routines in lib/find_bit.c are used AFAICT.
> 
> This might be a silly question, but If you are testing this on x86,
> do you think it would help if someone stepped in and slapped in
> some optimized assembly for those functions?
> 
> (I guess that is like saying, "instead of a trivial replacement,
> what about a really complicated one"?)
> 
> A simple git log arch/x86/include/asm/bitops.h doesn't show
> any traces of anyone trying to optimize those for x86.
> 
> I paged in the x86 assembly people, they definately knows whether
> that is a good idea or if it sucks. (And if it was done in the past.)

If the problem is as big as described, I don't think an optimized
version will matter at all. Maybe it'll make things 10% faster,
that's not solving the issue.

It's probably more likely that a better data structure should be
used, if we're spending a lot of time in find_bit. Maybe this
happens when the space is mostly full? A bitmap of bitmaps
might help for that.

But I'm just guessing here, as I haven't look into the problem.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-17 10:11 high overhead of functions blkg_*stats_* in bfq Paolo Valente
  2017-10-17 12:45 ` Paolo Valente
@ 2017-10-18 13:19 ` Tejun Heo
  2017-10-18 14:45   ` Jens Axboe
                     ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Tejun Heo @ 2017-10-18 13:19 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Luca Miccio

Hello, Paolo.

On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
...
> protected by a per-device scheduler lock.  To give you an idea, on an
> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
> null_blk (configured with 0 latency), if the update of groups stats is
> removed, then the throughput grows from 260 to 404 KIOPS.  This and
> all the other results we might share in this thread can be reproduced
> very easily with a (useful) script made by Luca Miccio [1].

I don't think the old request_queue is ever built for multiple CPUs
hitting on a mem-backed device.

> We tried to understand the reason for this high overhead, and, in
> particular, to find out whether whether there was some issue that we
> could address on our own.  But the causes seem somehow substantial:
> one of the most time-consuming operations needed by some blkg_*stats_*
> functions is, e.g., find_next_bit, for which we don't see any trivial
> replacement.

Can you point to the specific ones?  I can't find find_next_bit usages
in generic blkg code.

> So, as a first attempt to reduce this severe slowdown, we have made a
> patch that moves the invocation of blkg_*stats_* functions outside the
> critical sections protected by the bfq lock.  Still, these functions
> apparently need to be protected with the request_queue lock, because

blkgs are already protected with RCU, so RCU protection should be
enough.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-18 13:19 ` Tejun Heo
@ 2017-10-18 14:45   ` Jens Axboe
  2017-10-18 15:05     ` Paolo Valente
       [not found]   ` <2B52CB68-213C-470F-945C-0ADFF9AA7A66@linaro.org>
  2017-11-05  7:39   ` Paolo Valente
  2 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2017-10-18 14:45 UTC (permalink / raw)
  To: Tejun Heo, Paolo Valente; +Cc: linux-block, Luca Miccio

On 10/18/2017 07:19 AM, Tejun Heo wrote:
>> We tried to understand the reason for this high overhead, and, in
>> particular, to find out whether whether there was some issue that we
>> could address on our own.  But the causes seem somehow substantial:
>> one of the most time-consuming operations needed by some blkg_*stats_*
>> functions is, e.g., find_next_bit, for which we don't see any trivial
>> replacement.
> 
> Can you point to the specific ones?  I can't find find_next_bit usages
> in generic blkg code.

Yeah, in general a report like this is pretty much useless without
any sort of call traces or perf output. The best way to get help
is to post exactly what to run to reproduce the performance issue,
and profile output that shows/highlights the issues.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-18 14:45   ` Jens Axboe
@ 2017-10-18 15:05     ` Paolo Valente
  2017-10-18 15:44       ` Jens Axboe
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-10-18 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Luca Miccio


> Il giorno 18 ott 2017, alle ore 16:45, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 10/18/2017 07:19 AM, Tejun Heo wrote:
>>> We tried to understand the reason for this high overhead, and, in
>>> particular, to find out whether whether there was some issue that we
>>> could address on our own.  But the causes seem somehow substantial:
>>> one of the most time-consuming operations needed by some =
blkg_*stats_*
>>> functions is, e.g., find_next_bit, for which we don't see any =
trivial
>>> replacement.
>>=20
>> Can you point to the specific ones?  I can't find find_next_bit =
usages
>> in generic blkg code.
>=20
> Yeah, in general a report like this is pretty much useless without
> any sort of call traces or perf output. The best way to get help
> is to post exactly what to run to reproduce the performance issue,
> and profile output that shows/highlights the issues.
>=20

Yes, sorry.  To be very brief, I just provided a link to the script
with which one can immediately reproduce the issue.

I hope the information I have now provided in my reply to Tejun are
enough.

Thanks,
Paolo

> --=20
> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
       [not found]   ` <2B52CB68-213C-470F-945C-0ADFF9AA7A66@linaro.org>
@ 2017-10-18 15:08     ` Paolo Valente
  2017-10-18 15:40       ` Paolo Valente
       [not found]       ` <D6586934-DF02-4102-8839-8912DFA86BB0@linaro.org>
  0 siblings, 2 replies; 35+ messages in thread
From: Paolo Valente @ 2017-10-18 15:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, hpa, akinobu.mita, dhowells,
	Mark Brown, Ulf Hansson, Linus Walleij

Adding again Ulf, Linus and all the others, because Tejun replied to my =
initial email, which did not include them yet as recipients.

> Il giorno 18 ott 2017, alle ore 17:02, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
>>=20
>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha =
scritto:
>>=20
>> Hello, Paolo.
>>=20
>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>> ...
>>> protected by a per-device scheduler lock.  To give you an idea, on =
an
>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>> null_blk (configured with 0 latency), if the update of groups stats =
is
>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>> all the other results we might share in this thread can be =
reproduced
>>> very easily with a (useful) script made by Luca Miccio [1].
>>=20
>> I don't think the old request_queue is ever built for multiple CPUs
>> hitting on a mem-backed device.
>>=20
>>> We tried to understand the reason for this high overhead, and, in
>>> particular, to find out whether whether there was some issue that we
>>> could address on our own.  But the causes seem somehow substantial:
>>> one of the most time-consuming operations needed by some =
blkg_*stats_*
>>> functions is, e.g., find_next_bit, for which we don't see any =
trivial
>>> replacement.
>>=20
>> Can you point to the specific ones?  I can't find find_next_bit =
usages
>> in generic blkg code.
>>=20
>=20
> Yes, sorry for being too generic in the first place (fear to write too
> much).
>=20
> I have attached a flame graph (made by Luca), showing all involved
> functions.  Look, e.g., for the blkg_*stat_* functions invoked
> indirectly by bfq_dispatch_request, inside any of the worker
> processes.  As I already wrote, find_next_bit seems to be only part of
> the cost of these functions (although an important part).
>=20
> You can obtain/reproduce the information in the flame graph (on a 8
> logical-core cpu), by invoking
>=20
> perf record -g -a =E2=80=94callgraph dwarf -F 999
>=20
> and, in parallel,
>=20
> sudo ./IO_sched-speedtest.sh 20 8 bfq randread
>=20
> where IO_sched-speedtest.sh is the script I mentioned in my previous
> email [1]
>=20
> [1] https://github.com/Algodev-github/IOSpeed
>=20
>>> So, as a first attempt to reduce this severe slowdown, we have made =
a
>>> patch that moves the invocation of blkg_*stats_* functions outside =
the
>>> critical sections protected by the bfq lock.  Still, these functions
>>> apparently need to be protected with the request_queue lock, because
>>=20
>> blkgs are already protected with RCU, so RCU protection should be
>> enough.
>>=20
>=20
> blkgs are, but the blkg_stat objects passed to the blkg_*stat_*
> functions by bfq are not.  In particular, these objects are contained
> in bfq_group objects.  Anyway, as I wrote, the cost of using the
> request_queue lock seems to be a loss of about 5% of the throughput.
> So, I guess that replacing this lock with RCU protection would
> probably reduce this loss to only 2% or 3%.  I wonder whether such a
> gain would be worth the additional conceptual complexity of RCU; at
> least untill the major problem, i.e,, the apparently high cost of stat
> update, is solved somehow.
>=20
> Thanks,
> Paolo
>=20
>> Thanks.
>>=20
>> --=20
>> tejun
>=20
> <bfq-tracing-cgroup.svg>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-18 15:08     ` Paolo Valente
@ 2017-10-18 15:40       ` Paolo Valente
       [not found]       ` <D6586934-DF02-4102-8839-8912DFA86BB0@linaro.org>
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-10-18 15:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, hpa, akinobu.mita, dhowells,
	Mark Brown, Ulf Hansson, Linus Walleij


> Il giorno 18 ott 2017, alle ore 17:08, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
> Adding again Ulf, Linus and all the others, because Tejun replied to =
my initial email, which did not include them yet as recipients.
>=20
>> Il giorno 18 ott 2017, alle ore 17:02, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>>=20
>>>=20
>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha =
scritto:
>>>=20
>>> Hello, Paolo.
>>>=20
>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>> ...
>>>> protected by a per-device scheduler lock.  To give you an idea, on =
an
>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>>> null_blk (configured with 0 latency), if the update of groups stats =
is
>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>>> all the other results we might share in this thread can be =
reproduced
>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>=20
>>> I don't think the old request_queue is ever built for multiple CPUs
>>> hitting on a mem-backed device.
>>>=20
>>>> We tried to understand the reason for this high overhead, and, in
>>>> particular, to find out whether whether there was some issue that =
we
>>>> could address on our own.  But the causes seem somehow substantial:
>>>> one of the most time-consuming operations needed by some =
blkg_*stats_*
>>>> functions is, e.g., find_next_bit, for which we don't see any =
trivial
>>>> replacement.
>>>=20
>>> Can you point to the specific ones?  I can't find find_next_bit =
usages
>>> in generic blkg code.
>>>=20
>>=20
>> Yes, sorry for being too generic in the first place (fear to write =
too
>> much).
>>=20
>> I have attached a flame graph (made by Luca), showing all involved
>> functions.  Look, e.g., for the blkg_*stat_* functions invoked
>> indirectly by bfq_dispatch_request, inside any of the worker
>> processes.  As I already wrote, find_next_bit seems to be only part =
of
>> the cost of these functions (although an important part).
>>=20
>> You can obtain/reproduce the information in the flame graph (on a 8
>> logical-core cpu),

Sorry, that flame graph was actually made from a test run on an AMD
A8-3850 with 4 physical/4 logical cores, and passing 4, to the script,
for the number of fio threads.  Anyway, we got the same results on all
the different CPUs we used for our tests.

Thanks,
Polo

>> by invoking
>>=20
>> perf record -g -a =E2=80=94callgraph dwarf -F 999
>>=20
>> and, in parallel,
>>=20
>> sudo ./IO_sched-speedtest.sh 20 8 bfq randread
>>=20
>> where IO_sched-speedtest.sh is the script I mentioned in my previous
>> email [1]
>>=20
>> [1] https://github.com/Algodev-github/IOSpeed
>>=20
>>>> So, as a first attempt to reduce this severe slowdown, we have made =
a
>>>> patch that moves the invocation of blkg_*stats_* functions outside =
the
>>>> critical sections protected by the bfq lock.  Still, these =
functions
>>>> apparently need to be protected with the request_queue lock, =
because
>>>=20
>>> blkgs are already protected with RCU, so RCU protection should be
>>> enough.
>>>=20
>>=20
>> blkgs are, but the blkg_stat objects passed to the blkg_*stat_*
>> functions by bfq are not.  In particular, these objects are contained
>> in bfq_group objects.  Anyway, as I wrote, the cost of using the
>> request_queue lock seems to be a loss of about 5% of the throughput.
>> So, I guess that replacing this lock with RCU protection would
>> probably reduce this loss to only 2% or 3%.  I wonder whether such a
>> gain would be worth the additional conceptual complexity of RCU; at
>> least untill the major problem, i.e,, the apparently high cost of =
stat
>> update, is solved somehow.
>>=20
>> Thanks,
>> Paolo
>>=20
>>> Thanks.
>>>=20
>>> --=20
>>> tejun
>>=20
>> <bfq-tracing-cgroup.svg>
>=20

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-18 15:05     ` Paolo Valente
@ 2017-10-18 15:44       ` Jens Axboe
  0 siblings, 0 replies; 35+ messages in thread
From: Jens Axboe @ 2017-10-18 15:44 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Tejun Heo, linux-block, Luca Miccio

On 10/18/2017 09:05 AM, Paolo Valente wrote:
> 
>> Il giorno 18 ott 2017, alle ore 16:45, Jens Axboe <axboe@kernel.dk> ha scritto:
>>
>> On 10/18/2017 07:19 AM, Tejun Heo wrote:
>>>> We tried to understand the reason for this high overhead, and, in
>>>> particular, to find out whether whether there was some issue that we
>>>> could address on our own.  But the causes seem somehow substantial:
>>>> one of the most time-consuming operations needed by some blkg_*stats_*
>>>> functions is, e.g., find_next_bit, for which we don't see any trivial
>>>> replacement.
>>>
>>> Can you point to the specific ones?  I can't find find_next_bit usages
>>> in generic blkg code.
>>
>> Yeah, in general a report like this is pretty much useless without
>> any sort of call traces or perf output. The best way to get help
>> is to post exactly what to run to reproduce the performance issue,
>> and profile output that shows/highlights the issues.
>>
> 
> Yes, sorry.  To be very brief, I just provided a link to the script
> with which one can immediately reproduce the issue.

Brief is about number of words, you can never include too much
actual information or data ;-)

> I hope the information I have now provided in my reply to Tejun are> enough.

The picture is no longer attached. What list was this on?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
       [not found]       ` <D6586934-DF02-4102-8839-8912DFA86BB0@linaro.org>
@ 2017-10-19  6:50         ` Paolo Valente
  2017-10-21 16:13           ` Tejun Heo
  2017-10-30  9:49           ` David Howells
  0 siblings, 2 replies; 35+ messages in thread
From: Paolo Valente @ 2017-10-19  6:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, hpa, akinobu.mita, dhowells,
	Mark Brown, Ulf Hansson, Linus Walleij


> Il giorno 19 ott 2017, alle ore 08:46, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
> ...
>>>=20
>>> blkgs are, but the blkg_stat objects passed to the blkg_*stat_*
>>> functions by bfq are not.  In particular, these objects are =
contained
>>> in bfq_group objects.

I talked partial nonsense here.  bfqg objects are in bfq, but a bfqg
may die only after the corresponding blkg has died.  So, if RCU
protection guarantees that the blkg is alive when stat-update
functions are invoked, then the RCU guarantees that bfqg is alive too.
Still, I'm scratching my head over your claim that there is such a
protection.

The blkg obtained through a blkg_lookup, in a rcu_read section, is
protected.  But, outside that section, a pointer to that blkg is not
guaranteed to be valid any longer.  Stat-update functions seem safe in
cfq and bfq, just because they are executed within locks that happen
to be taken also before destroying the blkg.  They are the
request_queue lock for cfq and the scheduler lock for bfq.  Thus, at
least the request_queue lock apparently needs to be taken around
stat-update functions in bfq, if they are moved outside the section
protected by the scheduler lock.

Thanks,
Paolo

>>>   Anyway, as I wrote, the cost of using the
>>> request_queue lock seems to be a loss of about 5% of the throughput.
>>> So, I guess that replacing this lock with RCU protection would
>>> probably reduce this loss to only 2% or 3%.  I wonder whether such a
>>> gain would be worth the additional conceptual complexity of RCU; at
>>> least untill the major problem, i.e,, the apparently high cost of =
stat
>>> update, is solved somehow.
>>>=20
>>> Thanks,
>>> Paolo
>>>=20
>>>> Thanks.
>>>>=20
>>>> --=20
>>>> tejun
>>>=20
>>> <bfq-tracing-cgroup.svg>
>>=20
>=20

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-19  6:50         ` Paolo Valente
@ 2017-10-21 16:13           ` Tejun Heo
  2017-10-22  8:25             ` Paolo Valente
  2017-10-30  9:49           ` David Howells
  1 sibling, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2017-10-21 16:13 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, Luca Miccio, hpa, akinobu.mita, dhowells,
	Mark Brown, Ulf Hansson, Linus Walleij

Hello, Paolo.

On Thu, Oct 19, 2017 at 08:50:17AM +0200, Paolo Valente wrote:
> The blkg obtained through a blkg_lookup, in a rcu_read section, is
> protected.  But, outside that section, a pointer to that blkg is not
> guaranteed to be valid any longer.  Stat-update functions seem safe in

blkg's destruction is rcu delayed.  If you have access to a blkg under
rcu, it won't get freed until the rcu read lock is released.

> cfq and bfq, just because they are executed within locks that happen
> to be taken also before destroying the blkg.  They are the
> request_queue lock for cfq and the scheduler lock for bfq.  Thus, at
> least the request_queue lock apparently needs to be taken around
> stat-update functions in bfq, if they are moved outside the section
> protected by the scheduler lock.

So, a blkg stays alive if the queue lock is held, or the cgroup and
request_queue stays alive, and won't be freed (different from being
alive) as long as RCU read lock is held.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-21 16:13           ` Tejun Heo
@ 2017-10-22  8:25             ` Paolo Valente
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-10-22  8:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, hpa, akinobu.mita, dhowells,
	Mark Brown, Ulf Hansson, Linus Walleij


> Il giorno 21 ott 2017, alle ore 18:13, Tejun Heo <tj@kernel.org> ha =
scritto:
>=20
> Hello, Paolo.
>=20
> On Thu, Oct 19, 2017 at 08:50:17AM +0200, Paolo Valente wrote:
>> The blkg obtained through a blkg_lookup, in a rcu_read section, is
>> protected.  But, outside that section, a pointer to that blkg is not
>> guaranteed to be valid any longer.  Stat-update functions seem safe =
in
>=20
> blkg's destruction is rcu delayed.  If you have access to a blkg under
> rcu, it won't get freed until the rcu read lock is released.
>=20
>> cfq and bfq, just because they are executed within locks that happen
>> to be taken also before destroying the blkg.  They are the
>> request_queue lock for cfq and the scheduler lock for bfq.  Thus, at
>> least the request_queue lock apparently needs to be taken around
>> stat-update functions in bfq, if they are moved outside the section
>> protected by the scheduler lock.
>=20
> So, a blkg stays alive if the queue lock is held, or the cgroup and
> request_queue stays alive, and won't be freed (different from being
> alive) as long as RCU read lock is held.
>=20

Ok, then I have either to protect these stats-update functions with
the queue lock, or to extend somehow rcu protection to the bfq groups
referred by these functions.  According to our tests with different
processors, the gain of the latter solution is probably around 2 or 3%
of IOPS boost, with respect to using just the queue lock.  It would be
a little gain, against the current rather high overhead of the
functions to protect.  And it would cost additional code complexity.
So I'm oriented towards using just the queue lock.

Thank you very much for helping me with these doubts,
Paolo

> Thanks.
>=20
> --=20
> tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-19  6:50         ` Paolo Valente
  2017-10-21 16:13           ` Tejun Heo
@ 2017-10-30  9:49           ` David Howells
  1 sibling, 0 replies; 35+ messages in thread
From: David Howells @ 2017-10-30  9:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dhowells, Paolo Valente, linux-block, Luca Miccio, hpa,
	akinobu.mita, Mark Brown, Ulf Hansson, Linus Walleij

Tejun Heo <tj@kernel.org> wrote:

> > The blkg obtained through a blkg_lookup, in a rcu_read section, is
> > protected.  But, outside that section, a pointer to that blkg is not
> > guaranteed to be valid any longer.  Stat-update functions seem safe in
> 
> blkg's destruction is rcu delayed.  If you have access to a blkg under
> rcu, it won't get freed until the rcu read lock is released.

You also have to remember that if you take the RCU read lock just for this,
there is potentially a cost to be paid later as you may cause extra grace
period handling.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-10-18 13:19 ` Tejun Heo
  2017-10-18 14:45   ` Jens Axboe
       [not found]   ` <2B52CB68-213C-470F-945C-0ADFF9AA7A66@linaro.org>
@ 2017-11-05  7:39   ` Paolo Valente
  2017-11-06  2:21     ` Jens Axboe
  2 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-05  7:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-block, Luca Miccio, Ulf Hansson, Linus Walleij, Mark Brown


> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha =
scritto:
>=20
> Hello, Paolo.
>=20
> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
> ...
>> protected by a per-device scheduler lock.  To give you an idea, on an
>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>> null_blk (configured with 0 latency), if the update of groups stats =
is
>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>> all the other results we might share in this thread can be reproduced
>> very easily with a (useful) script made by Luca Miccio [1].
>=20
> I don't think the old request_queue is ever built for multiple CPUs
> hitting on a mem-backed device.
>=20

Hi,
from our measurements, the code and the comments received so far in
this thread, I guess that reducing the execution time of blkg_*stats_*
functions is not an easy task, and is unlikely to be accomplished in
the short term.  In this respect, we have unfortunately found out that
executing these functions causes a very high reduction of the
sustainable throughput on some CPUs.  For example, -70% on an ARM
CortexTM-A53 Octa-core.

Thus, to deal with such a considerable slowdown, until the overhead of
these functions gets reduced, it may make more sense to switch the
update of these statistics off, in all cases where these statistics
are not used, while higher performance (or lower power consumption) is
welcome/needed.

We wondered, however, how hazardous it might be to switch the update
of these statistics off.  To answer this question, we investigated the
extent at which these statistics are used by applications and
services.  Mainly, we tried to survey relevant people or
forums/mailing lists for involved communities: Linux distributions,
systemd, containers and other minor communities.  Nobody reported any
application or service using these statistics (either the variant
updated by bfq, or that updated by cfq).

So, one of the patches we are working on gives the user the
possibility to disable the update of these statistics online.

Thanks,
Paolo

>> We tried to understand the reason for this high overhead, and, in
>> particular, to find out whether whether there was some issue that we
>> could address on our own.  But the causes seem somehow substantial:
>> one of the most time-consuming operations needed by some =
blkg_*stats_*
>> functions is, e.g., find_next_bit, for which we don't see any trivial
>> replacement.
>=20
> Can you point to the specific ones?  I can't find find_next_bit usages
> in generic blkg code.
>=20
>> So, as a first attempt to reduce this severe slowdown, we have made a
>> patch that moves the invocation of blkg_*stats_* functions outside =
the
>> critical sections protected by the bfq lock.  Still, these functions
>> apparently need to be protected with the request_queue lock, because
>=20
> blkgs are already protected with RCU, so RCU protection should be
> enough.
>=20
> Thanks.
>=20
> --=20
> tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-05  7:39   ` Paolo Valente
@ 2017-11-06  2:21     ` Jens Axboe
  2017-11-06  9:22       ` Ulf Hansson
                         ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Jens Axboe @ 2017-11-06  2:21 UTC (permalink / raw)
  To: Paolo Valente, Tejun Heo
  Cc: linux-block, Luca Miccio, Ulf Hansson, Linus Walleij, Mark Brown

On 11/05/2017 01:39 AM, Paolo Valente wrote:
> 
>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha scritto:
>>
>> Hello, Paolo.
>>
>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>> ...
>>> protected by a per-device scheduler lock.  To give you an idea, on an
>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>> null_blk (configured with 0 latency), if the update of groups stats is
>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>> all the other results we might share in this thread can be reproduced
>>> very easily with a (useful) script made by Luca Miccio [1].
>>
>> I don't think the old request_queue is ever built for multiple CPUs
>> hitting on a mem-backed device.
>>
> 
> Hi,
> from our measurements, the code and the comments received so far in
> this thread, I guess that reducing the execution time of blkg_*stats_*
> functions is not an easy task, and is unlikely to be accomplished in
> the short term.  In this respect, we have unfortunately found out that
> executing these functions causes a very high reduction of the
> sustainable throughput on some CPUs.  For example, -70% on an ARM
> CortexTM-A53 Octa-core.
> 
> Thus, to deal with such a considerable slowdown, until the overhead of
> these functions gets reduced, it may make more sense to switch the
> update of these statistics off, in all cases where these statistics
> are not used, while higher performance (or lower power consumption) is
> welcome/needed.
> 
> We wondered, however, how hazardous it might be to switch the update
> of these statistics off.  To answer this question, we investigated the
> extent at which these statistics are used by applications and
> services.  Mainly, we tried to survey relevant people or
> forums/mailing lists for involved communities: Linux distributions,
> systemd, containers and other minor communities.  Nobody reported any
> application or service using these statistics (either the variant
> updated by bfq, or that updated by cfq).
> 
> So, one of the patches we are working on gives the user the
> possibility to disable the update of these statistics online.

If you want help with this, provide an easy way to reproduce this,
and/or some decent profiling output. There was one flamegraph posted,
but that was basically useless. Just do:

perf record -g -- whatever test
perf report -g --no-children

and post the top 10 entries from the perf report.

It's pointless to give up on this so soon, when no effort has apparently
been dedicated to figuring out what the actual issue is yet. So no, no
patch that will just disable the stats is going to be accepted.

That said, I have no idea who uses these stats. Surely someone can
answer that question. Tejun?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06  2:21     ` Jens Axboe
@ 2017-11-06  9:22       ` Ulf Hansson
  2017-11-06  9:49         ` Paolo Valente
  2017-11-06 15:00       ` Tejun Heo
  2017-11-06 18:46       ` Paolo Valente
  2 siblings, 1 reply; 35+ messages in thread
From: Ulf Hansson @ 2017-11-06  9:22 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo
  Cc: Paolo Valente, linux-block, Luca Miccio, Linus Walleij, Mark Brown

On 6 November 2017 at 03:21, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/05/2017 01:39 AM, Paolo Valente wrote:
>>
>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha scritto:
>>>
>>> Hello, Paolo.
>>>
>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>> ...
>>>> protected by a per-device scheduler lock.  To give you an idea, on an
>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>>> null_blk (configured with 0 latency), if the update of groups stats is
>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>>> all the other results we might share in this thread can be reproduced
>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>
>>> I don't think the old request_queue is ever built for multiple CPUs
>>> hitting on a mem-backed device.
>>>
>>
>> Hi,
>> from our measurements, the code and the comments received so far in
>> this thread, I guess that reducing the execution time of blkg_*stats_*
>> functions is not an easy task, and is unlikely to be accomplished in
>> the short term.  In this respect, we have unfortunately found out that
>> executing these functions causes a very high reduction of the
>> sustainable throughput on some CPUs.  For example, -70% on an ARM
>> CortexTM-A53 Octa-core.
>>
>> Thus, to deal with such a considerable slowdown, until the overhead of
>> these functions gets reduced, it may make more sense to switch the
>> update of these statistics off, in all cases where these statistics
>> are not used, while higher performance (or lower power consumption) is
>> welcome/needed.
>>
>> We wondered, however, how hazardous it might be to switch the update
>> of these statistics off.  To answer this question, we investigated the
>> extent at which these statistics are used by applications and
>> services.  Mainly, we tried to survey relevant people or
>> forums/mailing lists for involved communities: Linux distributions,
>> systemd, containers and other minor communities.  Nobody reported any
>> application or service using these statistics (either the variant
>> updated by bfq, or that updated by cfq).
>>
>> So, one of the patches we are working on gives the user the
>> possibility to disable the update of these statistics online.
>
> If you want help with this, provide an easy way to reproduce this,
> and/or some decent profiling output. There was one flamegraph posted,
> but that was basically useless. Just do:
>
> perf record -g -- whatever test
> perf report -g --no-children
>
> and post the top 10 entries from the perf report.
>
> It's pointless to give up on this so soon, when no effort has apparently
> been dedicated to figuring out what the actual issue is yet. So no, no
> patch that will just disable the stats is going to be accepted.
>
> That said, I have no idea who uses these stats. Surely someone can
> answer that question. Tejun?

Jens, Tejun, apologize for side-tracking the discussion.

It sounds to me that these stats should have been put into debugfs,
rather than sysfs from the beginning.

Perhaps we could consider moving them to debugfs for the
mq-schedulers, as those are still rather new?

Of course that doesn't solve the high overhead with stat computation,
which seems very reasonable to investigate further, no matter what.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06  9:22       ` Ulf Hansson
@ 2017-11-06  9:49         ` Paolo Valente
  2017-11-06 10:48           ` Ulf Hansson
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06  9:49 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Jens Axboe, Tejun Heo, linux-block, Luca Miccio, Linus Walleij,
	Mark Brown, akinobu.mita, hpa, dhowells


> Il giorno 06 nov 2017, alle ore 10:22, Ulf Hansson =
<ulf.hansson@linaro.org> ha scritto:
>=20
> On 6 November 2017 at 03:21, Jens Axboe <axboe@kernel.dk> wrote:
>> On 11/05/2017 01:39 AM, Paolo Valente wrote:
>>>=20
>>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha =
scritto:
>>>>=20
>>>> Hello, Paolo.
>>>>=20
>>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>>> ...
>>>>> protected by a per-device scheduler lock.  To give you an idea, on =
an
>>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel =
on
>>>>> null_blk (configured with 0 latency), if the update of groups =
stats is
>>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This =
and
>>>>> all the other results we might share in this thread can be =
reproduced
>>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>>=20
>>>> I don't think the old request_queue is ever built for multiple CPUs
>>>> hitting on a mem-backed device.
>>>>=20
>>>=20
>>> Hi,
>>> from our measurements, the code and the comments received so far in
>>> this thread, I guess that reducing the execution time of =
blkg_*stats_*
>>> functions is not an easy task, and is unlikely to be accomplished in
>>> the short term.  In this respect, we have unfortunately found out =
that
>>> executing these functions causes a very high reduction of the
>>> sustainable throughput on some CPUs.  For example, -70% on an ARM
>>> CortexTM-A53 Octa-core.
>>>=20
>>> Thus, to deal with such a considerable slowdown, until the overhead =
of
>>> these functions gets reduced, it may make more sense to switch the
>>> update of these statistics off, in all cases where these statistics
>>> are not used, while higher performance (or lower power consumption) =
is
>>> welcome/needed.
>>>=20
>>> We wondered, however, how hazardous it might be to switch the update
>>> of these statistics off.  To answer this question, we investigated =
the
>>> extent at which these statistics are used by applications and
>>> services.  Mainly, we tried to survey relevant people or
>>> forums/mailing lists for involved communities: Linux distributions,
>>> systemd, containers and other minor communities.  Nobody reported =
any
>>> application or service using these statistics (either the variant
>>> updated by bfq, or that updated by cfq).
>>>=20
>>> So, one of the patches we are working on gives the user the
>>> possibility to disable the update of these statistics online.
>>=20
>> If you want help with this, provide an easy way to reproduce this,
>> and/or some decent profiling output. There was one flamegraph posted,
>> but that was basically useless. Just do:
>>=20
>> perf record -g -- whatever test
>> perf report -g --no-children
>>=20
>> and post the top 10 entries from the perf report.
>>=20
>> It's pointless to give up on this so soon, when no effort has =
apparently
>> been dedicated to figuring out what the actual issue is yet. So no, =
no
>> patch that will just disable the stats is going to be accepted.
>>=20
>> That said, I have no idea who uses these stats. Surely someone can
>> answer that question. Tejun?
>=20
> Jens, Tejun, apologize for side-tracking the discussion.
>=20
> It sounds to me that these stats should have been put into debugfs,
> rather than sysfs from the beginning.
>=20

Ulf,
let me just add a bit of info, if useful: four of those stat files are
explicitly meant for debugging (as per the documentation), and created
if CONFIG_DEBUG_BLK_CGROUP=3Dy.

Paolo


> Perhaps we could consider moving them to debugfs for the
> mq-schedulers, as those are still rather new?
>=20
> Of course that doesn't solve the high overhead with stat computation,
> which seems very reasonable to investigate further, no matter what.
>=20
> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06  9:49         ` Paolo Valente
@ 2017-11-06 10:48           ` Ulf Hansson
  2017-11-06 11:20             ` Paolo Valente
  0 siblings, 1 reply; 35+ messages in thread
From: Ulf Hansson @ 2017-11-06 10:48 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, Tejun Heo, linux-block, Luca Miccio, Linus Walleij,
	Mark Brown, Akinobu Mita, hpa, David Howells

On 6 November 2017 at 10:49, Paolo Valente <paolo.valente@linaro.org> wrote:
>
>> Il giorno 06 nov 2017, alle ore 10:22, Ulf Hansson <ulf.hansson@linaro.org> ha scritto:
>>
>> On 6 November 2017 at 03:21, Jens Axboe <axboe@kernel.dk> wrote:
>>> On 11/05/2017 01:39 AM, Paolo Valente wrote:
>>>>
>>>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha scritto:
>>>>>
>>>>> Hello, Paolo.
>>>>>
>>>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>>>> ...
>>>>>> protected by a per-device scheduler lock.  To give you an idea, on an
>>>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>>>>> null_blk (configured with 0 latency), if the update of groups stats is
>>>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>>>>> all the other results we might share in this thread can be reproduced
>>>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>>>
>>>>> I don't think the old request_queue is ever built for multiple CPUs
>>>>> hitting on a mem-backed device.
>>>>>
>>>>
>>>> Hi,
>>>> from our measurements, the code and the comments received so far in
>>>> this thread, I guess that reducing the execution time of blkg_*stats_*
>>>> functions is not an easy task, and is unlikely to be accomplished in
>>>> the short term.  In this respect, we have unfortunately found out that
>>>> executing these functions causes a very high reduction of the
>>>> sustainable throughput on some CPUs.  For example, -70% on an ARM
>>>> CortexTM-A53 Octa-core.
>>>>
>>>> Thus, to deal with such a considerable slowdown, until the overhead of
>>>> these functions gets reduced, it may make more sense to switch the
>>>> update of these statistics off, in all cases where these statistics
>>>> are not used, while higher performance (or lower power consumption) is
>>>> welcome/needed.
>>>>
>>>> We wondered, however, how hazardous it might be to switch the update
>>>> of these statistics off.  To answer this question, we investigated the
>>>> extent at which these statistics are used by applications and
>>>> services.  Mainly, we tried to survey relevant people or
>>>> forums/mailing lists for involved communities: Linux distributions,
>>>> systemd, containers and other minor communities.  Nobody reported any
>>>> application or service using these statistics (either the variant
>>>> updated by bfq, or that updated by cfq).
>>>>
>>>> So, one of the patches we are working on gives the user the
>>>> possibility to disable the update of these statistics online.
>>>
>>> If you want help with this, provide an easy way to reproduce this,
>>> and/or some decent profiling output. There was one flamegraph posted,
>>> but that was basically useless. Just do:
>>>
>>> perf record -g -- whatever test
>>> perf report -g --no-children
>>>
>>> and post the top 10 entries from the perf report.
>>>
>>> It's pointless to give up on this so soon, when no effort has apparently
>>> been dedicated to figuring out what the actual issue is yet. So no, no
>>> patch that will just disable the stats is going to be accepted.
>>>
>>> That said, I have no idea who uses these stats. Surely someone can
>>> answer that question. Tejun?
>>
>> Jens, Tejun, apologize for side-tracking the discussion.
>>
>> It sounds to me that these stats should have been put into debugfs,
>> rather than sysfs from the beginning.
>>
>
> Ulf,
> let me just add a bit of info, if useful: four of those stat files are
> explicitly meant for debugging (as per the documentation), and created
> if CONFIG_DEBUG_BLK_CGROUP=y.
>
> Paolo

Right, so it's a mixture of debugfs/sysfs then.

In the BFQ case, it seems like CONFIG_DEBUG_BLK_CGROUP isn't checked.
I assume that should be changed, which would remove at least some of
the computation overhead when when this Kconfig is unset.

Perhaps one may even consider moving all stats for BFQ within that
Kconfig (and for other mq-schedulers if those ever intends to
implement support for the stats).

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 10:48           ` Ulf Hansson
@ 2017-11-06 11:20             ` Paolo Valente
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 11:20 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Jens Axboe, Tejun Heo, linux-block, Luca Miccio, Linus Walleij,
	Mark Brown, Akinobu Mita, hpa, David Howells


> Il giorno 06 nov 2017, alle ore 11:48, Ulf Hansson =
<ulf.hansson@linaro.org> ha scritto:
>=20
> On 6 November 2017 at 10:49, Paolo Valente <paolo.valente@linaro.org> =
wrote:
>>=20
>>> Il giorno 06 nov 2017, alle ore 10:22, Ulf Hansson =
<ulf.hansson@linaro.org> ha scritto:
>>>=20
>>> On 6 November 2017 at 03:21, Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 11/05/2017 01:39 AM, Paolo Valente wrote:
>>>>>=20
>>>>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> =
ha scritto:
>>>>>>=20
>>>>>> Hello, Paolo.
>>>>>>=20
>>>>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>>>>> ...
>>>>>>> protected by a per-device scheduler lock.  To give you an idea, =
on an
>>>>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel =
on
>>>>>>> null_blk (configured with 0 latency), if the update of groups =
stats is
>>>>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This =
and
>>>>>>> all the other results we might share in this thread can be =
reproduced
>>>>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>>>>=20
>>>>>> I don't think the old request_queue is ever built for multiple =
CPUs
>>>>>> hitting on a mem-backed device.
>>>>>>=20
>>>>>=20
>>>>> Hi,
>>>>> from our measurements, the code and the comments received so far =
in
>>>>> this thread, I guess that reducing the execution time of =
blkg_*stats_*
>>>>> functions is not an easy task, and is unlikely to be accomplished =
in
>>>>> the short term.  In this respect, we have unfortunately found out =
that
>>>>> executing these functions causes a very high reduction of the
>>>>> sustainable throughput on some CPUs.  For example, -70% on an ARM
>>>>> CortexTM-A53 Octa-core.
>>>>>=20
>>>>> Thus, to deal with such a considerable slowdown, until the =
overhead of
>>>>> these functions gets reduced, it may make more sense to switch the
>>>>> update of these statistics off, in all cases where these =
statistics
>>>>> are not used, while higher performance (or lower power =
consumption) is
>>>>> welcome/needed.
>>>>>=20
>>>>> We wondered, however, how hazardous it might be to switch the =
update
>>>>> of these statistics off.  To answer this question, we investigated =
the
>>>>> extent at which these statistics are used by applications and
>>>>> services.  Mainly, we tried to survey relevant people or
>>>>> forums/mailing lists for involved communities: Linux =
distributions,
>>>>> systemd, containers and other minor communities.  Nobody reported =
any
>>>>> application or service using these statistics (either the variant
>>>>> updated by bfq, or that updated by cfq).
>>>>>=20
>>>>> So, one of the patches we are working on gives the user the
>>>>> possibility to disable the update of these statistics online.
>>>>=20
>>>> If you want help with this, provide an easy way to reproduce this,
>>>> and/or some decent profiling output. There was one flamegraph =
posted,
>>>> but that was basically useless. Just do:
>>>>=20
>>>> perf record -g -- whatever test
>>>> perf report -g --no-children
>>>>=20
>>>> and post the top 10 entries from the perf report.
>>>>=20
>>>> It's pointless to give up on this so soon, when no effort has =
apparently
>>>> been dedicated to figuring out what the actual issue is yet. So no, =
no
>>>> patch that will just disable the stats is going to be accepted.
>>>>=20
>>>> That said, I have no idea who uses these stats. Surely someone can
>>>> answer that question. Tejun?
>>>=20
>>> Jens, Tejun, apologize for side-tracking the discussion.
>>>=20
>>> It sounds to me that these stats should have been put into debugfs,
>>> rather than sysfs from the beginning.
>>>=20
>>=20
>> Ulf,
>> let me just add a bit of info, if useful: four of those stat files =
are
>> explicitly meant for debugging (as per the documentation), and =
created
>> if CONFIG_DEBUG_BLK_CGROUP=3Dy.
>>=20
>> Paolo
>=20
> Right, so it's a mixture of debugfs/sysfs then.
>=20
> In the BFQ case, it seems like CONFIG_DEBUG_BLK_CGROUP isn't checked.
> I assume that should be changed,

Yes, it's in the patch series that we would like to propose.

> which would remove at least some of
> the computation overhead when when this Kconfig is unset.
>=20

Yes. My concern is the following.

If, as our one-month survey apparently confirms, there is no code
using these bfq stats, in particular the DEBUG_BLK_CGROUP ones,
then it sounds unfair to make a bfq user suffer from up to 70% loss
of sustainable throughput, just because, for reasons that the
user may even ignore, the 'wrong' options are set for his/her system =
(for
example, because DEBUG_BLK_CGROUP stats happen or happened to be used
for cfq in that system, and in legacy blk one can't even tell the
difference between switching on or off the update of those stats).

Of course, this is just a point of view, and hold only until those
update functions get possibly optimized.

Thanks,
Paolo

> Perhaps one may even consider moving all stats for BFQ within that
> Kconfig (and for other mq-schedulers if those ever intends to
> implement support for the stats).
>=20
> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06  2:21     ` Jens Axboe
  2017-11-06  9:22       ` Ulf Hansson
@ 2017-11-06 15:00       ` Tejun Heo
  2017-11-06 15:47         ` Paolo Valente
  2017-11-06 16:03         ` Jens Axboe
  2017-11-06 18:46       ` Paolo Valente
  2 siblings, 2 replies; 35+ messages in thread
From: Tejun Heo @ 2017-11-06 15:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Paolo Valente, linux-block, Luca Miccio, Ulf Hansson,
	Linus Walleij, Mark Brown

Hello,

On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
> It's pointless to give up on this so soon, when no effort has apparently
> been dedicated to figuring out what the actual issue is yet. So no, no
> patch that will just disable the stats is going to be accepted.
> 
> That said, I have no idea who uses these stats. Surely someone can
> answer that question. Tejun?

Except for the basic bytes / ios counts, it's all debug fluff, which
should have been hidden behind a debug boot param or go under debugfs.
I'm not sure we can get rid of them at this point for cfq but I don't
see why we'd have them for bfq.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 15:00       ` Tejun Heo
@ 2017-11-06 15:47         ` Paolo Valente
  2017-11-06 16:11           ` Paolo Valente
  2017-11-06 16:03         ` Jens Axboe
  1 sibling, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 15:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> ha =
scritto:
>=20
> Hello,
>=20
> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>> It's pointless to give up on this so soon, when no effort has =
apparently
>> been dedicated to figuring out what the actual issue is yet. So no, =
no
>> patch that will just disable the stats is going to be accepted.
>>=20
>> That said, I have no idea who uses these stats. Surely someone can
>> answer that question. Tejun?
>=20
> Except for the basic bytes / ios counts, it's all debug fluff, which
> should have been hidden behind a debug boot param or go under debugfs.
> I'm not sure we can get rid of them at this point for cfq but I don't
> see why we'd have them for bfq.
>=20

Ok, then I think this is the right time to ask you what I can throw
way for bfq.  According to your documentation, basic non-debug stats
should be:

blkio.time
blkio.time_recursive
blkio.sectors
blkio.io_service_bytes
blkio.io_service_bytes_recursive
blkio.io_serviced
blkio.io_serviced_recursive
blkio.io_service_time
blkio.io_service_time_recursive
blkio.io_wait_time
blkio.io_wait_time_recursive
blkio.io_merged
blkio.io_merged_recursive
blkio.io_queued
blkio.io_queued_recursive

So, I have to keep them in bfq, right?  Or is there something I can
remove for bfq, in your opinion?  Of course, I would be very happy to =
remove stuff!

On the other end, here are the debugging stats I have borrowed
from cfq:

blkio.avg_queue_size
blkio.dequeue
blkio.empty_time
blkio.group_wait_time
blkio.idle_time
blkio.io_wait_time
blkio.io_wait_time_recursive

These will be hidden behind the CONFIG_DEBUG_BLK_CGROUP option by one of =
the patches we are about to submit. Or do you want me to remove these =
debugging stats completely? ok for me, just, if these parameters are =
removed, then I guess the documentation of cgroups-v1 will have to be =
somehow updated once the interface of bfq and cfq is unified.

Thanks,
Paolo

> Thanks.
>=20
> --=20
> tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 15:00       ` Tejun Heo
  2017-11-06 15:47         ` Paolo Valente
@ 2017-11-06 16:03         ` Jens Axboe
  2017-11-06 16:10           ` Paolo Valente
  1 sibling, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2017-11-06 16:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Paolo Valente, linux-block, Luca Miccio, Ulf Hansson,
	Linus Walleij, Mark Brown

On 11/06/2017 08:00 AM, Tejun Heo wrote:
> Hello,
> 
> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>> It's pointless to give up on this so soon, when no effort has apparently
>> been dedicated to figuring out what the actual issue is yet. So no, no
>> patch that will just disable the stats is going to be accepted.
>>
>> That said, I have no idea who uses these stats. Surely someone can
>> answer that question. Tejun?
> 
> Except for the basic bytes / ios counts, it's all debug fluff, which
> should have been hidden behind a debug boot param or go under debugfs.
> I'm not sure we can get rid of them at this point for cfq but I don't
> see why we'd have them for bfq.

If that's the case, how about having some way to then turn it on and
leave it off by default? The metrics could still be useful, and
there's value in having them the same. But if the overhead is high
AND there's no use case for it outside of debugging, it should not
be on by default for neither cfq nor bfq.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:03         ` Jens Axboe
@ 2017-11-06 16:10           ` Paolo Valente
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 16:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown


> Il giorno 06 nov 2017, alle ore 17:03, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 11/06/2017 08:00 AM, Tejun Heo wrote:
>> Hello,
>>=20
>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>> It's pointless to give up on this so soon, when no effort has =
apparently
>>> been dedicated to figuring out what the actual issue is yet. So no, =
no
>>> patch that will just disable the stats is going to be accepted.
>>>=20
>>> That said, I have no idea who uses these stats. Surely someone can
>>> answer that question. Tejun?
>>=20
>> Except for the basic bytes / ios counts, it's all debug fluff, which
>> should have been hidden behind a debug boot param or go under =
debugfs.
>> I'm not sure we can get rid of them at this point for cfq but I don't
>> see why we'd have them for bfq.
>=20
> If that's the case, how about having some way to then turn it on and
> leave it off by default? The metrics could still be useful, and
> there's value in having them the same. But if the overhead is high
> AND there's no use case for it outside of debugging, it should not
> be on by default for neither cfq nor bfq.
>=20

Yes, these were exactly my motivations and plan for bfq!  However, at
this point, I think that Tejun's answer to my detailed question about
what to keep and what not to keep may greatly help in two respects.
First, to clarify the actual use of these files, second to understand
the actual impact on performance.  In fact, as I wrote in some of my
previous emails in this thread, some functions are much heavier than
others.  Those to update debug stats seems to be the heaviest, and,
fortunately, would be already behind the CONFIG_DEBUG_BLK_CGROUP
option.

Thanks,
Paolo

> --=20
> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 15:47         ` Paolo Valente
@ 2017-11-06 16:11           ` Paolo Valente
  2017-11-06 16:13             ` Jens Axboe
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 16:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 16:47, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
>>=20
>> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> ha =
scritto:
>>=20
>> Hello,
>>=20
>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>> It's pointless to give up on this so soon, when no effort has =
apparently
>>> been dedicated to figuring out what the actual issue is yet. So no, =
no
>>> patch that will just disable the stats is going to be accepted.
>>>=20
>>> That said, I have no idea who uses these stats. Surely someone can
>>> answer that question. Tejun?
>>=20
>> Except for the basic bytes / ios counts, it's all debug fluff, which
>> should have been hidden behind a debug boot param or go under =
debugfs.
>> I'm not sure we can get rid of them at this point for cfq but I don't
>> see why we'd have them for bfq.
>>=20
>=20
> Ok, then I think this is the right time to ask you what I can throw
> way for bfq.  According to your documentation, basic non-debug stats
> should be:
>=20
> blkio.time
> blkio.time_recursive
> blkio.sectors
> blkio.io_service_bytes
> blkio.io_service_bytes_recursive
> blkio.io_serviced
> blkio.io_serviced_recursive
> blkio.io_service_time
> blkio.io_service_time_recursive
> blkio.io_wait_time
> blkio.io_wait_time_recursive
> blkio.io_merged
> blkio.io_merged_recursive
> blkio.io_queued
> blkio.io_queued_recursive
>=20
> So, I have to keep them in bfq, right?  Or is there something I can
> remove for bfq, in your opinion?  Of course, I would be very happy to =
remove stuff!
>=20

Or, more mildly, could some of these stats be moved behind
CONFIG_DEBUG_BLK_CGROUP?

Thanks,
Paolo

> On the other end, here are the debugging stats I have borrowed
> from cfq:
>=20
> blkio.avg_queue_size
> blkio.dequeue
> blkio.empty_time
> blkio.group_wait_time
> blkio.idle_time
> blkio.io_wait_time
> blkio.io_wait_time_recursive
>=20
> These will be hidden behind the CONFIG_DEBUG_BLK_CGROUP option by one =
of the patches we are about to submit. Or do you want me to remove these =
debugging stats completely? ok for me, just, if these parameters are =
removed, then I guess the documentation of cgroups-v1 will have to be =
somehow updated once the interface of bfq and cfq is unified.
>=20
> Thanks,
> Paolo
>=20
>> Thanks.
>>=20
>> --=20
>> tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:11           ` Paolo Valente
@ 2017-11-06 16:13             ` Jens Axboe
  2017-11-06 16:21               ` Paolo Valente
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2017-11-06 16:13 UTC (permalink / raw)
  To: Paolo Valente, Tejun Heo
  Cc: linux-block, Luca Miccio, Ulf Hansson, Linus Walleij, Mark Brown,
	dhowells, hpa, akinobu.mita

On 11/06/2017 09:11 AM, Paolo Valente wrote:
> 
>> Il giorno 06 nov 2017, alle ore 16:47, Paolo Valente <paolo.valente@linaro.org> ha scritto:
>>
>>>
>>> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> ha scritto:
>>>
>>> Hello,
>>>
>>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>>> It's pointless to give up on this so soon, when no effort has apparently
>>>> been dedicated to figuring out what the actual issue is yet. So no, no
>>>> patch that will just disable the stats is going to be accepted.
>>>>
>>>> That said, I have no idea who uses these stats. Surely someone can
>>>> answer that question. Tejun?
>>>
>>> Except for the basic bytes / ios counts, it's all debug fluff, which
>>> should have been hidden behind a debug boot param or go under debugfs.
>>> I'm not sure we can get rid of them at this point for cfq but I don't
>>> see why we'd have them for bfq.
>>>
>>
>> Ok, then I think this is the right time to ask you what I can throw
>> way for bfq.  According to your documentation, basic non-debug stats
>> should be:
>>
>> blkio.time
>> blkio.time_recursive
>> blkio.sectors
>> blkio.io_service_bytes
>> blkio.io_service_bytes_recursive
>> blkio.io_serviced
>> blkio.io_serviced_recursive
>> blkio.io_service_time
>> blkio.io_service_time_recursive
>> blkio.io_wait_time
>> blkio.io_wait_time_recursive
>> blkio.io_merged
>> blkio.io_merged_recursive
>> blkio.io_queued
>> blkio.io_queued_recursive
>>
>> So, I have to keep them in bfq, right?  Or is there something I can
>> remove for bfq, in your opinion?  Of course, I would be very happy to remove stuff!
>>
> 
> Or, more mildly, could some of these stats be moved behind
> CONFIG_DEBUG_BLK_CGROUP?

It'd be nice to have them be runtime enabled, but I don't think
that's a hard requirement at all. So if you can just move them
under DEBUG_BLK_CGROUP, that should be good enough.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:13             ` Jens Axboe
@ 2017-11-06 16:21               ` Paolo Valente
  2017-11-06 16:22                 ` Jens Axboe
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 16:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 17:13, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 11/06/2017 09:11 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 06 nov 2017, alle ore 16:47, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>>>=20
>>>>=20
>>>> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> ha =
scritto:
>>>>=20
>>>> Hello,
>>>>=20
>>>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>>>> It's pointless to give up on this so soon, when no effort has =
apparently
>>>>> been dedicated to figuring out what the actual issue is yet. So =
no, no
>>>>> patch that will just disable the stats is going to be accepted.
>>>>>=20
>>>>> That said, I have no idea who uses these stats. Surely someone can
>>>>> answer that question. Tejun?
>>>>=20
>>>> Except for the basic bytes / ios counts, it's all debug fluff, =
which
>>>> should have been hidden behind a debug boot param or go under =
debugfs.
>>>> I'm not sure we can get rid of them at this point for cfq but I =
don't
>>>> see why we'd have them for bfq.
>>>>=20
>>>=20
>>> Ok, then I think this is the right time to ask you what I can throw
>>> way for bfq.  According to your documentation, basic non-debug stats
>>> should be:
>>>=20
>>> blkio.time
>>> blkio.time_recursive
>>> blkio.sectors
>>> blkio.io_service_bytes
>>> blkio.io_service_bytes_recursive
>>> blkio.io_serviced
>>> blkio.io_serviced_recursive
>>> blkio.io_service_time
>>> blkio.io_service_time_recursive
>>> blkio.io_wait_time
>>> blkio.io_wait_time_recursive
>>> blkio.io_merged
>>> blkio.io_merged_recursive
>>> blkio.io_queued
>>> blkio.io_queued_recursive
>>>=20
>>> So, I have to keep them in bfq, right?  Or is there something I can
>>> remove for bfq, in your opinion?  Of course, I would be very happy =
to remove stuff!
>>>=20
>>=20
>> Or, more mildly, could some of these stats be moved behind
>> CONFIG_DEBUG_BLK_CGROUP?
>=20
> It'd be nice to have them be runtime enabled, but I don't think
> that's a hard requirement at all.

So, you mean *all* stats dynamically enabled/disabled, and disabled by
default.  Plus, some of them also statically filtered, that is
enabled/disabled, through CONFIG_DEBUG_BLK_CGROUP.  Is my
understanding correct?  If it is, we will be happy to do it.

Thanks,
Paolo

> So if you can just move them
> under DEBUG_BLK_CGROUP, that should be good enough.
>=20
> --=20
> Jens Axboe
>=20

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:21               ` Paolo Valente
@ 2017-11-06 16:22                 ` Jens Axboe
  2017-11-06 16:26                   ` Paolo Valente
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2017-11-06 16:22 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita

On 11/06/2017 09:21 AM, Paolo Valente wrote:
> 
>> Il giorno 06 nov 2017, alle ore 17:13, Jens Axboe <axboe@kernel.dk> ha scritto:
>>
>> On 11/06/2017 09:11 AM, Paolo Valente wrote:
>>>
>>>> Il giorno 06 nov 2017, alle ore 16:47, Paolo Valente <paolo.valente@linaro.org> ha scritto:
>>>>
>>>>>
>>>>> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> ha scritto:
>>>>>
>>>>> Hello,
>>>>>
>>>>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>>>>> It's pointless to give up on this so soon, when no effort has apparently
>>>>>> been dedicated to figuring out what the actual issue is yet. So no, no
>>>>>> patch that will just disable the stats is going to be accepted.
>>>>>>
>>>>>> That said, I have no idea who uses these stats. Surely someone can
>>>>>> answer that question. Tejun?
>>>>>
>>>>> Except for the basic bytes / ios counts, it's all debug fluff, which
>>>>> should have been hidden behind a debug boot param or go under debugfs.
>>>>> I'm not sure we can get rid of them at this point for cfq but I don't
>>>>> see why we'd have them for bfq.
>>>>>
>>>>
>>>> Ok, then I think this is the right time to ask you what I can throw
>>>> way for bfq.  According to your documentation, basic non-debug stats
>>>> should be:
>>>>
>>>> blkio.time
>>>> blkio.time_recursive
>>>> blkio.sectors
>>>> blkio.io_service_bytes
>>>> blkio.io_service_bytes_recursive
>>>> blkio.io_serviced
>>>> blkio.io_serviced_recursive
>>>> blkio.io_service_time
>>>> blkio.io_service_time_recursive
>>>> blkio.io_wait_time
>>>> blkio.io_wait_time_recursive
>>>> blkio.io_merged
>>>> blkio.io_merged_recursive
>>>> blkio.io_queued
>>>> blkio.io_queued_recursive
>>>>
>>>> So, I have to keep them in bfq, right?  Or is there something I can
>>>> remove for bfq, in your opinion?  Of course, I would be very happy to remove stuff!
>>>>
>>>
>>> Or, more mildly, could some of these stats be moved behind
>>> CONFIG_DEBUG_BLK_CGROUP?
>>
>> It'd be nice to have them be runtime enabled, but I don't think
>> that's a hard requirement at all.
> 
> So, you mean *all* stats dynamically enabled/disabled, and disabled by
> default.  Plus, some of them also statically filtered, that is
> enabled/disabled, through CONFIG_DEBUG_BLK_CGROUP.  Is my
> understanding correct?  If it is, we will be happy to do it.

If you go the enable/disable route, don't make them depend on
DEBUG_BLK_CGROUP at all. Just have them be default off, with
some way to switch them on.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:22                 ` Jens Axboe
@ 2017-11-06 16:26                   ` Paolo Valente
  2017-11-06 16:30                     ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 16:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 17:22, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 11/06/2017 09:21 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 06 nov 2017, alle ore 17:13, Jens Axboe <axboe@kernel.dk> =
ha scritto:
>>>=20
>>> On 11/06/2017 09:11 AM, Paolo Valente wrote:
>>>>=20
>>>>> Il giorno 06 nov 2017, alle ore 16:47, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>>>>>=20
>>>>>>=20
>>>>>> Il giorno 06 nov 2017, alle ore 16:00, Tejun Heo <tj@kernel.org> =
ha scritto:
>>>>>>=20
>>>>>> Hello,
>>>>>>=20
>>>>>> On Sun, Nov 05, 2017 at 07:21:01PM -0700, Jens Axboe wrote:
>>>>>>> It's pointless to give up on this so soon, when no effort has =
apparently
>>>>>>> been dedicated to figuring out what the actual issue is yet. So =
no, no
>>>>>>> patch that will just disable the stats is going to be accepted.
>>>>>>>=20
>>>>>>> That said, I have no idea who uses these stats. Surely someone =
can
>>>>>>> answer that question. Tejun?
>>>>>>=20
>>>>>> Except for the basic bytes / ios counts, it's all debug fluff, =
which
>>>>>> should have been hidden behind a debug boot param or go under =
debugfs.
>>>>>> I'm not sure we can get rid of them at this point for cfq but I =
don't
>>>>>> see why we'd have them for bfq.
>>>>>>=20
>>>>>=20
>>>>> Ok, then I think this is the right time to ask you what I can =
throw
>>>>> way for bfq.  According to your documentation, basic non-debug =
stats
>>>>> should be:
>>>>>=20
>>>>> blkio.time
>>>>> blkio.time_recursive
>>>>> blkio.sectors
>>>>> blkio.io_service_bytes
>>>>> blkio.io_service_bytes_recursive
>>>>> blkio.io_serviced
>>>>> blkio.io_serviced_recursive
>>>>> blkio.io_service_time
>>>>> blkio.io_service_time_recursive
>>>>> blkio.io_wait_time
>>>>> blkio.io_wait_time_recursive
>>>>> blkio.io_merged
>>>>> blkio.io_merged_recursive
>>>>> blkio.io_queued
>>>>> blkio.io_queued_recursive
>>>>>=20
>>>>> So, I have to keep them in bfq, right?  Or is there something I =
can
>>>>> remove for bfq, in your opinion?  Of course, I would be very happy =
to remove stuff!
>>>>>=20
>>>>=20
>>>> Or, more mildly, could some of these stats be moved behind
>>>> CONFIG_DEBUG_BLK_CGROUP?
>>>=20
>>> It'd be nice to have them be runtime enabled, but I don't think
>>> that's a hard requirement at all.
>>=20
>> So, you mean *all* stats dynamically enabled/disabled, and disabled =
by
>> default.  Plus, some of them also statically filtered, that is
>> enabled/disabled, through CONFIG_DEBUG_BLK_CGROUP.  Is my
>> understanding correct?  If it is, we will be happy to do it.
>=20
> If you go the enable/disable route, don't make them depend on
> DEBUG_BLK_CGROUP at all. Just have them be default off, with
> some way to switch them on.
>=20

Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
subset switched off even when the dynamic parameter is on.

We'll wait for possible counter-arguments from Tejun, and then start
to work on what you propose.

Thanks,
Paolo

> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:26                   ` Paolo Valente
@ 2017-11-06 16:30                     ` Tejun Heo
  2017-11-06 16:33                       ` Paolo Valente
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2017-11-06 16:30 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita

Hello,

On Mon, Nov 06, 2017 at 05:26:38PM +0100, Paolo Valente wrote:
> Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
> subset switched off even when the dynamic parameter is on.
> 
> We'll wait for possible counter-arguments from Tejun, and then start
> to work on what you propose.

So, I think the only *really* required ones are io_serviced and the
corresponding byte counts, which are tracked by blk-cgroup core
anyway.  Everything else can be hidden behind an obviously debug boot
param, say, bfq.__DEBUG__extra_stats or whatever.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:30                     ` Tejun Heo
@ 2017-11-06 16:33                       ` Paolo Valente
  2017-11-06 16:37                         ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 16:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 17:30, Tejun Heo <tj@kernel.org> ha =
scritto:
>=20
> Hello,
>=20
> On Mon, Nov 06, 2017 at 05:26:38PM +0100, Paolo Valente wrote:
>> Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
>> subset switched off even when the dynamic parameter is on.
>>=20
>> We'll wait for possible counter-arguments from Tejun, and then start
>> to work on what you propose.
>=20
> So, I think the only *really* required ones are io_serviced and the
> corresponding byte counts, which are tracked by blk-cgroup core
> anyway.  Everything else can be hidden behind an obviously debug boot
> param, say, bfq.__DEBUG__extra_stats or whatever.
>=20

ok, can we change it, or do you want us to change it, for cfq too?

Thanks,
Paolo

> Thanks.
>=20
> --=20
> tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:33                       ` Paolo Valente
@ 2017-11-06 16:37                         ` Tejun Heo
  2017-11-06 16:39                           ` Jens Axboe
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2017-11-06 16:37 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita

On Mon, Nov 06, 2017 at 05:33:45PM +0100, Paolo Valente wrote:
> 
> > Il giorno 06 nov 2017, alle ore 17:30, Tejun Heo <tj@kernel.org> ha scritto:
> > 
> > Hello,
> > 
> > On Mon, Nov 06, 2017 at 05:26:38PM +0100, Paolo Valente wrote:
> >> Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
> >> subset switched off even when the dynamic parameter is on.
> >> 
> >> We'll wait for possible counter-arguments from Tejun, and then start
> >> to work on what you propose.
> > 
> > So, I think the only *really* required ones are io_serviced and the
> > corresponding byte counts, which are tracked by blk-cgroup core
> > anyway.  Everything else can be hidden behind an obviously debug boot
> > param, say, bfq.__DEBUG__extra_stats or whatever.
> > 
> 
> ok, can we change it, or do you want us to change it, for cfq too?

I don't have a strong opinion there but it coudl be more hassle than
its worth given how long the stats have been out there.  idk.

Please also note that you can still allow the extra stats to be
enabled run-time through /sys/kernel/module moduleparams and gate them
with the static_branch.  No idea whether making it that way is
worthwhile tho.  Creating / removing the files dynamically is
supported but given that some stats need to be tracking all the time
to be correct (like # in flights), it seems kinda silly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:37                         ` Tejun Heo
@ 2017-11-06 16:39                           ` Jens Axboe
  2017-11-06 17:05                             ` Paolo Valente
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2017-11-06 16:39 UTC (permalink / raw)
  To: Tejun Heo, Paolo Valente
  Cc: linux-block, Luca Miccio, Ulf Hansson, Linus Walleij, Mark Brown,
	dhowells, hpa, akinobu.mita

On 11/06/2017 09:37 AM, Tejun Heo wrote:
> On Mon, Nov 06, 2017 at 05:33:45PM +0100, Paolo Valente wrote:
>>
>>> Il giorno 06 nov 2017, alle ore 17:30, Tejun Heo <tj@kernel.org> ha scritto:
>>>
>>> Hello,
>>>
>>> On Mon, Nov 06, 2017 at 05:26:38PM +0100, Paolo Valente wrote:
>>>> Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
>>>> subset switched off even when the dynamic parameter is on.
>>>>
>>>> We'll wait for possible counter-arguments from Tejun, and then start
>>>> to work on what you propose.
>>>
>>> So, I think the only *really* required ones are io_serviced and the
>>> corresponding byte counts, which are tracked by blk-cgroup core
>>> anyway.  Everything else can be hidden behind an obviously debug boot
>>> param, say, bfq.__DEBUG__extra_stats or whatever.
>>>
>>
>> ok, can we change it, or do you want us to change it, for cfq too?
> 
> I don't have a strong opinion there but it coudl be more hassle than
> its worth given how long the stats have been out there.  idk.
> 
> Please also note that you can still allow the extra stats to be
> enabled run-time through /sys/kernel/module moduleparams and gate them
> with the static_branch.  No idea whether making it that way is
> worthwhile tho.  Creating / removing the files dynamically is
> supported but given that some stats need to be tracking all the time
> to be correct (like # in flights), it seems kinda silly.

Yeah, I'd keep them enable-only through some boot/module parameter.
Folks using this can just reboot to turn them back off.

Bonus points for doing the same for CFQ.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06 16:39                           ` Jens Axboe
@ 2017-11-06 17:05                             ` Paolo Valente
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 17:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, dhowells, hpa, akinobu.mita


> Il giorno 06 nov 2017, alle ore 17:39, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 11/06/2017 09:37 AM, Tejun Heo wrote:
>> On Mon, Nov 06, 2017 at 05:33:45PM +0100, Paolo Valente wrote:
>>>=20
>>>> Il giorno 06 nov 2017, alle ore 17:30, Tejun Heo <tj@kernel.org> ha =
scritto:
>>>>=20
>>>> Hello,
>>>>=20
>>>> On Mon, Nov 06, 2017 at 05:26:38PM +0100, Paolo Valente wrote:
>>>>> Yes.  DEBUG_BLK_CGROUP will just serve to keep the for-debugging
>>>>> subset switched off even when the dynamic parameter is on.
>>>>>=20
>>>>> We'll wait for possible counter-arguments from Tejun, and then =
start
>>>>> to work on what you propose.
>>>>=20
>>>> So, I think the only *really* required ones are io_serviced and the
>>>> corresponding byte counts, which are tracked by blk-cgroup core
>>>> anyway.  Everything else can be hidden behind an obviously debug =
boot
>>>> param, say, bfq.__DEBUG__extra_stats or whatever.
>>>>=20
>>>=20
>>> ok, can we change it, or do you want us to change it, for cfq too?
>>=20
>> I don't have a strong opinion there but it coudl be more hassle than
>> its worth given how long the stats have been out there.  idk.
>>=20
>> Please also note that you can still allow the extra stats to be
>> enabled run-time through /sys/kernel/module moduleparams and gate =
them
>> with the static_branch.  No idea whether making it that way is
>> worthwhile tho.  Creating / removing the files dynamically is
>> supported but given that some stats need to be tracking all the time
>> to be correct (like # in flights), it seems kinda silly.
>=20
> Yeah, I'd keep them enable-only through some boot/module parameter.
> Folks using this can just reboot to turn them back off.
>=20

Ok to change our solution that way.  One question though: if this
switch must be system-wide (as I seem to understand), and not only
bfq-specific, is a per-scheduler-module parameter ok?

> Bonus points for doing the same for CFQ.
>=20

Prepare the points ... :)

I'm unsure we can finish all this by the next merge window, so, to not
miss the chance of the merge window for the other patches we have
already almost finished, in the next days we'll submit only patches
to:
1) Hide behind DEBUG_BLK_CGROUP all the parameters that Tejun said can
be hidden behind it (only for bfq, as agreed)
2) Move the execution of stat update functions outside the scheduler
lock, which does increase parallelism, and provides a boost of about
30% of sustainable throughput when stats need to be update.  Because
of the patch in item 1), this boost will provide most benefits when
DEBUG_BLK_CGROUP is set, and when also the boot/module parameter is
set, after we will have implemented that too.

Thanks,
Paolo

> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: high overhead of functions blkg_*stats_* in bfq
  2017-11-06  2:21     ` Jens Axboe
  2017-11-06  9:22       ` Ulf Hansson
  2017-11-06 15:00       ` Tejun Heo
@ 2017-11-06 18:46       ` Paolo Valente
  2 siblings, 0 replies; 35+ messages in thread
From: Paolo Valente @ 2017-11-06 18:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Luca Miccio, Ulf Hansson, Linus Walleij,
	Mark Brown, David Howells, akinobu.mita, hpa


> Il giorno 06 nov 2017, alle ore 03:21, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 11/05/2017 01:39 AM, Paolo Valente wrote:
>>=20
>>> Il giorno 18 ott 2017, alle ore 15:19, Tejun Heo <tj@kernel.org> ha =
scritto:
>>>=20
>>> Hello, Paolo.
>>>=20
>>> On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
>>> ...
>>>> protected by a per-device scheduler lock.  To give you an idea, on =
an
>>>> Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
>>>> null_blk (configured with 0 latency), if the update of groups stats =
is
>>>> removed, then the throughput grows from 260 to 404 KIOPS.  This and
>>>> all the other results we might share in this thread can be =
reproduced
>>>> very easily with a (useful) script made by Luca Miccio [1].
>>>=20
>>> I don't think the old request_queue is ever built for multiple CPUs
>>> hitting on a mem-backed device.
>>>=20
>>=20
>> Hi,
>> from our measurements, the code and the comments received so far in
>> this thread, I guess that reducing the execution time of =
blkg_*stats_*
>> functions is not an easy task, and is unlikely to be accomplished in
>> the short term.  In this respect, we have unfortunately found out =
that
>> executing these functions causes a very high reduction of the
>> sustainable throughput on some CPUs.  For example, -70% on an ARM
>> CortexTM-A53 Octa-core.
>>=20
>> Thus, to deal with such a considerable slowdown, until the overhead =
of
>> these functions gets reduced, it may make more sense to switch the
>> update of these statistics off, in all cases where these statistics
>> are not used, while higher performance (or lower power consumption) =
is
>> welcome/needed.
>>=20
>> We wondered, however, how hazardous it might be to switch the update
>> of these statistics off.  To answer this question, we investigated =
the
>> extent at which these statistics are used by applications and
>> services.  Mainly, we tried to survey relevant people or
>> forums/mailing lists for involved communities: Linux distributions,
>> systemd, containers and other minor communities.  Nobody reported any
>> application or service using these statistics (either the variant
>> updated by bfq, or that updated by cfq).
>>=20
>> So, one of the patches we are working on gives the user the
>> possibility to disable the update of these statistics online.
>=20
> If you want help with this, provide an easy way to reproduce this,
> and/or some decent profiling output. There was one flamegraph posted,
> but that was basically useless. Just do:
>=20
> perf record -g -- whatever test
> perf report -g --no-children
>=20
> and post the top 10 entries from the perf report.
>=20

Probably not much meaningful any longer, but just for completeness ...

We have already passed through what you propose, and unfortunately it
was useless for us.  In fact, on laptop or desktop processors,
- bfq total per-request overhead is about 2/3 of CPU cycles, and
distributed across about four functions;
- the portion of the above 2/3 of CPU cycles consumed by stat-update
functions is about 40%, and distributed across five further functions,
which are in their turn scattered among bfq functions;
then a list of the absolute top-consuming 10 functions will tell you
very little.

That's why we ended up using a flamegraph.  In this respect, in
addition to one of our flamegraphs, in this thread we provided the
indication of which functions to look for, plus the steps to reproduce
the data, after your request.

Anyway, if you still want the top 10 entries, we'll provide them for =
you.

Thanks,
Paolo

> It's pointless to give up on this so soon, when no effort has =
apparently
> been dedicated to figuring out what the actual issue is yet. So no, no
> patch that will just disable the stats is going to be accepted.
>=20
> That said, I have no idea who uses these stats. Surely someone can
> answer that question. Tejun?
>=20
> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-11-06 18:46 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-17 10:11 high overhead of functions blkg_*stats_* in bfq Paolo Valente
2017-10-17 12:45 ` Paolo Valente
2017-10-17 16:45   ` Linus Walleij
2017-10-17 16:49     ` Jens Axboe
2017-10-18 13:19 ` Tejun Heo
2017-10-18 14:45   ` Jens Axboe
2017-10-18 15:05     ` Paolo Valente
2017-10-18 15:44       ` Jens Axboe
     [not found]   ` <2B52CB68-213C-470F-945C-0ADFF9AA7A66@linaro.org>
2017-10-18 15:08     ` Paolo Valente
2017-10-18 15:40       ` Paolo Valente
     [not found]       ` <D6586934-DF02-4102-8839-8912DFA86BB0@linaro.org>
2017-10-19  6:50         ` Paolo Valente
2017-10-21 16:13           ` Tejun Heo
2017-10-22  8:25             ` Paolo Valente
2017-10-30  9:49           ` David Howells
2017-11-05  7:39   ` Paolo Valente
2017-11-06  2:21     ` Jens Axboe
2017-11-06  9:22       ` Ulf Hansson
2017-11-06  9:49         ` Paolo Valente
2017-11-06 10:48           ` Ulf Hansson
2017-11-06 11:20             ` Paolo Valente
2017-11-06 15:00       ` Tejun Heo
2017-11-06 15:47         ` Paolo Valente
2017-11-06 16:11           ` Paolo Valente
2017-11-06 16:13             ` Jens Axboe
2017-11-06 16:21               ` Paolo Valente
2017-11-06 16:22                 ` Jens Axboe
2017-11-06 16:26                   ` Paolo Valente
2017-11-06 16:30                     ` Tejun Heo
2017-11-06 16:33                       ` Paolo Valente
2017-11-06 16:37                         ` Tejun Heo
2017-11-06 16:39                           ` Jens Axboe
2017-11-06 17:05                             ` Paolo Valente
2017-11-06 16:03         ` Jens Axboe
2017-11-06 16:10           ` Paolo Valente
2017-11-06 18:46       ` Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.