All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Mike Galbraith <efault@gmx.de>,
	Nicolas Pitre <nicolas.pitre@linaro.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>
Subject: Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization
Date: Wed, 17 Sep 2014 18:32:23 -0700	[thread overview]
Message-ID: <CAKfTPtA_=MvAVya6JR3M+qs-kLapZgGknBqo=87MVbihCCXF9A@mail.gmail.com> (raw)
In-Reply-To: <20140917222553.GD2848@worktop.localdomain>

On 17 September 2014 15:25, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 16, 2014 at 12:14:54AM +0200, Vincent Guittot wrote:
>> On 15 September 2014 13:42, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> > OK, I've reconsidered _again_, I still don't get it.
>> >
>> > So fundamentally I think its wrong to scale with the capacity; it just
>> > doesn't make any sense. Consider big.little stuff, their CPUs are
>> > inherently asymmetric in capacity, but that doesn't matter one whit for
>> > utilization numbers. If a core is fully consumed its fully consumed, no
>> > matter how much work it can or can not do.
>> >
>> >
>> > So the only thing that needs correcting is the fact that these
>> > statistics are based on clock_task and some of that time can end up in
>> > other scheduling classes, at which point we'll never get 100% even
>> > though we're 'saturated'. But correcting for that using capacity doesn't
>> > 'work'.
>>
>> I'm not sure to catch your last point because the capacity is the only
>> figures that take into account the "time" consumed by other classes.
>> Have you got in mind another way to take into account the other
>> classes ?
>
> So that was the entire point of stuffing capacity in? Note that that
> point was not at all clear.
>
> This is very much like 'all we have is a hammer, and therefore
> everything is a nail'. The rt fraction is a 'small' part of what the
> capacity is.
>
>> So we have cpu_capacity that is the capacity that can be currently
>> used by cfs class
>> We have cfs.usage_load_avg that is the sum of running time of cfs
>> tasks on the CPU and reflect the % of usage of this CPU by CFS tasks
>> We have to use the same metrics to compare available capacity for CFS
>> and current cfs usage
>
> -ENOPARSE
>
>> Now we have to use the same unit so we can either weight the
>> cpu_capacity_orig with the cfs.usage_load_avg and compare it with
>> cpu_capacity
>> or with divide cpu_capacity by cpu_capacity_orig and scale it into the
>> SCHED_LOAD_SCALE range. Is It what you are proposing ?
>
> I'm so not getting it; orig vs capacity still includes
> arch_scale_freq_capacity(), so that is not enough to isolate the rt
> fraction.

This patch does not try to solve any scale invariance issue. This
patch removes capacity_factor because it rarely works correctly.
capacity_factor tries to compute how many tasks a group of CPUs can
handle at the time we are doing the load balance. The capacity_factor
is hardly working for SMT system: it sometimes works for big cores and
but fails to do the right thing for little cores.

Below are two examples to illustrate the problem that this patch solves:

capacity_factor makes the assumption that max capacity of a CPU is
SCHED_CAPACITY_SCALE and the load of a thread is always is
SCHED_LOAD_SCALE. It compares the output of these figures with the sum
of nr_running to decide if a group is overloaded or not.

But if the default capacity of a CPU is less than SCHED_CAPACITY_SCALE
(640 as an example), a group of 3 CPUS will have a max capacity_factor
of 2 ( div_round_closest(3x640/1024) = 2) which means that it will be
seen as overloaded if we have only one task per CPU.

Then, if the default capacity of a CPU is greater than
SCHED_CAPACITY_SCALE (1512 as an example), a group of 4 CPUs will have
a capacity_factor of 4 (at max and thanks to the fix[0] for SMT system
that prevent the apparition of ghost CPUs) but if one CPU is fully
used by a rt task (and its capacity is reduced to nearly nothing), the
capacity factor of the group will still be 4
(div_round_closest(3*1512/1024) = 5).

So, this patch tries to solve this issue by removing capacity_factor
and replacing it with the 2 following metrics :
-the available CPU capacity for CFS tasks which is the one currently
used by load_balance
-the capacity that are effectively used by CFS tasks on the CPU. For
that, i have re-introduced the usage_avg_contrib which is in the range
[0..SCHED_CPU_LOAD] whatever the capacity of the CPU on which the task
is running, is. This usage_avg_contrib doesn't solve the scaling
in-variance problem, so i have to scale the usage with original
capacity in get_cpu_utilization (that will become get_cpu_usage in the
next version) in order to compare it with available capacity.

Once the scaling invariance will have been added in usage_avg_contrib,
we can remove the scale by cpu_capacity_orig in get_cpu_utilization.
But the scaling invariance will come in another patchset.

Hope that this explanation makes the goal of this patchset clearer.
And I can add this explanation in the commit log if you found it clear
enough

Vincent

[0] https://lkml.org/lkml/2013/8/28/194

WARNING: multiple messages have this Message-ID (diff)
From: vincent.guittot@linaro.org (Vincent Guittot)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5 11/12] sched: replace capacity_factor by utilization
Date: Wed, 17 Sep 2014 18:32:23 -0700	[thread overview]
Message-ID: <CAKfTPtA_=MvAVya6JR3M+qs-kLapZgGknBqo=87MVbihCCXF9A@mail.gmail.com> (raw)
In-Reply-To: <20140917222553.GD2848@worktop.localdomain>

On 17 September 2014 15:25, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 16, 2014 at 12:14:54AM +0200, Vincent Guittot wrote:
>> On 15 September 2014 13:42, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> > OK, I've reconsidered _again_, I still don't get it.
>> >
>> > So fundamentally I think its wrong to scale with the capacity; it just
>> > doesn't make any sense. Consider big.little stuff, their CPUs are
>> > inherently asymmetric in capacity, but that doesn't matter one whit for
>> > utilization numbers. If a core is fully consumed its fully consumed, no
>> > matter how much work it can or can not do.
>> >
>> >
>> > So the only thing that needs correcting is the fact that these
>> > statistics are based on clock_task and some of that time can end up in
>> > other scheduling classes, at which point we'll never get 100% even
>> > though we're 'saturated'. But correcting for that using capacity doesn't
>> > 'work'.
>>
>> I'm not sure to catch your last point because the capacity is the only
>> figures that take into account the "time" consumed by other classes.
>> Have you got in mind another way to take into account the other
>> classes ?
>
> So that was the entire point of stuffing capacity in? Note that that
> point was not at all clear.
>
> This is very much like 'all we have is a hammer, and therefore
> everything is a nail'. The rt fraction is a 'small' part of what the
> capacity is.
>
>> So we have cpu_capacity that is the capacity that can be currently
>> used by cfs class
>> We have cfs.usage_load_avg that is the sum of running time of cfs
>> tasks on the CPU and reflect the % of usage of this CPU by CFS tasks
>> We have to use the same metrics to compare available capacity for CFS
>> and current cfs usage
>
> -ENOPARSE
>
>> Now we have to use the same unit so we can either weight the
>> cpu_capacity_orig with the cfs.usage_load_avg and compare it with
>> cpu_capacity
>> or with divide cpu_capacity by cpu_capacity_orig and scale it into the
>> SCHED_LOAD_SCALE range. Is It what you are proposing ?
>
> I'm so not getting it; orig vs capacity still includes
> arch_scale_freq_capacity(), so that is not enough to isolate the rt
> fraction.

This patch does not try to solve any scale invariance issue. This
patch removes capacity_factor because it rarely works correctly.
capacity_factor tries to compute how many tasks a group of CPUs can
handle at the time we are doing the load balance. The capacity_factor
is hardly working for SMT system: it sometimes works for big cores and
but fails to do the right thing for little cores.

Below are two examples to illustrate the problem that this patch solves:

capacity_factor makes the assumption that max capacity of a CPU is
SCHED_CAPACITY_SCALE and the load of a thread is always is
SCHED_LOAD_SCALE. It compares the output of these figures with the sum
of nr_running to decide if a group is overloaded or not.

But if the default capacity of a CPU is less than SCHED_CAPACITY_SCALE
(640 as an example), a group of 3 CPUS will have a max capacity_factor
of 2 ( div_round_closest(3x640/1024) = 2) which means that it will be
seen as overloaded if we have only one task per CPU.

Then, if the default capacity of a CPU is greater than
SCHED_CAPACITY_SCALE (1512 as an example), a group of 4 CPUs will have
a capacity_factor of 4 (at max and thanks to the fix[0] for SMT system
that prevent the apparition of ghost CPUs) but if one CPU is fully
used by a rt task (and its capacity is reduced to nearly nothing), the
capacity factor of the group will still be 4
(div_round_closest(3*1512/1024) = 5).

So, this patch tries to solve this issue by removing capacity_factor
and replacing it with the 2 following metrics :
-the available CPU capacity for CFS tasks which is the one currently
used by load_balance
-the capacity that are effectively used by CFS tasks on the CPU. For
that, i have re-introduced the usage_avg_contrib which is in the range
[0..SCHED_CPU_LOAD] whatever the capacity of the CPU on which the task
is running, is. This usage_avg_contrib doesn't solve the scaling
in-variance problem, so i have to scale the usage with original
capacity in get_cpu_utilization (that will become get_cpu_usage in the
next version) in order to compare it with available capacity.

Once the scaling invariance will have been added in usage_avg_contrib,
we can remove the scale by cpu_capacity_orig in get_cpu_utilization.
But the scaling invariance will come in another patchset.

Hope that this explanation makes the goal of this patchset clearer.
And I can add this explanation in the commit log if you found it clear
enough

Vincent

[0] https://lkml.org/lkml/2013/8/28/194

  reply	other threads:[~2014-09-18  1:32 UTC|newest]

Thread overview: 164+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-26 11:06 [PATCH v5 00/12] sched: consolidation of cpu_capacity Vincent Guittot
2014-08-26 11:06 ` Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 01/12] sched: fix imbalance flag reset Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-19 11:47   ` [tip:sched/core] sched: Fix " tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 02/12] sched: remove a wake_affine condition Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-19 11:47   ` [tip:sched/core] sched: Remove a wake_affine() condition tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 03/12] sched: fix avg_load computation Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-08-30 12:00   ` Preeti U Murthy
2014-08-30 12:00     ` Preeti U Murthy
2014-09-03 11:09     ` Vincent Guittot
2014-09-03 11:09       ` Vincent Guittot
2014-09-03 23:43       ` Tim Chen
2014-09-03 23:43         ` Tim Chen
2014-09-04  7:17         ` Vincent Guittot
2014-09-04  7:17           ` Vincent Guittot
2014-09-04 16:26           ` Tim Chen
2014-09-04 16:26             ` Tim Chen
2014-09-05 11:10   ` Preeti U Murthy
2014-09-05 11:10     ` Preeti U Murthy
2014-09-19 11:47   ` [tip:sched/core] sched: Fix " tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-08-27 13:12   ` Kamalesh Babulal
2014-08-27 13:12     ` Kamalesh Babulal
2014-08-30 17:07   ` Preeti U Murthy
2014-08-30 17:07     ` Preeti U Murthy
2014-09-01  8:05     ` Vincent Guittot
2014-09-01  8:05       ` Vincent Guittot
2014-09-03  8:41       ` Preeti U Murthy
2014-09-03  8:41         ` Preeti U Murthy
2014-09-10 13:50     ` Peter Zijlstra
2014-09-10 13:50       ` Peter Zijlstra
2014-09-10 14:22       ` Vincent Guittot
2014-09-10 14:22         ` Vincent Guittot
2014-09-11 10:36       ` Preeti U Murthy
2014-09-11 10:36         ` Preeti U Murthy
2014-09-19 11:47   ` [tip:sched/core] sched: Allow all architectures to set ' capacity_orig' tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 05/12] ARM: topology: use new cpu_capacity interface Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-11 18:52   ` Nicolas Pitre
2014-09-11 18:52     ` Nicolas Pitre
2014-09-19 11:48   ` [tip:sched/core] ARM: topology: Use the " tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 06/12] sched: add per rq cpu_capacity_orig Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-08-27 13:32   ` Kamalesh Babulal
2014-08-27 13:32     ` Kamalesh Babulal
2014-08-28  7:34     ` Vincent Guittot
2014-08-28  7:34       ` Vincent Guittot
2014-09-10 13:53   ` Peter Zijlstra
2014-09-10 13:53     ` Peter Zijlstra
2014-09-10 14:19     ` Vincent Guittot
2014-09-10 14:19       ` Vincent Guittot
2014-09-11 19:02   ` Nicolas Pitre
2014-09-11 19:02     ` Nicolas Pitre
2014-09-15 21:22     ` Vincent Guittot
2014-09-15 21:22       ` Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 07/12] sched: test the cpu's capacity in wake affine Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-10 14:19   ` Peter Zijlstra
2014-09-10 14:19     ` Peter Zijlstra
2014-09-19 11:48   ` [tip:sched/core] sched: Test the CPU's capacity in wake_affine() tip-bot for Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-08-30 17:50   ` Preeti U Murthy
2014-08-30 17:50     ` Preeti U Murthy
2014-09-01  8:45     ` Vincent Guittot
2014-09-01  8:45       ` Vincent Guittot
2014-09-03  9:11       ` Preeti U Murthy
2014-09-03  9:11         ` Preeti U Murthy
2014-09-03 11:44         ` Vincent Guittot
2014-09-03 11:44           ` Vincent Guittot
2014-09-03 12:26           ` Preeti U Murthy
2014-09-03 12:26             ` Preeti U Murthy
2014-09-03 12:49             ` Vincent Guittot
2014-09-03 12:49               ` Vincent Guittot
2014-09-11  9:27             ` Peter Zijlstra
2014-09-11  9:27               ` Peter Zijlstra
2014-09-05 12:06   ` Preeti U Murthy
2014-09-05 12:06     ` Preeti U Murthy
2014-09-05 12:24     ` Vincent Guittot
2014-09-05 12:24       ` Vincent Guittot
2014-09-11 10:07   ` Peter Zijlstra
2014-09-11 10:07     ` Peter Zijlstra
2014-09-11 11:20     ` Vincent Guittot
2014-09-11 11:20       ` Vincent Guittot
2014-09-11 10:13   ` Peter Zijlstra
2014-09-11 10:13     ` Peter Zijlstra
2014-09-11 12:14     ` Vincent Guittot
2014-09-11 12:14       ` Vincent Guittot
2014-09-11 11:54   ` Peter Zijlstra
2014-09-11 11:54     ` Peter Zijlstra
2014-08-26 11:06 ` [PATCH v5 09/12] sched: add usage_load_avg Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-04  7:34   ` [PATCH v5 09/11] " Vincent Guittot
2014-09-04  7:34     ` Vincent Guittot
2014-09-11 11:17     ` Peter Zijlstra
2014-09-11 11:17       ` Peter Zijlstra
2014-09-11 11:17   ` [PATCH v5 09/12] " Peter Zijlstra
2014-09-11 11:17     ` Peter Zijlstra
2014-09-11 12:18     ` Vincent Guittot
2014-09-11 12:18       ` Vincent Guittot
2014-09-11 12:20     ` Vincent Guittot
2014-09-11 12:20       ` Vincent Guittot
2014-09-15 19:15   ` Morten Rasmussen
2014-09-15 19:15     ` Morten Rasmussen
2014-09-15 22:33     ` Vincent Guittot
2014-09-15 22:33       ` Vincent Guittot
2014-08-26 11:06 ` [PATCH v5 10/12] sched: get CPU's utilization statistic Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-11 12:34   ` Peter Zijlstra
2014-09-11 12:34     ` Peter Zijlstra
2014-09-11 13:07     ` Vincent Guittot
2014-09-11 13:07       ` Vincent Guittot
2014-09-11 14:04       ` Peter Zijlstra
2014-09-11 14:04         ` Peter Zijlstra
2014-09-11 19:17         ` Nicolas Pitre
2014-09-11 19:17           ` Nicolas Pitre
2014-09-12  7:41           ` Vincent Guittot
2014-09-12  7:41             ` Vincent Guittot
2014-09-15 19:45         ` Morten Rasmussen
2014-09-15 19:45           ` Morten Rasmussen
2014-09-16 22:43           ` Vincent Guittot
2014-09-16 22:43             ` Vincent Guittot
2014-09-15 19:28     ` Morten Rasmussen
2014-09-15 19:28       ` Morten Rasmussen
2014-08-26 11:06 ` [PATCH v5 11/12] sched: replace capacity_factor by utilization Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot
2014-09-11 15:39   ` Peter Zijlstra
2014-09-11 15:39     ` Peter Zijlstra
2014-09-11 16:15   ` Peter Zijlstra
2014-09-11 16:15     ` Peter Zijlstra
2014-09-11 17:26     ` Vincent Guittot
2014-09-11 17:26       ` Vincent Guittot
2014-09-14 19:41       ` Peter Zijlstra
2014-09-14 19:41         ` Peter Zijlstra
2014-09-14 19:51         ` Peter Zijlstra
2014-09-14 19:51           ` Peter Zijlstra
2014-09-15 11:42         ` Peter Zijlstra
2014-09-15 11:42           ` Peter Zijlstra
2014-09-15 19:07           ` Nicolas Pitre
2014-09-15 19:07             ` Nicolas Pitre
2014-09-15 20:01             ` Peter Zijlstra
2014-09-15 20:01               ` Peter Zijlstra
2014-09-17 18:45               ` Morten Rasmussen
2014-09-17 18:45                 ` Morten Rasmussen
2014-09-17 18:58                 ` Morten Rasmussen
2014-09-17 18:58                   ` Morten Rasmussen
2014-09-17 23:03                 ` Peter Zijlstra
2014-09-17 23:03                   ` Peter Zijlstra
2014-09-15 22:14           ` Vincent Guittot
2014-09-15 22:14             ` Vincent Guittot
2014-09-15 22:18             ` Vincent Guittot
2014-09-15 22:18               ` Vincent Guittot
2014-09-17 22:25             ` Peter Zijlstra
2014-09-17 22:25               ` Peter Zijlstra
2014-09-18  1:32               ` Vincent Guittot [this message]
2014-09-18  1:32                 ` Vincent Guittot
2014-09-16 17:00         ` Dietmar Eggemann
2014-09-16 17:00           ` Dietmar Eggemann
2014-08-26 11:06 ` [PATCH v5 12/12] sched: add SD_PREFER_SIBLING for SMT level Vincent Guittot
2014-08-26 11:06   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtA_=MvAVya6JR3M+qs-kLapZgGknBqo=87MVbihCCXF9A@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@kernel.org \
    --cc=nicolas.pitre@linaro.org \
    --cc=peterz@infradead.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.