All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] sched/core: Fix up load metric exposed to cpuidle
@ 2016-09-23 21:49 Sai Gurrappadi
  2016-09-23 22:06 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Sai Gurrappadi @ 2016-09-23 21:49 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: Peter Boonstoppel, Peter Zijlstra, Colin Cross, Arjan van de Ven,
	linux-pm

When triaging a performance degradation of ~5% on some use cases between
our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4
was way more aggressive when requesting for deeper idle states. It would
often get it wrong though resulting in perf loss.

The menu governor tries to bias picking shallower idle states based on the
historical load on the CPU. The busier the CPU, the shallower the idle
state.

However, after commit "3289bdb sched: Move the loadavg code to a more
obvious location", the load metric it looks at is rq->load.weight which is
the instantaneous se->load.weight sum for top level entities on the rq
which on idle entry is always 0 (for the common case at least) because
there is nothing on the cfs rq.

The previous metric the menu governor used was rq->cpu_load[0] which is a
snap shot of the weighted_cpuload at the previous load update point so it
isn't always 0 on idle entry.

Unfortunately, it isn't straightforward to switch the metric being used to
rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a
lot more than rq->cpu_load[0] (include blocked task contrib.). That would
potentially require redoing the magic constants in the menu governor's
performance_multiplier...so for now, use rq->cpu_load[0] instead to
preserve old behaviour.

Reported-by: Juha Lainema <jlainema@nvidia.com>
Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com>
---

* I realize this might not be the best thing to do hence the RFC tag.
Thoughts?

 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44817c6..d1aea12 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2955,7 +2955,7 @@ void get_iowait_load(unsigned long *nr_waiters, unsigned long *load)
 {
 	struct rq *rq = this_rq();
 	*nr_waiters = atomic_read(&rq->nr_iowait);
-	*load = rq->load.weight;
+	*load = rq->cpu_load[0];
 }
 
 #ifdef CONFIG_SMP
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/core: Fix up load metric exposed to cpuidle
  2016-09-23 21:49 [RFC] sched/core: Fix up load metric exposed to cpuidle Sai Gurrappadi
@ 2016-09-23 22:06 ` Peter Zijlstra
  2016-09-23 22:44   ` Sai Gurrappadi
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2016-09-23 22:06 UTC (permalink / raw)
  To: Sai Gurrappadi
  Cc: rafael.j.wysocki, Peter Boonstoppel, Colin Cross,
	Arjan van de Ven, linux-pm

On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote:
> When triaging a performance degradation of ~5% on some use cases between
> our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4
> was way more aggressive when requesting for deeper idle states. It would
> often get it wrong though resulting in perf loss.
> 
> The menu governor tries to bias picking shallower idle states based on the
> historical load on the CPU. The busier the CPU, the shallower the idle
> state.
> 
> However, after commit "3289bdb sched: Move the loadavg code to a more

The normal quoting style is:

  3289bdb42988 ("sched: Move the loadavg code to a more obvious location")

Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have
already happened.

Also, that was more than a year ago, nobody noticed?

> obvious location", the load metric it looks at is rq->load.weight which is
> the instantaneous se->load.weight sum for top level entities on the rq
> which on idle entry is always 0 (for the common case at least) because
> there is nothing on the cfs rq.
> 
> The previous metric the menu governor used was rq->cpu_load[0] which is a
> snap shot of the weighted_cpuload at the previous load update point so it
> isn't always 0 on idle entry.

Right, basically a 'random' number :)

> Unfortunately, it isn't straightforward to switch the metric being used to
> rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a
> lot more than rq->cpu_load[0] (include blocked task contrib.). That would
> potentially require redoing the magic constants in the menu governor's
> performance_multiplier...so for now, use rq->cpu_load[0] instead to
> preserve old behaviour.

Works for me I suppose, but I would indeed encourage people to
investigate better options.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/core: Fix up load metric exposed to cpuidle
  2016-09-23 22:06 ` Peter Zijlstra
@ 2016-09-23 22:44   ` Sai Gurrappadi
  2016-09-29 13:22     ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Sai Gurrappadi @ 2016-09-23 22:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rafael.j.wysocki, Peter Boonstoppel, Colin Cross,
	Arjan van de Ven, linux-pm


On 09/23/2016 03:06 PM, Peter Zijlstra wrote:> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote:
>> When triaging a performance degradation of ~5% on some use cases between
>> our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4
>> was way more aggressive when requesting for deeper idle states. It would
>> often get it wrong though resulting in perf loss.
>>
>> The menu governor tries to bias picking shallower idle states based on the
>> historical load on the CPU. The busier the CPU, the shallower the idle
>> state.
>>
>> However, after commit "3289bdb sched: Move the loadavg code to a more
> 
> The normal quoting style is:
> 
>   3289bdb42988 ("sched: Move the loadavg code to a more obvious location")
> 
> Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have
> already happened.
> 
> Also, that was more than a year ago, nobody noticed?

Ah, sorry. Will do next time.

> 
>> obvious location", the load metric it looks at is rq->load.weight which is
>> the instantaneous se->load.weight sum for top level entities on the rq
>> which on idle entry is always 0 (for the common case at least) because
>> there is nothing on the cfs rq.
>>
>> The previous metric the menu governor used was rq->cpu_load[0] which is a
>> snap shot of the weighted_cpuload at the previous load update point so it
>> isn't always 0 on idle entry.
> 
> Right, basically a 'random' number :)

Indeed. I never really understood how things worked with the cpu_load stuff
given how random it seemed.

> 
>> Unfortunately, it isn't straightforward to switch the metric being used to
>> rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a
>> lot more than rq->cpu_load[0] (include blocked task contrib.). That would
>> potentially require redoing the magic constants in the menu governor's
>> performance_multiplier...so for now, use rq->cpu_load[0] instead to
>> preserve old behaviour.
> 
> Works for me I suppose, but I would indeed encourage people to
> investigate better options.

Thanks for the review :)

-Sai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/core: Fix up load metric exposed to cpuidle
  2016-09-23 22:44   ` Sai Gurrappadi
@ 2016-09-29 13:22     ` Rafael J. Wysocki
  2016-09-29 18:22       ` Sai Gurrappadi
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2016-09-29 13:22 UTC (permalink / raw)
  To: Sai Gurrappadi
  Cc: Peter Zijlstra, rafael.j.wysocki, Peter Boonstoppel, Colin Cross,
	Arjan van de Ven, linux-pm

On Friday, September 23, 2016 03:44:23 PM Sai Gurrappadi wrote:
> 
> On 09/23/2016 03:06 PM, Peter Zijlstra wrote:> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote:
> >> When triaging a performance degradation of ~5% on some use cases between
> >> our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4
> >> was way more aggressive when requesting for deeper idle states. It would
> >> often get it wrong though resulting in perf loss.
> >>
> >> The menu governor tries to bias picking shallower idle states based on the
> >> historical load on the CPU. The busier the CPU, the shallower the idle
> >> state.
> >>
> >> However, after commit "3289bdb sched: Move the loadavg code to a more
> > 
> > The normal quoting style is:
> > 
> >   3289bdb42988 ("sched: Move the loadavg code to a more obvious location")
> > 
> > Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have
> > already happened.
> > 
> > Also, that was more than a year ago, nobody noticed?
> 
> Ah, sorry. Will do next time.
> 
> > 
> >> obvious location", the load metric it looks at is rq->load.weight which is
> >> the instantaneous se->load.weight sum for top level entities on the rq
> >> which on idle entry is always 0 (for the common case at least) because
> >> there is nothing on the cfs rq.
> >>
> >> The previous metric the menu governor used was rq->cpu_load[0] which is a
> >> snap shot of the weighted_cpuload at the previous load update point so it
> >> isn't always 0 on idle entry.
> > 
> > Right, basically a 'random' number :)
> 
> Indeed. I never really understood how things worked with the cpu_load stuff
> given how random it seemed.

Well, the choice seems to be between "better performance, but we don't really
know how we get that" and "more idle and we understand how it works".

Honestly, I prefer to understand how it works in the first place.

Of course, the fact that the metric used currently is (almost) always 0 is a
problem, but it doesn't seem like going back to the old one would be a huge
improvement either.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/core: Fix up load metric exposed to cpuidle
  2016-09-29 13:22     ` Rafael J. Wysocki
@ 2016-09-29 18:22       ` Sai Gurrappadi
  0 siblings, 0 replies; 5+ messages in thread
From: Sai Gurrappadi @ 2016-09-29 18:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, rafael.j.wysocki, Peter Boonstoppel, Colin Cross,
	Arjan van de Ven, linux-pm


On 09/29/2016 06:22 AM, Rafael J. Wysocki wrote:
> On Friday, September 23, 2016 03:44:23 PM Sai Gurrappadi wrote:
>>
>> On 09/23/2016 03:06 PM, Peter Zijlstra wrote:
>>>> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote:

...

>>>> The previous metric the menu governor used was rq->cpu_load[0] which is a
>>>> snap shot of the weighted_cpuload at the previous load update point so it
>>>> isn't always 0 on idle entry.
>>>
>>> Right, basically a 'random' number :)
>>
>> Indeed. I never really understood how things worked with the cpu_load stuff
>> given how random it seemed.
> 
> Well, the choice seems to be between "better performance, but we don't really
> know how we get that" and "more idle and we understand how it works".
> 
> Honestly, I prefer to understand how it works in the first place.

A busier CPU wants to enter shallower idle states because the cost of entering
deeper idle states (exit latency wise) is bigger. The performance_multiplier 
stuff in the menu governor tries to map a cpu load metric -> exit latency 
tolerance by tweaking a fudge factor. This idea at least makes sense.

The problem is that the cpu load metric it used (rq->cpu_load[0]) isn't the 
best thing for this purpose because it is highly dependent on how the sched
tick aligns with the workload.

We found that not considering CPU load in the menu governor did result in worse
idle state prediction resulting in performance loss due to the higher exit 
latency of the deeper idle states.

> 
> Of course, the fact that the metric used currently is (almost) always 0 is a
> problem, but it doesn't seem like going back to the old one would be a huge
> improvement either.

Yup, I agree. Ideally we redo the performance_multiplier stuff in the menu
governor to use a better metric (rq->cfs.avg.load_avg?) or maybe we go address
why the predictor fails in the first place in a different manner.

Do note that this stuff was using rq->cpu_load when it still used
rq->load.weight as the input metric. This was switched once we added the PELT
metric but the fudge factors in performance_multiplier haven't changed so that
in itself might be a good enough reason to redo it..

Thanks,
-Sai

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-29 18:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23 21:49 [RFC] sched/core: Fix up load metric exposed to cpuidle Sai Gurrappadi
2016-09-23 22:06 ` Peter Zijlstra
2016-09-23 22:44   ` Sai Gurrappadi
2016-09-29 13:22     ` Rafael J. Wysocki
2016-09-29 18:22       ` Sai Gurrappadi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.