linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg
Date: Thu, 27 Apr 2017 10:28:01 +0200	[thread overview]
Message-ID: <CAKfTPtAx_aoW+UQN_KP9F2-zYnwjT6KJ6hZJ7uH2nvq1iGJL=Q@mail.gmail.com> (raw)
In-Reply-To: <20170427003020.GD11348@wtj.duckdns.org>

On 27 April 2017 at 02:30, Tejun Heo <tj@kernel.org> wrote:
> Hello, Vincent.
>
> On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
>> > This is from the follow-up patch.  I was confused.  Because we don't
>> > propagate decays, we still should decay the runnable_load_avg;
>> > otherwise, we end up accumulating errors in the counter.  I'll drop
>> > the last patch.
>>
>> Ok, the runnable_load_avg goes back to 0 when I drop patch 3. But i
>> see  runnable_load_avg sometimes significantly higher than load_avg
>> which is normally not possible as load_avg = runnable_load_avg +
>> sleeping task's load_avg
>
> So, while load_avg would eventually converge on runnable_load_avg +
> blocked load_avg given stable enough workload for long enough,
> runnable_load_avg jumping above load avg temporarily is expected,

No, it's not. Look at load_avg/runnable_avg at root domain when only
task are involved, runnable_load_avg will never be higher than
load_avg because
    load_avg = /sum load_avg of tasks attached to the cfs_rq
    runnable_load_avg = /sum load_avg of tasks attached and enqueued
to the cfs_rq
load_avg = runnable_load_avg + blocked tasks and as a result
runnable_load_avg is always lower than load_avg.
And with the propagate load/util_avg patchset, we can even reflect
task migration directly at root domain whereas we had to wait for the
util/load_avg and runnable_load_avg to converge to the new value
before

Just to confirm one of my assumption, the latency regression was
already there in previous kernel versions and is not a result of
propagate load/util_avg patchset, isn't it ?

> AFAICS.  That's the whole point of it, a sum closely tracking what's
> currently on the cpu so that we can pick the cpu which has the most on
> it now.  It doesn't make sense to try to pick threads off of a cpu
> which is generally loaded but doesn't have much going on right now,
> after all.

The only interest of runnable_load_avg is to be null when a cfs_rq is
idle whereas load_avg is not  but not to be higher than load_avg. The
root cause is that load_balance only looks at "load" but not number of
task currently running and that's probably the main problem:
runnable_load_avg has been added because load_balance fails to filter
idle group and idle rq. We should better add a new type in
group_classify to tag group that are  idle and the same in
find_busiest_queue more.

>
>> Then, I just have the opposite behavior on my platform. I see a
>> increase of latency at p99 with your patches.
>> My platform is a hikey : 2x4 cores ARM and I have used schbench -m 2
>> -t 4 -s 10000 -c 15000 -r 30 so I have 1 worker thread per CPU which
>> is similar to what you are doing on your platform
>>
>> With v4.11-rc8. I have run 10 times the test and get consistent results
> ...
>> *99.0000th: 539
> ...
>> With your patches i see an increase of the latency for p99. I run 10
>> *99.0000th: 2034
>
> I see.  This is surprising given that at least the purpose of the
> patch is restoring cgroup behavior to match !cgroup one.  I could have
> totally messed it up tho.  Hmm... there are several ways forward I
> guess.
>
> * Can you please double check that the higher latencies w/ the patch
>   is reliably reproducible?  The test machines that I use have
>   variable management load.  They never dominate the machine but are
>   enough to disturb the results so that to drawing out a reliable
>   pattern takes a lot of repeated runs.  I'd really appreciate if you
>   could double check that the pattern is reliable with different run
>   patterns (ie. instead of 10 consecutive runs after another,
>   interleaved).

I always let time between 2 consecutive run and the 10 consecutive
runs take around 2min to execute

Then I have run several time these 10 consecutive test and results stay the same

>
> * Is the board something easily obtainable?  It'd be the eaisest for
>   me to set up the same environment and reproduce the problem.  I
>   looked up hikey boards on amazon but couldn't easily find 2x4 core

It is often named hikey octo cores but I use 2x4 cores just to point
out that there are 2 clusters which is important for scheduler
topology and behavior

>   ones.  If there's something I can easily buy, please point me to it.
>   If there's something I can loan, that'd be great too.

It looks like most of web site are currently out of stock

>
> * If not, I'll try to clean up the debug patches I have and send them
>   your way to get more visiblity but given these things tend to be
>   very iterative, it might take quite a few back and forth.

Yes, that could be usefull. Even a trace of regression could be useful

I can also push on my git tree the debug patch that i use for tracking
load metrics if you want. It's ugly but it does the job

Thanks

>
> Thanks!
>
> --
> tejun

  reply	other threads:[~2017-04-27  8:28 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33   ` [PATCH v2 " Tejun Heo
2017-05-03 18:00     ` Peter Zijlstra
2017-05-03 21:45       ` Tejun Heo
2017-05-04  5:51         ` Peter Zijlstra
2017-05-04  6:21           ` Peter Zijlstra
2017-05-04  9:49             ` Dietmar Eggemann
2017-05-04 10:57               ` Peter Zijlstra
2017-05-04 17:39               ` Tejun Heo
2017-05-05 10:36                 ` Dietmar Eggemann
2017-05-04 10:26       ` Vincent Guittot
2017-04-25  8:35   ` [PATCH " Vincent Guittot
2017-04-25 18:12     ` Tejun Heo
2017-04-26 16:51       ` Vincent Guittot
2017-04-26 22:40         ` Tejun Heo
2017-04-27  7:00           ` Vincent Guittot
2017-05-01 14:17         ` Peter Zijlstra
2017-05-01 14:52           ` Peter Zijlstra
2017-05-01 21:56           ` Tejun Heo
2017-05-02  8:19             ` Peter Zijlstra
2017-05-02  8:30               ` Peter Zijlstra
2017-05-02 20:00                 ` Tejun Heo
2017-05-03  9:10                   ` Peter Zijlstra
2017-04-26 16:14   ` Vincent Guittot
2017-04-26 22:27     ` Tejun Heo
2017-04-27  8:59       ` Vincent Guittot
2017-04-28 17:46         ` Tejun Heo
2017-05-02  7:20           ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25  8:46   ` Vincent Guittot
2017-04-25  9:05     ` Vincent Guittot
2017-04-25 12:59       ` Vincent Guittot
2017-04-25 18:49         ` Tejun Heo
2017-04-25 20:49           ` Tejun Heo
2017-04-25 21:15             ` Chris Mason
2017-04-25 21:08           ` Tejun Heo
2017-04-26 10:21             ` Vincent Guittot
2017-04-27  0:30               ` Tejun Heo
2017-04-27  8:28                 ` Vincent Guittot [this message]
2017-04-28 16:14                   ` Tejun Heo
2017-05-02  6:56                     ` Vincent Guittot
2017-05-02 20:56                       ` Tejun Heo
2017-05-03  7:25                         ` Vincent Guittot
2017-05-03  7:54                           ` Vincent Guittot
2017-04-26 18:12   ` Vincent Guittot
2017-04-26 22:52     ` Tejun Heo
2017-04-27  8:29       ` Vincent Guittot
2017-04-28 20:33         ` Tejun Heo
2017-04-28 20:38           ` Tejun Heo
2017-05-01 15:56           ` Peter Zijlstra
2017-05-02 22:01             ` Tejun Heo
2017-05-02  7:18           ` Vincent Guittot
2017-05-02 13:26             ` Vincent Guittot
2017-05-02 22:37               ` Tejun Heo
2017-05-02 21:50             ` Tejun Heo
2017-05-03  7:34               ` Vincent Guittot
2017-05-03  9:37                 ` Peter Zijlstra
2017-05-03 10:37                   ` Vincent Guittot
2017-05-03 13:09                     ` Peter Zijlstra
2017-05-03 21:49                       ` Tejun Heo
2017-05-04  8:19                         ` Vincent Guittot
2017-05-04 17:43                           ` Tejun Heo
2017-05-04 19:02                             ` Vincent Guittot
2017-05-04 19:04                               ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48   ` Peter Zijlstra
2017-04-24 22:54     ` Tejun Heo
2017-04-25 21:09   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtAx_aoW+UQN_KP9F2-zYnwjT6KJ6hZJ7uH2nvq1iGJL=Q@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=clm@fb.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).