Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg

From: Tejun Heo <tj@kernel.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg
Date: Wed, 26 Apr 2017 17:30:20 -0700	[thread overview]
Message-ID: <20170427003020.GD11348@wtj.duckdns.org> (raw)
In-Reply-To: <CAKfTPtC92nVXCH3QX-Qqf5R5gD58pk2=S_OpwiTao5y16g84Xw@mail.gmail.com>

Hello, Vincent.

On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote:
> > This is from the follow-up patch.  I was confused.  Because we don't
> > propagate decays, we still should decay the runnable_load_avg;
> > otherwise, we end up accumulating errors in the counter.  I'll drop
> > the last patch.
> 
> Ok, the runnable_load_avg goes back to 0 when I drop patch 3. But i
> see  runnable_load_avg sometimes significantly higher than load_avg
> which is normally not possible as load_avg = runnable_load_avg +
> sleeping task's load_avg

So, while load_avg would eventually converge on runnable_load_avg +
blocked load_avg given stable enough workload for long enough,
runnable_load_avg jumping above load avg temporarily is expected,
AFAICS.  That's the whole point of it, a sum closely tracking what's
currently on the cpu so that we can pick the cpu which has the most on
it now.  It doesn't make sense to try to pick threads off of a cpu
which is generally loaded but doesn't have much going on right now,
after all.

> Then, I just have the opposite behavior on my platform. I see a
> increase of latency at p99 with your patches.
> My platform is a hikey : 2x4 cores ARM and I have used schbench -m 2
> -t 4 -s 10000 -c 15000 -r 30 so I have 1 worker thread per CPU which
> is similar to what you are doing on your platform
>
> With v4.11-rc8. I have run 10 times the test and get consistent results
...
> *99.0000th: 539
...
> With your patches i see an increase of the latency for p99. I run 10
> *99.0000th: 2034

I see.  This is surprising given that at least the purpose of the
patch is restoring cgroup behavior to match !cgroup one.  I could have
totally messed it up tho.  Hmm... there are several ways forward I
guess.

* Can you please double check that the higher latencies w/ the patch
  is reliably reproducible?  The test machines that I use have
  variable management load.  They never dominate the machine but are
  enough to disturb the results so that to drawing out a reliable
  pattern takes a lot of repeated runs.  I'd really appreciate if you
  could double check that the pattern is reliable with different run
  patterns (ie. instead of 10 consecutive runs after another,
  interleaved).

* Is the board something easily obtainable?  It'd be the eaisest for
  me to set up the same environment and reproduce the problem.  I
  looked up hikey boards on amazon but couldn't easily find 2x4 core
  ones.  If there's something I can easily buy, please point me to it.
  If there's something I can loan, that'd be great too.

* If not, I'll try to clean up the debug patches I have and send them
  your way to get more visiblity but given these things tend to be
  very iterative, it might take quite a few back and forth.

Thanks!

-- 
tejun