From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
Chris Mason <clm@fb.com>,
kernel-team@fb.com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg
Date: Wed, 3 May 2017 09:34:51 +0200 [thread overview]
Message-ID: <CAKfTPtB9kns1X7zCGYrd98M1ubHq5RfAqiopcCw1-4Q6Y=nYdw@mail.gmail.com> (raw)
In-Reply-To: <20170502215054.GC5335@htj.duckdns.org>
Hi Tejun,
On 2 May 2017 at 23:50, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Tue, May 02, 2017 at 09:18:53AM +0200, Vincent Guittot wrote:
>> > dbg_odd: odd: dst=28 idle=2 brk=32 lbtgt=0-31 type=2
>> > dbg_odd_dump: A: grp=1,17 w=2 avg=7.247 grp=8.337 sum=8.337 pertask=2.779
>> > dbg_odd_dump: A: gcap=1.150 gutil=1.095 run=3 idle=0 gwt=2 type=2 nocap=1
>> > dbg_odd_dump: A: CPU001: run=1 schb=1
>> > dbg_odd_dump: A: Q001-asdf: w=1.000,l=0.525,u=0.513,r=0.527 run=1 hrun=1 tgs=100.000 tgw=17.266
>> > dbg_odd_dump: A: Q001-asdf: schbench(153757C):w=1.000,l=0.527,u=0.514
>> > dbg_odd_dump: A: Q001-/: w=5.744,l=2.522,u=0.520,r=3.067 run=1 hrun=1 tgs=1.000 tgw=0.000
>> > dbg_odd_dump: A: Q001-/: asdf(C):w=5.744,l=3.017,u=0.521
>> > dbg_odd_dump: A: CPU017: run=2 schb=2
>> > dbg_odd_dump: A: Q017-asdf: w=2.000,l=0.989,u=0.966,r=0.988 run=2 hrun=2 tgs=100.000 tgw=17.266
>> > dbg_odd_dump: A: Q017-asdf: schbench(153737C):w=1.000,l=0.493,u=0.482 schbench(153739):w=1.000,l=0.494,u=0.483
>> > dbg_odd_dump: A: Q017-/: w=10.653,l=7.888,u=0.973,r=5.270 run=1 hrun=2 tgs=1.000 tgw=0.000
>> > dbg_odd_dump: A: Q017-/: asdf(C):w=10.653,l=5.269,u=0.966
>> > dbg_odd_dump: B: grp=14,30 w=2 avg=7.666 grp=8.819 sum=8.819 pertask=4.409
>> > dbg_odd_dump: B: gcap=1.150 gutil=1.116 run=2 idle=0 gwt=2 type=2 nocap=1
>> > dbg_odd_dump: B: CPU014: run=1 schb=1
>> > dbg_odd_dump: B: Q014-asdf: w=1.000,l=1.004,u=0.970,r=0.492 run=1 hrun=1 tgs=100.000 tgw=17.266
>> > dbg_odd_dump: B: Q014-asdf: schbench(153760C):w=1.000,l=0.491,u=0.476
>> > dbg_odd_dump: B: Q014-/: w=5.605,l=11.146,u=0.970,r=5.774 run=1 hrun=1 tgs=1.000 tgw=0.000
>> > dbg_odd_dump: B: Q014-/: asdf(C):w=5.605,l=5.766,u=0.970
>> > dbg_odd_dump: B: CPU030: run=1 schb=1
>> > dbg_odd_dump: B: Q030-asdf: w=1.000,l=0.538,u=0.518,r=0.558 run=1 hrun=1 tgs=100.000 tgw=17.266
>> > dbg_odd_dump: B: Q030-asdf: schbench(153747C):w=1.000,l=0.537,u=0.516
>> > dbg_odd_dump: B: Q030-/: w=5.758,l=3.186,u=0.541,r=3.044 run=1 hrun=1 tgs=1.000 tgw=0.000
>> > dbg_odd_dump: B: Q030-/: asdf(C):w=5.758,l=3.092,u=0.516
>> >
>> > You can notice that B's pertask weight is 4.409 which is way higher
>> > than A's 2.779, and this is from Q014-asdf's contribution to Q014-/ is
>> > twice as high as it should be. The root queue's runnable avg should
>>
>> Are you sure that this is because of blocked load in group A ? it can
>> be that Q014-asdf has already have to wait before running and its load
>> still increase while runnable but not running .
>
> This is with propagation enabled, so the only thing contributing to
> the root queue's runnable_load_avg is the load being propagated from
> Q014-asdf, which has twice high load avg than runnable. The past
> history doesn't matter for load balancing and without cgroup this
> blocked load wouldn't have contributed to root's runnable load avg. I
> don't think it can get much clearer.
>
>> IIUC your trace, group A has 2 running tasks and group B only one but
>> load_balance selects B because of its sgs->avg_load being higher. But
>> this can also happen even if runnable_load_avg of child cfs_rq was
>> propagated correctly in group entity because we can have situation
>> where a group A has only 1 task with higher load than 2 tasks on
>> groupB and even if blocked load is not taken into account, and
>> load_balance will select A.
>
> Yes, it can happen with tasks w/ different weights. That's clearly
> not what's happening here. The load balancer is picking the wrong CPU
> far more frequently because the root queue's runnable load avg
> incorrectly includes blocked load avgs from nested cfs_rqs.
>
>> IMHO, we should better improve load balance selection. I'm going to
>> add smarter group selection in load_balance. that's something we
>> should have already done but it was difficult without load/util_avg
>> propagation. it should be doable now
>
> That's all well and great but let's fix a bug first; otherwise, we'd
> be papering over an existing issue with a new mechanism which is a bad
> idea for any code base which has to last.
runnable_load_avg is already a kind of fix and breaking load_avg seems
worse than fixing load_balance
>
>> > We can argue whether overriding a cfs_rq se's load_avg to the scaled
>> > runnable_load_avg of the cfs_rq is the right way to go or we should
>> > introduce a separate channel to propagate runnable_load_avg; however,
>> > it's clear that we need to fix runnable_load_avg propagation one way
>> > or another.
>>
>> The minimum would be to not break load_avg
>
> Oh yeah, this I can understand. The proposed change is icky in that
> it forces group se->load_avg.avg to be runnable_load_avg of the
> corresponding group cfs_rq. We *can* introduce a separate channel,
> say, se->group_runnable_load_avg which is used to propagate
> runnable_load_avg; however, the thing is that we don't really use
> group se->load_avg.avg anywhere, so we might as well just override it.
We use load_avg for calculating a stable share and we want to use it
more and more.
So breaking it because it's easier doesn't seems to be the right way to do IMHO
>
> I have a preliminary patch to introduce a separate field but it looks
> sad too because we end up calculating the load_avg and
> runnable_load_avg to propagate separately without actually using the
> former value anywhere.
>
>> > The thing with cfs_rq se's load_avg is that, it isn't really used
>> > anywhere else AFAICS, so overriding it to the cfs_rq's
>> > runnable_load_avg isn't prettiest but doesn't really change anything.
>>
>> load_avg is used for defining the share of each cfs_rq.
>
> Each cfs_rq calculates its load_avg independently from the weight sum.
> The queued se's load_avgs don't affect cfs_rq's load_avg in any direct
> way. The only time the value is used is for propagation during
> migration; however, group se themselves never get migrated themselves
> and during propagation only deltas matter so the difference between
> load_avg and runnable_load_avg isn't gonna matter that much. In
> short, we don't really use group se's load_avg in any way significant.
>
> Thanks.
>
> --
> tejun
next prev parent reply other threads:[~2017-05-03 7:35 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33 ` [PATCH v2 " Tejun Heo
2017-05-03 18:00 ` Peter Zijlstra
2017-05-03 21:45 ` Tejun Heo
2017-05-04 5:51 ` Peter Zijlstra
2017-05-04 6:21 ` Peter Zijlstra
2017-05-04 9:49 ` Dietmar Eggemann
2017-05-04 10:57 ` Peter Zijlstra
2017-05-04 17:39 ` Tejun Heo
2017-05-05 10:36 ` Dietmar Eggemann
2017-05-04 10:26 ` Vincent Guittot
2017-04-25 8:35 ` [PATCH " Vincent Guittot
2017-04-25 18:12 ` Tejun Heo
2017-04-26 16:51 ` Vincent Guittot
2017-04-26 22:40 ` Tejun Heo
2017-04-27 7:00 ` Vincent Guittot
2017-05-01 14:17 ` Peter Zijlstra
2017-05-01 14:52 ` Peter Zijlstra
2017-05-01 21:56 ` Tejun Heo
2017-05-02 8:19 ` Peter Zijlstra
2017-05-02 8:30 ` Peter Zijlstra
2017-05-02 20:00 ` Tejun Heo
2017-05-03 9:10 ` Peter Zijlstra
2017-04-26 16:14 ` Vincent Guittot
2017-04-26 22:27 ` Tejun Heo
2017-04-27 8:59 ` Vincent Guittot
2017-04-28 17:46 ` Tejun Heo
2017-05-02 7:20 ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25 8:46 ` Vincent Guittot
2017-04-25 9:05 ` Vincent Guittot
2017-04-25 12:59 ` Vincent Guittot
2017-04-25 18:49 ` Tejun Heo
2017-04-25 20:49 ` Tejun Heo
2017-04-25 21:15 ` Chris Mason
2017-04-25 21:08 ` Tejun Heo
2017-04-26 10:21 ` Vincent Guittot
2017-04-27 0:30 ` Tejun Heo
2017-04-27 8:28 ` Vincent Guittot
2017-04-28 16:14 ` Tejun Heo
2017-05-02 6:56 ` Vincent Guittot
2017-05-02 20:56 ` Tejun Heo
2017-05-03 7:25 ` Vincent Guittot
2017-05-03 7:54 ` Vincent Guittot
2017-04-26 18:12 ` Vincent Guittot
2017-04-26 22:52 ` Tejun Heo
2017-04-27 8:29 ` Vincent Guittot
2017-04-28 20:33 ` Tejun Heo
2017-04-28 20:38 ` Tejun Heo
2017-05-01 15:56 ` Peter Zijlstra
2017-05-02 22:01 ` Tejun Heo
2017-05-02 7:18 ` Vincent Guittot
2017-05-02 13:26 ` Vincent Guittot
2017-05-02 22:37 ` Tejun Heo
2017-05-02 21:50 ` Tejun Heo
2017-05-03 7:34 ` Vincent Guittot [this message]
2017-05-03 9:37 ` Peter Zijlstra
2017-05-03 10:37 ` Vincent Guittot
2017-05-03 13:09 ` Peter Zijlstra
2017-05-03 21:49 ` Tejun Heo
2017-05-04 8:19 ` Vincent Guittot
2017-05-04 17:43 ` Tejun Heo
2017-05-04 19:02 ` Vincent Guittot
2017-05-04 19:04 ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48 ` Peter Zijlstra
2017-04-24 22:54 ` Tejun Heo
2017-04-25 21:09 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKfTPtB9kns1X7zCGYrd98M1ubHq5RfAqiopcCw1-4Q6Y=nYdw@mail.gmail.com' \
--to=vincent.guittot@linaro.org \
--cc=clm@fb.com \
--cc=efault@gmx.de \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).