All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Date: Fri, 14 Oct 2016 10:24:09 +0200	[thread overview]
Message-ID: <CAKfTPtADw=JChCccEQ+HohaYwbOCh3QuWOFrGJdGudOHCD8-OA@mail.gmail.com> (raw)
In-Reply-To: <CAKfTPtAo=E62XK8kTrrCp1NbxQbhJeYFEoxfTEHvdKTyVZODbQ@mail.gmail.com>

On 13 October 2016 at 23:34, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 13 October 2016 at 20:49, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 13/10/16 17:48, Vincent Guittot wrote:
>>> On 13 October 2016 at 17:52, Joseph Salisbury
>>> <joseph.salisbury@canonical.com> wrote:
>>>> On 10/13/2016 06:58 AM, Vincent Guittot wrote:
>>>>> Hi,
>>>>>
>>>>> On 12 October 2016 at 18:21, Joseph Salisbury
>>>>> <joseph.salisbury@canonical.com> wrote:
>>>>>> On 10/12/2016 08:20 AM, Vincent Guittot wrote:
>>>>>>> On 8 October 2016 at 13:49, Mike Galbraith <efault@gmx.de> wrote:
>>>>>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote:
>>>>>>>>> On 8 October 2016 at 10:39, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>>>>>> * Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote:
>>>>>>>>>>>> Hello Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> A kernel bug report was opened against Ubuntu [0].  After a
>>>>>>>>>>>> kernel
>>>>>>>>>>>> bisect, it was found that reverting the following commit
>>>>>>>>>>>> resolved this bug:
>>>>>>>>>>>>
>>>>>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5
>>>>>>>>>>>> Author: Peter Zijlstra <peterz@infradead.org>
>>>>>>>>>>>> Date:   Tue Jun 21 14:27:50 2016 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     sched/fair: Apply more PELT fixes
>>>>>>>>> This patch only speeds up the update of task group load in order to
>>>>>>>>> reflect the new load balance but It should not change the final value
>>>>>>>>> and as a result the final behavior. I will try to reproduce it in my
>>>>>>>>> target later today
>>>>>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master.
>>>>>>> Me too
>>>>>>>
>>>>>>> Is it possible to get some dump of  /proc/sched_debug while the problem occurs ?
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>>         -Mike
>>>>>> The output from /proc/shed_debug can be seen here:
>>>>>> http://paste.ubuntu.com/23312351/
>>>>> I have looked at the dump and there is something very odd for
>>>>> system.slice task group where the display manager is running.
>>>>> system.slice->tg_load_avg is around 381697 but  tg_load_avg is
>>>>> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
>>>>> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
>>>>> case. We can have some differences because the dump of
>>>>> /proc/shed_debug is not atomic and some changes can happen but nothing
>>>>> like this difference.
>>>>>
>>>>> The main effect of this quite high value is that the weight/prio of
>>>>> the sched_entity that represents system.slice in root cfs_rq is very
>>>>> low (lower than task with the smallest nice prio) so the system.slice
>>>>> task group will not get the CPU quite often compared to the user.slice
>>>>> task group: less than 1% for the system.slice where lightDM and xorg
>>>>> are running compared 99% for the user.slice where the stress tasks are
>>>>> running. This is confirmed by the se->avg.util_avg value of the task
>>>>> groups which reflect how much time each task group is effectively
>>>>> running on a CPU:
>>>>> system.slice[CPU3].se->avg.util_avg = 8 whereas
>>>>> user.slice[CPU3].se->avg.util_avg = 991
>>>>>
>>>>> This difference of weight/priority explains why the system becomes
>>>>> unresponsive. For now, I can't explain is why
>>>>> system.slice->tg_load_avg = 381697 whereas is should be around 1013
>>>>> and how the patch can generate this situation.
>>>>>
>>>>> Is it possible to have a dump of /proc/sched_debug before starting
>>>>> stress command ? to check if the problem is there from the beginning
>>>>> but not seen because not overloaded. Or if it the problem comes when
>>>>> user starts to load the system
>>>> Here is the dump before stress is started:
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy
>>>
>>> This one is ok.
>>> The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic
>>> #12~lp1627108Commit3d30544Reverted
>>> so this is without the culprit commit
>>>
>>>>
>>>> Here it is after:
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy
>>>>
>>>
>>> This one has the exact same odds values for system.slice->tg_load_avg
>>> than the 1st dump that you sent yesterday
>>> The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu
>>> So this dump has been done with a different kernel than for the dump above.
>>> As I can't find any stress task in the dump, i tend to believe that
>>> the dump has been done before starting the stress tasks and not after
>>> starting them. Can you confirm ?
>>>
>>> If i'm right, it mean that the problem was already there before
>>> starting stress tasks.
>>
>> Could it be a problem I'm also seeing on my ARM64 Juno (6 logical cpus) w/o systemd
>> and w/o autogroup (tip/sched/core 447976ef4fd0):
>>
>> When I create a tg_root/tg_x/tg_y_1 and a tg_root/tg_x/tg_y_2 group, the tg_x->load_avg
>> becomes > 6*1024 before any tasks ran in it.
>
> This is normal as se->avg.load_avg is initialized to
> scale_load_down(se->load.weight) and this se->avg.load_avg will be
> added to tg_x[cpu]->cfs_rq->avg.load_avg when attached to the cfs_rq
>
>>
>> tg_x   : 0xffff800975800d80
>> tg_y_1 : 0xffff800975800c00
>> tg_y_2 : 0xffff80097543d200
>>
>>  mkdir-2177 [002] 117.235241: bprint: sched_online_group: tg=0xffff800975800d80 tg->parent=0xffff000008fd0300
>>  mkdir-2177 [002] 117.235244: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=0
>>  mkdir-2177 [002] 117.235247: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=1
>>  mkdir-2177 [002] 117.235249: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=2
>>  mkdir-2177 [002] 117.235251: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=3
>>  mkdir-2177 [002] 117.235253: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=4
>>  mkdir-2177 [002] 117.235255: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=5
>>
>>  mkdir-2181 [002] 117.353155: bprint: sched_online_group: tg=0xffff800975800c00 tg->parent=0xffff800975800d80
>>  mkdir-2181 [002] 117.353158: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=0
>>  mkdir-2181 [002] 117.353162: bprint: post_init_entity_util_avg: cpu=0 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=0 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>  mkdir-2181 [002] 117.353164: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=1
>>  mkdir-2181 [002] 117.353167: bprint: post_init_entity_util_avg: cpu=1 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=1024 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>  mkdir-2181 [002] 117.353168: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=2
>>  mkdir-2181 [002] 117.353171: bprint: post_init_entity_util_avg: cpu=2 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=2048 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>  mkdir-2181 [002] 117.353173: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=3
>>  mkdir-2181 [002] 117.353175: bprint: post_init_entity_util_avg: cpu=3 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=3072 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>  mkdir-2181 [002] 117.353177: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=4
>>  mkdir-2181 [002] 117.353179: bprint: post_init_entity_util_avg: cpu=4 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=4096 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>  mkdir-2181 [002] 117.353180: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=5
>>  mkdir-2181 [002] 117.353183: bprint: post_init_entity_util_avg: cpu=5 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=5120 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024
>>
>
> This looks coherent as tg_x->load_avg = 5120 == sum of
> cfs_rq[cpu]->load_avg and cfs_rq->tg_load_avg_contrib will be set to
> cfs_rq[cpu]->load_avg at the end of the function call
> In the buggy case, tg->load_avg = 381697 but sum of
> cfs_rq[cpu]->tg_load_avg_contrib = 1013 = cfs_rq[cpu]->avg.load_avg so
> there is no way to remove the 381697
>
>>  mkdir-2185 [001] 117.502980: bprint: sched_online_group: tg=0xffff80097543d200 tg->parent=0xffff800975800d80
>>  mkdir-2185 [001] 117.502982: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=0
>>  mkdir-2185 [001] 117.502987: bprint: post_init_entity_util_avg: cpu=0 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6144 cfs_rq->avg.load_avg=1068 cfs_rq->tg_load_avg_contrib=1024 delta=44
>>  mkdir-2185 [001] 117.502988: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=1
>>  mkdir-2185 [001] 117.502992: bprint: post_init_entity_util_avg: cpu=1 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6188 cfs_rq->avg.load_avg=1058 cfs_rq->tg_load_avg_contrib=1024 delta=34
>>  mkdir-2185 [001] 117.502993: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=2
>>  mkdir-2185 [001] 117.502996: bprint: post_init_entity_util_avg: cpu=2 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6222 cfs_rq->avg.load_avg=1092 cfs_rq->tg_load_avg_contrib=1024 delta=68
>>  mkdir-2185 [001] 117.502998: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=3
>>  mkdir-2185 [001] 117.503001: bprint: post_init_entity_util_avg: cpu=3 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6290 cfs_rq->avg.load_avg=1069 cfs_rq->tg_load_avg_contrib=1024 delta=45
>>  mkdir-2185 [001] 117.503002: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=4
>>  mkdir-2185 [001] 117.503005: bprint: post_init_entity_util_avg: cpu=4 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6335 cfs_rq->avg.load_avg=1064 cfs_rq->tg_load_avg_contrib=1024 delta=40
>>  mkdir-2185 [001] 117.503006: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=5
>>  mkdir-2185 [001] 117.503009: bprint: post_init_entity_util_avg: cpu=5 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=*6375* cfs_rq->avg.load_avg=1061 cfs_rq->tg_load_avg_contrib=1024 delta=37
>>
>> We add delta=1024 to tg_x->load_avg during tg_root/tg_x/tg_y_1 initialization but only
>> delta=~40 for tg_root/tg_x/tg_y_2.
>
> IMO, this comes from the decay of tg_y_2->se[cpu]->avg.load_avg before
> being attached to tg_x
>
> But I don't think that it's the same issue as the tg_x->load_avg is
> still coherent with the sum of cfs_rq->avg.load_avg and the sume of
> cfs_rq->tg_load_avg_contrib
>
>>
>> system.slice->tg_load_avg = 381697 is still pretty much higher and AFAICS, it's an
>> i5-5300U CPU so only 4 logical cpus.
>>
>> How many system.slice/foo's actually exists in this system? I only see user.slice
>
> There is several dozen of childs in system.slice but all

but all child have null load_avg and load_avg_contrib

>
>> related cfs_rq[x]:/autogroups-xxx on my Ubuntu-16.04 Desktop system.
>>
>> [...]

  reply	other threads:[~2016-10-14  8:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 19:38 [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Joseph Salisbury
2016-10-07 19:57 ` Linus Torvalds
2016-10-07 20:22   ` Joseph Salisbury
2016-10-07 20:37     ` Linus Torvalds
2016-10-08  8:00 ` Peter Zijlstra
2016-10-08  8:39   ` Ingo Molnar
2016-10-08 11:37     ` Vincent Guittot
2016-10-08 11:49       ` Mike Galbraith
2016-10-12 12:20         ` Vincent Guittot
2016-10-12 15:35           ` Joseph Salisbury
2016-10-12 16:21           ` Joseph Salisbury
2016-10-13 10:58             ` Vincent Guittot
2016-10-13 15:52               ` Joseph Salisbury
2016-10-13 16:48                 ` Vincent Guittot
2016-10-13 18:49                   ` Dietmar Eggemann
2016-10-13 21:34                     ` Vincent Guittot
2016-10-14  8:24                       ` Vincent Guittot [this message]
2016-10-14 13:10                         ` Dietmar Eggemann
2016-10-14 15:18                           ` Vincent Guittot
2016-10-14 16:04                             ` Joseph Salisbury
2016-10-17  9:09                               ` Vincent Guittot
2016-10-17 11:49                                 ` Dietmar Eggemann
2016-10-17 13:19                                   ` Peter Zijlstra
2016-10-17 13:54                                     ` Vincent Guittot
2016-10-17 22:52                                       ` Dietmar Eggemann
2016-10-18  8:43                                         ` Vincent Guittot
2016-10-18  9:07                                         ` Peter Zijlstra
2016-10-18  9:45                                           ` Vincent Guittot
2016-10-18 10:34                                             ` Peter Zijlstra
2016-10-18 11:56                                               ` Vincent Guittot
2016-10-18 21:58                                                 ` Joonwoo Park
2016-10-19  6:42                                                   ` Vincent Guittot
2016-10-19  9:46                                                 ` Dietmar Eggemann
2016-10-19 11:25                                                   ` Vincent Guittot
2016-10-19 15:33                                                     ` Dietmar Eggemann
2016-10-19 17:33                                                       ` Joonwoo Park
2016-10-19 17:50                                                       ` Vincent Guittot
2016-10-19 11:33                                                 ` Peter Zijlstra
2016-10-19 11:50                                                   ` Vincent Guittot
2016-10-19 13:30                                                 ` Morten Rasmussen
2016-10-19 17:41                                                   ` Vincent Guittot
2016-10-20  7:56                                                     ` Morten Rasmussen
2016-10-19 14:49                                                 ` Joseph Salisbury
2016-10-19 14:53                                                   ` Vincent Guittot
2016-10-18 11:15                                           ` Dietmar Eggemann
2016-10-18 12:07                                             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtADw=JChCccEQ+HohaYwbOCh3QuWOFrGJdGudOHCD8-OA@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=joseph.salisbury@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.