All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
Date: Thu, 13 Oct 2016 18:48:12 +0200	[thread overview]
Message-ID: <CAKfTPtDU+DmF4iLwHWF2jEnZCjPUXBOu9c6Zqh4+kQPMio39nQ@mail.gmail.com> (raw)
In-Reply-To: <57FFADC8.2020602@canonical.com>

On 13 October 2016 at 17:52, Joseph Salisbury
<joseph.salisbury@canonical.com> wrote:
> On 10/13/2016 06:58 AM, Vincent Guittot wrote:
>> Hi,
>>
>> On 12 October 2016 at 18:21, Joseph Salisbury
>> <joseph.salisbury@canonical.com> wrote:
>>> On 10/12/2016 08:20 AM, Vincent Guittot wrote:
>>>> On 8 October 2016 at 13:49, Mike Galbraith <efault@gmx.de> wrote:
>>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote:
>>>>>> On 8 October 2016 at 10:39, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>>> * Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>
>>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote:
>>>>>>>>> Hello Peter,
>>>>>>>>>
>>>>>>>>> A kernel bug report was opened against Ubuntu [0].  After a
>>>>>>>>> kernel
>>>>>>>>> bisect, it was found that reverting the following commit
>>>>>>>>> resolved this bug:
>>>>>>>>>
>>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5
>>>>>>>>> Author: Peter Zijlstra <peterz@infradead.org>
>>>>>>>>> Date:   Tue Jun 21 14:27:50 2016 +0200
>>>>>>>>>
>>>>>>>>>     sched/fair: Apply more PELT fixes
>>>>>> This patch only speeds up the update of task group load in order to
>>>>>> reflect the new load balance but It should not change the final value
>>>>>> and as a result the final behavior. I will try to reproduce it in my
>>>>>> target later today
>>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master.
>>>> Me too
>>>>
>>>> Is it possible to get some dump of  /proc/sched_debug while the problem occurs ?
>>>>
>>>> Vincent
>>>>
>>>>>         -Mike
>>> The output from /proc/shed_debug can be seen here:
>>> http://paste.ubuntu.com/23312351/
>> I have looked at the dump and there is something very odd for
>> system.slice task group where the display manager is running.
>> system.slice->tg_load_avg is around 381697 but  tg_load_avg is
>> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
>> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
>> case. We can have some differences because the dump of
>> /proc/shed_debug is not atomic and some changes can happen but nothing
>> like this difference.
>>
>> The main effect of this quite high value is that the weight/prio of
>> the sched_entity that represents system.slice in root cfs_rq is very
>> low (lower than task with the smallest nice prio) so the system.slice
>> task group will not get the CPU quite often compared to the user.slice
>> task group: less than 1% for the system.slice where lightDM and xorg
>> are running compared 99% for the user.slice where the stress tasks are
>> running. This is confirmed by the se->avg.util_avg value of the task
>> groups which reflect how much time each task group is effectively
>> running on a CPU:
>> system.slice[CPU3].se->avg.util_avg = 8 whereas
>> user.slice[CPU3].se->avg.util_avg = 991
>>
>> This difference of weight/priority explains why the system becomes
>> unresponsive. For now, I can't explain is why
>> system.slice->tg_load_avg = 381697 whereas is should be around 1013
>> and how the patch can generate this situation.
>>
>> Is it possible to have a dump of /proc/sched_debug before starting
>> stress command ? to check if the problem is there from the beginning
>> but not seen because not overloaded. Or if it the problem comes when
>> user starts to load the system
> Here is the dump before stress is started:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy

This one is ok.
The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic
#12~lp1627108Commit3d30544Reverted
so this is without the culprit commit

>
> Here it is after:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy
>

This one has the exact same odds values for system.slice->tg_load_avg
than the 1st dump that you sent yesterday
The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu
So this dump has been done with a different kernel than for the dump above.
As I can't find any stress task in the dump, i tend to believe that
the dump has been done before starting the stress tasks and not after
starting them. Can you confirm ?

If i'm right, it mean that the problem was already there before
starting stress tasks.


>
>>
>> Thanks,
>>
>>> Ingo, the latest scheduler bits also still exhibit the bug:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>>>
>>>
>

  reply	other threads:[~2016-10-13 16:55 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07 19:38 [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Joseph Salisbury
2016-10-07 19:57 ` Linus Torvalds
2016-10-07 20:22   ` Joseph Salisbury
2016-10-07 20:37     ` Linus Torvalds
2016-10-08  8:00 ` Peter Zijlstra
2016-10-08  8:39   ` Ingo Molnar
2016-10-08 11:37     ` Vincent Guittot
2016-10-08 11:49       ` Mike Galbraith
2016-10-12 12:20         ` Vincent Guittot
2016-10-12 15:35           ` Joseph Salisbury
2016-10-12 16:21           ` Joseph Salisbury
2016-10-13 10:58             ` Vincent Guittot
2016-10-13 15:52               ` Joseph Salisbury
2016-10-13 16:48                 ` Vincent Guittot [this message]
2016-10-13 18:49                   ` Dietmar Eggemann
2016-10-13 21:34                     ` Vincent Guittot
2016-10-14  8:24                       ` Vincent Guittot
2016-10-14 13:10                         ` Dietmar Eggemann
2016-10-14 15:18                           ` Vincent Guittot
2016-10-14 16:04                             ` Joseph Salisbury
2016-10-17  9:09                               ` Vincent Guittot
2016-10-17 11:49                                 ` Dietmar Eggemann
2016-10-17 13:19                                   ` Peter Zijlstra
2016-10-17 13:54                                     ` Vincent Guittot
2016-10-17 22:52                                       ` Dietmar Eggemann
2016-10-18  8:43                                         ` Vincent Guittot
2016-10-18  9:07                                         ` Peter Zijlstra
2016-10-18  9:45                                           ` Vincent Guittot
2016-10-18 10:34                                             ` Peter Zijlstra
2016-10-18 11:56                                               ` Vincent Guittot
2016-10-18 21:58                                                 ` Joonwoo Park
2016-10-19  6:42                                                   ` Vincent Guittot
2016-10-19  9:46                                                 ` Dietmar Eggemann
2016-10-19 11:25                                                   ` Vincent Guittot
2016-10-19 15:33                                                     ` Dietmar Eggemann
2016-10-19 17:33                                                       ` Joonwoo Park
2016-10-19 17:50                                                       ` Vincent Guittot
2016-10-19 11:33                                                 ` Peter Zijlstra
2016-10-19 11:50                                                   ` Vincent Guittot
2016-10-19 13:30                                                 ` Morten Rasmussen
2016-10-19 17:41                                                   ` Vincent Guittot
2016-10-20  7:56                                                     ` Morten Rasmussen
2016-10-19 14:49                                                 ` Joseph Salisbury
2016-10-19 14:53                                                   ` Vincent Guittot
2016-10-18 11:15                                           ` Dietmar Eggemann
2016-10-18 12:07                                             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtDU+DmF4iLwHWF2jEnZCjPUXBOu9c6Zqh4+kQPMio39nQ@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=efault@gmx.de \
    --cc=joseph.salisbury@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.