From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756426AbcJNIYm (ORCPT ); Fri, 14 Oct 2016 04:24:42 -0400 Received: from mail-lf0-f54.google.com ([209.85.215.54]:34854 "EHLO mail-lf0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751605AbcJNIYe (ORCPT ); Fri, 14 Oct 2016 04:24:34 -0400 MIME-Version: 1.0 In-Reply-To: References: <57F7F9AF.2010609@canonical.com> <20161008080019.GE3142@twins.programming.kicks-ass.net> <20161008083936.GA13658@gmail.com> <1475927349.5573.1.camel@gmx.de> <57FE6302.6060103@canonical.com> <57FFADC8.2020602@canonical.com> <43c59cba-2044-1de2-0f78-8f346bd1e3cb@arm.com> From: Vincent Guittot Date: Fri, 14 Oct 2016 10:24:09 +0200 Message-ID: Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes To: Dietmar Eggemann Cc: Joseph Salisbury , Ingo Molnar , Peter Zijlstra , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13 October 2016 at 23:34, Vincent Guittot wrote: > On 13 October 2016 at 20:49, Dietmar Eggemann wrote: >> On 13/10/16 17:48, Vincent Guittot wrote: >>> On 13 October 2016 at 17:52, Joseph Salisbury >>> wrote: >>>> On 10/13/2016 06:58 AM, Vincent Guittot wrote: >>>>> Hi, >>>>> >>>>> On 12 October 2016 at 18:21, Joseph Salisbury >>>>> wrote: >>>>>> On 10/12/2016 08:20 AM, Vincent Guittot wrote: >>>>>>> On 8 October 2016 at 13:49, Mike Galbraith wrote: >>>>>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote: >>>>>>>>> On 8 October 2016 at 10:39, Ingo Molnar wrote: >>>>>>>>>> * Peter Zijlstra wrote: >>>>>>>>>> >>>>>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote: >>>>>>>>>>>> Hello Peter, >>>>>>>>>>>> >>>>>>>>>>>> A kernel bug report was opened against Ubuntu [0]. After a >>>>>>>>>>>> kernel >>>>>>>>>>>> bisect, it was found that reverting the following commit >>>>>>>>>>>> resolved this bug: >>>>>>>>>>>> >>>>>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5 >>>>>>>>>>>> Author: Peter Zijlstra >>>>>>>>>>>> Date: Tue Jun 21 14:27:50 2016 +0200 >>>>>>>>>>>> >>>>>>>>>>>> sched/fair: Apply more PELT fixes >>>>>>>>> This patch only speeds up the update of task group load in order to >>>>>>>>> reflect the new load balance but It should not change the final value >>>>>>>>> and as a result the final behavior. I will try to reproduce it in my >>>>>>>>> target later today >>>>>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master. >>>>>>> Me too >>>>>>> >>>>>>> Is it possible to get some dump of /proc/sched_debug while the problem occurs ? >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>>> -Mike >>>>>> The output from /proc/shed_debug can be seen here: >>>>>> http://paste.ubuntu.com/23312351/ >>>>> I have looked at the dump and there is something very odd for >>>>> system.slice task group where the display manager is running. >>>>> system.slice->tg_load_avg is around 381697 but tg_load_avg is >>>>> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib >>>>> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our >>>>> case. We can have some differences because the dump of >>>>> /proc/shed_debug is not atomic and some changes can happen but nothing >>>>> like this difference. >>>>> >>>>> The main effect of this quite high value is that the weight/prio of >>>>> the sched_entity that represents system.slice in root cfs_rq is very >>>>> low (lower than task with the smallest nice prio) so the system.slice >>>>> task group will not get the CPU quite often compared to the user.slice >>>>> task group: less than 1% for the system.slice where lightDM and xorg >>>>> are running compared 99% for the user.slice where the stress tasks are >>>>> running. This is confirmed by the se->avg.util_avg value of the task >>>>> groups which reflect how much time each task group is effectively >>>>> running on a CPU: >>>>> system.slice[CPU3].se->avg.util_avg = 8 whereas >>>>> user.slice[CPU3].se->avg.util_avg = 991 >>>>> >>>>> This difference of weight/priority explains why the system becomes >>>>> unresponsive. For now, I can't explain is why >>>>> system.slice->tg_load_avg = 381697 whereas is should be around 1013 >>>>> and how the patch can generate this situation. >>>>> >>>>> Is it possible to have a dump of /proc/sched_debug before starting >>>>> stress command ? to check if the problem is there from the beginning >>>>> but not seen because not overloaded. Or if it the problem comes when >>>>> user starts to load the system >>>> Here is the dump before stress is started: >>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy >>> >>> This one is ok. >>> The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic >>> #12~lp1627108Commit3d30544Reverted >>> so this is without the culprit commit >>> >>>> >>>> Here it is after: >>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy >>>> >>> >>> This one has the exact same odds values for system.slice->tg_load_avg >>> than the 1st dump that you sent yesterday >>> The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu >>> So this dump has been done with a different kernel than for the dump above. >>> As I can't find any stress task in the dump, i tend to believe that >>> the dump has been done before starting the stress tasks and not after >>> starting them. Can you confirm ? >>> >>> If i'm right, it mean that the problem was already there before >>> starting stress tasks. >> >> Could it be a problem I'm also seeing on my ARM64 Juno (6 logical cpus) w/o systemd >> and w/o autogroup (tip/sched/core 447976ef4fd0): >> >> When I create a tg_root/tg_x/tg_y_1 and a tg_root/tg_x/tg_y_2 group, the tg_x->load_avg >> becomes > 6*1024 before any tasks ran in it. > > This is normal as se->avg.load_avg is initialized to > scale_load_down(se->load.weight) and this se->avg.load_avg will be > added to tg_x[cpu]->cfs_rq->avg.load_avg when attached to the cfs_rq > >> >> tg_x : 0xffff800975800d80 >> tg_y_1 : 0xffff800975800c00 >> tg_y_2 : 0xffff80097543d200 >> >> mkdir-2177 [002] 117.235241: bprint: sched_online_group: tg=0xffff800975800d80 tg->parent=0xffff000008fd0300 >> mkdir-2177 [002] 117.235244: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=0 >> mkdir-2177 [002] 117.235247: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=1 >> mkdir-2177 [002] 117.235249: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=2 >> mkdir-2177 [002] 117.235251: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=3 >> mkdir-2177 [002] 117.235253: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=4 >> mkdir-2177 [002] 117.235255: bprint: online_fair_sched_group: tg=0xffff800975800d80 cpu=5 >> >> mkdir-2181 [002] 117.353155: bprint: sched_online_group: tg=0xffff800975800c00 tg->parent=0xffff800975800d80 >> mkdir-2181 [002] 117.353158: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=0 >> mkdir-2181 [002] 117.353162: bprint: post_init_entity_util_avg: cpu=0 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=0 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> mkdir-2181 [002] 117.353164: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=1 >> mkdir-2181 [002] 117.353167: bprint: post_init_entity_util_avg: cpu=1 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=1024 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> mkdir-2181 [002] 117.353168: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=2 >> mkdir-2181 [002] 117.353171: bprint: post_init_entity_util_avg: cpu=2 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=2048 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> mkdir-2181 [002] 117.353173: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=3 >> mkdir-2181 [002] 117.353175: bprint: post_init_entity_util_avg: cpu=3 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=3072 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> mkdir-2181 [002] 117.353177: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=4 >> mkdir-2181 [002] 117.353179: bprint: post_init_entity_util_avg: cpu=4 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=4096 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> mkdir-2181 [002] 117.353180: bprint: online_fair_sched_group: tg=0xffff800975800c00 cpu=5 >> mkdir-2181 [002] 117.353183: bprint: post_init_entity_util_avg: cpu=5 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=5120 cfs_rq->avg.load_avg=1024 cfs_rq->tg_load_avg_contrib=0 delta=1024 >> > > This looks coherent as tg_x->load_avg = 5120 == sum of > cfs_rq[cpu]->load_avg and cfs_rq->tg_load_avg_contrib will be set to > cfs_rq[cpu]->load_avg at the end of the function call > In the buggy case, tg->load_avg = 381697 but sum of > cfs_rq[cpu]->tg_load_avg_contrib = 1013 = cfs_rq[cpu]->avg.load_avg so > there is no way to remove the 381697 > >> mkdir-2185 [001] 117.502980: bprint: sched_online_group: tg=0xffff80097543d200 tg->parent=0xffff800975800d80 >> mkdir-2185 [001] 117.502982: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=0 >> mkdir-2185 [001] 117.502987: bprint: post_init_entity_util_avg: cpu=0 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6144 cfs_rq->avg.load_avg=1068 cfs_rq->tg_load_avg_contrib=1024 delta=44 >> mkdir-2185 [001] 117.502988: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=1 >> mkdir-2185 [001] 117.502992: bprint: post_init_entity_util_avg: cpu=1 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6188 cfs_rq->avg.load_avg=1058 cfs_rq->tg_load_avg_contrib=1024 delta=34 >> mkdir-2185 [001] 117.502993: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=2 >> mkdir-2185 [001] 117.502996: bprint: post_init_entity_util_avg: cpu=2 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6222 cfs_rq->avg.load_avg=1092 cfs_rq->tg_load_avg_contrib=1024 delta=68 >> mkdir-2185 [001] 117.502998: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=3 >> mkdir-2185 [001] 117.503001: bprint: post_init_entity_util_avg: cpu=3 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6290 cfs_rq->avg.load_avg=1069 cfs_rq->tg_load_avg_contrib=1024 delta=45 >> mkdir-2185 [001] 117.503002: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=4 >> mkdir-2185 [001] 117.503005: bprint: post_init_entity_util_avg: cpu=4 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=6335 cfs_rq->avg.load_avg=1064 cfs_rq->tg_load_avg_contrib=1024 delta=40 >> mkdir-2185 [001] 117.503006: bprint: online_fair_sched_group: tg=0xffff80097543d200 cpu=5 >> mkdir-2185 [001] 117.503009: bprint: post_init_entity_util_avg: cpu=5 tg=0xffff800975800d80 tg_css_id=2 cfs_rq->tg->load_avg=*6375* cfs_rq->avg.load_avg=1061 cfs_rq->tg_load_avg_contrib=1024 delta=37 >> >> We add delta=1024 to tg_x->load_avg during tg_root/tg_x/tg_y_1 initialization but only >> delta=~40 for tg_root/tg_x/tg_y_2. > > IMO, this comes from the decay of tg_y_2->se[cpu]->avg.load_avg before > being attached to tg_x > > But I don't think that it's the same issue as the tg_x->load_avg is > still coherent with the sum of cfs_rq->avg.load_avg and the sume of > cfs_rq->tg_load_avg_contrib > >> >> system.slice->tg_load_avg = 381697 is still pretty much higher and AFAICS, it's an >> i5-5300U CPU so only 4 logical cpus. >> >> How many system.slice/foo's actually exists in this system? I only see user.slice > > There is several dozen of childs in system.slice but all but all child have null load_avg and load_avg_contrib > >> related cfs_rq[x]:/autogroups-xxx on my Ubuntu-16.04 Desktop system. >> >> [...]