From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753097AbeDIRdq (ORCPT ); Mon, 9 Apr 2018 13:33:46 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:55942 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751778AbeDIRdo (ORCPT ); Mon, 9 Apr 2018 13:33:44 -0400 X-Google-Smtp-Source: AIpwx48FKgxt9x05asNDUWh2TaPime2TjPvXxM2AbugSFCutkuqSw6oJ3ee5UlhsFiSsKSDveYLIsA== Subject: Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle" To: Vincent Guittot Cc: Dietmar Eggemann , "Peter Zijlstra (Intel)" , Ingo Molnar , Linux Kernel Mailing List , "Rafael J. Wysocki" References: <20180324064627.GA10884@linaro.org> <6b151c56-9ead-7bbe-d1b7-b0e6d69c0d7f@arm.com> <796bbd7b-8512-7370-28c9-0f082dc3f287@gmail.com> <95fe50be-cc8b-4345-7333-cdf656fad2a7@gmail.com> From: Heiner Kallweit Message-ID: Date: Mon, 9 Apr 2018 19:33:32 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 06.04.2018 um 18:03 schrieb Vincent Guittot: > Hi Heiner, > > On 30 March 2018 at 10:37, Heiner Kallweit wrote: >> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >>> On 29 March 2018 at 19:40, Heiner Kallweit wrote: >>>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >>> >>>>> >>>>> I'm finally not so sure that i have the right set up to reproduce the >>>>> problem as I haven't been able to reproduce it since. >>>>> >>>>> Heiner, >>>>> >>>>> How fast the problem happens on your board ? >>>>> Are you doing anything specific on the console that trigger the problem ? >>>>> >>>> Hi Vincent, >>>> >>>> the lag when working on the console is constantly there, the "rcu_preempt >>>> detected stalls" happens after several hours (so far always within 24h) >>>> w/o any triggering event I would be aware of. It occured also when the >>>> system was idle at that point in time. >>> >>> Ok, so I don't have the problem on my hikey as the console never lag >>> on my setup. >>> >>> Can you send me the config of your kernel ? I'd like to check if you >>> have enable something that could trigger such problem >>> >> Sure, he we go. I also add a system log. > > Thanks for the config. I have used it for my setup but I can't > reproduce your regression. My platforms stay stable so I probably > missing something. Are you facing similar problem with other platforms > or only this celeron based platform ? > > I have reviewed the code but don't see any obvious place in the patch > that can generate the problem. Nevertheless, would you mind to try the > patch below ? It's a blind test to try to narrow the problem. > > Thanks > Hi Vincent, I tried again with today's linux-next and it's much better. The lag isn't completely gone but it's much less annoying. Every ~30 secs the console hangs for about half a second, that's much less frequent than before. I saw some patches from Rafael have been merged in the last days. Maybe they improved the situation. Regards, Heiner > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..e9835f2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, > struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > if (sd) > update_next_balance(sd, &next_balance); > - rcu_read_unlock(); > > nohz_newidle_balance(this_rq); > + rcu_read_unlock(); > > goto out; > } >