From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753236AbeDKQLK (ORCPT ); Wed, 11 Apr 2018 12:11:10 -0400 Received: from mail-it0-f50.google.com ([209.85.214.50]:52113 "EHLO mail-it0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752894AbeDKQLI (ORCPT ); Wed, 11 Apr 2018 12:11:08 -0400 X-Google-Smtp-Source: AIpwx4+nb7BmU4kSO0MviWJyZAQ6oeY7OViJ8mlQNfC4JUZuLR7Vctru63tudjFEUwhTdFusCx5rKwYBNZUXrZ2EYxg= MIME-Version: 1.0 In-Reply-To: <20180411160000.GO4082@hirez.programming.kicks-ass.net> References: <20180406172835.20078-1-patrick.bellasi@arm.com> <20180410110412.GG14248@e110439-lin> <20180411151450.GK4043@hirez.programming.kicks-ass.net> <20180411153710.GN4082@hirez.programming.kicks-ass.net> <20180411160000.GO4082@hirez.programming.kicks-ass.net> From: Vincent Guittot Date: Wed, 11 Apr 2018 18:10:47 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: schedutil: update only with all info available To: Peter Zijlstra Cc: Patrick Bellasi , linux-kernel , "open list:THERMAL" , Ingo Molnar , "Rafael J . Wysocki" , Viresh Kumar , Juri Lelli , Joel Fernandes , Steve Muckle , Dietmar Eggemann , Morten Rasmussen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11 April 2018 at 18:00, Peter Zijlstra wrote: > On Wed, Apr 11, 2018 at 05:41:24PM +0200, Vincent Guittot wrote: >> Yes. and to be honest I don't have any clues of the root cause :-( >> Heiner mentioned that it's much better in latest linux-next but I >> haven't seen any changes related to the code of those patches > > Yeah, it's a bit of a puzzle. Now you touch nohz, and the patches in > next that are most likely to have affected this are rjw's > cpuidle-vs-nohz patches. The common demoninator being nohz. > > Now I think rjw's patches will ensure we enter nohz _less_, they avoid > stopping the tick when we expect to go idle for a short period only. > > So if your patch makes nohz go wobbly, going nohz less will make that > better. > > Of course, I've no actual clue as to what that patch (it's the last one > in the series, right?: > > 31e77c93e432 ("sched/fair: Update blocked load when newly idle") > > ) does that is so offensive to that one machine. You never did manage to > reproduce, right? yes > > Could is be that for some reason the nohz balancer now takes a very long > time to run? Heiner mentions that is was a relatively slow celeron and he uses ondemand governor. So I was about to ask him to use performance governor to see if it can be because cpu runs slow and takes too muche time to enter idle > > Could something like the following happen (and this is really flaky > thinking here): > > last CPU goes idle, we enter idle_balance(), that kicks ilb, ilb runs, > which somehow again triggers idle_balance and around we go? > > I'm not immediately seeing how that could happen, but if we do something > daft like that we can tie up the CPU for a while, mostly with IRQs > disabled, and that would be visible as that latency he sees. > >