From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10F52C433E6 for ; Tue, 19 Jan 2021 23:58:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E185620A8B for ; Tue, 19 Jan 2021 23:58:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730824AbhASX6B (ORCPT ); Tue, 19 Jan 2021 18:58:01 -0500 Received: from foss.arm.com ([217.140.110.172]:60096 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2395011AbhASNzU (ORCPT ); Tue, 19 Jan 2021 08:55:20 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BC465D6E; Tue, 19 Jan 2021 05:54:33 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D669F3F719; Tue, 19 Jan 2021 05:54:32 -0800 (PST) From: Valentin Schneider To: Vincent Guittot , Qais Yousef Cc: "Peter Zijlstra \(Intel\)" , Dietmar Eggemann , linux-kernel , Morten Rasmussen Subject: Re: [PATCH] sched/eas: Don't update misfit status if the task is pinned In-Reply-To: References: <20210119120755.2425264-1-qais.yousef@arm.com> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Tue, 19 Jan 2021 13:54:30 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/01/21 14:34, Vincent Guittot wrote: > On Tue, 19 Jan 2021 at 13:08, Qais Yousef wrote: >> >> If the task is pinned to a cpu, setting the misfit status means that >> we'll unnecessarily continuously attempt to migrate the task but fail. >> >> This continuous failure will cause the balance_interval to increase to >> a high value, and eventually cause unnecessary significant delays in >> balancing the system when real imbalance happens. >> >> Caught while testing uclamp where rt-app calibration loop was pinned to >> cpu 0, shortly after which we spawn another task with high util_clamp >> value. The task was failing to migrate after over 40ms of runtime due to >> balance_interval unnecessary expanded to a very high value from the >> calibration loop. >> >> Not done here, but it could be useful to extend the check for pinning to >> verify that the affinity of the task has a cpu that fits. We could end >> up in a similar situation otherwise. >> >> Fixes: 3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type") >> Signed-off-by: Qais Yousef >> --- >> kernel/sched/fair.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 197a51473e0c..9379a481dd8c 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -4060,7 +4060,7 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) >> if (!static_branch_unlikely(&sched_asym_cpucapacity)) >> return; >> >> - if (!p) { >> + if (!p || p->nr_cpus_allowed == 1) { > > Side question: What happens if there is 2 misfit tasks and the current > one is pinned but not the other waiting one > update_misfit_status() is called either on the current task (at tick) or on the task picked by pick_next_task_fair() - i.e. CFS current or about-to-be-current. So if you have 2 CPU hogs enqueued on a single LITTLE, and one of them is pinned, the other one will be moved away either via regular load balance, or via misfit balance sometime after it's picked as the next task to run. Admittedly that second case suffers from unfortunate timing mostly related to the load balance interval. There was an old patch in the Android stack that would reduce the balance interval upon detecting a misfit task to "accelerate" its upmigration; this might need to be revisited... >> rq->misfit_task_load = 0; >> return; >> } >> -- >> 2.25.1 >>