From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 808F0C433E0 for ; Wed, 1 Jul 2020 23:28:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EA582077D for ; Wed, 1 Jul 2020 23:28:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="T/EQ68wX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726631AbgGAX2u (ORCPT ); Wed, 1 Jul 2020 19:28:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726274AbgGAX2u (ORCPT ); Wed, 1 Jul 2020 19:28:50 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 030C6C08C5C1 for ; Wed, 1 Jul 2020 16:28:49 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id m8so7634177qvk.7 for ; Wed, 01 Jul 2020 16:28:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=w/6vsbQ9eQ9SYZyjzJjo7q2++dtmL7y8qUp3o/IY01o=; b=T/EQ68wXN+j3TU28ZuKOoUhjEFIYaXWHzb4njqW0NqGdPOBA5aEudroRUdiJ8FrnLm sY65BzCM/yuanryZ58uxsKIUzrwSqwtjuf2VsEFWNQSv/AmVYIZooAHnTVRGBhP4I1Na ElhVC000WZYrhABgmHLOgVP6TzoVIFGNCtZBk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=w/6vsbQ9eQ9SYZyjzJjo7q2++dtmL7y8qUp3o/IY01o=; b=IlvNAipApPa4QrLXYIFVXb7gNqN8HR5EnY2/WcrUP29kWYSz+A7uZPvlx4mHPLPCS6 8nXpqS40wq21EsNADwBPkWMel0hPyv81SzZqH9W3H/7PuSZgsgFCOR4JoAiNMtIrWfov nCXt92gXY6o8ZBp3NDzF2esPrOmj0Mv6q/+jHQ+nn+HBFlVfvjYeTb/sMz3BY5cK0SzO F/Vm01cbDgLpRl0dDhvG/kvGkRofUTb0+i28x05eTdEbjhCs/m5XLai/hYId7oNFXW5h hxExl057Hm6WPaCt39j/V+du/sx772bCyy2wKpZArInLg62EU7971E7a/Kuq1JuIBF0k EHuw== X-Gm-Message-State: AOAM532At4ps3Ex9+qmAIu+1zvYuLJx5Temv0GoWtugK9EHvWRdcT4hq /FipQNdyFim16EeKfzNcbyCHZQ== X-Google-Smtp-Source: ABdhPJySc8UKTCwLa3Lxp62I5q63tLwLgEuWArRfWcNItmojl1J1bW93G7TvXwlgr0hD/mXteORW/w== X-Received: by 2002:a0c:a992:: with SMTP id a18mr18028116qvb.211.1593646129011; Wed, 01 Jul 2020 16:28:49 -0700 (PDT) Received: from localhost ([2620:15c:6:12:cad3:ffff:feb3:bd59]) by smtp.gmail.com with ESMTPSA id c7sm7780636qta.95.2020.07.01.16.28.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jul 2020 16:28:48 -0700 (PDT) Date: Wed, 1 Jul 2020 19:28:47 -0400 From: Joel Fernandes To: Vineeth Remanan Pillai , Peter Zijlstra , Tim Chen Cc: Nishanth Aravamudan , Julien Desfossez , tglx@linutronix.de, pjt@google.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, mingo@kernel.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Aaron Lu , Aubrey Li , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , vineethrp@gmail.com, Chen Yu , Christian Brauner , Aaron Lu , paulmck@kernel.org Subject: Re: [RFC PATCH 06/16] sched: Add core wide task selection and scheduling. Message-ID: <20200701232847.GA439212@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 30, 2020 at 09:32:27PM +0000, Vineeth Remanan Pillai wrote: > From: Peter Zijlstra > > Instead of only selecting a local task, select a task for all SMT > siblings for every reschedule on the core (irrespective which logical > CPU does the reschedule). > > There could be races in core scheduler where a CPU is trying to pick > a task for its sibling in core scheduler, when that CPU has just been > offlined. We should not schedule any tasks on the CPU in this case. > Return an idle task in pick_next_task for this situation. > > NOTE: there is still potential for siblings rivalry. > NOTE: this is far too complicated; but thus far I've failed to > simplify it further. > > Signed-off-by: Peter Zijlstra (Intel) > Signed-off-by: Julien Desfossez > Signed-off-by: Vineeth Remanan Pillai > Signed-off-by: Aaron Lu > Signed-off-by: Tim Chen Hi Peter, Tim, all, the below patch fixes the hotplug issue described in the below patch's Link tag. Patch description below describes the issues fixed and it applies on top of this patch. ------8<---------- From: "Joel Fernandes (Google)" Subject: [PATCH] sched: Fix CPU hotplug causing crashes in task selection logic The selection logic does not run correctly if the current CPU is not in the cpu_smt_mask (which it is not because the CPU is offlined when the stopper finishes running and needs to switch to idle). There are also other issues fixed by the patch I think such as: if some other sibling set core_pick to something, however the selection logic on current cpu resets it before selecting. In this case, we need to run the task selection logic again to make sure it picks something if there is something to run. It might end up picking the wrong task. Yet another issue was, if the stopper thread is an unconstrained pick, then rq->core_pick is set. The next time task selection logic runs when stopper needs to switch to idle, the current CPU is not in the smt_mask. This causes the previous ->core_pick to be picked again which happens to be the unconstrained task! so the stopper keeps getting selected forever. That and there are a few more safe guards and checks around checking/setting rq->core_pick. To test it, I ran rcutorture and made it tag all torture threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit the issue. Now it runs for an hour or so without issue. (Torture testing debug changes: https://bit.ly/38htfqK ). Various fixes were tried causing varying degrees of crashes. Finally I found that it is easiest to just add current CPU to the smt_mask's copy always. This is so that task selection logic always runs on the current CPU which called schedule(). Link: lore.kernel.org/r/20200414133559.GT20730@hirez.programming.kicks-ass.net Reported-by: Tim Chen Reported-by: Vineeth Pillai Signed-off-by: Joel Fernandes (Google) --- kernel/sched/core.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ede86fb37b4e8..a5604aa292e66 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4307,7 +4307,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { struct task_struct *next, *max = NULL; const struct sched_class *class; - const struct cpumask *smt_mask; + struct cpumask select_mask; int i, j, cpu, occ = 0; bool need_sync; @@ -4334,7 +4334,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) finish_prev_task(rq, prev, rf); cpu = cpu_of(rq); - smt_mask = cpu_smt_mask(cpu); + /* Make a copy of cpu_smt_mask as we should not set that. */ + cpumask_copy(&select_mask, cpu_smt_mask(cpu)); + + /* + * Always make sure current CPU is added to smt_mask so that below + * selection logic runs on it. + */ + cpumask_set_cpu(cpu, &select_mask); /* * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq @@ -4351,7 +4358,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) /* reset state */ rq->core->core_cookie = 0UL; - for_each_cpu(i, smt_mask) { + for_each_cpu(i, &select_mask) { struct rq *rq_i = cpu_rq(i); rq_i->core_pick = NULL; @@ -4371,7 +4378,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) */ for_each_class(class) { again: - for_each_cpu_wrap(i, smt_mask, cpu) { + for_each_cpu_wrap(i, &select_mask, cpu) { struct rq *rq_i = cpu_rq(i); struct task_struct *p; @@ -4402,6 +4409,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) */ if (i == cpu && !need_sync && !p->core_cookie) { next = p; + rq_i->core_pick = next; + rq_i->core_sched_seq = rq_i->core->core_pick_seq; goto done; } @@ -4427,7 +4436,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) max = p; if (old_max) { - for_each_cpu(j, smt_mask) { + for_each_cpu(j, &select_mask) { if (j == i) continue; @@ -4452,6 +4461,10 @@ next_class:; rq->core->core_pick_seq = rq->core->core_task_seq; next = rq->core_pick; + + /* Something should have been selected for current CPU */ + WARN_ON_ONCE(!next); + rq->core_sched_seq = rq->core->core_pick_seq; /* @@ -4462,7 +4475,7 @@ next_class:; * their task. This ensures there is no inter-sibling overlap between * non-matching user state. */ - for_each_cpu(i, smt_mask) { + for_each_cpu(i, &select_mask) { struct rq *rq_i = cpu_rq(i); WARN_ON_ONCE(!rq_i->core_pick); @@ -4483,6 +4496,16 @@ next_class:; } done: + /* + * If we reset a sibling's core_pick, make sure that we picked a task + * for it, this is because we might have reset it though it was set to + * something by another selector. In this case we cannot leave it as + * NULL and should have found something for it. + */ + for_each_cpu(i, &select_mask) { + WARN_ON_ONCE(!cpu_rq(i)->core_pick); + } + set_next_task(rq, next); return next; } -- 2.27.0.212.ge8ba1cc988-goog