From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A499C43334 for ; Wed, 8 Jun 2022 23:34:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233450AbiFHXel (ORCPT ); Wed, 8 Jun 2022 19:34:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232971AbiFHXe2 (ORCPT ); Wed, 8 Jun 2022 19:34:28 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0C7312295F for ; Wed, 8 Jun 2022 16:34:23 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=dtcccc@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VFoWqnQ_1654731259; Received: from localhost.localdomain(mailfrom:dtcccc@linux.alibaba.com fp:SMTPD_---0VFoWqnQ_1654731259) by smtp.aliyun-inc.com; Thu, 09 Jun 2022 07:34:20 +0800 From: Tianchen Ding To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: linux-kernel@vger.kernel.org Subject: [PATCH RESEND v4 1/2] sched: Fix the check of nr_running at queue wakelist Date: Thu, 9 Jun 2022 07:34:11 +0800 Message-Id: <20220608233412.327341-2-dtcccc@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220608233412.327341-1-dtcccc@linux.alibaba.com> References: <20220608233412.327341-1-dtcccc@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The commit 2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling") checked rq->nr_running <= 1 to avoid task stacking when WF_ON_CPU. Per the ordering of writes to p->on_rq and p->on_cpu, observing p->on_cpu (WF_ON_CPU) in ttwu_queue_cond() implies !p->on_rq, IOW p has gone through the deactivate_task() in __schedule(), thus p has been accounted out of rq->nr_running. As such, the task being the only runnable task on the rq implies reading rq->nr_running == 0 at that point. The benchmark result is in [1]. [1] https://lore.kernel.org/all/e34de686-4e85-bde1-9f3c-9bbc86b38627@linux.alibaba.com/ Suggested-by: Valentin Schneider Signed-off-by: Tianchen Ding Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 21db6816a7bd..a4bdb2b95976 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3829,8 +3829,12 @@ static inline bool ttwu_queue_cond(int cpu, int wake_flags) * CPU then use the wakelist to offload the task activation to * the soon-to-be-idle CPU as the current CPU is likely busy. * nr_running is checked to avoid unnecessary task stacking. + * + * Note that we can only get here with (wakee) p->on_rq=0, + * p->on_cpu can be whatever, we've done the dequeue, so + * the wakee has been accounted out of ->nr_running. */ - if ((wake_flags & WF_ON_CPU) && cpu_rq(cpu)->nr_running <= 1) + if ((wake_flags & WF_ON_CPU) && !cpu_rq(cpu)->nr_running) return true; return false; -- 2.27.0