From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21E34C4338F for ; Mon, 2 Aug 2021 10:52:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 05E9260F9E for ; Mon, 2 Aug 2021 10:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233303AbhHBKwP convert rfc822-to-8bit (ORCPT ); Mon, 2 Aug 2021 06:52:15 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:7915 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233176AbhHBKwO (ORCPT ); Mon, 2 Aug 2021 06:52:14 -0400 Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4GdZV40Yttz82jX; Mon, 2 Aug 2021 18:48:12 +0800 (CST) Received: from dggema721-chm.china.huawei.com (10.3.20.85) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Mon, 2 Aug 2021 18:52:01 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggema721-chm.china.huawei.com (10.3.20.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Mon, 2 Aug 2021 18:52:01 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012; Mon, 2 Aug 2021 18:52:01 +0800 From: "Song Bao Hua (Barry Song)" To: Mel Gorman , LKML CC: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Valentin Schneider , Aubrey Li , yangyicong Subject: RE: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning for an idle core Thread-Topic: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning for an idle core Thread-Index: AQHXgghk3nTW1qcKWUGNOvwzGAgDfKtgFAig Date: Mon, 2 Aug 2021 10:52:01 +0000 Message-ID: <58167022b9074ed9951b09ab6ba1983e@hisilicon.com> References: <20210726102247.21437-1-mgorman@techsingularity.net> <20210726102247.21437-8-mgorman@techsingularity.net> In-Reply-To: <20210726102247.21437-8-mgorman@techsingularity.net> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.201.55] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Mel Gorman [mailto:mgorman@techsingularity.net] > Sent: Monday, July 26, 2021 10:23 PM > To: LKML > Cc: Ingo Molnar ; Peter Zijlstra ; > Vincent Guittot ; Valentin Schneider > ; Aubrey Li ; Mel > Gorman > Subject: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning > for an idle core > > When scanning for a single CPU, the scan is limited based on the estimated > average idle time for a domain to reduce the risk that more time is spent > scanning for idle CPUs than we are idle for. > > With SMT, if an idle core is expected to exist there is no scan depth > limits so the scan depth may or may not be related to average idle time. > Unfortunately has_idle_cores can be very inaccurate when workloads are > rapidly entering/exiting idle (e.g. hackbench). > > As the scan depth is now proportional to cores and not CPUs, enforce > SIS_PROP for idle core scans. > > The performance impact of this is variable and is neither a universal > gain nor loss. In some cases, has_idle_cores will be cleared prematurely > because the whole domain was not scanned but has_idle_cores is already > known to be an inaccurate heuristic. There is also additional cost because > time calculations are made even for an idle core scan and the delta is > calculated for both scan successes and failures. Finally, SMT siblings > may be used prematurely due to scan depth limitations. > > On the flip side, scan depth is now consistent for both core and smt > scans. The reduction in scan depth improves performance in some cases > and wakeup latency is reduced in some cases. > > There were few changes identified in the SIS statistics but notably, > "SIS Core Hit" was slightly reduced in tbench as thread counts increased, > presumably due to the core search depth being throttled. > > Signed-off-by: Mel Gorman > --- > kernel/sched/fair.c | 33 +++++++++++++++++++-------------- > 1 file changed, 19 insertions(+), 14 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 20b9255ebf97..b180205e6b25 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6232,7 +6232,7 @@ static int select_idle_cpu(struct task_struct *p, struct > sched_domain *sd, bool > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > > - if (sched_feat(SIS_PROP) && !has_idle_core) { > + if (sched_feat(SIS_PROP)) { > u64 avg_cost, avg_idle, span_avg; > unsigned long now = jiffies; > > @@ -6265,30 +6265,35 @@ static int select_idle_cpu(struct task_struct *p, struct > sched_domain *sd, bool > if (has_idle_core) { > i = select_idle_core(p, cpu, cpus, &idle_cpu); > if ((unsigned int)i < nr_cpumask_bits) > - return i; > + break; > > + nr -= sched_smt_weight; > } else { > - if (!--nr) > - return -1; > idle_cpu = __select_idle_cpu(cpu, p); > if ((unsigned int)idle_cpu < nr_cpumask_bits) > break; > + nr--; > } > + > + if (nr < 0) > + break; > } > > - if (has_idle_core) > - set_idle_cores(target, false); > + if ((unsigned int)idle_cpu < nr_cpumask_bits) { > + if (has_idle_core) > + set_idle_cores(target, false); > For example, if we have 16 cpus(8 SMT2 cores). In case core7 is idle, we only have scanned core0+core1(cpu0-cpu3) and if these two cores are not idle, but here we set has_idle_cores to false while core7 is idle. It seems incorrect. > - if (sched_feat(SIS_PROP) && !has_idle_core) { > - time = cpu_clock(this) - time; > + if (sched_feat(SIS_PROP)) { > + time = cpu_clock(this) - time; > > - /* > - * Account for the scan cost of wakeups against the average > - * idle time. > - */ > - this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time); > + /* > + * Account for the scan cost of wakeups against the average > + * idle time. > + */ > + this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time); > > - update_avg(&this_sd->avg_scan_cost, time); > + update_avg(&this_sd->avg_scan_cost, time); > + } > } > > return idle_cpu; > -- > 2.26.2 Thanks Barry