From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5B0C433ED for ; Mon, 3 May 2021 11:38:48 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A3E061244 for ; Mon, 3 May 2021 11:38:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A3E061244 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:List-Owner; bh=LsgC+txvvnKP+i5LqUek+RXq93+uIkNd52k0WaF89m8=; b=c9FvZjfEuSuczhdz2Gt35Kkw/ 4FH0RGcEfGk0fOTFGWEcuKF5WIj5qVg/3DdlSQrkmzlqklVwZosKmWsf25CbilDq5j5a6qxvEkm07 fq0NgP+VWFXI1a0lgz6uMNtSpSm8Jw8oaFsZEl8a/VTTxOnlcfLKLwPVRziFpTChGU88E5WHYR48a IQ3Peg7AHdgEQLvYxN73KjATYbxWupe+ao9vAkDR1clApdRo61XBW//S2s2PszEaYB2KXHrWBrnJJ +79Q1bwvO02VtS/VryLg/c1f0V7rMtR3Q5QNVtHbmXh5/bkpzUMSwLO+VaO7McglX/NX/5DW64Y/H jBVRl9+Cg==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1ldWry-00DlL3-BL; Mon, 03 May 2021 11:36:12 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1ldWrM-00DlIv-MR for linux-arm-kernel@desiato.infradead.org; Mon, 03 May 2021 11:35:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=MIME-Version: Content-Transfer-Encoding:Content-Type:References:Message-ID:Date:Subject:CC: To:From:Sender:Reply-To:Content-ID:Content-Description:In-Reply-To; bh=u4RK0ibjcEnkY3bUOKdqruG3Gn+F/C/dKyApK1ekQGY=; b=Y/wljKiQQg/Xgt2fZwbugifgeO 4eJA5ZoIAcTgWfvUWFYHGKnp79x1GzlfwYgH6KEYrE9pSfUCzAHAUf0VEWOGmZ89Gv4sZJ5ltqQ/Z CvZv1RAOH8ntsBpM8yxjJ2hTcXKxTLRs04adjP6WGs5loUQOKCIFUS2h+7TPtC4O5hwf5hRa55Why RU7fxDq8/qbbyNLnrO2K90Ft0aLTe+IlPudxXiafpi/NLyUa3ez2udX4d5y9xEXhhxQRorOqtqL5I 37sWLrT26w2SBkv8vqPS8QEtjyGQEm7GdrWsdd0ZczxnftwZRPVVv8uMxPuPzdGZ8octkgFWUm7xA we5WKO6A==; Received: from frasgout.his.huawei.com ([185.176.79.56]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1ldWrH-00315X-NJ for linux-arm-kernel@lists.infradead.org; Mon, 03 May 2021 11:35:30 +0000 Received: from fraeml744-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4FYgjr62kBz6wlP8; Mon, 3 May 2021 19:29:36 +0800 (CST) Received: from lhreml712-chm.china.huawei.com (10.201.108.63) by fraeml744-chm.china.huawei.com (10.206.15.225) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 3 May 2021 13:35:22 +0200 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by lhreml712-chm.china.huawei.com (10.201.108.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Mon, 3 May 2021 12:35:20 +0100 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012; Mon, 3 May 2021 19:35:18 +0800 From: "Song Bao Hua (Barry Song)" To: Dietmar Eggemann , Vincent Guittot CC: "tim.c.chen@linux.intel.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "rjw@rjwysocki.net" , "bp@alien8.de" , "tglx@linutronix.de" , "mingo@redhat.com" , "lenb@kernel.org" , "peterz@infradead.org" , "rostedt@goodmis.org" , "bsegall@google.com" , "mgorman@suse.de" , "msys.mizuma@gmail.com" , "valentin.schneider@arm.com" , "gregkh@linuxfoundation.org" , Jonathan Cameron , "juri.lelli@redhat.com" , "mark.rutland@arm.com" , "sudeep.holla@arm.com" , "aubrey.li@linux.intel.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "x86@kernel.org" , "xuwei (O)" , "Zengtao (B)" , "guodong.xu@linaro.org" , yangyicong , "Liguozhu (Kenneth)" , "linuxarm@openeuler.org" , "hpa@zytor.com" Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC Thread-Topic: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC Thread-Index: AQHXPa2htgkN1X7dCEatlQZ1LQ756arRBrBQgACVCdA= Date: Mon, 3 May 2021 11:35:18 +0000 Message-ID: <4d1f063504b1420c9f836d1f1a7f8e77@hisilicon.com> References: <20210420001844.9116-1-song.bao.hua@hisilicon.com> <20210420001844.9116-4-song.bao.hua@hisilicon.com> <80f489f9-8c88-95d8-8241-f0cfd2c2ac66@arm.com> <8b5277d9-e367-566d-6bd1-44ac78d21d3f@arm.com> <185746c4d02a485ca8f3509439328b26@hisilicon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.203.132] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210503_043528_064416_8CF1E245 X-CRM114-Status: GOOD ( 34.58 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org > -----Original Message----- > From: Song Bao Hua (Barry Song) > Sent: Monday, May 3, 2021 6:12 PM > To: 'Dietmar Eggemann' ; Vincent Guittot > > Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org; > rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com; > lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org; > bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com; > valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron > ; juri.lelli@redhat.com; mark.rutland@arm.com; > sudeep.holla@arm.com; aubrey.li@linux.intel.com; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; > linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) ; > Zengtao (B) ; guodong.xu@linaro.org; yangyicong > ; Liguozhu (Kenneth) ; > linuxarm@openeuler.org; hpa@zytor.com > Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks > within one LLC > > > > > -----Original Message----- > > From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com] > > Sent: Friday, April 30, 2021 10:43 PM > > To: Song Bao Hua (Barry Song) ; Vincent Guittot > > > > Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org; > > rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com; > > lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org; > > bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com; > > valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron > > ; juri.lelli@redhat.com; > mark.rutland@arm.com; > > sudeep.holla@arm.com; aubrey.li@linux.intel.com; > > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; > > linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) ; > > Zengtao (B) ; guodong.xu@linaro.org; yangyicong > > ; Liguozhu (Kenneth) ; > > linuxarm@openeuler.org; hpa@zytor.com > > Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks > > within one LLC > > > > On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote: > > > > > > > > >> -----Original Message----- > > >> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com] > > > > [...] > > > > >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com] > > >> > > >> [...] > > >> > > >>>>> On 20/04/2021 02:18, Barry Song wrote: > > > > [...] > > > > > Though we will never go to slow path, wake_wide() will affect want_affine, > > > so eventually affect the "new_cpu"? > > > > yes. > > > > > > > > for_each_domain(cpu, tmp) { > > > /* > > > * If both 'cpu' and 'prev_cpu' are part of this domain, > > > * cpu is a valid SD_WAKE_AFFINE target. > > > */ > > > if (want_affine && (tmp->flags & SD_WAKE_AFFINE) && > > > cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) { > > > if (cpu != prev_cpu) > > > new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync); > > > > > > sd = NULL; /* Prefer wake_affine over balance flags */ > > > break; > > > } > > > > > > if (tmp->flags & sd_flag) > > > sd = tmp; > > > else if (!want_affine) > > > break; > > > } > > > > > > If wake_affine is false, the above won't execute, new_cpu(target) will > > > always be "prev_cpu"? so when task size > cluster size in wake_wide(), > > > this means we won't pull the wakee to the cluster of waker? It seems > > > sensible. > > > > What is `task size` here? > > > > The criterion is `!(slave < factor || master < slave * factor)` or > > `slave >= factor && master >= slave * factor` to wake wide. > > > > Yes. For "task size", I actually mean a bundle of waker-wakee tasks > which can make "slave >= factor && master >= slave * factor" either > true or false, then change the target cpu where we are going to scan > from. > Now since I have moved to cluster level when tasks have been in same > LLC level, it seems it would be more sensible to use "cluster_size" as > factor? > > > I see that since you effectively change the sched domain size from LLC > > to CLUSTER (e.g. 24->6) for wakeups with cpu and prev_cpu sharing LLC > > (hence the `numactl -N 0` in your workload), wake_wide() has to take > > CLUSTER size into consideration. > > > > I was wondering if you saw wake_wide() returning 1 with your use cases: > > > > numactl -N 0 /usr/lib/lmbench/bin/stream -P [6,12] -M 1024M -N 5 > > I couldn't make wake_wide return 1 by the above stream command. > And I can't reproduce it by a 1:1(monogamous) hackbench "-f 1". > > But I am able to reproduce this issue by a M:N hackbench, for example: > > numactl -N 0 hackbench -p -T -f 10 -l 20000 -g 1 > > hackbench will create 10 senders which will send messages to 10 > receivers. (Each sender can send messages to all 10 receivers.) > > I've often seen flips like: > waker wakee > 1501 39 > 1509 17 > 11 1320 > 13 2016 > > 11, 13, 17 is smaller than LLC but larger than cluster. So the wake_wide() > using cluster factor will return 1, on the other hand, if we always use > llc_size as factor, it will return 0. > > However, it seems the change in wake_wide() could bring some negative > influence to M:N relationship(-f 10) according to tests made today by: > > numactl -N 0 hackbench -p -T -f 10 -l 20000 -g $1 > > g = 1 2 3 4 > cluster_size 0.5768 0.6578 0.8117 1.0119 > LLC_size 0.5479 0.6162 0.6922 0.7754 > > Always using llc_size as factor in wake_wide still shows better result > in the 10:10 polygamous hackbench. > > So it seems the `slave >= factor && master >= slave * factor` isn't > a suitable criterion for cluster size? On the other hand, according to "sched: Implement smarter wake-affine logic" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419 Proper factor in wake_wide is mainly beneficial of 1:n tasks like postgresql/pgbench. So using the smaller cluster size as factor might help make wake_affine false so improve pgbench. >From the commit log, while clients = 2*cpus, the commit made the biggest improvement. In my case, It should be clients=48 for a machine whose LLC size is 24. In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench" under two different scenarios: 1. page cache always hit, so no real I/O for database read 2. echo 3 > /proc/sys/vm/drop_caches For case 1, using cluster_size and using llc_size will result in similar tps= ~108000, all of 24 cpus have 100% cpu utilization. For case 2, using llc_size still shows better performance. tps for each test round(cluster size as factor in wake_wide): 1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294 1539.877146 avg tps = 1464 tps for each test round(llc size as factor in wake_wide): 1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814 1876.802168 avg tps = 1641 (+12%) so it seems using cluster_size as factor in "slave >= factor && master >= slave * factor" isn't a good choice for my machine at least. Thanks Barry _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel