From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ydQD=KV=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E44A1C2B9F7
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 26 May 2021 12:07:37 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id AD295613B0
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 26 May 2021 12:07:37 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD295613B0
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=hisilicon.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:Message-ID:Date
	:Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:
	List-Owner; bh=zbj1RY1eHGeTHrtjjBqvAvvb7U7NIXLREn1rdD63pqU=; b=bB1B88uw/0eLNB
	ojj78VBzr+tF2qDhyRjiRQ/IZ5/qJDwSEAk1o0gM5BJ6YMK5j6X/LWX5pZSepiq3/APBt5jFz9dRL
	2EtYjPiv6rVBGKmTsn5jOpTbDyXw+mAgashlsXPpzscLdUi7++lQ0zR9rrsuIvp7DHWspMiWr+9oP
	24JVfOphnDJqc6Jwa/yKIGHClPE/NAi61FOt2CrM3NQ66tMP8ne5MfHJuyBhDIEfx4Tmr26fx6iUX
	Psud0Efric8HnMNchW43RMmvdw/UaBeOvZhtX5ODdSfar/73HCANkMYdu4rwXZKnrzchjG61tHa9z
	XBfkuPqWrxCA7bf80Gyg==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1llsH2-00Dnet-T3; Wed, 26 May 2021 12:04:35 +0000
Received: from szxga04-in.huawei.com ([45.249.212.190])
 by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux))
 id 1llqFO-00D1je-H9
 for linux-arm-kernel@lists.infradead.org; Wed, 26 May 2021 09:54:45 +0000
Received: from dggems704-chm.china.huawei.com (unknown [172.30.72.58])
 by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4FqmSG62lHz1BQrx;
 Wed, 26 May 2021 17:51:42 +0800 (CST)
Received: from lhreml712-chm.china.huawei.com (10.201.108.63) by
 dggems704-chm.china.huawei.com (10.3.19.181) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2176.2; Wed, 26 May 2021 17:54:34 +0800
Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by
 lhreml712-chm.china.huawei.com (10.201.108.63) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id
 15.1.2176.2; Wed, 26 May 2021 10:54:30 +0100
Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by
 dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2176.012;
 Wed, 26 May 2021 17:54:29 +0800
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>, Vincent Guittot
 <vincent.guittot@linaro.org>
CC: "tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
 "catalin.marinas@arm.com" <catalin.marinas@arm.com>, "will@kernel.org"
 <will@kernel.org>, "rjw@rjwysocki.net" <rjw@rjwysocki.net>, "bp@alien8.de"
 <bp@alien8.de>, "tglx@linutronix.de" <tglx@linutronix.de>, "mingo@redhat.com"
 <mingo@redhat.com>, "lenb@kernel.org" <lenb@kernel.org>,
 "peterz@infradead.org" <peterz@infradead.org>, "rostedt@goodmis.org"
 <rostedt@goodmis.org>, "bsegall@google.com" <bsegall@google.com>,
 "mgorman@suse.de" <mgorman@suse.de>, "msys.mizuma@gmail.com"
 <msys.mizuma@gmail.com>, "valentin.schneider@arm.com"
 <valentin.schneider@arm.com>, "gregkh@linuxfoundation.org"
 <gregkh@linuxfoundation.org>, Jonathan Cameron <jonathan.cameron@huawei.com>, 
 "juri.lelli@redhat.com" <juri.lelli@redhat.com>, "mark.rutland@arm.com"
 <mark.rutland@arm.com>, "sudeep.holla@arm.com" <sudeep.holla@arm.com>,
 "aubrey.li@linux.intel.com" <aubrey.li@linux.intel.com>,
 "linux-arm-kernel@lists.infradead.org"
 <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org"
 <linux-kernel@vger.kernel.org>, "linux-acpi@vger.kernel.org"
 <linux-acpi@vger.kernel.org>, "x86@kernel.org" <x86@kernel.org>, "xuwei (O)"
 <xuwei5@huawei.com>, "Zengtao (B)" <prime.zeng@hisilicon.com>,
 "guodong.xu@linaro.org" <guodong.xu@linaro.org>, yangyicong
 <yangyicong@huawei.com>, "Liguozhu (Kenneth)" <liguozhu@hisilicon.com>,
 "linuxarm@openeuler.org" <linuxarm@openeuler.org>, "hpa@zytor.com"
 <hpa@zytor.com>
Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
 within one LLC
Thread-Topic: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
 within one LLC
Thread-Index: AQHXPa2htgkN1X7dCEatlQZ1LQ756arRBrBQgACVCdCAAreDAIADr3lwgAjj3ACAExY5IIABpq/Q
Date: Wed, 26 May 2021 09:54:28 +0000
Message-ID: <bbc339cef87e4009b6d56ee37e202daf@hisilicon.com>
References: <20210420001844.9116-1-song.bao.hua@hisilicon.com>
 <20210420001844.9116-4-song.bao.hua@hisilicon.com>
 <80f489f9-8c88-95d8-8241-f0cfd2c2ac66@arm.com>
 <b42c762a287b4360bfa3179a5c7c3e8c@hisilicon.com>
 <CAKfTPtC51eO2mAuW6mHQ-SdznAtfDL3D4UOs4HmnXaPOOCN_cA@mail.gmail.com>
 <8b5277d9-e367-566d-6bd1-44ac78d21d3f@arm.com>
 <185746c4d02a485ca8f3509439328b26@hisilicon.com>
 <d31a65af-d1d5-5fd1-276c-d2318cdba078@arm.com>
 <4d1f063504b1420c9f836d1f1a7f8e77@hisilicon.com>
 <142c7192-cde8-6dbe-bb9d-f0fce21ec959@arm.com>
 <aee3fd353a3a4bfca65aa1b78386f9b5@hisilicon.com>
 <45cce983-79ca-392a-f590-9168da7aefab@arm.com> 
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.126.203.64]
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210526_025442_926364_5ECC7CC9 
X-CRM114-Status: GOOD (  35.28  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org


> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, May 25, 2021 8:07 PM
> To: 'Dietmar Eggemann' <dietmar.eggemann@arm.com>; Vincent Guittot
> <vincent.guittot@linaro.org>
> Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org;
> rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com;
> lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org;
> bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com;
> valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron
> <jonathan.cameron@huawei.com>; juri.lelli@redhat.com; mark.rutland@arm.com;
> sudeep.holla@arm.com; aubrey.li@linux.intel.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) <xuwei5@huawei.com>;
> Zengtao (B) <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> linuxarm@openeuler.org; hpa@zytor.com
> Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> within one LLC
> 
> 
> 
> > -----Original Message-----
> > From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > Sent: Friday, May 14, 2021 12:32 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Vincent Guittot
> > <vincent.guittot@linaro.org>
> > Cc: tim.c.chen@linux.intel.com; catalin.marinas@arm.com; will@kernel.org;
> > rjw@rjwysocki.net; bp@alien8.de; tglx@linutronix.de; mingo@redhat.com;
> > lenb@kernel.org; peterz@infradead.org; rostedt@goodmis.org;
> > bsegall@google.com; mgorman@suse.de; msys.mizuma@gmail.com;
> > valentin.schneider@arm.com; gregkh@linuxfoundation.org; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; juri.lelli@redhat.com;
> mark.rutland@arm.com;
> > sudeep.holla@arm.com; aubrey.li@linux.intel.com;
> > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> > linux-acpi@vger.kernel.org; x86@kernel.org; xuwei (O) <xuwei5@huawei.com>;
> > Zengtao (B) <prime.zeng@hisilicon.com>; guodong.xu@linaro.org; yangyicong
> > <yangyicong@huawei.com>; Liguozhu (Kenneth) <liguozhu@hisilicon.com>;
> > linuxarm@openeuler.org; hpa@zytor.com
> > Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> > within one LLC
> >
> > On 07/05/2021 15:07, Song Bao Hua (Barry Song) wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> >
> > [...]
> >
> > >> On 03/05/2021 13:35, Song Bao Hua (Barry Song) wrote:
> > >>
> > >> [...]
> > >>
> > >>>> From: Song Bao Hua (Barry Song)
> > >>
> > >> [...]
> > >>
> > >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > >>
> > >> [...]
> > >>
> > >>>>> On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > >>>>>
> > >>>>> [...]
> > >>>>>
> > >>>>>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> > >>>>>>>
> > >>>>>>> [...]
> > >>>>>>>
> > >>>>>>>>>> On 20/04/2021 02:18, Barry Song wrote:
> > >>
> > >> [...]
> > >>
> > >>>
> > >>> On the other hand, according to "sched: Implement smarter wake-affine
> logic"
> > >>>
> > >>
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > >> ?id=62470419
> > >>>
> > >>> Proper factor in wake_wide is mainly beneficial of 1:n tasks like
> > >> postgresql/pgbench.
> > >>> So using the smaller cluster size as factor might help make wake_affine
> > false
> > >> so
> > >>> improve pgbench.
> > >>>
> > >>> From the commit log, while clients =  2*cpus, the commit made the biggest
> > >>> improvement. In my case, It should be clients=48 for a machine whose LLC
> > >>> size is 24.
> > >>>
> > >>> In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench"
> > >>> under two different scenarios:
> > >>> 1. page cache always hit, so no real I/O for database read
> > >>> 2. echo 3 > /proc/sys/vm/drop_caches
> > >>>
> > >>> For case 1, using cluster_size and using llc_size will result in similar
> > >>> tps= ~108000, all of 24 cpus have 100% cpu utilization.
> > >>>
> > >>> For case 2, using llc_size still shows better performance.
> > >>>
> > >>> tps for each test round(cluster size as factor in wake_wide):
> > >>> 1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294
> > >> 1539.877146
> > >>> avg tps = 1464
> > >>>
> > >>> tps for each test round(llc size as factor in wake_wide):
> > >>> 1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814
> > >> 1876.802168
> > >>> avg tps = 1641  (+12%)
> > >>>
> > >>> so it seems using cluster_size as factor in "slave >= factor && master >=
> > >> slave *
> > >>> factor" isn't a good choice for my machine at least.
> > >>
> > >> So SD size = 4 (instead of 24) seems to be too small for `-c 48`.
> > >>
> > >> Just curious, have you seen the benefit of using wake wide on SD size =
> > >> 24 (LLC) compared to not using it at all?
> > >
> > > At least in my benchmark made today, I have not seen any benefit to use
> > > llc_size. Always returning 0 in wake_wide() seems to be much better.
> > >
> > > postgres@ubuntu:$pgbench -i pgbench
> > > postgres@pgbench:$ pgbench -T 120 -c 48 pgbench
> > >
> > > using llc_size, it got to 123tps
> > > always returning 0 in wake_wide(), it got to 158tps
> > >
> > > actually, I really couldn't reproduce the performance improvement
> > > the commit "sched: Implement smarter wake-affine logic" mentioned.
> > > on the other hand, the commit log didn't present the pgbench command
> > > parameter used. I guess the benchmark result will highly depend on
> > > the command parameter and disk I/O speed.
> >
> > I see. And it was a way smaller machine (12 CPUs) back then.
> >
> > You could run pgbench via mmtests https://github.com/gormanm/mmtests.
> >
> > I.e the `timed-ro-medium` test.
> >
> > mmtests# ./run-mmtests.sh --config
> > ./configs/config-db-pgbench-timed-ro-medium test_tag
> >
> > /shellpacks/shellpack-bench-pgbench contains all the individual test
> > steps. Something you could use as a template for your pgbench standalone
> > tests as well.
> >
> > I ran this test on an Intel Xeon E5-2690 v2 with 40 CPUs and 64GB of
> > memory on v5.12 vanilla and w/o wakewide.
> > The test uses `scale_factor = 2570` on this machine. I guess this
> > relates to ~41GB? At least this was the size of the:
> 
> Thanks. Dietmar, sorry for slow response. Sick leave for the whole
> last week.
> 
> I feel it makes much more sense to use mmtests which is setting
> scale_factor according to total memory size, thus, considering
> the impact of page cache. And it is also doing database warming-up
> for 30minutes.
> 
> I will get more data and compare three cases:
> 1. use cluster as wake_wide factor
> 2. use llc as wake_wide factor
> 3. always return 0 in wake_wide.
> 
> and post the result afterwards.

I used only a numa node with 24cpus and 60GB memory
(scale factor: 2392) to finish the test. As mentioned
before, each numa node shares one LLC. So waker and
wakee are in same LLC domain.

Basically, the difference is just noise between using
cluster size as factor in wake_wide() and using llc size
as factor for 1/48threads, 8/48threads, 12/48threads,
24/48threads, 32/48threads. But for 48/48threads(system
is busy), using llc size as factor shows 4%+ pgbench
improvement.

                 cluster_as_factor     llc_as_factor
Hmean     1     10779.67 (   0.00%)    10869.27 *   0.83%*
Hmean     8     19595.09 (   0.00%)    19580.59 *  -0.07%*
Hmean     12    29553.06 (   0.00%)    29643.56 *   0.31%*
Hmean     24    43368.55 (   0.00%)    43194.47 *  -0.40%*
Hmean     32    40258.08 (   0.00%)    40163.23 *  -0.24%*
Hmean     48    40450.42 (   0.00%)    42249.29 *   4.45%*

I can further see 14%+ improvement for 48/48threads transactions
case if I totally don't depend on wake_wide(), that is like
wake_wide() always return 0.

                llc_as_factor          don't_use_wake_wide
Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*

I begin to believe wake_wide() is useless while waker and wakee
are already in same LLC. So I sent another patch to address this
generic issue:
[PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC
https://lore.kernel.org/lkml/20210526091057.1800-1-song.bao.hua@hisilicon.com/

> 
> >
> > #mmtests/work/testdisk/data/pgdata directory when the test started.
> >
> >
> > mmtests/work/log# ../../compare-kernels.sh --baseline base --compare
> > wo_wakewide | grep ^Hmean
> >
> >
> >       #clients  v5.12 vanilla          v5.12 w/o wakewide
> >
> > Hmean     1     10903.88 (   0.00%)    10792.59 *  -1.02%*
> > Hmean     6     28480.60 (   0.00%)    27954.97 *  -1.85%*
> > Hmean     12    49197.55 (   0.00%)    47758.16 *  -2.93%*
> > Hmean     22    72902.37 (   0.00%)    71314.01 *  -2.18%*
> > Hmean     30    75468.16 (   0.00%)    75929.17 *   0.61%*
> > Hmean     48    60155.58 (   0.00%)    60471.91 *   0.53%*
> > Hmean     80    62202.38 (   0.00%)    60814.76 *  -2.23%*
> >
> >
> > So there are some improvements w/ wakewide but nothing of the scale
> > showed in the original wakewide patch.
> >
> > I'm not an expert on how to set up these pgbench tests though. So maybe
> > other pgbench related mmtests configs or some more fine-grained tuning
> > can produce bigger diffs?
> 
> Thanks
> Barry

Thanks
Barry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel