From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
<linux-kernel@vger.kernel.org>,
<linux-tip-commits@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
<x86@kernel.org>, Gautham Shenoy <gautham.shenoy@amd.com>,
Tim Chen <tim.c.chen@intel.com>
Subject: Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()
Date: Wed, 21 Jun 2023 15:16:15 +0800 [thread overview]
Message-ID: <ZJKjvx/NxooM5z1Y@chenyu5-mobl2.ccr.corp.intel.com> (raw)
In-Reply-To: <20230614151348.GM1639749@hirez.programming.kicks-ass.net>
On 2023-06-14 at 17:13:48 +0200, Peter Zijlstra wrote:
> On Wed, Jun 14, 2023 at 10:58:20PM +0800, Chen Yu wrote:
> > On 2023-06-14 at 10:17:57 +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 13, 2023 at 04:00:39PM +0530, K Prateek Nayak wrote:
> > >
> > > > >> - SIS_NODE_TOPOEXT - tip:sched/core + this patch
> > > > >> + new sched domain (Multi-Multi-Core or MMC)
> > > > >> (https://lore.kernel.org/all/20230601153522.GB559993@hirez.programming.kicks-ass.net/)
> > > > >> MMC domain groups 2 nearby CCX.
> > > > >
> > > > > OK, so you managed to get the NPS4 topology in NPS1 mode?
> > > >
> > > > Yup! But it is a hack. I'll leave the patch at the end.
> > >
> > > Chen Yu, could we do the reverse? Instead of building a bigger LLC
> > > domain, can we split our LLC based on SNC (sub-numa-cluster) topologies?
> > >
> > Hi Peter,
> > Do you mean with SNC enabled, if the LLC domain gets smaller?
> > According to the test, the answer seems to be yes.
>
> No, I mean to build smaller LLC domains even with SNC disabled, as-if
> SNC were active.
>
>
The topology on Sapphire Rapids is that there are 4 memory controllers within
1 package per lstopo result, and the LLCs could have slightly difference distance
to the 4 mc with SNC disabled. Unfortunately there is no interface for the OS
to query this partition. I used a hack to split the LLC into 4 smaller ones
with SNC disabled, according to the topology in SNC4. Then I had a test on this
platform with/withouth this LLC split, both with SIS_NODE enabled and with
this issue fixed[1]. Something like this when iterating the groups in select_idle_node():
if (cpumask_test_cpu(target, sched_group_span(sg)))
continue;
The SIS_NODE should have no impact on non-LLC-split version on
Sapphire Rapids, so the baseline is vanilla+SIS_NODE.
In summary, huge improvement from netperf was observed, but also regression from
hackbench/schbench was observed when the system is under load. I'll collect some
schedstats to check the scan depth in the problematic cases.
With SNC disabled and with the hack llc-split patch applied, there is a new
Die domain generated, the LLC is divided into 4 sub-llc groups:
grep . domain*/{name,flags}
domain0/name:SMT
domain1/name:MC
domain2/name:DIE
domain3/name:NUMA
domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING
domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING
domain2/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_PREFER_SIBLING
domain3/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SERIALIZE SD_OVERLAP SD_NUMA
cat /proc/schedstat | grep cpu0 -A 4
cpu0 0 0 0 0 0 0 15968391465 3630455022 18084
domain0 00000000,00000000,00000000,00010000,00000000,00000000,00000001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,3fff0000,00000000,00000000,00003fff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,000000ff,ffffffff,ffff0000,00000000,00ffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hackbench
=========
case load baseline(std%) compare%( std%)
process-pipe 1-groups 1.00 ( 3.81) -100.18 ( 0.19)
process-pipe 2-groups 1.00 ( 10.74) -59.21 ( 0.91)
process-pipe 4-groups 1.00 ( 5.37) -56.37 ( 0.56)
process-pipe 8-groups 1.00 ( 0.36) +17.11 ( 0.82)
process-sockets 1-groups 1.00 ( 0.09) -26.53 ( 1.45)
process-sockets 2-groups 1.00 ( 0.82) -26.45 ( 0.40)
process-sockets 4-groups 1.00 ( 0.21) -4.09 ( 0.19)
process-sockets 8-groups 1.00 ( 0.13) -5.31 ( 0.36)
threads-pipe 1-groups 1.00 ( 2.14) -62.87 ( 1.11)
threads-pipe 2-groups 1.00 ( 3.18) -55.82 ( 1.14)
threads-pipe 4-groups 1.00 ( 4.68) -54.92 ( 0.34)
threads-pipe 8-groups 1.00 ( 5.08) +15.81 ( 3.08)
threads-sockets 1-groups 1.00 ( 2.60) -18.28 ( 6.03)
threads-sockets 2-groups 1.00 ( 0.83) -30.17 ( 0.60)
threads-sockets 4-groups 1.00 ( 0.16) -4.15 ( 0.27)
threads-sockets 8-groups 1.00 ( 0.36) -5.92 ( 0.94)
The 1 group, 2 groups, 4 groups suffered.
netperf
=======
case load baseline(std%) compare%( std%)
TCP_RR 56-threads 1.00 ( 2.75) +10.49 ( 10.88)
TCP_RR 112-threads 1.00 ( 2.39) -1.88 ( 2.82)
TCP_RR 168-threads 1.00 ( 2.05) +8.31 ( 9.73)
TCP_RR 224-threads 1.00 ( 2.32) +788.25 ( 1.94)
TCP_RR 280-threads 1.00 ( 59.77) +83.07 ( 12.38)
TCP_RR 336-threads 1.00 ( 21.61) -0.22 ( 28.72)
TCP_RR 392-threads 1.00 ( 31.26) -0.13 ( 36.11)
TCP_RR 448-threads 1.00 ( 39.93) -0.14 ( 45.71)
UDP_RR 56-threads 1.00 ( 5.57) +2.38 ( 7.41)
UDP_RR 112-threads 1.00 ( 24.53) +1.51 ( 8.43)
UDP_RR 168-threads 1.00 ( 11.83) +7.34 ( 20.20)
UDP_RR 224-threads 1.00 ( 10.55) +163.81 ( 20.64)
UDP_RR 280-threads 1.00 ( 11.32) +176.04 ( 21.83)
UDP_RR 336-threads 1.00 ( 31.79) +12.87 ( 37.23)
UDP_RR 392-threads 1.00 ( 34.06) +15.64 ( 44.62)
UDP_RR 448-threads 1.00 ( 59.09) +14.00 ( 52.93)
The 224-thread/280-threads show good improvement.
tbench
======
case load baseline(std%) compare%( std%)
loopback 56-threads 1.00 ( 0.83) +1.38 ( 1.56)
loopback 112-threads 1.00 ( 0.19) -4.25 ( 0.90)
loopback 168-threads 1.00 ( 56.43) -31.12 ( 0.37)
loopback 224-threads 1.00 ( 0.28) -2.50 ( 0.44)
loopback 280-threads 1.00 ( 0.10) -1.64 ( 0.81)
loopback 336-threads 1.00 ( 0.19) -2.10 ( 0.10)
loopback 392-threads 1.00 ( 0.13) -2.15 ( 0.39)
loopback 448-threads 1.00 ( 0.45) -2.14 ( 0.43)
Might have no impact to tbench(the 168 threads result is unstable and could
be ignored)
schbench
========
case load baseline(std%) compare%( std%)
normal 1-mthreads 1.00 ( 0.42) -0.59 ( 0.72)
normal 2-mthreads 1.00 ( 2.72) +1.76 ( 0.42)
normal 4-mthreads 1.00 ( 0.75) -1.22 ( 1.86)
normal 8-mthreads 1.00 ( 6.44) -14.56 ( 5.64)
8 message case is not good for schbench.
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..ffc44639447e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -511,6 +511,30 @@ static const struct x86_cpu_id intel_cod_cpu[] = {
{}
};
+static unsigned int sub_llc_nr;
+
+static int __init parse_sub_llc(char *str)
+{
+ get_option(&str, &sub_llc_nr);
+
+ return 0;
+}
+early_param("sub_llc_nr", parse_sub_llc);
+
+static bool
+topology_same_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+ int idx1, idx2;
+
+ if (!sub_llc_nr)
+ return true;
+
+ idx1 = c->apicid / sub_llc_nr;
+ idx2 = o->apicid / sub_llc_nr;
+
+ return idx1 == idx2;
+}
+
static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
@@ -530,7 +554,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
* means 'c' does not share the LLC of 'o'. This will be
* reflected to userspace.
*/
- if (match_pkg(c, o) && !topology_same_node(c, o) && intel_snc)
+ if (match_pkg(c, o) && (!topology_same_node(c, o) || !topology_same_llc(c, o)) && intel_snc)
return false;
return topology_sane(c, o, "llc");
--
2.25.1
[1] https://lore.kernel.org/lkml/5903fc0a-787e-9471-0256-77ff66f0bdef@bytedance.com/
next prev parent reply other threads:[~2023-06-21 7:17 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-31 12:04 [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling() tip-bot2 for Peter Zijlstra
2023-06-01 3:41 ` Abel Wu
2023-06-01 8:09 ` Peter Zijlstra
2023-06-01 9:33 ` K Prateek Nayak
2023-06-01 11:13 ` Peter Zijlstra
2023-06-01 11:56 ` Peter Zijlstra
2023-06-01 12:00 ` Peter Zijlstra
2023-06-01 14:47 ` Peter Zijlstra
2023-06-01 15:35 ` Peter Zijlstra
2023-06-02 5:13 ` K Prateek Nayak
2023-06-02 6:54 ` Peter Zijlstra
2023-06-02 9:19 ` K Prateek Nayak
2023-06-07 18:32 ` K Prateek Nayak
2023-06-13 8:25 ` Peter Zijlstra
2023-06-13 10:30 ` K Prateek Nayak
2023-06-14 8:17 ` Peter Zijlstra
2023-06-14 14:58 ` Chen Yu
2023-06-14 15:13 ` Peter Zijlstra
2023-06-21 7:16 ` Chen Yu [this message]
2023-06-16 6:34 ` K Prateek Nayak
2023-07-05 11:57 ` Peter Zijlstra
2023-07-08 13:17 ` Chen Yu
2023-07-12 17:19 ` Chen Yu
2023-07-13 3:43 ` K Prateek Nayak
2023-07-17 1:09 ` Chen Yu
2023-06-02 7:00 ` Peter Zijlstra
2023-06-01 14:51 ` Peter Zijlstra
2023-06-02 5:17 ` K Prateek Nayak
2023-06-02 9:06 ` Gautham R. Shenoy
2023-06-02 11:23 ` Peter Zijlstra
2023-06-01 16:44 ` Chen Yu
2023-06-02 3:12 ` K Prateek Nayak
[not found] ` <CGME20230605152531eucas1p2a10401ec2180696cc9a5f2e94a67adca@eucas1p2.samsung.com>
2023-06-05 15:25 ` Marek Szyprowski
2023-06-05 17:56 ` Peter Zijlstra
2023-06-05 19:07 ` Peter Zijlstra
2023-06-05 22:20 ` Marek Szyprowski
2023-06-06 7:58 ` Chen Yu
2023-06-01 8:43 tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJKjvx/NxooM5z1Y@chenyu5-mobl2.ccr.corp.intel.com \
--to=yu.c.chen@intel.com \
--cc=gautham.shenoy@amd.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tim.c.chen@intel.com \
--cc=tj@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.