linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Meelis Roos <mroos@linux.ee>, LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Mel Gorman <mgorman@suse.de>
Subject: RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes
Date: Thu, 21 Jan 2021 21:17:10 +0000	[thread overview]
Message-ID: <353d255769b6463c862993e2329a9a8d@hisilicon.com> (raw)
In-Reply-To: <f0818204-66d1-bf01-062e-0aeec9ce806d@arm.com>



> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@arm.com]
> Sent: Friday, January 22, 2021 7:54 AM
> To: Valentin Schneider <valentin.schneider@arm.com>; Meelis Roos
> <mroos@linux.ee>; LKML <linux-kernel@vger.kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>; Vincent Guittot
> <vincent.guittot@linaro.org>; Song Bao Hua (Barry Song)
> <song.bao.hua@hisilicon.com>; Mel Gorman <mgorman@suse.de>
> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> 
> On 21/01/2021 19:21, Valentin Schneider wrote:
> > On 21/01/21 19:39, Meelis Roos wrote:
> >>> Could you paste the output of the below?
> >>>
> >>>    $ cat /sys/devices/system/node/node*/distance
> >>
> >> 10 12 12 14 14 14 14 16
> >> 12 10 14 12 14 14 12 14
> >> 12 14 10 14 12 12 14 14
> >> 14 12 14 10 12 12 14 14
> >> 14 14 12 12 10 14 12 14
> >> 14 14 12 12 14 10 14 12
> >> 14 12 14 14 12 14 10 12
> >> 16 14 14 14 14 12 12 10
> >>
> >
> > Thanks!
> >
> >>
> >>> Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
> >>> appending 'sched_debug' to your cmdline should yield some extra data.
> >>
> >> [    0.000000] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2)
> (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1)
> #55 SMP Thu Jan 21 19:23:10 EET 2021
> >> [    0.000000] Command line:
> BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro
> quiet
> >
> > This is missing 'sched_debug' to get the extra topology debug prints (yes
> > it needs an extra cmdline argument on top of having CONFIG_SCHED_DEBUG=y),
> > but I should be able to generate those locally by feeding QEMU the above
> > distance table.
> 
> Can be recreated with (simplified with only 1 CPU per node):
> 
> $ qemu-system-aarch64 -kernel /opt/git/kernel_org/arch/arm64/boot/Image -hda
> /opt/git/tools/qemu-imgs-manipulator/images/qemu-image-aarch64.img -append
> 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -nographic -machine
> virt,gic-version=max -smp cores=8 -m 512 -cpu cortex-a57 -numa
> node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, -numa node,cpus=2,nodeid=2,
> -numa node,cpus=3,nodeid=3, -numa node,cpus=4,nodeid=4, -numa
> node,cpus=5,nodeid=5, -numa node,cpus=6,nodeid=6, -numa node,cpus=7,nodeid=7,
> -numa dist,src=0,dst=1,val=12, -numa dist,src=0,dst=2,val=12, -numa
> dist,src=0,dst=3,val=14, -numa dist,src=0,dst=4,val=14, -numa
> dist,src=0,dst=5,val=14, -numa dist,src=0,dst=6,val=14, -numa
> dist,src=0,dst=7,val=16, -numa dist,src=1,dst=2,val=14, -numa
> dist,src=1,dst=3,val=12, -numa dist,src=1,dst=4,val=14, -numa
> dist,src=1,dst=5,val=14, -numa dist,src=1,dst=6,val=12, -numa
> dist,src=1,dst=7,val=14, -numa dist,src=2,dst=3,val=14, -numa
> dist,src=2,dst=4,val=12, -numa dist,src=2,dst=5,val=12, -numa
> dist,src=2,dst=6,val=14, -numa dist,src=2,dst=7,val=14, -numa
> dist,src=3,dst=4,val=12, -numa dist,src=3,dst=5,val=12, -numa
> dist,src=3,dst=6,val=14, -numa dist,src=3,dst=7,val=14, -numa
> dist,src=4,dst=5,val=14, -numa dist,src=4,dst=6,val=12, -numa
> dist,src=4,dst=7,val=14, -numa dist,src=5,dst=6,val=14, -numa
> dist,src=5,dst=7,val=12, -numa dist,src=6,dst=7,val=12
> 
> [    0.206628] ------------[ cut here ]------------
> [    0.206698] Shortest NUMA path spans too many nodes
> [    0.207119] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:753
> cpu_attach_domain+0x42c/0x87c
> [    0.207176] Modules linked in:
> [    0.207373] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.11.0-rc2-00010-g65bcf072e20e-dirty #81
> [    0.207458] Hardware name: linux,dummy-virt (DT)
> [    0.207584] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [    0.207618] pc : cpu_attach_domain+0x42c/0x87c
> [    0.207646] lr : cpu_attach_domain+0x42c/0x87c
> [    0.207665] sp : ffff800011fcbbf0
> [    0.207679] x29: ffff800011fcbbf0 x28: ffff0000024d8200
> [    0.207735] x27: 0000000000001fef x26: 0000000000001917
> [    0.207755] x25: ffff0000024d8000 x24: 0000000000001917
> [    0.207772] x23: 0000000000000000 x22: ffff800011b69a40
> [    0.207789] x21: ffff0000024d8320 x20: ffff8000116fda80
> [    0.207806] x19: ffff0000024d8000 x18: 0000000000000000
> [    0.207822] x17: 0000000000000000 x16: 00000000bd30d762
> [    0.207838] x15: 0000000000000030 x14: ffffffffffffffff
> [    0.207855] x13: ffff800011b82e08 x12: 00000000000001b9
> [    0.207871] x11: 0000000000000093 x10: ffff800011bdae08
> [    0.207887] x9 : 00000000fffff000 x8 : ffff800011b82e08
> [    0.207922] x7 : ffff800011bdae08 x6 : 0000000000000000
> [    0.207939] x5 : 0000000000000000 x4 : 0000000000000000
> [    0.207955] x3 : 00000000ffffffff x2 : 0000000000000000
> [    0.207972] x1 : 0000000000000000 x0 : ffff000018020000
> [    0.208125] Call trace:
> [    0.208230]  cpu_attach_domain+0x42c/0x87c
> [    0.208256]  build_sched_domains+0x1238/0x12f4
> [    0.208271]  sched_init_domains+0x80/0xb0
> [    0.208283]  sched_init_smp+0x30/0x80
> [    0.208299]  kernel_init_freeable+0xf4/0x238
> [    0.208313]  kernel_init+0x14/0x118
> [    0.208328]  ret_from_fork+0x10/0x34
> [    0.208507] ---[ end trace 75cafa7c7d1a3d7e ]---
> [    0.208706] CPU0 attaching sched-domain(s):
> [    0.208756]  domain-0: span=0-2 level=NUMA
> [    0.209001]   groups: 0:{ span=0 cap=1017 }, 1:{ span=1 cap=1016 }, 2:{ span=2
> cap=1015 }
> [    0.209247]   domain-1: span=0-6 level=NUMA
> [    0.209280]    groups: 0:{ span=0-2 mask=0 cap=3048 }, 3:{ span=1,3-5 mask=3
> cap=4073 }, 6:{ span=1,4,6-7 mask=6 cap=4084 }
> [    0.209693] ERROR: groups don't span domain->span
> [    0.209703]    domain-2: span=0-7 level=NUMA
> [    0.209722]     groups: 0:{ span=0-6 mask=0 cap=7114 }, 7:{ span=1-7 mask=7
> cap=7163 }
> [    0.210361] CPU1 attaching sched-domain(s):
> [    0.210376]  domain-0: span=0-1,3,6 level=NUMA
> [    0.210411]   groups: 1:{ span=1 cap=1016 }, 3:{ span=3 cap=1018 }, 6:{ span=6
> cap=1017 }, 0:{ span=0 cap=1017 }
> [    0.210493]   domain-1: span=0-7 level=NUMA
> [    0.210511]    groups: 1:{ span=0-1,3,6 mask=1 cap=4075 }, 2:{ span=0,2,4-5
> mask=2 cap=4070 }, 7:{ span=5-7 mask=7 cap=3067 }
> [    0.210641] CPU2 attaching sched-domain(s):
> [    0.210653]  domain-0: span=0,2,4-5 level=NUMA
> [    0.210672]   groups: 2:{ span=2 cap=1015 }, 4:{ span=4 cap=1016 }, 5:{ span=5
> cap=1015 }, 0:{ span=0 cap=1017 }
> [    0.210752]   domain-1: span=0-7 level=NUMA
> [    0.210769]    groups: 2:{ span=0,2,4-5 mask=2 cap=4070 }, 3:{ span=1,3-5
> mask=3 cap=4073 }, 6:{ span=1,4,6-7 mask=6 cap=4084 }
> [    0.210860] CPU3 attaching sched-domain(s):
> [    0.210870]  domain-0: span=1,3-5 level=NUMA
> [    0.210887]   groups: 3:{ span=3 cap=1018 }, 4:{ span=4 cap=1016 }, 5:{ span=5
> cap=1015 }, 1:{ span=1 cap=1016 }
> [    0.210965]   domain-1: span=0-7 level=NUMA
> [    0.210981]    groups: 3:{ span=1,3-5 mask=3 cap=4073 }, 6:{ span=1,4,6-7
> mask=6 cap=4084 }, 0:{ span=0-2 mask=0 cap=3048 }
> [    0.211109] CPU4 attaching sched-domain(s):
> [    0.211134]  domain-0: span=2-4,6 level=NUMA
> [    0.211151]   groups: 4:{ span=4 cap=1016 }, 6:{ span=6 cap=1017 }, 2:{ span=2
> cap=1015 }, 3:{ span=3 cap=1018 }
> [    0.211229]   domain-1: span=0-7 level=NUMA
> [    0.211245]    groups: 4:{ span=2-4,6 mask=4 cap=4081 }, 5:{ span=2-3,5,7
> mask=5 cap=4082 }, 0:{ span=0-2 mask=0 cap=3048 }
> [    0.211383] CPU5 attaching sched-domain(s):
> [    0.211393]  domain-0: span=2-3,5,7 level=NUMA
> [    0.211425]   groups: 5:{ span=5 cap=1015 }, 7:{ span=7 cap=1019 }, 2:{ span=2
> cap=1015 }, 3:{ span=3 cap=1018 }
> [    0.211506]   domain-1: span=0-7 level=NUMA
> [    0.211524]    groups: 5:{ span=2-3,5,7 mask=5 cap=4082 }, 6:{ span=1,4,6-7
> mask=6 cap=4084 }, 0:{ span=0-2 mask=0 cap=3048 }
> [    0.211618] CPU6 attaching sched-domain(s):
> [    0.211628]  domain-0: span=1,4,6-7 level=NUMA
> [    0.211645]   groups: 6:{ span=6 cap=1017 }, 7:{ span=7 cap=1019 }, 1:{ span=1
> cap=1016 }, 4:{ span=4 cap=1016 }
> [    0.211728]   domain-1: span=0-7 level=NUMA
> [    0.211745]    groups: 6:{ span=1,4,6-7 mask=6 cap=4084 }, 0:{ span=0-2 mask=0
> cap=3048 }, 3:{ span=1,3-5 mask=3 cap=4073 }
> [    0.211855] CPU7 attaching sched-domain(s):
> [    0.211866]  domain-0: span=5-7 level=NUMA
> [    0.211884]   groups: 7:{ span=7 cap=1019 }, 5:{ span=5 cap=1015 }, 6:{ span=6
> cap=1017 }
> [    0.211949]   domain-1: span=1-7 level=NUMA
> [    0.211966]    groups: 7:{ span=5-7 mask=7 cap=3067 }, 1:{ span=0-1,3,6 mask=1
> cap=4075 }, 2:{ span=0,2,4-5 mask=2 cap=4070 }
> [    0.212047] ERROR: groups don't span domain->span
> [    0.212055]    domain-2: span=0-7 level=NUMA
> [    0.212072]     groups: 7:{ span=1-7 mask=7 cap=7163 }, 0:{ span=0-6 mask=0
> cap=7114 }
> 
> # cat /sys/devices/system/node/node*/distance
> 10 12 12 14 14 14 14 16
> 12 10 14 12 14 14 12 14
> 12 14 10 14 12 12 14 14
> 14 12 14 10 12 12 14 14
> 14 14 12 12 10 14 12 14
> 14 14 12 12 14 10 14 12
> 14 12 14 14 12 14 10 12
> 16 14 14 14 14 12 12 10
> 
> The '16' seems to be the culprit. How does such a topo look like?

Once we get a topology like this:


         +------+         +------+        +-------+       +------+
         | node |         |node  |        | node  |       |node  |
         |      +---------+      +--------+       +-------+      |
         +------+         +------+        +-------+       +------+

We can reproduce this issue. 
For example, every cpu with the below numa_distance can have 
"groups don't span domain->span":
node   0   1   2   3
  0:  10  12  20  22
  1:  12  10  22  24
  2:  20  22  10  12
  3:  22  24  12  10

Qemu:
qemu-system-aarch64 -M virt -nographic \
 -smp cpus=8 \
 -numa node,cpus=0-1,nodeid=0 \
 -numa node,cpus=2-3,nodeid=1 \
 -numa node,cpus=4-5,nodeid=2 \
 -numa node,cpus=6-7,nodeid=3 \
 -numa dist,src=0,dst=1,val=12 \
 -numa dist,src=0,dst=2,val=20 \
 -numa dist,src=0,dst=3,val=22 \
 -numa dist,src=1,dst=2,val=22 \
 -numa dist,src=2,dst=3,val=12 \
 -numa dist,src=1,dst=3,val=24 \

Boot log:
[    0.834496] CPU0 attaching sched-domain(s):
[    0.834546]  domain-0: span=0-1 level=MC
[    0.834754]   groups: 0:{ span=0 cap=1011 }, 1:{ span=1 cap=970 }
[    0.835018]   domain-1: span=0-3 level=NUMA
[    0.835052]    groups: 0:{ span=0-1 cap=1981 }, 2:{ span=2-3 cap=1997 }
[    0.835128]    domain-2: span=0-5 level=NUMA
[    0.835144]     groups: 0:{ span=0-3 cap=3978 }, 4:{ span=4-7 cap=3864 }
[    0.835195] ERROR: groups don't span domain->span
[    0.835206]     domain-3: span=0-7 level=NUMA
[    0.835222]      groups: 0:{ span=0-5 mask=0-1 cap=5933 }, 6:{
span=4-7 mask=6-7 cap=3957 }
[    0.835959] CPU1 attaching sched-domain(s):
[    0.835974]  domain-0: span=0-1 level=MC
[    0.835996]   groups: 1:{ span=1 cap=970 }, 0:{ span=0 cap=1011 }
[    0.836049]   domain-1: span=0-3 level=NUMA
[    0.836065]    groups: 0:{ span=0-1 cap=1981 }, 2:{ span=2-3 cap=1997 }
[    0.836114]    domain-2: span=0-5 level=NUMA
[    0.836130]     groups: 0:{ span=0-3 cap=3978 }, 4:{ span=4-7 cap=3864 }
[    0.836178] ERROR: groups don't span domain->span
[    0.836188]     domain-3: span=0-7 level=NUMA
[    0.836204]      groups: 0:{ span=0-5 mask=0-1 cap=5933 }, 6:{
span=4-7 mask=6-7 cap=3957 }
[    0.836290] CPU2 attaching sched-domain(s):
[    0.836299]  domain-0: span=2-3 level=MC
[    0.836316]   groups: 2:{ span=2 cap=983 }, 3:{ span=3 cap=1014 }
[    0.836364]   domain-1: span=0-3 level=NUMA
[    0.836379]    groups: 2:{ span=2-3 cap=1997 }, 0:{ span=0-1 cap=1981 }
[    0.836427]    domain-2: span=0-5 level=NUMA
[    0.836442]     groups: 2:{ span=0-3 mask=2-3 cap=4045 }, 4:{
span=0-1,4-7 mask=4-5 cap=5912 }
[    0.836538] ERROR: groups don't span domain->span
[    0.836549]     domain-3: span=0-7 level=NUMA
[    0.836580]      groups: 2:{ span=0-5 mask=2-3 cap=6000 }, 6:{
span=0-1,4-7 mask=6-7 cap=6005 }
[    0.836667] CPU3 attaching sched-domain(s):
[    0.836675]  domain-0: span=2-3 level=MC
[    0.836690]   groups: 3:{ span=3 cap=1014 }, 2:{ span=2 cap=983 }
[    0.836734]   domain-1: span=0-3 level=NUMA
[    0.836749]    groups: 2:{ span=2-3 cap=1997 }, 0:{ span=0-1 cap=1981 }
[    0.836793]    domain-2: span=0-5 level=NUMA
[    0.836822]     groups: 2:{ span=0-3 mask=2-3 cap=4045 }, 4:{
span=0-1,4-7 mask=4-5 cap=5912 }
[    0.836879] ERROR: groups don't span domain->span
[    0.836888]     domain-3: span=0-7 level=NUMA
[    0.836903]      groups: 2:{ span=0-5 mask=2-3 cap=6000 }, 6:{
span=0-1,4-7 mask=6-7 cap=6005 }
[    0.836975] CPU4 attaching sched-domain(s):
[    0.836982]  domain-0: span=4-5 level=MC
[    0.836997]   groups: 4:{ span=4 cap=945 }, 5:{ span=5 cap=1010 }
[    0.837041]   domain-1: span=4-7 level=NUMA
[    0.837057]    groups: 4:{ span=4-5 cap=1955 }, 6:{ span=6-7 cap=1909 }
[    0.837102]    domain-2: span=0-1,4-7 level=NUMA
[    0.837117]     groups: 4:{ span=4-7 cap=3864 }, 0:{ span=0-3 cap=3978 }
[    0.837161] ERROR: groups don't span domain->span
[    0.837170]     domain-3: span=0-7 level=NUMA
[    0.837185]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=5912 }, 2:{
span=0-3 mask=2-3 cap=4045 }
[    0.837252] CPU5 attaching sched-domain(s):
[    0.837260]  domain-0: span=4-5 level=MC
[    0.837275]   groups: 5:{ span=5 cap=1010 }, 4:{ span=4 cap=945 }
[    0.837320]   domain-1: span=4-7 level=NUMA
[    0.837334]    groups: 4:{ span=4-5 cap=1955 }, 6:{ span=6-7 cap=1909 }
[    0.837378]    domain-2: span=0-1,4-7 level=NUMA
[    0.837393]     groups: 4:{ span=4-7 cap=3864 }, 0:{ span=0-3 cap=3978 }
[    0.837437] ERROR: groups don't span domain->span
[    0.837445]     domain-3: span=0-7 level=NUMA
[    0.837460]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=5912 }, 2:{
span=0-3 mask=2-3 cap=4045 }
[    0.837552] CPU6 attaching sched-domain(s):
[    0.837560]  domain-0: span=6-7 level=MC
[    0.837576]   groups: 6:{ span=6 cap=1002 }, 7:{ span=7 cap=907 }
[    0.837621]   domain-1: span=4-7 level=NUMA
[    0.837635]    groups: 6:{ span=6-7 cap=1909 }, 4:{ span=4-5 cap=1955 }
[    0.837679]    domain-2: span=0-1,4-7 level=NUMA
[    0.837695]     groups: 6:{ span=4-7 mask=6-7 cap=3957 }, 0:{
span=0-5 mask=0-1 cap=5933 }
[    0.837749] ERROR: groups don't span domain->span
[    0.837758]     domain-3: span=0-7 level=NUMA
[    0.837774]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=6005 }, 2:{
span=0-5 mask=2-3 cap=6000 }
[    0.838055] CPU7 attaching sched-domain(s):
[    0.838066]  domain-0: span=6-7 level=MC
[    0.838086]   groups: 7:{ span=7 cap=907 }, 6:{ span=6 cap=1002 }
[    0.838135]   domain-1: span=4-7 level=NUMA
[    0.838151]    groups: 6:{ span=6-7 cap=1909 }, 4:{ span=4-5 cap=1955 }
[    0.838198]    domain-2: span=0-1,4-7 level=NUMA
[    0.838214]     groups: 6:{ span=4-7 mask=6-7 cap=3957 }, 0:{
span=0-5 mask=0-1 cap=5933 }
[    0.838272] ERROR: groups don't span domain->span
[    0.838282]     domain-3: span=0-7 level=NUMA
[    0.838298]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=6005 }, 2:{
span=0-5 mask=2-3 cap=6000 }
[    0.838414] root domain span: 0-7 (max cpu_capacity = 1024)

Thanks
Barry


  reply	other threads:[~2021-01-21 21:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 13:41 5.11-rc4+git: Shortest NUMA path spans too many nodes Meelis Roos
2021-01-21 15:05 ` Valentin Schneider
2021-01-21 17:39   ` Meelis Roos
2021-01-21 18:21     ` Valentin Schneider
2021-01-21 18:53       ` Dietmar Eggemann
2021-01-21 21:17         ` Song Bao Hua (Barry Song) [this message]
2021-01-22 10:05           ` Dietmar Eggemann
2021-01-22 11:09             ` Song Bao Hua (Barry Song)
2021-01-22 11:16               ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=353d255769b6463c862993e2329a9a8d@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mroos@linux.ee \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).