linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] powerpc/numa: do not skip node 0 in lookup table
@ 2020-08-14 20:34 Daniel Henrique Barboza
  2020-08-14 20:34 ` [PATCH 1/1] powerpc/numa: do not skip node 0 when init " Daniel Henrique Barboza
  2020-09-04 20:06 ` [PATCH 0/1] powerpc/numa: do not skip node 0 in " Daniel Henrique Barboza
  0 siblings, 2 replies; 3+ messages in thread
From: Daniel Henrique Barboza @ 2020-08-14 20:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Henrique Barboza

Hi,

This is a simple fix that I made while testing NUMA changes
I'm making in QEMU [1]. Setting any non-zero value to the
associativity of NUMA node 0 has no impact in the output
of 'numactl' because the distance_lookup_table is never
initialized for node 0.

Seeing through the LOPAPR spec and git history I found no
technical reason to skip node 0, which makes me believe this is
a bug that got under the radar up until now because no one
attempted to set node 0 associativity like I'm doing now.

For anyone wishing to give it a spin, using the QEMU build
in [1] and experimenting with NUMA distances, such as:

sudo ./qemu-system-ppc64 -machine pseries-5.2,accel=kvm,usb=off,dump-guest-core=off -m 65536 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc -display none -vga none -nographic -boot menu=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device spapr-pci-host-bridge,index=3,id=pci.3 -device spapr-pci-host-bridge,index=4,id=pci.4 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -drive file=/home/danielhb/f32.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on \
-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=80 -numa dist,src=0,dst=2,val=80 \
-numa dist,src=0,dst=3,val=80 -numa dist,src=1,dst=2,val=80 \
-numa dist,src=1,dst=3,val=80 -numa dist,src=2,dst=3,val=80

The current kernel code will ignore the associativity of
node 0, and numactl will output this:

node distances:
node   0   1   2   3 
  0:  10  160  160  160 
  1:  160  10  80  80 
  2:  160  80  10  80 
  3:  160  80  80  10 

With this patch:

node distances:
node   0   1   2   3 
  0:  10  160  160  160 
  1:  160  10  80  40 
  2:  160  80  10  20 
  3:  160  40  20  10 


If anyone wonders, this patch has no conflict with the proposed
NUMA changes in [2] because Aneesh isn't changing this line.


[1] https://github.com/danielhb/qemu/tree/spapr_numa_v1
[2] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200731111916.243569-1-aneesh.kumar@linux.ibm.com/


Daniel Henrique Barboza (1):
  powerpc/numa: do not skip node 0 when init lookup table

 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] powerpc/numa: do not skip node 0 when init lookup table
  2020-08-14 20:34 [PATCH 0/1] powerpc/numa: do not skip node 0 in lookup table Daniel Henrique Barboza
@ 2020-08-14 20:34 ` Daniel Henrique Barboza
  2020-09-04 20:06 ` [PATCH 0/1] powerpc/numa: do not skip node 0 in " Daniel Henrique Barboza
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Henrique Barboza @ 2020-08-14 20:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Henrique Barboza

associativity_to_nid() is skipping node 0 when initializing
the distance lookup table. This has no practical effect when
the associativity of node 0 is always zero, which seems to
be case for a long time. As such, this line got introduced in
commit 41eab6f88f24 from 2010 and never revisited.

However, QEMU is making an effort to allow user input to configure
NUMA topologies, and this behavior got exposed when testing
that work. With the existing code, this is what happens with a
4 node NUMA guest with distance = 80 to each other:

$ numactl -H
(...)
node distances:
node   0   1   2   3
  0:  10  160  160  160
  1:  160  10  80  80
  2:  160  80  10  80
  3:  160  80  80  10

With this patch, this is the result:

$ numactl -H
(...)
node distances:
node   0   1   2   3
  0:  10  80  80  80
  1:  80  10  80  80
  2:  80  80  10  80
  3:  80  80  80  10

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 1f61fa2148b5..c11aabad1090 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -239,7 +239,7 @@ static int associativity_to_nid(const __be32 *associativity)
 	if (nid == 0xffff || nid >= nr_node_ids)
 		nid = NUMA_NO_NODE;
 
-	if (nid > 0 &&
+	if (nid >= 0 &&
 		of_read_number(associativity, 1) >= distance_ref_points_depth) {
 		/*
 		 * Skip the length field and send start of associativity array
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] powerpc/numa: do not skip node 0 in lookup table
  2020-08-14 20:34 [PATCH 0/1] powerpc/numa: do not skip node 0 in lookup table Daniel Henrique Barboza
  2020-08-14 20:34 ` [PATCH 1/1] powerpc/numa: do not skip node 0 when init " Daniel Henrique Barboza
@ 2020-09-04 20:06 ` Daniel Henrique Barboza
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-04 20:06 UTC (permalink / raw)
  To: linuxppc-dev

I discussed this a bit with Aneesh Kumar in IBM internal Slack, a few weeks
ago, and he informed me that that this patch does not make sense with the
design used by the kernel. The kernel will assume that, for node 0, all
associativity domains must also be zeroed. This is why node 0 is skipped
when creating the distance table.

This of course has consequences for QEMU, so based on that, I've adapted
the QEMU implementation to not touch node 0.



Daniel

On 8/14/20 5:34 PM, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is a simple fix that I made while testing NUMA changes
> I'm making in QEMU [1]. Setting any non-zero value to the
> associativity of NUMA node 0 has no impact in the output
> of 'numactl' because the distance_lookup_table is never
> initialized for node 0.
> 
> Seeing through the LOPAPR spec and git history I found no
> technical reason to skip node 0, which makes me believe this is
> a bug that got under the radar up until now because no one
> attempted to set node 0 associativity like I'm doing now.
> 
> For anyone wishing to give it a spin, using the QEMU build
> in [1] and experimenting with NUMA distances, such as:
> 
> sudo ./qemu-system-ppc64 -machine pseries-5.2,accel=kvm,usb=off,dump-guest-core=off -m 65536 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc -display none -vga none -nographic -boot menu=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device spapr-pci-host-bridge,index=3,id=pci.3 -device spapr-pci-host-bridge,index=4,id=pci.4 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -drive file=/home/danielhb/f32.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on \
> -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
> -numa node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 \
> -numa dist,src=0,dst=1,val=80 -numa dist,src=0,dst=2,val=80 \
> -numa dist,src=0,dst=3,val=80 -numa dist,src=1,dst=2,val=80 \
> -numa dist,src=1,dst=3,val=80 -numa dist,src=2,dst=3,val=80
> 
> The current kernel code will ignore the associativity of
> node 0, and numactl will output this:
> 
> node distances:
> node   0   1   2   3
>    0:  10  160  160  160
>    1:  160  10  80  80
>    2:  160  80  10  80
>    3:  160  80  80  10
> 
> With this patch:
> 
> node distances:
> node   0   1   2   3
>    0:  10  160  160  160
>    1:  160  10  80  40
>    2:  160  80  10  20
>    3:  160  40  20  10
> 
> 
> If anyone wonders, this patch has no conflict with the proposed
> NUMA changes in [2] because Aneesh isn't changing this line.
> 
> 
> [1] https://github.com/danielhb/qemu/tree/spapr_numa_v1
> [2] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200731111916.243569-1-aneesh.kumar@linux.ibm.com/
> 
> 
> Daniel Henrique Barboza (1):
>    powerpc/numa: do not skip node 0 when init lookup table
> 
>   arch/powerpc/mm/numa.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-04 20:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 20:34 [PATCH 0/1] powerpc/numa: do not skip node 0 in lookup table Daniel Henrique Barboza
2020-08-14 20:34 ` [PATCH 1/1] powerpc/numa: do not skip node 0 when init " Daniel Henrique Barboza
2020-09-04 20:06 ` [PATCH 0/1] powerpc/numa: do not skip node 0 in " Daniel Henrique Barboza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).