All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: QEMU NUMA and U-Boot
       [not found] <CAHFG_=WTxz24QAPgfSfS_uz7X6wZMnjqwjTkbi0eO1Vk_TGkiQ@mail.gmail.com>
@ 2021-07-06 18:10 ` Heinrich Schuchardt
  2021-07-07  1:44   ` AKASHI Takahiro
  0 siblings, 1 reply; 13+ messages in thread
From: Heinrich Schuchardt @ 2021-07-06 18:10 UTC (permalink / raw)
  To: François Ozog
  Cc: Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

On 7/6/21 6:13 PM, François Ozog wrote:
> Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> description when using Qemu 5.2 NUMA configuration to adapt memory map
> (kernel_addr_r...):
>
>         -smp 4 \
>           -m 8G,slots=2,maxmem=16G \
>          -object memory-backend-ram,size=4G,id=m0 \
>          -object memory-backend-ram,size=4G,id=m1 \
>          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>          -numa node,cpus=2-3,nodeid=1,memdev=m1
>
> kernel_addr_r is still 0x4040000 and thus you can't use it to bootefi.
>
> fdt addr 0x13ede6de0; fdt print
>
> Displays fdt while I think it should not.
>
> If I load the kernel at dram.start, the load works but not boot
>
> U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>
>
> DRAM:4 GiB
>
> Flash: 64 MiB
>
> Loading Environment from Flash... OK
>
> In:pl011@9000000
>
> Out: pl011@9000000
>
> Err: pl011@9000000
>
> Net: eth0: virtio-net#32
>
> Hit any key to stop autoboot:0
>
> =>
>
> => bdinfo
>
> boot_params = 0x0000000000000000
>
> DRAM bank = 0x0000000000000000
>
> -> start= 0x0000000140000000
>
> -> size = 0x0000000100000000
>
> flashstart= 0x0000000000000000
>
> flashsize = 0x0000000004000000
>
> flashoffset = 0x00000000000bc990
>
> baudrate= 115200 bps
>
> relocaddr = 0x000000013ff27000
>
> reloc off = 0x000000013ff27000
>
> Build = 64-bit
>
> current eth = virtio-net#32
>
> ethaddr = 52:52:52:52:52:52
>
> IP addr = <NULL>
>
> fdt_blob= 0x000000013ede6de0
>
> new_fdt = 0x000000013ede6de0
>
> fdt_size= 0x0000000000100000
>
> lmb_dump_all:
>
> memory.cnt= 0x1
>
> memory.reg[0x0].base = 0x140000000
>
> .size = 0x100000000
>
>
> reserved.cnt= 0x0
>
> arch_number = 0x0000000000000000
>
> TLB addr= 0x000000013fff0000
>
> irq_sp= 0x000000013ede6dd0
>
> sp start= 0x000000013ede6dd0
>
> Early malloc usage: 3a8 / 2000
>
> => load virtio 0:1 0x140000000 /oskit.efi
>
> 853424 bytes read in 1 ms (813.9 MiB/s)
>
> => bootefi0x140000000 0x13ede6dd0
>
> ERROR: Failed to register WaitForKey event
>
> Setting OsIndications failed
>
> Error: Cannot initialize UEFI sub-system, r = 9
>
>
> I think there is a need to calculate memory map based on previous
> firmware (TFA, QEMU can be considered as previous frimware) information
> (DT or blob_list).
>
> What do you think ?
>
> Cheers
>
> FF
>
> --
>
> François-Frédéric Ozog | /Director Business Development/
> T: +33.67221.6485
> francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> | Skype: ffozog
>
>

The kernel load address is hard coded here:
include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \

bdinfo shows:
DRAM start = 0x140000000
DRAM size  = 0x100000000

fdt addr $fdt_addr
fdt printf

shows two memory areas. One at 40000000, one at 140000000.

Your use case is well beyond the typical U-Boot usage. So I guess it
will be up to Linaro to provide the necessary patches:

* determine the active CPU
* determine the RAM assigned to the active CPU according
   to the numa-node-id in the device-tree
* make sure that U-Boot only uses the memory of the active CPU
   internally
* make sure that the UEFI memory map contains a compliant description
* possibly, dynamically set up the environment variables

+CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)

Best regards

Heinrich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-06 18:10 ` QEMU NUMA and U-Boot Heinrich Schuchardt
@ 2021-07-07  1:44   ` AKASHI Takahiro
  2021-07-07  3:18     ` Heinrich Schuchardt
  0 siblings, 1 reply; 13+ messages in thread
From: AKASHI Takahiro @ 2021-07-07  1:44 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Fran??ois Ozog, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

François,

On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> On 7/6/21 6:13 PM, François Ozog wrote:
> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > description when using Qemu 5.2 NUMA configuration to adapt memory map
> > (kernel_addr_r...):
> > 
> >         -smp 4 \
> >           -m 8G,slots=2,maxmem=16G \
> >          -object memory-backend-ram,size=4G,id=m0 \
> >          -object memory-backend-ram,size=4G,id=m1 \
> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > 
> > kernel_addr_r is still 0x4040000 and thus you can't use it to bootefi.
> > 
> > fdt addr 0x13ede6de0; fdt print
> > 
> > Displays fdt while I think it should not.
> > 
> > If I load the kernel at dram.start, the load works but not boot
> > 
> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > 
> > 
> > DRAM:4 GiB
> > 
> > Flash: 64 MiB
> > 
> > Loading Environment from Flash... OK
> > 
> > In:pl011@9000000
> > 
> > Out: pl011@9000000
> > 
> > Err: pl011@9000000
> > 
> > Net: eth0: virtio-net#32
> > 
> > Hit any key to stop autoboot:0
> > 
> > =>
> > 
> > => bdinfo
> > 
> > boot_params = 0x0000000000000000
> > 
> > DRAM bank = 0x0000000000000000
> > 
> > -> start= 0x0000000140000000
> > 
> > -> size = 0x0000000100000000
> > 
> > flashstart= 0x0000000000000000
> > 
> > flashsize = 0x0000000004000000
> > 
> > flashoffset = 0x00000000000bc990
> > 
> > baudrate= 115200 bps
> > 
> > relocaddr = 0x000000013ff27000
> > 
> > reloc off = 0x000000013ff27000
> > 
> > Build = 64-bit
> > 
> > current eth = virtio-net#32
> > 
> > ethaddr = 52:52:52:52:52:52
> > 
> > IP addr = <NULL>
> > 
> > fdt_blob= 0x000000013ede6de0
> > 
> > new_fdt = 0x000000013ede6de0
> > 
> > fdt_size= 0x0000000000100000
> > 
> > lmb_dump_all:
> > 
> > memory.cnt= 0x1
> > 
> > memory.reg[0x0].base = 0x140000000
> > 
> > .size = 0x100000000
> > 
> > 
> > reserved.cnt= 0x0
> > 
> > arch_number = 0x0000000000000000
> > 
> > TLB addr= 0x000000013fff0000
> > 
> > irq_sp= 0x000000013ede6dd0
> > 
> > sp start= 0x000000013ede6dd0
> > 
> > Early malloc usage: 3a8 / 2000
> > 
> > => load virtio 0:1 0x140000000 /oskit.efi
> > 
> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > 
> > => bootefi0x140000000 0x13ede6dd0
> > 
> > ERROR: Failed to register WaitForKey event
> > 
> > Setting OsIndications failed
> > 
> > Error: Cannot initialize UEFI sub-system, r = 9
> > 
> > 
> > I think there is a need to calculate memory map based on previous
> > firmware (TFA, QEMU can be considered as previous frimware) information
> > (DT or blob_list).
> > 
> > What do you think ?
> > 
> > Cheers
> > 
> > FF
> > 
> > --
> > 
> > François-Frédéric Ozog | /Director Business Development/
> > T: +33.67221.6485
> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> | Skype: ffozog
> > 
> > 
> 
> The kernel load address is hard coded here:
> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> 
> bdinfo shows:
> DRAM start = 0x140000000
> DRAM size  = 0x100000000
> 
> fdt addr $fdt_addr
> fdt printf
> 
> shows two memory areas. One at 40000000, one at 140000000.

(This shows that U-Boot receives a correct memory map via dtb.)

Is this a NUMA machine, isn't it? Why should we care of which
memory region be used here? Please note that this is a virtual machine,
there is no practical difference between two regions.

The root problem is that U-Boot did not recognize there were two
memory regions. We can fix this issue in either way:

1)
diff --git a/configs/qemu_arm64_defconfig b/configs/qemu_arm64_defconfig
index f6e586627a8e..b70ffae8bf6e 100644
--- a/configs/qemu_arm64_defconfig
+++ b/configs/qemu_arm64_defconfig
@@ -1,7 +1,7 @@
 CONFIG_ARM=y
 CONFIG_POSITION_INDEPENDENT=y
 CONFIG_ARCH_QEMU=y
-CONFIG_NR_DRAM_BANKS=1
+CONFIG_NR_DRAM_BANKS=2
 CONFIG_ENV_SIZE=0x40000
 CONFIG_ENV_SECT_SIZE=0x40000
 CONFIG_AHCI=y

2)
diff --git a/lib/fdtdec.c b/lib/fdtdec.c
index 4b097fb588ed..4067ea2dead6 100644
--- a/lib/fdtdec.c
+++ b/lib/fdtdec.c
@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
                return -EINVAL;
        }
 
-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
+       for (bank = 0; ; bank++) {
                ret = ofnode_read_resource(mem, reg++, &res);
                if (ret < 0) {
                        reg = 0;

   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)


(2) seems much better, but I don't know why we had to use
CONFIG_NR_DRAM_BANKS here.

In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
should be replaced with a variable for it. 

> Your use case is well beyond the typical U-Boot usage. So I guess it
> will be up to Linaro to provide the necessary patches:
> 
> * determine the active CPU
> * determine the RAM assigned to the active CPU according
>   to the numa-node-id in the device-tree
> * make sure that U-Boot only uses the memory of the active CPU
>   internally
> * make sure that the UEFI memory map contains a compliant description
> * possibly, dynamically set up the environment variables
> 
> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)

For (1), we'd better have a different config, or increase
the value of CONFIG_NR_DRAM_BANKS to a bigger number?

-Takahiro Akashi


> Best regards
> 
> Heinrich

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  1:44   ` AKASHI Takahiro
@ 2021-07-07  3:18     ` Heinrich Schuchardt
  2021-07-07  3:58       ` Heinrich Schuchardt
  2021-07-07  3:59       ` Heinrich Schuchardt
  0 siblings, 2 replies; 13+ messages in thread
From: Heinrich Schuchardt @ 2021-07-07  3:18 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Fran??ois Ozog, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro <takahiro.akashi@linaro.org>:
>François,
>
>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
>> On 7/6/21 6:13 PM, François Ozog wrote:
>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
>> > description when using Qemu 5.2 NUMA configuration to adapt memory
>map
>> > (kernel_addr_r...):
>> > 
>> >         -smp 4 \
>> >           -m 8G,slots=2,maxmem=16G \
>> >          -object memory-backend-ram,size=4G,id=m0 \
>> >          -object memory-backend-ram,size=4G,id=m1 \
>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
>> > 
>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
>bootefi.
>> > 
>> > fdt addr 0x13ede6de0; fdt print
>> > 
>> > Displays fdt while I think it should not.
>> > 
>> > If I load the kernel at dram.start, the load works but not boot
>> > 
>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>> > 
>> > 
>> > DRAM:4 GiB
>> > 
>> > Flash: 64 MiB
>> > 
>> > Loading Environment from Flash... OK
>> > 
>> > In:pl011@9000000
>> > 
>> > Out: pl011@9000000
>> > 
>> > Err: pl011@9000000
>> > 
>> > Net: eth0: virtio-net#32
>> > 
>> > Hit any key to stop autoboot:0
>> > 
>> > =>
>> > 
>> > => bdinfo
>> > 
>> > boot_params = 0x0000000000000000
>> > 
>> > DRAM bank = 0x0000000000000000
>> > 
>> > -> start= 0x0000000140000000
>> > 
>> > -> size = 0x0000000100000000
>> > 
>> > flashstart= 0x0000000000000000
>> > 
>> > flashsize = 0x0000000004000000
>> > 
>> > flashoffset = 0x00000000000bc990
>> > 
>> > baudrate= 115200 bps
>> > 
>> > relocaddr = 0x000000013ff27000
>> > 
>> > reloc off = 0x000000013ff27000
>> > 
>> > Build = 64-bit
>> > 
>> > current eth = virtio-net#32
>> > 
>> > ethaddr = 52:52:52:52:52:52
>> > 
>> > IP addr = <NULL>
>> > 
>> > fdt_blob= 0x000000013ede6de0
>> > 
>> > new_fdt = 0x000000013ede6de0
>> > 
>> > fdt_size= 0x0000000000100000
>> > 
>> > lmb_dump_all:
>> > 
>> > memory.cnt= 0x1
>> > 
>> > memory.reg[0x0].base = 0x140000000
>> > 
>> > .size = 0x100000000
>> > 
>> > 
>> > reserved.cnt= 0x0
>> > 
>> > arch_number = 0x0000000000000000
>> > 
>> > TLB addr= 0x000000013fff0000
>> > 
>> > irq_sp= 0x000000013ede6dd0
>> > 
>> > sp start= 0x000000013ede6dd0
>> > 
>> > Early malloc usage: 3a8 / 2000
>> > 
>> > => load virtio 0:1 0x140000000 /oskit.efi
>> > 
>> > 853424 bytes read in 1 ms (813.9 MiB/s)
>> > 
>> > => bootefi0x140000000 0x13ede6dd0
>> > 
>> > ERROR: Failed to register WaitForKey event
>> > 
>> > Setting OsIndications failed
>> > 
>> > Error: Cannot initialize UEFI sub-system, r = 9
>> > 
>> > 
>> > I think there is a need to calculate memory map based on previous
>> > firmware (TFA, QEMU can be considered as previous frimware)
>information
>> > (DT or blob_list).
>> > 
>> > What do you think ?
>> > 
>> > Cheers
>> > 
>> > FF
>> > 
>> > --
>> > 
>> > François-Frédéric Ozog | /Director Business Development/
>> > T: +33.67221.6485
>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>| Skype: ffozog
>> > 
>> > 
>> 
>> The kernel load address is hard coded here:
>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
>> 
>> bdinfo shows:
>> DRAM start = 0x140000000
>> DRAM size  = 0x100000000
>> 
>> fdt addr $fdt_addr
>> fdt printf
>> 
>> shows two memory areas. One at 40000000, one at 140000000.
>
>(This shows that U-Boot receives a correct memory map via dtb.)
>
>Is this a NUMA machine, isn't it? Why should we care of which
>memory region be used here? Please note that this is a virtual machine,
>there is no practical difference between two regions.
>
>The root problem is that U-Boot did not recognize there were two
>memory regions. We can fix this issue in either way:
>
>1)
>diff --git a/configs/qemu_arm64_defconfig
>b/configs/qemu_arm64_defconfig
>index f6e586627a8e..b70ffae8bf6e 100644
>--- a/configs/qemu_arm64_defconfig
>+++ b/configs/qemu_arm64_defconfig
>@@ -1,7 +1,7 @@
> CONFIG_ARM=y
> CONFIG_POSITION_INDEPENDENT=y
> CONFIG_ARCH_QEMU=y
>-CONFIG_NR_DRAM_BANKS=1
>+CONFIG_NR_DRAM_BANKS=2
> CONFIG_ENV_SIZE=0x40000
> CONFIG_ENV_SECT_SIZE=0x40000
> CONFIG_AHCI=y
>
>2)
>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
>index 4b097fb588ed..4067ea2dead6 100644
>--- a/lib/fdtdec.c
>+++ b/lib/fdtdec.c
>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
>                return -EINVAL;
>        }
> 
>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>+       for (bank = 0; ; bank++) {
>                ret = ofnode_read_resource(mem, reg++, &res);
>                if (ret < 0) {
>                        reg = 0;
>
>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
>
>
>(2) seems much better, but I don't know why we had to use
>CONFIG_NR_DRAM_BANKS here.
>
>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
>should be replaced with a variable for it. 
>
>> Your use case is well beyond the typical U-Boot usage. So I guess it
>> will be up to Linaro to provide the necessary patches:
>> 
>> * determine the active CPU
>> * determine the RAM assigned to the active CPU according
>>   to the numa-node-id in the device-tree
>> * make sure that U-Boot only uses the memory of the active CPU
>>   internally
>> * make sure that the UEFI memory map contains a compliant description
>> * possibly, dynamically set up the environment variables
>> 
>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
>
>For (1), we'd better have a different config, or increase
>the value of CONFIG_NR_DRAM_BANKS to a bigger number?

Is the system configured such that each CPU can access the others CPU's RAM when entering U-Boot?

Best regards

Heinrich

>
>-Takahiro Akashi
>
>
>> Best regards
>> 
>> Heinrich


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  3:18     ` Heinrich Schuchardt
@ 2021-07-07  3:58       ` Heinrich Schuchardt
  2021-07-07  3:59       ` Heinrich Schuchardt
  1 sibling, 0 replies; 13+ messages in thread
From: Heinrich Schuchardt @ 2021-07-07  3:58 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Fran??ois Ozog, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <xypron.glpk@gmx.de>:
>Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
><takahiro.akashi@linaro.org>:
>>François,
>>
>>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
>>> On 7/6/21 6:13 PM, François Ozog wrote:
>>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
>>> > description when using Qemu 5.2 NUMA configuration to adapt memory
>>map
>>> > (kernel_addr_r...):
>>> > 
>>> >         -smp 4 \
>>> >           -m 8G,slots=2,maxmem=16G \
>>> >          -object memory-backend-ram,size=4G,id=m0 \
>>> >          -object memory-backend-ram,size=4G,id=m1 \
>>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
>>> > 
>>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
>>bootefi.
>>> > 
>>> > fdt addr 0x13ede6de0; fdt print
>>> > 
>>> > Displays fdt while I think it should not.
>>> > 
>>> > If I load the kernel at dram.start, the load works but not boot
>>> > 
>>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>>> > 
>>> > 
>>> > DRAM:4 GiB
>>> > 
>>> > Flash: 64 MiB
>>> > 
>>> > Loading Environment from Flash... OK
>>> > 
>>> > In:pl011@9000000
>>> > 
>>> > Out: pl011@9000000
>>> > 
>>> > Err: pl011@9000000
>>> > 
>>> > Net: eth0: virtio-net#32
>>> > 
>>> > Hit any key to stop autoboot:0
>>> > 
>>> > =>
>>> > 
>>> > => bdinfo
>>> > 
>>> > boot_params = 0x0000000000000000
>>> > 
>>> > DRAM bank = 0x0000000000000000
>>> > 
>>> > -> start= 0x0000000140000000
>>> > 
>>> > -> size = 0x0000000100000000
>>> > 
>>> > flashstart= 0x0000000000000000
>>> > 
>>> > flashsize = 0x0000000004000000
>>> > 
>>> > flashoffset = 0x00000000000bc990
>>> > 
>>> > baudrate= 115200 bps
>>> > 
>>> > relocaddr = 0x000000013ff27000
>>> > 
>>> > reloc off = 0x000000013ff27000
>>> > 
>>> > Build = 64-bit
>>> > 
>>> > current eth = virtio-net#32
>>> > 
>>> > ethaddr = 52:52:52:52:52:52
>>> > 
>>> > IP addr = <NULL>
>>> > 
>>> > fdt_blob= 0x000000013ede6de0
>>> > 
>>> > new_fdt = 0x000000013ede6de0
>>> > 
>>> > fdt_size= 0x0000000000100000
>>> > 
>>> > lmb_dump_all:
>>> > 
>>> > memory.cnt= 0x1
>>> > 
>>> > memory.reg[0x0].base = 0x140000000
>>> > 
>>> > .size = 0x100000000
>>> > 
>>> > 
>>> > reserved.cnt= 0x0
>>> > 
>>> > arch_number = 0x0000000000000000
>>> > 
>>> > TLB addr= 0x000000013fff0000
>>> > 
>>> > irq_sp= 0x000000013ede6dd0
>>> > 
>>> > sp start= 0x000000013ede6dd0
>>> > 
>>> > Early malloc usage: 3a8 / 2000
>>> > 
>>> > => load virtio 0:1 0x140000000 /oskit.efi
>>> > 
>>> > 853424 bytes read in 1 ms (813.9 MiB/s)
>>> > 
>>> > => bootefi0x140000000 0x13ede6dd0
>>> > 
>>> > ERROR: Failed to register WaitForKey event
>>> > 
>>> > Setting OsIndications failed
>>> > 
>>> > Error: Cannot initialize UEFI sub-system, r = 9
>>> > 
>>> > 
>>> > I think there is a need to calculate memory map based on previous
>>> > firmware (TFA, QEMU can be considered as previous frimware)
>>information
>>> > (DT or blob_list).
>>> > 
>>> > What do you think ?
>>> > 
>>> > Cheers
>>> > 
>>> > FF
>>> > 
>>> > --
>>> > 
>>> > François-Frédéric Ozog | /Director Business Development/
>>> > T: +33.67221.6485
>>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>>| Skype: ffozog
>>> > 
>>> > 
>>> 
>>> The kernel load address is hard coded here:
>>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
>>> 
>>> bdinfo shows:
>>> DRAM start = 0x140000000
>>> DRAM size  = 0x100000000
>>> 
>>> fdt addr $fdt_addr
>>> fdt printf
>>> 
>>> shows two memory areas. One at 40000000, one at 140000000.
>>
>>(This shows that U-Boot receives a correct memory map via dtb.)
>>
>>Is this a NUMA machine, isn't it? Why should we care of which
>>memory region be used here? Please note that this is a virtual
>machine,
>>there is no practical difference between two regions.
>>
>>The root problem is that U-Boot did not recognize there were two
>>memory regions. We can fix this issue in either way:
>>
>>1)
>>diff --git a/configs/qemu_arm64_defconfig
>>b/configs/qemu_arm64_defconfig
>>index f6e586627a8e..b70ffae8bf6e 100644
>>--- a/configs/qemu_arm64_defconfig
>>+++ b/configs/qemu_arm64_defconfig
>>@@ -1,7 +1,7 @@
>> CONFIG_ARM=y
>> CONFIG_POSITION_INDEPENDENT=y
>> CONFIG_ARCH_QEMU=y
>>-CONFIG_NR_DRAM_BANKS=1
>>+CONFIG_NR_DRAM_BANKS=2
>> CONFIG_ENV_SIZE=0x40000
>> CONFIG_ENV_SECT_SIZE=0x40000
>> CONFIG_AHCI=y
>>
>>2)
>>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
>>index 4b097fb588ed..4067ea2dead6 100644
>>--- a/lib/fdtdec.c
>>+++ b/lib/fdtdec.c
>>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
>>                return -EINVAL;
>>        }
>> 
>>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>>+       for (bank = 0; ; bank++) {
>>                ret = ofnode_read_resource(mem, reg++, &res);
>>                if (ret < 0) {
>>                        reg = 0;
>>
>>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
>>
>>
>>(2) seems much better, but I don't know why we had to use
>>CONFIG_NR_DRAM_BANKS here.
>>
>>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
>>should be replaced with a variable for it. 
>>
>>> Your use case is well beyond the typical U-Boot usage. So I guess it
>>> will be up to Linaro to provide the necessary patches:
>>> 
>>> * determine the active CPU
>>> * determine the RAM assigned to the active CPU according
>>>   to the numa-node-id in the device-tree
>>> * make sure that U-Boot only uses the memory of the active CPU
>>>   internally
>>> * make sure that the UEFI memory map contains a compliant
>description
>>> * possibly, dynamically set up the environment variables
>>> 
>>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
>>
>>For (1), we'd better have a different config, or increase
>>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
>
>Is the system configured such that each CPU can access the others CPU's
>RAM when entering U-Boot?
>
>Best regards
>
>Heinrich
>

At least the comments for this patch sound as if on a physical system cross NUMA node memory access is only available after full SMP initialization:

https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/

QEMU may be less restrictive.

QEMU allows the node distance to be 255 indicating that cross node access is infeasible.

Best regards

Heinrich

>>
>>-Takahiro Akashi
>>
>>
>>> Best regards
>>> 
>>> Heinrich


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  3:18     ` Heinrich Schuchardt
  2021-07-07  3:58       ` Heinrich Schuchardt
@ 2021-07-07  3:59       ` Heinrich Schuchardt
  2021-07-07  7:40         ` François Ozog
  1 sibling, 1 reply; 13+ messages in thread
From: Heinrich Schuchardt @ 2021-07-07  3:59 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Fran??ois Ozog, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <xypron.glpk@gmx.de>:
>Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
><takahiro.akashi@linaro.org>:
>>François,
>>
>>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
>>> On 7/6/21 6:13 PM, François Ozog wrote:
>>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
>>> > description when using Qemu 5.2 NUMA configuration to adapt memory
>>map
>>> > (kernel_addr_r...):
>>> > 
>>> >         -smp 4 \
>>> >           -m 8G,slots=2,maxmem=16G \
>>> >          -object memory-backend-ram,size=4G,id=m0 \
>>> >          -object memory-backend-ram,size=4G,id=m1 \
>>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
>>> > 
>>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
>>bootefi.
>>> > 
>>> > fdt addr 0x13ede6de0; fdt print
>>> > 
>>> > Displays fdt while I think it should not.
>>> > 
>>> > If I load the kernel at dram.start, the load works but not boot
>>> > 
>>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>>> > 
>>> > 
>>> > DRAM:4 GiB
>>> > 
>>> > Flash: 64 MiB
>>> > 
>>> > Loading Environment from Flash... OK
>>> > 
>>> > In:pl011@9000000
>>> > 
>>> > Out: pl011@9000000
>>> > 
>>> > Err: pl011@9000000
>>> > 
>>> > Net: eth0: virtio-net#32
>>> > 
>>> > Hit any key to stop autoboot:0
>>> > 
>>> > =>
>>> > 
>>> > => bdinfo
>>> > 
>>> > boot_params = 0x0000000000000000
>>> > 
>>> > DRAM bank = 0x0000000000000000
>>> > 
>>> > -> start= 0x0000000140000000
>>> > 
>>> > -> size = 0x0000000100000000
>>> > 
>>> > flashstart= 0x0000000000000000
>>> > 
>>> > flashsize = 0x0000000004000000
>>> > 
>>> > flashoffset = 0x00000000000bc990
>>> > 
>>> > baudrate= 115200 bps
>>> > 
>>> > relocaddr = 0x000000013ff27000
>>> > 
>>> > reloc off = 0x000000013ff27000
>>> > 
>>> > Build = 64-bit
>>> > 
>>> > current eth = virtio-net#32
>>> > 
>>> > ethaddr = 52:52:52:52:52:52
>>> > 
>>> > IP addr = <NULL>
>>> > 
>>> > fdt_blob= 0x000000013ede6de0
>>> > 
>>> > new_fdt = 0x000000013ede6de0
>>> > 
>>> > fdt_size= 0x0000000000100000
>>> > 
>>> > lmb_dump_all:
>>> > 
>>> > memory.cnt= 0x1
>>> > 
>>> > memory.reg[0x0].base = 0x140000000
>>> > 
>>> > .size = 0x100000000
>>> > 
>>> > 
>>> > reserved.cnt= 0x0
>>> > 
>>> > arch_number = 0x0000000000000000
>>> > 
>>> > TLB addr= 0x000000013fff0000
>>> > 
>>> > irq_sp= 0x000000013ede6dd0
>>> > 
>>> > sp start= 0x000000013ede6dd0
>>> > 
>>> > Early malloc usage: 3a8 / 2000
>>> > 
>>> > => load virtio 0:1 0x140000000 /oskit.efi
>>> > 
>>> > 853424 bytes read in 1 ms (813.9 MiB/s)
>>> > 
>>> > => bootefi0x140000000 0x13ede6dd0
>>> > 
>>> > ERROR: Failed to register WaitForKey event
>>> > 
>>> > Setting OsIndications failed
>>> > 
>>> > Error: Cannot initialize UEFI sub-system, r = 9
>>> > 
>>> > 
>>> > I think there is a need to calculate memory map based on previous
>>> > firmware (TFA, QEMU can be considered as previous frimware)
>>information
>>> > (DT or blob_list).
>>> > 
>>> > What do you think ?
>>> > 
>>> > Cheers
>>> > 
>>> > FF
>>> > 
>>> > --
>>> > 
>>> > François-Frédéric Ozog | /Director Business Development/
>>> > T: +33.67221.6485
>>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>>| Skype: ffozog
>>> > 
>>> > 
>>> 
>>> The kernel load address is hard coded here:
>>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
>>> 
>>> bdinfo shows:
>>> DRAM start = 0x140000000
>>> DRAM size  = 0x100000000
>>> 
>>> fdt addr $fdt_addr
>>> fdt printf
>>> 
>>> shows two memory areas. One at 40000000, one at 140000000.
>>
>>(This shows that U-Boot receives a correct memory map via dtb.)
>>
>>Is this a NUMA machine, isn't it? Why should we care of which
>>memory region be used here? Please note that this is a virtual
>machine,
>>there is no practical difference between two regions.
>>
>>The root problem is that U-Boot did not recognize there were two
>>memory regions. We can fix this issue in either way:
>>
>>1)
>>diff --git a/configs/qemu_arm64_defconfig
>>b/configs/qemu_arm64_defconfig
>>index f6e586627a8e..b70ffae8bf6e 100644
>>--- a/configs/qemu_arm64_defconfig
>>+++ b/configs/qemu_arm64_defconfig
>>@@ -1,7 +1,7 @@
>> CONFIG_ARM=y
>> CONFIG_POSITION_INDEPENDENT=y
>> CONFIG_ARCH_QEMU=y
>>-CONFIG_NR_DRAM_BANKS=1
>>+CONFIG_NR_DRAM_BANKS=2
>> CONFIG_ENV_SIZE=0x40000
>> CONFIG_ENV_SECT_SIZE=0x40000
>> CONFIG_AHCI=y
>>
>>2)
>>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
>>index 4b097fb588ed..4067ea2dead6 100644
>>--- a/lib/fdtdec.c
>>+++ b/lib/fdtdec.c
>>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
>>                return -EINVAL;
>>        }
>> 
>>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>>+       for (bank = 0; ; bank++) {
>>                ret = ofnode_read_resource(mem, reg++, &res);
>>                if (ret < 0) {
>>                        reg = 0;
>>
>>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
>>
>>
>>(2) seems much better, but I don't know why we had to use
>>CONFIG_NR_DRAM_BANKS here.
>>
>>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
>>should be replaced with a variable for it. 
>>
>>> Your use case is well beyond the typical U-Boot usage. So I guess it
>>> will be up to Linaro to provide the necessary patches:
>>> 
>>> * determine the active CPU
>>> * determine the RAM assigned to the active CPU according
>>>   to the numa-node-id in the device-tree
>>> * make sure that U-Boot only uses the memory of the active CPU
>>>   internally
>>> * make sure that the UEFI memory map contains a compliant
>description
>>> * possibly, dynamically set up the environment variables
>>> 
>>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
>>
>>For (1), we'd better have a different config, or increase
>>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
>
>Is the system configured such that each CPU can access the others CPU's
>RAM when entering U-Boot?
>
>Best regards
>
>Heinrich
>

At least the comments for this patch sound as if on a physical system cross NUMA node memory access is only available after full SMP initialization:

https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/

QEMU may be less restrictive.

QEMU allows the node distance to be 255 indicating that cross node access is infeasible.

Best regards

Heinrich

>>
>>-Takahiro Akashi
>>
>>
>>> Best regards
>>> 
>>> Heinrich


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  3:59       ` Heinrich Schuchardt
@ 2021-07-07  7:40         ` François Ozog
  2021-07-07  9:37           ` François Ozog
  0 siblings, 1 reply; 13+ messages in thread
From: François Ozog @ 2021-07-07  7:40 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: AKASHI Takahiro, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>
> Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <xypron.glpk@gmx.de>:
> >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> ><takahiro.akashi@linaro.org>:
> >>François,
> >>
> >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> >>> On 7/6/21 6:13 PM, François Ozog wrote:
> >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> >>map
> >>> > (kernel_addr_r...):
> >>> >
> >>> >         -smp 4 \
> >>> >           -m 8G,slots=2,maxmem=16G \
> >>> >          -object memory-backend-ram,size=4G,id=m0 \
> >>> >          -object memory-backend-ram,size=4G,id=m1 \
> >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> >>> >
> >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> >>bootefi.
> >>> >
> >>> > fdt addr 0x13ede6de0; fdt print
> >>> >
> >>> > Displays fdt while I think it should not.
> >>> >
> >>> > If I load the kernel at dram.start, the load works but not boot
> >>> >
> >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> >>> >
> >>> >
> >>> > DRAM:4 GiB
> >>> >
> >>> > Flash: 64 MiB
> >>> >
> >>> > Loading Environment from Flash... OK
> >>> >
> >>> > In:pl011@9000000
> >>> >
> >>> > Out: pl011@9000000
> >>> >
> >>> > Err: pl011@9000000
> >>> >
> >>> > Net: eth0: virtio-net#32
> >>> >
> >>> > Hit any key to stop autoboot:0
> >>> >
> >>> > =>
> >>> >
> >>> > => bdinfo
> >>> >
> >>> > boot_params = 0x0000000000000000
> >>> >
> >>> > DRAM bank = 0x0000000000000000
> >>> >
> >>> > -> start= 0x0000000140000000
> >>> >
> >>> > -> size = 0x0000000100000000
> >>> >
> >>> > flashstart= 0x0000000000000000
> >>> >
> >>> > flashsize = 0x0000000004000000
> >>> >
> >>> > flashoffset = 0x00000000000bc990
> >>> >
> >>> > baudrate= 115200 bps
> >>> >
> >>> > relocaddr = 0x000000013ff27000
> >>> >
> >>> > reloc off = 0x000000013ff27000
> >>> >
> >>> > Build = 64-bit
> >>> >
> >>> > current eth = virtio-net#32
> >>> >
> >>> > ethaddr = 52:52:52:52:52:52
> >>> >
> >>> > IP addr = <NULL>
> >>> >
> >>> > fdt_blob= 0x000000013ede6de0
> >>> >
> >>> > new_fdt = 0x000000013ede6de0
> >>> >
> >>> > fdt_size= 0x0000000000100000
> >>> >
> >>> > lmb_dump_all:
> >>> >
> >>> > memory.cnt= 0x1
> >>> >
> >>> > memory.reg[0x0].base = 0x140000000
> >>> >
> >>> > .size = 0x100000000
> >>> >
> >>> >
> >>> > reserved.cnt= 0x0
> >>> >
> >>> > arch_number = 0x0000000000000000
> >>> >
> >>> > TLB addr= 0x000000013fff0000
> >>> >
> >>> > irq_sp= 0x000000013ede6dd0
> >>> >
> >>> > sp start= 0x000000013ede6dd0
> >>> >
> >>> > Early malloc usage: 3a8 / 2000
> >>> >
> >>> > => load virtio 0:1 0x140000000 /oskit.efi
> >>> >
> >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> >>> >
> >>> > => bootefi0x140000000 0x13ede6dd0
> >>> >
> >>> > ERROR: Failed to register WaitForKey event
> >>> >
> >>> > Setting OsIndications failed
> >>> >
> >>> > Error: Cannot initialize UEFI sub-system, r = 9
> >>> >
> >>> >
> >>> > I think there is a need to calculate memory map based on previous
> >>> > firmware (TFA, QEMU can be considered as previous frimware)
> >>information
> >>> > (DT or blob_list).
> >>> >
> >>> > What do you think ?
> >>> >
> >>> > Cheers
> >>> >
> >>> > FF
> >>> >
> >>> > --
> >>> >
> >>> > François-Frédéric Ozog | /Director Business Development/
> >>> > T: +33.67221.6485
> >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> >>| Skype: ffozog
> >>> >
> >>> >
> >>>
> >>> The kernel load address is hard coded here:
> >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> >>>
> >>> bdinfo shows:
> >>> DRAM start = 0x140000000
> >>> DRAM size  = 0x100000000
> >>>
> >>> fdt addr $fdt_addr
> >>> fdt printf
> >>>
> >>> shows two memory areas. One at 40000000, one at 140000000.
> >>
> >>(This shows that U-Boot receives a correct memory map via dtb.)
> >>
> >>Is this a NUMA machine, isn't it? Why should we care of which
> >>memory region be used here? Please note that this is a virtual
> >machine,
> >>there is no practical difference between two regions.
> >>
> >>The root problem is that U-Boot did not recognize there were two
> >>memory regions. We can fix this issue in either way:
> >>
> >>1)
> >>diff --git a/configs/qemu_arm64_defconfig
> >>b/configs/qemu_arm64_defconfig
> >>index f6e586627a8e..b70ffae8bf6e 100644
> >>--- a/configs/qemu_arm64_defconfig
> >>+++ b/configs/qemu_arm64_defconfig
> >>@@ -1,7 +1,7 @@
> >> CONFIG_ARM=y
> >> CONFIG_POSITION_INDEPENDENT=y
> >> CONFIG_ARCH_QEMU=y
> >>-CONFIG_NR_DRAM_BANKS=1
> >>+CONFIG_NR_DRAM_BANKS=2
> >> CONFIG_ENV_SIZE=0x40000
> >> CONFIG_ENV_SECT_SIZE=0x40000
> >> CONFIG_AHCI=y
> >>
> >>2)
> >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> >>index 4b097fb588ed..4067ea2dead6 100644
> >>--- a/lib/fdtdec.c
> >>+++ b/lib/fdtdec.c
> >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> >>                return -EINVAL;
> >>        }
> >>
> >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> >>+       for (bank = 0; ; bank++) {
> >>                ret = ofnode_read_resource(mem, reg++, &res);
> >>                if (ret < 0) {
> >>                        reg = 0;
> >>
> >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> >>
> >>
> >>(2) seems much better, but I don't know why we had to use
> >>CONFIG_NR_DRAM_BANKS here.
> >>

2) alone does not work as other places in the code refer to
CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
bdinfo seems now correct:

=> bdinfo
boot_params = 0x0000000000000000
DRAM bank   = 0x0000000000000000
-> start    = 0x0000000140000000
-> size     = 0x0000000100000000
DRAM bank   = 0x0000000000000001
-> start    = 0x0000000040000000
-> size     = 0x0000000100000000
flashstart  = 0x0000000000000000
flashsize   = 0x0000000004000000
flashoffset = 0x00000000000bcb88
baudrate    = 115200 bps
relocaddr   = 0x000000013ff27000
reloc off   = 0x000000013ff27000
Build       = 64-bit
current eth = virtio-net#32
ethaddr     = 52:52:52:52:52:52
IP addr     = <NULL>
fdt_blob    = 0x000000013ede6cf0
new_fdt     = 0x000000013ede6cf0
fdt_size    = 0x0000000000100000
lmb_dump_all:
    memory.cnt   = 0x1
    memory.reg[0x0].base   = 0x40000000
  .size   = 0x200000000
    reserved.cnt   = 0x1
    reserved.reg[0x0].base = 0x13ede58f0
    .size = 0x121a710
arch_number = 0x0000000000000000
TLB addr    = 0x000000013fff0000
irq_sp      = 0x000000013ede6ce0
sp start    = 0x000000013ede6ce0
Early malloc usage: 3a8 / 2000

May I suggest you propose a combined patch Akashi-san? If we assume
NUMA systems to be tested up to 8 nodes to mimic real existing
enterprise hardware and up to 4 memory slots (say for memory hot
plugging tests) what about a default value of 32? Alternatively, we
could set this value to a much higher one if the costs are negligible.


> >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> >>should be replaced with a variable for it.
> >>
> >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> >>> will be up to Linaro to provide the necessary patches:
> >>>
> >>> * determine the active CPU
> >>> * determine the RAM assigned to the active CPU according
> >>>   to the numa-node-id in the device-tree
> >>> * make sure that U-Boot only uses the memory of the active CPU
> >>>   internally
> >>> * make sure that the UEFI memory map contains a compliant
> >description
> >>> * possibly, dynamically set up the environment variables
> >>>
> >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> >>
> >>For (1), we'd better have a different config, or increase
> >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> >
> >Is the system configured such that each CPU can access the others CPU's
> >RAM when entering U-Boot?
> >
> >Best regards
> >
> >Heinrich
> >
>
> At least the comments for this patch sound as if on a physical system cross NUMA node memory access is only available after full SMP initialization:
>
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
>
> QEMU may be less restrictive.
>
> QEMU allows the node distance to be 255 indicating that cross node access is infeasible.
>
> Best regards
>
> Heinrich
>
> >>
> >>-Takahiro Akashi
> >>
> >>
> >>> Best regards
> >>>
> >>> Heinrich
>


-- 
François-Frédéric Ozog | Director Business Development
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  7:40         ` François Ozog
@ 2021-07-07  9:37           ` François Ozog
  2021-07-07 10:16             ` AKASHI Takahiro
  0 siblings, 1 reply; 13+ messages in thread
From: François Ozog @ 2021-07-07  9:37 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: AKASHI Takahiro, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog@linaro.org> wrote:

> On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de>
> wrote:
> >
> > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> xypron.glpk@gmx.de>:
> > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > ><takahiro.akashi@linaro.org>:
> > >>François,
> > >>
> > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> > >>map
> > >>> > (kernel_addr_r...):
> > >>> >
> > >>> >         -smp 4 \
> > >>> >           -m 8G,slots=2,maxmem=16G \
> > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > >>> >
> > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > >>bootefi.
> > >>> >
> > >>> > fdt addr 0x13ede6de0; fdt print
> > >>> >
> > >>> > Displays fdt while I think it should not.
> > >>> >
> > >>> > If I load the kernel at dram.start, the load works but not boot
> > >>> >
> > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > >>> >
> > >>> >
> > >>> > DRAM:4 GiB
> > >>> >
> > >>> > Flash: 64 MiB
> > >>> >
> > >>> > Loading Environment from Flash... OK
> > >>> >
> > >>> > In:pl011@9000000
> > >>> >
> > >>> > Out: pl011@9000000
> > >>> >
> > >>> > Err: pl011@9000000
> > >>> >
> > >>> > Net: eth0: virtio-net#32
> > >>> >
> > >>> > Hit any key to stop autoboot:0
> > >>> >
> > >>> > =>
> > >>> >
> > >>> > => bdinfo
> > >>> >
> > >>> > boot_params = 0x0000000000000000
> > >>> >
> > >>> > DRAM bank = 0x0000000000000000
> > >>> >
> > >>> > -> start= 0x0000000140000000
> > >>> >
> > >>> > -> size = 0x0000000100000000
> > >>> >
> > >>> > flashstart= 0x0000000000000000
> > >>> >
> > >>> > flashsize = 0x0000000004000000
> > >>> >
> > >>> > flashoffset = 0x00000000000bc990
> > >>> >
> > >>> > baudrate= 115200 bps
> > >>> >
> > >>> > relocaddr = 0x000000013ff27000
> > >>> >
> > >>> > reloc off = 0x000000013ff27000
> > >>> >
> > >>> > Build = 64-bit
> > >>> >
> > >>> > current eth = virtio-net#32
> > >>> >
> > >>> > ethaddr = 52:52:52:52:52:52
> > >>> >
> > >>> > IP addr = <NULL>
> > >>> >
> > >>> > fdt_blob= 0x000000013ede6de0
> > >>> >
> > >>> > new_fdt = 0x000000013ede6de0
> > >>> >
> > >>> > fdt_size= 0x0000000000100000
> > >>> >
> > >>> > lmb_dump_all:
> > >>> >
> > >>> > memory.cnt= 0x1
> > >>> >
> > >>> > memory.reg[0x0].base = 0x140000000
> > >>> >
> > >>> > .size = 0x100000000
> > >>> >
> > >>> >
> > >>> > reserved.cnt= 0x0
> > >>> >
> > >>> > arch_number = 0x0000000000000000
> > >>> >
> > >>> > TLB addr= 0x000000013fff0000
> > >>> >
> > >>> > irq_sp= 0x000000013ede6dd0
> > >>> >
> > >>> > sp start= 0x000000013ede6dd0
> > >>> >
> > >>> > Early malloc usage: 3a8 / 2000
> > >>> >
> > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > >>> >
> > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > >>> >
> > >>> > => bootefi0x140000000 0x13ede6dd0
> > >>> >
> > >>> > ERROR: Failed to register WaitForKey event
> > >>> >
> > >>> > Setting OsIndications failed
> > >>> >
> > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > >>> >
> > >>> >
> > >>> > I think there is a need to calculate memory map based on previous
> > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > >>information
> > >>> > (DT or blob_list).
> > >>> >
> > >>> > What do you think ?
> > >>> >
> > >>> > Cheers
> > >>> >
> > >>> > FF
> > >>> >
> > >>> > --
> > >>> >
> > >>> > François-Frédéric Ozog | /Director Business Development/
> > >>> > T: +33.67221.6485
> > >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > >>| Skype: ffozog
> > >>> >
> > >>> >
> > >>>
> > >>> The kernel load address is hard coded here:
> > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > >>>
> > >>> bdinfo shows:
> > >>> DRAM start = 0x140000000
> > >>> DRAM size  = 0x100000000
> > >>>
> > >>> fdt addr $fdt_addr
> > >>> fdt printf
> > >>>
> > >>> shows two memory areas. One at 40000000, one at 140000000.
> > >>
> > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > >>
> > >>Is this a NUMA machine, isn't it? Why should we care of which
> > >>memory region be used here? Please note that this is a virtual
> > >machine,
> > >>there is no practical difference between two regions.
> > >>
> > >>The root problem is that U-Boot did not recognize there were two
> > >>memory regions. We can fix this issue in either way:
> > >>
> > >>1)
> > >>diff --git a/configs/qemu_arm64_defconfig
> > >>b/configs/qemu_arm64_defconfig
> > >>index f6e586627a8e..b70ffae8bf6e 100644
> > >>--- a/configs/qemu_arm64_defconfig
> > >>+++ b/configs/qemu_arm64_defconfig
> > >>@@ -1,7 +1,7 @@
> > >> CONFIG_ARM=y
> > >> CONFIG_POSITION_INDEPENDENT=y
> > >> CONFIG_ARCH_QEMU=y
> > >>-CONFIG_NR_DRAM_BANKS=1
> > >>+CONFIG_NR_DRAM_BANKS=2
> > >> CONFIG_ENV_SIZE=0x40000
> > >> CONFIG_ENV_SECT_SIZE=0x40000
> > >> CONFIG_AHCI=y
> > >>
> > >>2)
> > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > >>index 4b097fb588ed..4067ea2dead6 100644
> > >>--- a/lib/fdtdec.c
> > >>+++ b/lib/fdtdec.c
> > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > >>                return -EINVAL;
> > >>        }
> > >>
> > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > >>+       for (bank = 0; ; bank++) {
> > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > >>                if (ret < 0) {
> > >>                        reg = 0;
> > >>
> > >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> > >>
> > >>
> > >>(2) seems much better, but I don't know why we had to use
> > >>CONFIG_NR_DRAM_BANKS here.
> > >>
>
> 2) alone does not work as other places in the code refer to
> CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> bdinfo seems now correct:
>
=> bdinfo
> boot_params = 0x0000000000000000
> DRAM bank   = 0x0000000000000000
> -> start    = 0x0000000140000000
> -> size     = 0x0000000100000000
> DRAM bank   = 0x0000000000000001
> -> start    = 0x0000000040000000
> -> size     = 0x0000000100000000
> flashstart  = 0x0000000000000000
> flashsize   = 0x0000000004000000
> flashoffset = 0x00000000000bcb88
> baudrate    = 115200 bps
> relocaddr   = 0x000000013ff27000
> reloc off   = 0x000000013ff27000
> Build       = 64-bit
> current eth = virtio-net#32
> ethaddr     = 52:52:52:52:52:52
> IP addr     = <NULL>
> fdt_blob    = 0x000000013ede6cf0
> new_fdt     = 0x000000013ede6cf0
> fdt_size    = 0x0000000000100000
> lmb_dump_all:
>     memory.cnt   = 0x1
>     memory.reg[0x0].base   = 0x40000000
>   .size   = 0x200000000
>     reserved.cnt   = 0x1
>     reserved.reg[0x0].base = 0x13ede58f0
>     .size = 0x121a710
> arch_number = 0x0000000000000000
> TLB addr    = 0x000000013fff0000
> irq_sp      = 0x000000013ede6ce0
> sp start    = 0x000000013ede6ce0
> Early malloc usage: 3a8 / 2000
>
> May I suggest you propose a combined patch Akashi-san? If we assume
> NUMA systems to be tested up to 8 nodes to mimic real existing
> enterprise hardware and up to 4 memory slots (say for memory hot
> plugging tests) what about a default value of 32? Alternatively, we
> could set this value to a much higher one if the costs are negligible.
>
>
> Well, lets not rush as there are other twists:

the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
which I assume is not the case. EDK2 reports it as ConventionalMemory.

The root cause seem to be gd->ramtop not being setup properly.

Further analysis shows that the DT passed to the booted EFI payload does
not seem to be correct:

DT fragment passed to U-Boot

memory@140000000 {
numa-node-id = <0x00000001>;
reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
device_type = "memory";
};
memory@40000000 {
numa-node-id = <0x00000000>;
reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
device_type = "memory";
};

DT passed to payload (as per my debug code):

memory@140000000: memory

    numa-node-id 1

    reg (len= 32)

         140000000 100000000

         40000000 100000000

memory@40000000: memory

    numa-node-id 0

    reg (len= 16)

         40000000 100000000

I am investigating this further...

> >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > >>should be replaced with a variable for it.
> > >>
> > >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> > >>> will be up to Linaro to provide the necessary patches:
> > >>>
> > >>> * determine the active CPU
> > >>> * determine the RAM assigned to the active CPU according
> > >>>   to the numa-node-id in the device-tree
> > >>> * make sure that U-Boot only uses the memory of the active CPU
> > >>>   internally
> > >>> * make sure that the UEFI memory map contains a compliant
> > >description
> > >>> * possibly, dynamically set up the environment variables
> > >>>
> > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > >>
> > >>For (1), we'd better have a different config, or increase
> > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > >
> > >Is the system configured such that each CPU can access the others CPU's
> > >RAM when entering U-Boot?
> > >
> > >Best regards
> > >
> > >Heinrich
> > >
> >
> > At least the comments for this patch sound as if on a physical system
> cross NUMA node memory access is only available after full SMP
> initialization:
> >
> >
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> >
> > QEMU may be less restrictive.
> >
> > QEMU allows the node distance to be 255 indicating that cross node
> access is infeasible.
> >
> > Best regards
> >
> > Heinrich
> >
> > >>
> > >>-Takahiro Akashi
> > >>
> > >>
> > >>> Best regards
> > >>>
> > >>> Heinrich
> >
>
>
> --
> François-Frédéric Ozog | Director Business Development
> T: +33.67221.6485
> francois.ozog@linaro.org | Skype: ffozog
>


-- 
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07  9:37           ` François Ozog
@ 2021-07-07 10:16             ` AKASHI Takahiro
  2021-07-07 11:00               ` François Ozog
  0 siblings, 1 reply; 13+ messages in thread
From: AKASHI Takahiro @ 2021-07-07 10:16 UTC (permalink / raw)
  To: Fran??ois Ozog
  Cc: Heinrich Schuchardt, Ilias Apalodimas, Tuomas Tynkkynen,
	U-Boot Mailing List

On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog@linaro.org> wrote:
> 
> > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de>
> > wrote:
> > >
> > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> > xypron.glpk@gmx.de>:
> > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > > ><takahiro.akashi@linaro.org>:
> > > >>François,
> > > >>
> > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > > >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> > > >>map
> > > >>> > (kernel_addr_r...):
> > > >>> >
> > > >>> >         -smp 4 \
> > > >>> >           -m 8G,slots=2,maxmem=16G \
> > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > > >>> >
> > > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > > >>bootefi.
> > > >>> >
> > > >>> > fdt addr 0x13ede6de0; fdt print
> > > >>> >
> > > >>> > Displays fdt while I think it should not.
> > > >>> >
> > > >>> > If I load the kernel at dram.start, the load works but not boot
> > > >>> >
> > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > > >>> >
> > > >>> >
> > > >>> > DRAM:4 GiB
> > > >>> >
> > > >>> > Flash: 64 MiB
> > > >>> >
> > > >>> > Loading Environment from Flash... OK
> > > >>> >
> > > >>> > In:pl011@9000000
> > > >>> >
> > > >>> > Out: pl011@9000000
> > > >>> >
> > > >>> > Err: pl011@9000000
> > > >>> >
> > > >>> > Net: eth0: virtio-net#32
> > > >>> >
> > > >>> > Hit any key to stop autoboot:0
> > > >>> >
> > > >>> > =>
> > > >>> >
> > > >>> > => bdinfo
> > > >>> >
> > > >>> > boot_params = 0x0000000000000000
> > > >>> >
> > > >>> > DRAM bank = 0x0000000000000000
> > > >>> >
> > > >>> > -> start= 0x0000000140000000
> > > >>> >
> > > >>> > -> size = 0x0000000100000000
> > > >>> >
> > > >>> > flashstart= 0x0000000000000000
> > > >>> >
> > > >>> > flashsize = 0x0000000004000000
> > > >>> >
> > > >>> > flashoffset = 0x00000000000bc990
> > > >>> >
> > > >>> > baudrate= 115200 bps
> > > >>> >
> > > >>> > relocaddr = 0x000000013ff27000
> > > >>> >
> > > >>> > reloc off = 0x000000013ff27000
> > > >>> >
> > > >>> > Build = 64-bit
> > > >>> >
> > > >>> > current eth = virtio-net#32
> > > >>> >
> > > >>> > ethaddr = 52:52:52:52:52:52
> > > >>> >
> > > >>> > IP addr = <NULL>
> > > >>> >
> > > >>> > fdt_blob= 0x000000013ede6de0
> > > >>> >
> > > >>> > new_fdt = 0x000000013ede6de0
> > > >>> >
> > > >>> > fdt_size= 0x0000000000100000
> > > >>> >
> > > >>> > lmb_dump_all:
> > > >>> >
> > > >>> > memory.cnt= 0x1
> > > >>> >
> > > >>> > memory.reg[0x0].base = 0x140000000
> > > >>> >
> > > >>> > .size = 0x100000000
> > > >>> >
> > > >>> >
> > > >>> > reserved.cnt= 0x0
> > > >>> >
> > > >>> > arch_number = 0x0000000000000000
> > > >>> >
> > > >>> > TLB addr= 0x000000013fff0000
> > > >>> >
> > > >>> > irq_sp= 0x000000013ede6dd0
> > > >>> >
> > > >>> > sp start= 0x000000013ede6dd0
> > > >>> >
> > > >>> > Early malloc usage: 3a8 / 2000
> > > >>> >
> > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > > >>> >
> > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > > >>> >
> > > >>> > => bootefi0x140000000 0x13ede6dd0
> > > >>> >
> > > >>> > ERROR: Failed to register WaitForKey event
> > > >>> >
> > > >>> > Setting OsIndications failed
> > > >>> >
> > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > > >>> >
> > > >>> >
> > > >>> > I think there is a need to calculate memory map based on previous
> > > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > > >>information
> > > >>> > (DT or blob_list).
> > > >>> >
> > > >>> > What do you think ?
> > > >>> >
> > > >>> > Cheers
> > > >>> >
> > > >>> > FF
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > François-Frédéric Ozog | /Director Business Development/
> > > >>> > T: +33.67221.6485
> > > >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > > >>| Skype: ffozog
> > > >>> >
> > > >>> >
> > > >>>
> > > >>> The kernel load address is hard coded here:
> > > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > > >>>
> > > >>> bdinfo shows:
> > > >>> DRAM start = 0x140000000
> > > >>> DRAM size  = 0x100000000
> > > >>>
> > > >>> fdt addr $fdt_addr
> > > >>> fdt printf
> > > >>>
> > > >>> shows two memory areas. One at 40000000, one at 140000000.
> > > >>
> > > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > > >>
> > > >>Is this a NUMA machine, isn't it? Why should we care of which
> > > >>memory region be used here? Please note that this is a virtual
> > > >machine,
> > > >>there is no practical difference between two regions.
> > > >>
> > > >>The root problem is that U-Boot did not recognize there were two
> > > >>memory regions. We can fix this issue in either way:
> > > >>
> > > >>1)
> > > >>diff --git a/configs/qemu_arm64_defconfig
> > > >>b/configs/qemu_arm64_defconfig
> > > >>index f6e586627a8e..b70ffae8bf6e 100644
> > > >>--- a/configs/qemu_arm64_defconfig
> > > >>+++ b/configs/qemu_arm64_defconfig
> > > >>@@ -1,7 +1,7 @@
> > > >> CONFIG_ARM=y
> > > >> CONFIG_POSITION_INDEPENDENT=y
> > > >> CONFIG_ARCH_QEMU=y
> > > >>-CONFIG_NR_DRAM_BANKS=1
> > > >>+CONFIG_NR_DRAM_BANKS=2
> > > >> CONFIG_ENV_SIZE=0x40000
> > > >> CONFIG_ENV_SECT_SIZE=0x40000
> > > >> CONFIG_AHCI=y
> > > >>
> > > >>2)
> > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > > >>index 4b097fb588ed..4067ea2dead6 100644
> > > >>--- a/lib/fdtdec.c
> > > >>+++ b/lib/fdtdec.c
> > > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > > >>                return -EINVAL;
> > > >>        }
> > > >>
> > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > > >>+       for (bank = 0; ; bank++) {
> > > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > > >>                if (ret < 0) {
> > > >>                        reg = 0;
> > > >>
> > > >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> > > >>
> > > >>
> > > >>(2) seems much better, but I don't know why we had to use
> > > >>CONFIG_NR_DRAM_BANKS here.
> > > >>
> >
> > 2) alone does not work as other places in the code refer to
> > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> > bdinfo seems now correct:
> >
> => bdinfo
> > boot_params = 0x0000000000000000
> > DRAM bank   = 0x0000000000000000
> > -> start    = 0x0000000140000000
> > -> size     = 0x0000000100000000
> > DRAM bank   = 0x0000000000000001
> > -> start    = 0x0000000040000000
> > -> size     = 0x0000000100000000
> > flashstart  = 0x0000000000000000
> > flashsize   = 0x0000000004000000
> > flashoffset = 0x00000000000bcb88
> > baudrate    = 115200 bps
> > relocaddr   = 0x000000013ff27000
> > reloc off   = 0x000000013ff27000
> > Build       = 64-bit
> > current eth = virtio-net#32
> > ethaddr     = 52:52:52:52:52:52
> > IP addr     = <NULL>
> > fdt_blob    = 0x000000013ede6cf0
> > new_fdt     = 0x000000013ede6cf0
> > fdt_size    = 0x0000000000100000
> > lmb_dump_all:
> >     memory.cnt   = 0x1
> >     memory.reg[0x0].base   = 0x40000000
> >   .size   = 0x200000000
> >     reserved.cnt   = 0x1
> >     reserved.reg[0x0].base = 0x13ede58f0
> >     .size = 0x121a710
> > arch_number = 0x0000000000000000
> > TLB addr    = 0x000000013fff0000
> > irq_sp      = 0x000000013ede6ce0
> > sp start    = 0x000000013ede6ce0
> > Early malloc usage: 3a8 / 2000
> >
> > May I suggest you propose a combined patch Akashi-san? If we assume
> > NUMA systems to be tested up to 8 nodes to mimic real existing
> > enterprise hardware and up to 4 memory slots (say for memory hot
> > plugging tests) what about a default value of 32? Alternatively, we
> > could set this value to a much higher one if the costs are negligible.
> >
> >
> > Well, lets not rush as there are other twists:
> 
> the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
> which I assume is not the case. EDK2 reports it as ConventionalMemory.
> 
> The root cause seem to be gd->ramtop not being setup properly.
> 
> Further analysis shows that the DT passed to the booted EFI payload does
> not seem to be correct:
> 
> DT fragment passed to U-Boot
> 
> memory@140000000 {
> numa-node-id = <0x00000001>;
> reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> memory@40000000 {
> numa-node-id = <0x00000000>;
> reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> 
> DT passed to payload (as per my debug code):
> 
> memory@140000000: memory
> 
>     numa-node-id 1
> 
>     reg (len= 32)
> 
>          140000000 100000000
> 
>          40000000 100000000
> 
> memory@40000000: memory
> 
>     numa-node-id 0
> 
>     reg (len= 16)
> 
>          40000000 100000000
> 
> I am investigating this further...

You should check the logic of fdt_fixup_memory_banks()
which is called this way:
  efi_dt_fixup()
    image_setup_libfdt()
      arch_fixup_fdt()
        fdt_fixup_memory_banks()

What it does is to put *all* the memory regions unconditionally as
a single "reg" array into the *first-detected* "memory" node, which is
"memory@140000000" in this case.
It means that this function doesn't respect NUMA configuration.

-Takahiro Akashi


> > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > > >>should be replaced with a variable for it.
> > > >>
> > > >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> > > >>> will be up to Linaro to provide the necessary patches:
> > > >>>
> > > >>> * determine the active CPU
> > > >>> * determine the RAM assigned to the active CPU according
> > > >>>   to the numa-node-id in the device-tree
> > > >>> * make sure that U-Boot only uses the memory of the active CPU
> > > >>>   internally
> > > >>> * make sure that the UEFI memory map contains a compliant
> > > >description
> > > >>> * possibly, dynamically set up the environment variables
> > > >>>
> > > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > > >>
> > > >>For (1), we'd better have a different config, or increase
> > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > > >
> > > >Is the system configured such that each CPU can access the others CPU's
> > > >RAM when entering U-Boot?
> > > >
> > > >Best regards
> > > >
> > > >Heinrich
> > > >
> > >
> > > At least the comments for this patch sound as if on a physical system
> > cross NUMA node memory access is only available after full SMP
> > initialization:
> > >
> > >
> > https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > >
> > > QEMU may be less restrictive.
> > >
> > > QEMU allows the node distance to be 255 indicating that cross node
> > access is infeasible.
> > >
> > > Best regards
> > >
> > > Heinrich
> > >
> > > >>
> > > >>-Takahiro Akashi
> > > >>
> > > >>
> > > >>> Best regards
> > > >>>
> > > >>> Heinrich
> > >
> >
> >
> > --
> > François-Frédéric Ozog | Director Business Development
> > T: +33.67221.6485
> > francois.ozog@linaro.org | Skype: ffozog
> >
> 
> 
> -- 
> François-Frédéric Ozog | *Director Business Development*
> T: +33.67221.6485
> francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07 10:16             ` AKASHI Takahiro
@ 2021-07-07 11:00               ` François Ozog
  2021-07-07 15:15                 ` François Ozog
  0 siblings, 1 reply; 13+ messages in thread
From: François Ozog @ 2021-07-07 11:00 UTC (permalink / raw)
  To: AKASHI Takahiro, Fran??ois Ozog, Heinrich Schuchardt,
	Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

On Wed, 7 Jul 2021 at 12:16, AKASHI Takahiro <takahiro.akashi@linaro.org>
wrote:

> On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> > On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog@linaro.org>
> wrote:
> >
> > > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de>
> > > wrote:
> > > >
> > > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> > > xypron.glpk@gmx.de>:
> > > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > > > ><takahiro.akashi@linaro.org>:
> > > > >>François,
> > > > >>
> > > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt
> wrote:
> > > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account
> memory
> > > > >>> > description when using Qemu 5.2 NUMA configuration to adapt
> memory
> > > > >>map
> > > > >>> > (kernel_addr_r...):
> > > > >>> >
> > > > >>> >         -smp 4 \
> > > > >>> >           -m 8G,slots=2,maxmem=16G \
> > > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > > > >>> >
> > > > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > > > >>bootefi.
> > > > >>> >
> > > > >>> > fdt addr 0x13ede6de0; fdt print
> > > > >>> >
> > > > >>> > Displays fdt while I think it should not.
> > > > >>> >
> > > > >>> > If I load the kernel at dram.start, the load works but not boot
> > > > >>> >
> > > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > > > >>> >
> > > > >>> >
> > > > >>> > DRAM:4 GiB
> > > > >>> >
> > > > >>> > Flash: 64 MiB
> > > > >>> >
> > > > >>> > Loading Environment from Flash... OK
> > > > >>> >
> > > > >>> > In:pl011@9000000
> > > > >>> >
> > > > >>> > Out: pl011@9000000
> > > > >>> >
> > > > >>> > Err: pl011@9000000
> > > > >>> >
> > > > >>> > Net: eth0: virtio-net#32
> > > > >>> >
> > > > >>> > Hit any key to stop autoboot:0
> > > > >>> >
> > > > >>> > =>
> > > > >>> >
> > > > >>> > => bdinfo
> > > > >>> >
> > > > >>> > boot_params = 0x0000000000000000
> > > > >>> >
> > > > >>> > DRAM bank = 0x0000000000000000
> > > > >>> >
> > > > >>> > -> start= 0x0000000140000000
> > > > >>> >
> > > > >>> > -> size = 0x0000000100000000
> > > > >>> >
> > > > >>> > flashstart= 0x0000000000000000
> > > > >>> >
> > > > >>> > flashsize = 0x0000000004000000
> > > > >>> >
> > > > >>> > flashoffset = 0x00000000000bc990
> > > > >>> >
> > > > >>> > baudrate= 115200 bps
> > > > >>> >
> > > > >>> > relocaddr = 0x000000013ff27000
> > > > >>> >
> > > > >>> > reloc off = 0x000000013ff27000
> > > > >>> >
> > > > >>> > Build = 64-bit
> > > > >>> >
> > > > >>> > current eth = virtio-net#32
> > > > >>> >
> > > > >>> > ethaddr = 52:52:52:52:52:52
> > > > >>> >
> > > > >>> > IP addr = <NULL>
> > > > >>> >
> > > > >>> > fdt_blob= 0x000000013ede6de0
> > > > >>> >
> > > > >>> > new_fdt = 0x000000013ede6de0
> > > > >>> >
> > > > >>> > fdt_size= 0x0000000000100000
> > > > >>> >
> > > > >>> > lmb_dump_all:
> > > > >>> >
> > > > >>> > memory.cnt= 0x1
> > > > >>> >
> > > > >>> > memory.reg[0x0].base = 0x140000000
> > > > >>> >
> > > > >>> > .size = 0x100000000
> > > > >>> >
> > > > >>> >
> > > > >>> > reserved.cnt= 0x0
> > > > >>> >
> > > > >>> > arch_number = 0x0000000000000000
> > > > >>> >
> > > > >>> > TLB addr= 0x000000013fff0000
> > > > >>> >
> > > > >>> > irq_sp= 0x000000013ede6dd0
> > > > >>> >
> > > > >>> > sp start= 0x000000013ede6dd0
> > > > >>> >
> > > > >>> > Early malloc usage: 3a8 / 2000
> > > > >>> >
> > > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > > > >>> >
> > > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > > > >>> >
> > > > >>> > => bootefi0x140000000 0x13ede6dd0
> > > > >>> >
> > > > >>> > ERROR: Failed to register WaitForKey event
> > > > >>> >
> > > > >>> > Setting OsIndications failed
> > > > >>> >
> > > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > > > >>> >
> > > > >>> >
> > > > >>> > I think there is a need to calculate memory map based on
> previous
> > > > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > > > >>information
> > > > >>> > (DT or blob_list).
> > > > >>> >
> > > > >>> > What do you think ?
> > > > >>> >
> > > > >>> > Cheers
> > > > >>> >
> > > > >>> > FF
> > > > >>> >
> > > > >>> > --
> > > > >>> >
> > > > >>> > François-Frédéric Ozog | /Director Business Development/
> > > > >>> > T: +33.67221.6485
> > > > >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > > > >>| Skype: ffozog
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >>> The kernel load address is hard coded here:
> > > > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > > > >>>
> > > > >>> bdinfo shows:
> > > > >>> DRAM start = 0x140000000
> > > > >>> DRAM size  = 0x100000000
> > > > >>>
> > > > >>> fdt addr $fdt_addr
> > > > >>> fdt printf
> > > > >>>
> > > > >>> shows two memory areas. One at 40000000, one at 140000000.
> > > > >>
> > > > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > > > >>
> > > > >>Is this a NUMA machine, isn't it? Why should we care of which
> > > > >>memory region be used here? Please note that this is a virtual
> > > > >machine,
> > > > >>there is no practical difference between two regions.
> > > > >>
> > > > >>The root problem is that U-Boot did not recognize there were two
> > > > >>memory regions. We can fix this issue in either way:
> > > > >>
> > > > >>1)
> > > > >>diff --git a/configs/qemu_arm64_defconfig
> > > > >>b/configs/qemu_arm64_defconfig
> > > > >>index f6e586627a8e..b70ffae8bf6e 100644
> > > > >>--- a/configs/qemu_arm64_defconfig
> > > > >>+++ b/configs/qemu_arm64_defconfig
> > > > >>@@ -1,7 +1,7 @@
> > > > >> CONFIG_ARM=y
> > > > >> CONFIG_POSITION_INDEPENDENT=y
> > > > >> CONFIG_ARCH_QEMU=y
> > > > >>-CONFIG_NR_DRAM_BANKS=1
> > > > >>+CONFIG_NR_DRAM_BANKS=2
> > > > >> CONFIG_ENV_SIZE=0x40000
> > > > >> CONFIG_ENV_SECT_SIZE=0x40000
> > > > >> CONFIG_AHCI=y
> > > > >>
> > > > >>2)
> > > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > > > >>index 4b097fb588ed..4067ea2dead6 100644
> > > > >>--- a/lib/fdtdec.c
> > > > >>+++ b/lib/fdtdec.c
> > > > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > > > >>                return -EINVAL;
> > > > >>        }
> > > > >>
> > > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > > > >>+       for (bank = 0; ; bank++) {
> > > > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > > > >>                if (ret < 0) {
> > > > >>                        reg = 0;
> > > > >>
> > > > >>   (fdtdec_setup_memory_banksize() is called in
> dram_init_banksize().)
> > > > >>
> > > > >>
> > > > >>(2) seems much better, but I don't know why we had to use
> > > > >>CONFIG_NR_DRAM_BANKS here.
> > > > >>
> > >
> > > 2) alone does not work as other places in the code refer to
> > > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> > > bdinfo seems now correct:
> > >
> > => bdinfo
> > > boot_params = 0x0000000000000000
> > > DRAM bank   = 0x0000000000000000
> > > -> start    = 0x0000000140000000
> > > -> size     = 0x0000000100000000
> > > DRAM bank   = 0x0000000000000001
> > > -> start    = 0x0000000040000000
> > > -> size     = 0x0000000100000000
> > > flashstart  = 0x0000000000000000
> > > flashsize   = 0x0000000004000000
> > > flashoffset = 0x00000000000bcb88
> > > baudrate    = 115200 bps
> > > relocaddr   = 0x000000013ff27000
> > > reloc off   = 0x000000013ff27000
> > > Build       = 64-bit
> > > current eth = virtio-net#32
> > > ethaddr     = 52:52:52:52:52:52
> > > IP addr     = <NULL>
> > > fdt_blob    = 0x000000013ede6cf0
> > > new_fdt     = 0x000000013ede6cf0
> > > fdt_size    = 0x0000000000100000
> > > lmb_dump_all:
> > >     memory.cnt   = 0x1
> > >     memory.reg[0x0].base   = 0x40000000
> > >   .size   = 0x200000000
> > >     reserved.cnt   = 0x1
> > >     reserved.reg[0x0].base = 0x13ede58f0
> > >     .size = 0x121a710
> > > arch_number = 0x0000000000000000
> > > TLB addr    = 0x000000013fff0000
> > > irq_sp      = 0x000000013ede6ce0
> > > sp start    = 0x000000013ede6ce0
> > > Early malloc usage: 3a8 / 2000
> > >
> > > May I suggest you propose a combined patch Akashi-san? If we assume
> > > NUMA systems to be tested up to 8 nodes to mimic real existing
> > > enterprise hardware and up to 4 memory slots (say for memory hot
> > > plugging tests) what about a default value of 32? Alternatively, we
> > > could set this value to a much higher one if the costs are negligible.
> > >
> > >
> > > Well, lets not rush as there are other twists:
> >
> > the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
> > which I assume is not the case. EDK2 reports it as ConventionalMemory.
> >
> > The root cause seem to be gd->ramtop not being setup properly.
> >
> > Further analysis shows that the DT passed to the booted EFI payload does
> > not seem to be correct:
> >
> > DT fragment passed to U-Boot
> >
> > memory@140000000 {
> > numa-node-id = <0x00000001>;
> > reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> > device_type = "memory";
> > };
> > memory@40000000 {
> > numa-node-id = <0x00000000>;
> > reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> > device_type = "memory";
> > };
> >
> > DT passed to payload (as per my debug code):
> >
> > memory@140000000: memory
> >
> >     numa-node-id 1
> >
> >     reg (len= 32)
> >
> >          140000000 100000000
> >
> >          40000000 100000000
> >
> > memory@40000000: memory
> >
> >     numa-node-id 0
> >
> >     reg (len= 16)
> >
> >          40000000 100000000
> >
> > I am investigating this further...
>
> You should check the logic of fdt_fixup_memory_banks()
> which is called this way:
>   efi_dt_fixup()
>     image_setup_libfdt()
>       arch_fixup_fdt()
>         fdt_fixup_memory_banks()
>
> What it does is to put *all* the memory regions unconditionally as
> a single "reg" array into the *first-detected* "memory" node, which is
> "memory@140000000" in this case.
> It means that this function doesn't respect NUMA configuration.
>
> Thanks.

tweaking ram_top to be the correct in
https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
results in memory on node 1 to be considered as ConventionalMemory but
U-Boot places all code and data at the end of node 1 while it should
position it on the current node. That said it is an acceptable work around
for my test case.

Bottom line, we need to introduce NUMA node management in memory management
all over the place.
It is unclear if there is a business case for that. I'll ask LEDGE
members...


> -Takahiro Akashi
>
>
> > > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > > > >>should be replaced with a variable for it.
> > > > >>
> > > > >>> Your use case is well beyond the typical U-Boot usage. So I
> guess it
> > > > >>> will be up to Linaro to provide the necessary patches:
> > > > >>>
> > > > >>> * determine the active CPU
> > > > >>> * determine the RAM assigned to the active CPU according
> > > > >>>   to the numa-node-id in the device-tree
> > > > >>> * make sure that U-Boot only uses the memory of the active CPU
> > > > >>>   internally
> > > > >>> * make sure that the UEFI memory map contains a compliant
> > > > >description
> > > > >>> * possibly, dynamically set up the environment variables
> > > > >>>
> > > > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > > > >>
> > > > >>For (1), we'd better have a different config, or increase
> > > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > > > >
> > > > >Is the system configured such that each CPU can access the others
> CPU's
> > > > >RAM when entering U-Boot?
> > > > >
> > > > >Best regards
> > > > >
> > > > >Heinrich
> > > > >
> > > >
> > > > At least the comments for this patch sound as if on a physical system
> > > cross NUMA node memory access is only available after full SMP
> > > initialization:
> > > >
> > > >
> > >
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > > >
> > > > QEMU may be less restrictive.
> > > >
> > > > QEMU allows the node distance to be 255 indicating that cross node
> > > access is infeasible.
> > > >
> > > > Best regards
> > > >
> > > > Heinrich
> > > >
> > > > >>
> > > > >>-Takahiro Akashi
> > > > >>
> > > > >>
> > > > >>> Best regards
> > > > >>>
> > > > >>> Heinrich
> > > >
> > >
> > >
> > > --
> > > François-Frédéric Ozog | Director Business Development
> > > T: +33.67221.6485
> > > francois.ozog@linaro.org | Skype: ffozog
> > >
> >
> >
> > --
> > François-Frédéric Ozog | *Director Business Development*
> > T: +33.67221.6485
> > francois.ozog@linaro.org | Skype: ffozog
>


-- 
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07 11:00               ` François Ozog
@ 2021-07-07 15:15                 ` François Ozog
  2021-07-07 17:39                   ` Heinrich Schuchardt
  0 siblings, 1 reply; 13+ messages in thread
From: François Ozog @ 2021-07-07 15:15 UTC (permalink / raw)
  To: AKASHI Takahiro, Fran??ois Ozog, Heinrich Schuchardt,
	Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

top posting what now works for me:
- changed calculation of memory size to loop through different memory nodes
- added numa_node to banks
- filter out banks that do not match the target mixup node (do not want to
change ABI to add node information)

That's not satisfying overall but at least my code works with NUMA on Qemu.


*diff --git a/Kconfig b/Kconfig*

*index f8c1a77bed..4d3ab8cb49 100644*

*--- a/Kconfig*

*+++ b/Kconfig*

@@ -192,7 +192,9 @@ config NR_DRAM_BANKS

        default 1 if ARCH_SUNXI || ARCH_OWL

        default 4

        help

-         This defines the number of DRAM banks.

+         This defines the number of DRAM banks. For qemu with NUMA, you

+         may want to set this value to <slots> * <possible memdev>.

+         for instance, for a 2 slot with 4 memdevs set NR_DRAM_BANKS to 8.



 config SYS_BOOT_GET_CMDLINE

        bool "Enable kernel command line setup"

*diff --git a/arch/arm/lib/bootm-fdt.c b/arch/arm/lib/bootm-fdt.c*

*index 29020bd1c6..2b28ab8108 100644*

*--- a/arch/arm/lib/bootm-fdt.c*

*+++ b/arch/arm/lib/bootm-fdt.c*

@@ -42,12 +42,17 @@ int arch_fixup_fdt(void *blob)

        u64 size[CONFIG_NR_DRAM_BANKS];



        for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {

+               unsigned char node = bd->bi_dram[bank].numa_node;

                start[bank] = bd->bi_dram[bank].start;

                size[bank] = bd->bi_dram[bank].size;

 #ifdef CONFIG_ARMV7_NONSEC

                ret = armv7_apply_memory_carveout(&start[bank],
&size[bank]);

                if (ret)

                        return ret;

+#endif

+#ifdef CONFIG_OF_LIBFDT

+               /* add node info for the fdt_fixup_memory below */

+               start[bank] = (((phys_addr_t)node) << 56) |
bd->bi_dram[bank].start;

 #endif

        }



*diff --git a/common/fdt_support.c b/common/fdt_support.c*

*index a9a32df1e7..3bca2ba888 100644*

*--- a/common/fdt_support.c*

*+++ b/common/fdt_support.c*

@@ -415,16 +415,29 @@ static int fdt_pack_reg(const void *fdt, void *buf,
u64 *address, u64 *size,

        return p - (char *)buf;

 }



+static inline uint32_t fdt32_ld(const fdt32_t *p)

+{

+    const uint8_t *bp = (const uint8_t *)p;

+

+    return ((uint32_t)bp[0] << 24)

+           | ((uint32_t)bp[1] << 16)

+           | ((uint32_t)bp[2] << 8)

+           | bp[3];

+}

 #if CONFIG_NR_DRAM_BANKS > 4

 #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS

 #else

 #define MEMORY_BANKS_MAX 4

 #endif

+/* NUMA has yet to be properly handled

+ * This code appends memory to the first memory node that matches the NUMA
node.

+ */

 int fdt_fixup_memory_banks(void *blob, u64 start[], u64 size[], int banks)

 {

        int err, nodeoffset;

        int len, i;

        u8 tmp[MEMORY_BANKS_MAX * 16]; /* Up to 64-bit address + 64-bit
size */

+       unsigned int numa_node;



        if (banks > MEMORY_BANKS_MAX) {

                printf("%s: num banks %d exceeds hardcoded limit %d."

@@ -444,6 +457,12 @@ int fdt_fixup_memory_banks(void *blob, u64 start[],
u64 size[], int banks)

        if (nodeoffset < 0)

                        return nodeoffset;



+       const __be32* numa_node_prop = fdt_getprop(blob, nodeoffset,
"numa-node-id", &len);

+       if (numa_node_prop != NULL && len == sizeof(__be32)) {

+               numa_node = fdt32_ld(numa_node_prop);

+       }

+       else numa_node = 0;

+

        err = fdt_setprop(blob, nodeoffset, "device_type", "memory",

                        sizeof("memory"));

        if (err < 0) {

@@ -453,8 +472,27 @@ int fdt_fixup_memory_banks(void *blob, u64 start[],
u64 size[], int banks)

        }



        for (i = 0; i < banks; i++) {

-               if (start[i] == 0 && size[i] == 0)

+               /* clear node information */

+               unsigned int node;

+recheck:

+               if (start[i]  == 0 && size[i] == 0)

                        break;

+               node = (start[i] >> 56) & 0xFF;

+               start[i] = start[i] & 0x00FFFFFFFFFFFFFF;

+               /* for the moment, just ignore the banks that are not in

+                * memory NUMA node */

+               if (node != numa_node) {

+                       /* remove the bank from the list */

+                       int j;

+                       for (j=i; j < banks-1; j++) {

+                               start[j] = start[j+1];

+                               size[j] = size[j+1];

+                       }

+                       start[j]=0;

+                       size[j]=0;

+                       banks--;

+                       goto recheck;

+               }

        }



        banks = i;

@@ -470,6 +508,7 @@ int fdt_fixup_memory_banks(void *blob, u64 start[], u64
size[], int banks)

                                "reg", fdt_strerror(err));

                return err;

        }

+

        return 0;

 }



*diff --git a/configs/qemu_arm64_defconfig b/configs/qemu_arm64_defconfig*

*index f6e586627a..0fdc22d71d 100644*

*--- a/configs/qemu_arm64_defconfig*

*+++ b/configs/qemu_arm64_defconfig*

@@ -1,7 +1,7 @@

 CONFIG_ARM=y

 CONFIG_POSITION_INDEPENDENT=y

 CONFIG_ARCH_QEMU=y

-CONFIG_NR_DRAM_BANKS=1

+CONFIG_NR_DRAM_BANKS=32

 CONFIG_ENV_SIZE=0x40000

 CONFIG_ENV_SECT_SIZE=0x40000

 CONFIG_AHCI=y

*diff --git a/include/asm-generic/u-boot.h b/include/asm-generic/u-boot.h*

*index 637de0c455..3cf45124ec 100644*

*--- a/include/asm-generic/u-boot.h*

*+++ b/include/asm-generic/u-boot.h*

@@ -71,6 +71,8 @@ struct bd_info {

        struct {                        /* RAM configuration */

                phys_addr_t start;

                phys_size_t size;

+               unsigned numa_node;

+               unsigned pad; /* just to make sure we do not cause
alignment */

        } bi_dram[CONFIG_NR_DRAM_BANKS];

 };



*diff --git a/lib/fdtdec.c b/lib/fdtdec.c*

*index 4b097fb588..b1934edc8d 100644*

*--- a/lib/fdtdec.c*

*+++ b/lib/fdtdec.c*

@@ -1064,77 +1064,101 @@ int fdtdec_decode_display_timing(const void *blob,
int parent, int index,

        return ret;

 }



+ofnode get_next_memory_node(ofnode mem)

+{

+       do {

+               mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);

+       } while (!ofnode_is_available(mem));

+

+       return mem;

+}

+

 int fdtdec_setup_mem_size_base(void)

 {

        int ret;

+       int reg;

        ofnode mem;

        struct resource res;

+       phys_addr_t base = ~0;

+       phys_size_t size = 0;;



-       mem = ofnode_path("/memory");

-       if (!ofnode_valid(mem)) {

-               debug("%s: Missing /memory node\n", __func__);

-               return -EINVAL;

-       }

+       for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
get_next_memory_node(mem)) {



-       ret = ofnode_read_resource(mem, 0, &res);

-       if (ret != 0) {

-               debug("%s: Unable to decode first memory bank\n", __func__);

-               return -EINVAL;

+               for(reg = 0, ret = 0; ret == 0 ; reg++) {

+                       ret = ofnode_read_resource(mem, reg, &res);

+                       if (ret != 0)

+                               break;

+                       if ((phys_addr_t)res.start < base)

+                               base = (phys_addr_t)res.start;

+                       size += (phys_size_t)(res.end - res.start + 1);

+               }

        }

+

+       gd->ram_base = (unsigned long)base;

+       gd->ram_size = (phys_size_t)size;



-       gd->ram_size = (phys_size_t)(res.end - res.start + 1);

-       gd->ram_base = (unsigned long)res.start;

+       debug("%s: Initial DRAM base %llx\n", __func__,

+             (unsigned long long)gd->ram_base);

        debug("%s: Initial DRAM size %llx\n", __func__,

              (unsigned long long)gd->ram_size);



        return 0;

 }



-ofnode get_next_memory_node(ofnode mem)

-{

-       do {

-               mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);

-       } while (!ofnode_is_available(mem));

-

-       return mem;

-}

-

 int fdtdec_setup_memory_banksize(void)

 {

        int bank, ret, reg = 0;

        struct resource res;

        ofnode mem = ofnode_null();

+       const __be32* numa_node_prop = NULL;

+       int len;

+       int numa_node = -1;

+       int count = 0;

+

+       for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
get_next_memory_node(mem)) {



-       mem = get_next_memory_node(mem);

-       if (!ofnode_valid(mem)) {

-               debug("%s: Missing /memory node\n", __func__);

-               return -EINVAL;

+       count++;

+

+       numa_node_prop = ofnode_get_property(mem, "numa-node-id", &len);

+       if (numa_node_prop != NULL && len == sizeof(__be32)) {

+               numa_node = of_read_number(numa_node_prop, 1);

        }

+       else numa_node = 0;



-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {

-               ret = ofnode_read_resource(mem, reg++, &res);

-               if (ret < 0) {

-                       reg = 0;

-                       mem = get_next_memory_node(mem);

-                       if (!ofnode_valid(mem))

-                               break;

+       debug("Found memory for node %d\n", numa_node);



-                       ret = ofnode_read_resource(mem, reg++, &res);

-                       if (ret < 0)

-                               break;

-               }

+       ret = 0;

+       for(reg = 0; ret == 0 && bank < CONFIG_NR_DRAM_BANKS; reg++) {

+               ret = ofnode_read_resource(mem, reg, &res);



                if (ret != 0)

-                       return -EINVAL;

+                       break;



                gd->bd->bi_dram[bank].start = (phys_addr_t)res.start;

                gd->bd->bi_dram[bank].size =

                        (phys_size_t)(res.end - res.start + 1);

+               gd->bd->bi_dram[bank].numa_node = numa_node;



-               debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx\n",

+               debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx"

+                    " name_node = %d\n",

                      __func__, bank,

                      (unsigned long long)gd->bd->bi_dram[bank].start,

-                     (unsigned long long)gd->bd->bi_dram[bank].size);

+                     (unsigned long long)gd->bd->bi_dram[bank].size,

+                     gd->bd->bi_dram[bank].numa_node);

+

+               bank++;

+       }

+

+

+       }

+

+       if (count == 0) {

+               debug("%s: Missing /memory node\n", __func__);

+               return -EINVAL;

+       }

+       if (bank >= CONFIG_NR_DRAM_BANKS) {

+               printf("Too many DT memory nodes for
CONFIG_NR_DRAM_BANKS=%d\n",

+                               CONFIG_NR_DRAM_BANKS);

        }



        return 0;

On Wed, 7 Jul 2021 at 13:00, François Ozog <francois.ozog@linaro.org> wrote:

>
>
> On Wed, 7 Jul 2021 at 12:16, AKASHI Takahiro <takahiro.akashi@linaro.org>
> wrote:
>
>> On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
>> > On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog@linaro.org>
>> wrote:
>> >
>> > > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de>
>> > > wrote:
>> > > >
>> > > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
>> > > xypron.glpk@gmx.de>:
>> > > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
>> > > > ><takahiro.akashi@linaro.org>:
>> > > > >>François,
>> > > > >>
>> > > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt
>> wrote:
>> > > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
>> > > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account
>> memory
>> > > > >>> > description when using Qemu 5.2 NUMA configuration to adapt
>> memory
>> > > > >>map
>> > > > >>> > (kernel_addr_r...):
>> > > > >>> >
>> > > > >>> >         -smp 4 \
>> > > > >>> >           -m 8G,slots=2,maxmem=16G \
>> > > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
>> > > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
>> > > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>> > > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
>> > > > >>> >
>> > > > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
>> > > > >>bootefi.
>> > > > >>> >
>> > > > >>> > fdt addr 0x13ede6de0; fdt print
>> > > > >>> >
>> > > > >>> > Displays fdt while I think it should not.
>> > > > >>> >
>> > > > >>> > If I load the kernel at dram.start, the load works but not
>> boot
>> > > > >>> >
>> > > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > DRAM:4 GiB
>> > > > >>> >
>> > > > >>> > Flash: 64 MiB
>> > > > >>> >
>> > > > >>> > Loading Environment from Flash... OK
>> > > > >>> >
>> > > > >>> > In:pl011@9000000
>> > > > >>> >
>> > > > >>> > Out: pl011@9000000
>> > > > >>> >
>> > > > >>> > Err: pl011@9000000
>> > > > >>> >
>> > > > >>> > Net: eth0: virtio-net#32
>> > > > >>> >
>> > > > >>> > Hit any key to stop autoboot:0
>> > > > >>> >
>> > > > >>> > =>
>> > > > >>> >
>> > > > >>> > => bdinfo
>> > > > >>> >
>> > > > >>> > boot_params = 0x0000000000000000
>> > > > >>> >
>> > > > >>> > DRAM bank = 0x0000000000000000
>> > > > >>> >
>> > > > >>> > -> start= 0x0000000140000000
>> > > > >>> >
>> > > > >>> > -> size = 0x0000000100000000
>> > > > >>> >
>> > > > >>> > flashstart= 0x0000000000000000
>> > > > >>> >
>> > > > >>> > flashsize = 0x0000000004000000
>> > > > >>> >
>> > > > >>> > flashoffset = 0x00000000000bc990
>> > > > >>> >
>> > > > >>> > baudrate= 115200 bps
>> > > > >>> >
>> > > > >>> > relocaddr = 0x000000013ff27000
>> > > > >>> >
>> > > > >>> > reloc off = 0x000000013ff27000
>> > > > >>> >
>> > > > >>> > Build = 64-bit
>> > > > >>> >
>> > > > >>> > current eth = virtio-net#32
>> > > > >>> >
>> > > > >>> > ethaddr = 52:52:52:52:52:52
>> > > > >>> >
>> > > > >>> > IP addr = <NULL>
>> > > > >>> >
>> > > > >>> > fdt_blob= 0x000000013ede6de0
>> > > > >>> >
>> > > > >>> > new_fdt = 0x000000013ede6de0
>> > > > >>> >
>> > > > >>> > fdt_size= 0x0000000000100000
>> > > > >>> >
>> > > > >>> > lmb_dump_all:
>> > > > >>> >
>> > > > >>> > memory.cnt= 0x1
>> > > > >>> >
>> > > > >>> > memory.reg[0x0].base = 0x140000000
>> > > > >>> >
>> > > > >>> > .size = 0x100000000
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > reserved.cnt= 0x0
>> > > > >>> >
>> > > > >>> > arch_number = 0x0000000000000000
>> > > > >>> >
>> > > > >>> > TLB addr= 0x000000013fff0000
>> > > > >>> >
>> > > > >>> > irq_sp= 0x000000013ede6dd0
>> > > > >>> >
>> > > > >>> > sp start= 0x000000013ede6dd0
>> > > > >>> >
>> > > > >>> > Early malloc usage: 3a8 / 2000
>> > > > >>> >
>> > > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
>> > > > >>> >
>> > > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
>> > > > >>> >
>> > > > >>> > => bootefi0x140000000 0x13ede6dd0
>> > > > >>> >
>> > > > >>> > ERROR: Failed to register WaitForKey event
>> > > > >>> >
>> > > > >>> > Setting OsIndications failed
>> > > > >>> >
>> > > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > I think there is a need to calculate memory map based on
>> previous
>> > > > >>> > firmware (TFA, QEMU can be considered as previous frimware)
>> > > > >>information
>> > > > >>> > (DT or blob_list).
>> > > > >>> >
>> > > > >>> > What do you think ?
>> > > > >>> >
>> > > > >>> > Cheers
>> > > > >>> >
>> > > > >>> > FF
>> > > > >>> >
>> > > > >>> > --
>> > > > >>> >
>> > > > >>> > François-Frédéric Ozog | /Director Business Development/
>> > > > >>> > T: +33.67221.6485
>> > > > >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>> > > > >>| Skype: ffozog
>> > > > >>> >
>> > > > >>> >
>> > > > >>>
>> > > > >>> The kernel load address is hard coded here:
>> > > > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
>> > > > >>>
>> > > > >>> bdinfo shows:
>> > > > >>> DRAM start = 0x140000000
>> > > > >>> DRAM size  = 0x100000000
>> > > > >>>
>> > > > >>> fdt addr $fdt_addr
>> > > > >>> fdt printf
>> > > > >>>
>> > > > >>> shows two memory areas. One at 40000000, one at 140000000.
>> > > > >>
>> > > > >>(This shows that U-Boot receives a correct memory map via dtb.)
>> > > > >>
>> > > > >>Is this a NUMA machine, isn't it? Why should we care of which
>> > > > >>memory region be used here? Please note that this is a virtual
>> > > > >machine,
>> > > > >>there is no practical difference between two regions.
>> > > > >>
>> > > > >>The root problem is that U-Boot did not recognize there were two
>> > > > >>memory regions. We can fix this issue in either way:
>> > > > >>
>> > > > >>1)
>> > > > >>diff --git a/configs/qemu_arm64_defconfig
>> > > > >>b/configs/qemu_arm64_defconfig
>> > > > >>index f6e586627a8e..b70ffae8bf6e 100644
>> > > > >>--- a/configs/qemu_arm64_defconfig
>> > > > >>+++ b/configs/qemu_arm64_defconfig
>> > > > >>@@ -1,7 +1,7 @@
>> > > > >> CONFIG_ARM=y
>> > > > >> CONFIG_POSITION_INDEPENDENT=y
>> > > > >> CONFIG_ARCH_QEMU=y
>> > > > >>-CONFIG_NR_DRAM_BANKS=1
>> > > > >>+CONFIG_NR_DRAM_BANKS=2
>> > > > >> CONFIG_ENV_SIZE=0x40000
>> > > > >> CONFIG_ENV_SECT_SIZE=0x40000
>> > > > >> CONFIG_AHCI=y
>> > > > >>
>> > > > >>2)
>> > > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
>> > > > >>index 4b097fb588ed..4067ea2dead6 100644
>> > > > >>--- a/lib/fdtdec.c
>> > > > >>+++ b/lib/fdtdec.c
>> > > > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
>> > > > >>                return -EINVAL;
>> > > > >>        }
>> > > > >>
>> > > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>> > > > >>+       for (bank = 0; ; bank++) {
>> > > > >>                ret = ofnode_read_resource(mem, reg++, &res);
>> > > > >>                if (ret < 0) {
>> > > > >>                        reg = 0;
>> > > > >>
>> > > > >>   (fdtdec_setup_memory_banksize() is called in
>> dram_init_banksize().)
>> > > > >>
>> > > > >>
>> > > > >>(2) seems much better, but I don't know why we had to use
>> > > > >>CONFIG_NR_DRAM_BANKS here.
>> > > > >>
>> > >
>> > > 2) alone does not work as other places in the code refer to
>> > > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
>> > > bdinfo seems now correct:
>> > >
>> > => bdinfo
>> > > boot_params = 0x0000000000000000
>> > > DRAM bank   = 0x0000000000000000
>> > > -> start    = 0x0000000140000000
>> > > -> size     = 0x0000000100000000
>> > > DRAM bank   = 0x0000000000000001
>> > > -> start    = 0x0000000040000000
>> > > -> size     = 0x0000000100000000
>> > > flashstart  = 0x0000000000000000
>> > > flashsize   = 0x0000000004000000
>> > > flashoffset = 0x00000000000bcb88
>> > > baudrate    = 115200 bps
>> > > relocaddr   = 0x000000013ff27000
>> > > reloc off   = 0x000000013ff27000
>> > > Build       = 64-bit
>> > > current eth = virtio-net#32
>> > > ethaddr     = 52:52:52:52:52:52
>> > > IP addr     = <NULL>
>> > > fdt_blob    = 0x000000013ede6cf0
>> > > new_fdt     = 0x000000013ede6cf0
>> > > fdt_size    = 0x0000000000100000
>> > > lmb_dump_all:
>> > >     memory.cnt   = 0x1
>> > >     memory.reg[0x0].base   = 0x40000000
>> > >   .size   = 0x200000000
>> > >     reserved.cnt   = 0x1
>> > >     reserved.reg[0x0].base = 0x13ede58f0
>> > >     .size = 0x121a710
>> > > arch_number = 0x0000000000000000
>> > > TLB addr    = 0x000000013fff0000
>> > > irq_sp      = 0x000000013ede6ce0
>> > > sp start    = 0x000000013ede6ce0
>> > > Early malloc usage: 3a8 / 2000
>> > >
>> > > May I suggest you propose a combined patch Akashi-san? If we assume
>> > > NUMA systems to be tested up to 8 nodes to mimic real existing
>> > > enterprise hardware and up to 4 memory slots (say for memory hot
>> > > plugging tests) what about a default value of 32? Alternatively, we
>> > > could set this value to a much higher one if the costs are negligible.
>> > >
>> > >
>> > > Well, lets not rush as there are other twists:
>> >
>> > the 4G bank in node 1 is marked BootServicesData in the UEFI
>> GetMemoryMap
>> > which I assume is not the case. EDK2 reports it as ConventionalMemory.
>> >
>> > The root cause seem to be gd->ramtop not being setup properly.
>> >
>> > Further analysis shows that the DT passed to the booted EFI payload does
>> > not seem to be correct:
>> >
>> > DT fragment passed to U-Boot
>> >
>> > memory@140000000 {
>> > numa-node-id = <0x00000001>;
>> > reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
>> > device_type = "memory";
>> > };
>> > memory@40000000 {
>> > numa-node-id = <0x00000000>;
>> > reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
>> > device_type = "memory";
>> > };
>> >
>> > DT passed to payload (as per my debug code):
>> >
>> > memory@140000000: memory
>> >
>> >     numa-node-id 1
>> >
>> >     reg (len= 32)
>> >
>> >          140000000 100000000
>> >
>> >          40000000 100000000
>> >
>> > memory@40000000: memory
>> >
>> >     numa-node-id 0
>> >
>> >     reg (len= 16)
>> >
>> >          40000000 100000000
>> >
>> > I am investigating this further...
>>
>> You should check the logic of fdt_fixup_memory_banks()
>> which is called this way:
>>   efi_dt_fixup()
>>     image_setup_libfdt()
>>       arch_fixup_fdt()
>>         fdt_fixup_memory_banks()
>>
>> What it does is to put *all* the memory regions unconditionally as
>> a single "reg" array into the *first-detected* "memory" node, which is
>> "memory@140000000" in this case.
>> It means that this function doesn't respect NUMA configuration.
>>
>> Thanks.
>
> tweaking ram_top to be the correct in
> https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
> results in memory on node 1 to be considered as ConventionalMemory but
> U-Boot places all code and data at the end of node 1 while it should
> position it on the current node. That said it is an acceptable work around
> for my test case.
>
> Bottom line, we need to introduce NUMA node management in memory
> management all over the place.
> It is unclear if there is a business case for that. I'll ask LEDGE
> members...
>
>
>> -Takahiro Akashi
>>
>>
>> > > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
>> > > > >>should be replaced with a variable for it.
>> > > > >>
>> > > > >>> Your use case is well beyond the typical U-Boot usage. So I
>> guess it
>> > > > >>> will be up to Linaro to provide the necessary patches:
>> > > > >>>
>> > > > >>> * determine the active CPU
>> > > > >>> * determine the RAM assigned to the active CPU according
>> > > > >>>   to the numa-node-id in the device-tree
>> > > > >>> * make sure that U-Boot only uses the memory of the active CPU
>> > > > >>>   internally
>> > > > >>> * make sure that the UEFI memory map contains a compliant
>> > > > >description
>> > > > >>> * possibly, dynamically set up the environment variables
>> > > > >>>
>> > > > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
>> > > > >>
>> > > > >>For (1), we'd better have a different config, or increase
>> > > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
>> > > > >
>> > > > >Is the system configured such that each CPU can access the others
>> CPU's
>> > > > >RAM when entering U-Boot?
>> > > > >
>> > > > >Best regards
>> > > > >
>> > > > >Heinrich
>> > > > >
>> > > >
>> > > > At least the comments for this patch sound as if on a physical
>> system
>> > > cross NUMA node memory access is only available after full SMP
>> > > initialization:
>> > > >
>> > > >
>> > >
>> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
>> > > >
>> > > > QEMU may be less restrictive.
>> > > >
>> > > > QEMU allows the node distance to be 255 indicating that cross node
>> > > access is infeasible.
>> > > >
>> > > > Best regards
>> > > >
>> > > > Heinrich
>> > > >
>> > > > >>
>> > > > >>-Takahiro Akashi
>> > > > >>
>> > > > >>
>> > > > >>> Best regards
>> > > > >>>
>> > > > >>> Heinrich
>> > > >
>> > >
>> > >
>> > > --
>> > > François-Frédéric Ozog | Director Business Development
>> > > T: +33.67221.6485
>> > > francois.ozog@linaro.org | Skype: ffozog
>> > >
>> >
>> >
>> > --
>> > François-Frédéric Ozog | *Director Business Development*
>> > T: +33.67221.6485
>> > francois.ozog@linaro.org | Skype: ffozog
>>
>
>
> --
> François-Frédéric Ozog | *Director Business Development*
> T: +33.67221.6485
> francois.ozog@linaro.org | Skype: ffozog
>
>

-- 
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07 15:15                 ` François Ozog
@ 2021-07-07 17:39                   ` Heinrich Schuchardt
  2022-03-23  7:29                     ` François Ozog
  0 siblings, 1 reply; 13+ messages in thread
From: Heinrich Schuchardt @ 2021-07-07 17:39 UTC (permalink / raw)
  To: François Ozog, AKASHI Takahiro, Ilias Apalodimas,
	Tuomas Tynkkynen, U-Boot Mailing List



On 7/7/21 5:15 PM, François Ozog wrote:
> top posting what now works for me:
> - changed calculation of memory size to loop through different memory nodes
> - added numa_node to banks
> - filter out banks that do not match the target mixup node (do not want
> to change ABI to add node information)
>
> That's not satisfying overall but at least my code works with NUMA on Qemu.

Do we expect real hardware with NUMA to be using U-Boot?

If you have real ARM hardware in NUMA configuration, can the boot CPU
access memory of dormant CPUs when U-Boot is entered?

Best regards

Heinrich

>
>
> *diff --git a/Kconfig b/Kconfig*
>
> *index f8c1a77bed..4d3ab8cb49 100644*
>
> *--- a/Kconfig*
>
> *+++ b/Kconfig*
>
> @@ -192,7 +192,9 @@config NR_DRAM_BANKS
>
> default 1 if ARCH_SUNXI || ARCH_OWL
>
> default 4
>
> help
>
> - This defines the number of DRAM banks.
>
> +This defines the number of DRAM banks. For qemu with NUMA, you
>
> +may want to set this value to <slots> * <possible memdev>.
>
> +for instance, for a 2 slot with 4 memdevs set NR_DRAM_BANKS to 8.
>
> config SYS_BOOT_GET_CMDLINE
>
> bool "Enable kernel command line setup"
>
> *diff --git a/arch/arm/lib/bootm-fdt.c b/arch/arm/lib/bootm-fdt.c*
>
> *index 29020bd1c6..2b28ab8108 100644*
>
> *--- a/arch/arm/lib/bootm-fdt.c*
>
> *+++ b/arch/arm/lib/bootm-fdt.c*
>
> @@ -42,12 +42,17 @@int arch_fixup_fdt(void *blob)
>
> u64 size[CONFIG_NR_DRAM_BANKS];
>
> for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>
> +unsigned char node = bd->bi_dram[bank].numa_node;
>
> start[bank] = bd->bi_dram[bank].start;
>
> size[bank] = bd->bi_dram[bank].size;
>
> #ifdef CONFIG_ARMV7_NONSEC
>
> ret = armv7_apply_memory_carveout(&start[bank], &size[bank]);
>
> if (ret)
>
> return ret;
>
> +#endif
>
> +#ifdef CONFIG_OF_LIBFDT
>
> +/* add node info for the fdt_fixup_memory below */
>
> +start[bank] = (((phys_addr_t)node) << 56) | bd->bi_dram[bank].start;
>
> #endif
>
> }
>
> *diff --git a/common/fdt_support.c b/common/fdt_support.c*
>
> *index a9a32df1e7..3bca2ba888 100644*
>
> *--- a/common/fdt_support.c*
>
> *+++ b/common/fdt_support.c*
>
> @@ -415,16 +415,29 @@static int fdt_pack_reg(const void *fdt, void *buf,
> u64 *address, u64 *size,
>
> return p - (char *)buf;
>
> }
>
> +static inline uint32_t fdt32_ld(const fdt32_t *p)
>
> +{
>
> +const uint8_t *bp = (const uint8_t *)p;
>
> +
>
> +return ((uint32_t)bp[0] << 24)
>
> + | ((uint32_t)bp[1] << 16)
>
> + | ((uint32_t)bp[2] << 8)
>
> + | bp[3];
>
> +}
>
> #if CONFIG_NR_DRAM_BANKS > 4
>
> #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS
>
> #else
>
> #define MEMORY_BANKS_MAX 4
>
> #endif
>
> +/* NUMA has yet to be properly handled
>
> + * This code appends memory to the first memory node that matches the
> NUMA node.
>
> + */
>
> int fdt_fixup_memory_banks(void *blob, u64 start[], u64 size[], int banks)
>
> {
>
> int err, nodeoffset;
>
> int len, i;
>
> u8 tmp[MEMORY_BANKS_MAX * 16]; /* Up to 64-bit address + 64-bit size */
>
> +unsigned int numa_node;
>
> if (banks > MEMORY_BANKS_MAX) {
>
> printf("%s: num banks %d exceeds hardcoded limit %d."
>
> @@ -444,6 +457,12 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> u64 size[], int banks)
>
> if (nodeoffset < 0)
>
> return nodeoffset;
>
> +const __be32* numa_node_prop = fdt_getprop(blob, nodeoffset,
> "numa-node-id", &len);
>
> +if (numa_node_prop != NULL && len == sizeof(__be32)) {
>
> +numa_node = fdt32_ld(numa_node_prop);
>
> +}
>
> +else numa_node = 0;
>
> +
>
> err = fdt_setprop(blob, nodeoffset, "device_type", "memory",
>
> sizeof("memory"));
>
> if (err < 0) {
>
> @@ -453,8 +472,27 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> u64 size[], int banks)
>
> }
>
> for (i = 0; i < banks; i++) {
>
> - if (start[i] == 0 && size[i] == 0)
>
> +/* clear node information */
>
> +unsigned int node;
>
> +recheck:
>
> +if (start[i]== 0 && size[i] == 0)
>
> break;
>
> +node = (start[i] >> 56) & 0xFF;
>
> +start[i] = start[i] & 0x00FFFFFFFFFFFFFF;
>
> +/* for the moment, just ignore the banks that are not in
>
> +* memory NUMA node */
>
> +if (node != numa_node) {
>
> +/* remove the bank from the list */
>
> +int j;
>
> +for (j=i; j < banks-1; j++) {
>
> +start[j] = start[j+1];
>
> +size[j] = size[j+1];
>
> +}
>
> +start[j]=0;
>
> +size[j]=0;
>
> +banks--;
>
> +goto recheck;
>
> +}
>
> }
>
> banks = i;
>
> @@ -470,6 +508,7 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> u64 size[], int banks)
>
> "reg", fdt_strerror(err));
>
> return err;
>
> }
>
> +
>
> return 0;
>
> }
>
> *diff --git a/configs/qemu_arm64_defconfig b/configs/qemu_arm64_defconfig*
>
> *index f6e586627a..0fdc22d71d 100644*
>
> *--- a/configs/qemu_arm64_defconfig*
>
> *+++ b/configs/qemu_arm64_defconfig*
>
> @@ -1,7 +1,7 @@
>
> CONFIG_ARM=y
>
> CONFIG_POSITION_INDEPENDENT=y
>
> CONFIG_ARCH_QEMU=y
>
> -CONFIG_NR_DRAM_BANKS=1
>
> +CONFIG_NR_DRAM_BANKS=32
>
> CONFIG_ENV_SIZE=0x40000
>
> CONFIG_ENV_SECT_SIZE=0x40000
>
> CONFIG_AHCI=y
>
> *diff --git a/include/asm-generic/u-boot.h b/include/asm-generic/u-boot.h*
>
> *index 637de0c455..3cf45124ec 100644*
>
> *--- a/include/asm-generic/u-boot.h*
>
> *+++ b/include/asm-generic/u-boot.h*
>
> @@ -71,6 +71,8 @@struct bd_info {
>
> struct {/* RAM configuration */
>
> phys_addr_t start;
>
> phys_size_t size;
>
> +unsigned numa_node;
>
> +unsigned pad; /* just to make sure we do not cause alignment */
>
> } bi_dram[CONFIG_NR_DRAM_BANKS];
>
> };
>
> *diff --git a/lib/fdtdec.c b/lib/fdtdec.c*
>
> *index 4b097fb588..b1934edc8d 100644*
>
> *--- a/lib/fdtdec.c*
>
> *+++ b/lib/fdtdec.c*
>
> @@ -1064,77 +1064,101 @@int fdtdec_decode_display_timing(const void
> *blob, int parent, int index,
>
> return ret;
>
> }
>
> +ofnode get_next_memory_node(ofnode mem)
>
> +{
>
> +do {
>
> +mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
>
> +} while (!ofnode_is_available(mem));
>
> +
>
> +return mem;
>
> +}
>
> +
>
> int fdtdec_setup_mem_size_base(void)
>
> {
>
> int ret;
>
> +int reg;
>
> ofnode mem;
>
> struct resource res;
>
> +phys_addr_t base = ~0;
>
> +phys_size_t size = 0;;
>
> - mem = ofnode_path("/memory");
>
> - if (!ofnode_valid(mem)) {
>
> - debug("%s: Missing /memory node\n", __func__);
>
> - return -EINVAL;
>
> - }
>
> +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> get_next_memory_node(mem)) {
>
> - ret = ofnode_read_resource(mem, 0, &res);
>
> - if (ret != 0) {
>
> - debug("%s: Unable to decode first memory bank\n", __func__);
>
> - return -EINVAL;
>
> +for(reg = 0, ret = 0; ret == 0 ; reg++) {
>
> +ret = ofnode_read_resource(mem, reg, &res);
>
> +if (ret != 0)
>
> +break;
>
> +if ((phys_addr_t)res.start < base)
>
> +base = (phys_addr_t)res.start;
>
> +size += (phys_size_t)(res.end - res.start + 1);
>
> +}
>
> }
>
> +
>
> +gd->ram_base = (unsigned long)base;
>
> +gd->ram_size = (phys_size_t)size;
>
> - gd->ram_size = (phys_size_t)(res.end - res.start + 1);
>
> - gd->ram_base = (unsigned long)res.start;
>
> +debug("%s: Initial DRAM base %llx\n", __func__,
>
> +(unsigned long long)gd->ram_base);
>
> debug("%s: Initial DRAM size %llx\n", __func__,
>
> (unsigned long long)gd->ram_size);
>
> return 0;
>
> }
>
> -ofnode get_next_memory_node(ofnode mem)
>
> -{
>
> - do {
>
> - mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
>
> - } while (!ofnode_is_available(mem));
>
> -
>
> - return mem;
>
> -}
>
> -
>
> int fdtdec_setup_memory_banksize(void)
>
> {
>
> int bank, ret, reg = 0;
>
> struct resource res;
>
> ofnode mem = ofnode_null();
>
> +const __be32* numa_node_prop = NULL;
>
> +int len;
>
> +int numa_node = -1;
>
> +int count = 0;
>
> +
>
> +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> get_next_memory_node(mem)) {
>
> - mem = get_next_memory_node(mem);
>
> - if (!ofnode_valid(mem)) {
>
> - debug("%s: Missing /memory node\n", __func__);
>
> - return -EINVAL;
>
> +count++;
>
> +
>
> +numa_node_prop = ofnode_get_property(mem, "numa-node-id", &len);
>
> +if (numa_node_prop != NULL && len == sizeof(__be32)) {
>
> +numa_node = of_read_number(numa_node_prop, 1);
>
> }
>
> +else numa_node = 0;
>
> - for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
>
> - ret = ofnode_read_resource(mem, reg++, &res);
>
> - if (ret < 0) {
>
> - reg = 0;
>
> - mem = get_next_memory_node(mem);
>
> - if (!ofnode_valid(mem))
>
> - break;
>
> +debug("Found memory for node %d\n", numa_node);
>
> - ret = ofnode_read_resource(mem, reg++, &res);
>
> - if (ret < 0)
>
> - break;
>
> - }
>
> +ret = 0;
>
> +for(reg = 0; ret == 0 && bank < CONFIG_NR_DRAM_BANKS; reg++) {
>
> +ret = ofnode_read_resource(mem, reg, &res);
>
> if (ret != 0)
>
> - return -EINVAL;
>
> +break;
>
> gd->bd->bi_dram[bank].start = (phys_addr_t)res.start;
>
> gd->bd->bi_dram[bank].size =
>
> (phys_size_t)(res.end - res.start + 1);
>
> +gd->bd->bi_dram[bank].numa_node = numa_node;
>
> - debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx\n",
>
> +debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx"
>
> +" name_node = %d\n",
>
> __func__, bank,
>
> (unsigned long long)gd->bd->bi_dram[bank].start,
>
> - (unsigned long long)gd->bd->bi_dram[bank].size);
>
> +(unsigned long long)gd->bd->bi_dram[bank].size,
>
> +gd->bd->bi_dram[bank].numa_node);
>
> +
>
> +bank++;
>
> +}
>
> +
>
> +
>
> +}
>
> +
>
> +if (count == 0) {
>
> +debug("%s: Missing /memory node\n", __func__);
>
> +return -EINVAL;
>
> +}
>
> +if (bank >= CONFIG_NR_DRAM_BANKS) {
>
> +printf("Too many DT memory nodes for CONFIG_NR_DRAM_BANKS=%d\n",
>
> +CONFIG_NR_DRAM_BANKS);
>
> }
>
> return 0;
>
>
> On Wed, 7 Jul 2021 at 13:00, François Ozog <francois.ozog@linaro.org
> <mailto:francois.ozog@linaro.org>> wrote:
>
>
>
>     On Wed, 7 Jul 2021 at 12:16, AKASHI Takahiro
>     <takahiro.akashi@linaro.org <mailto:takahiro.akashi@linaro.org>> wrote:
>
>         On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
>          > On Wed, 7 Jul 2021 at 09:40, François Ozog
>         <francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>> wrote:
>          >
>          > > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt
>         <xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>
>          > > wrote:
>          > > >
>          > > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
>          > > xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>:
>          > > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
>          > > > ><takahiro.akashi@linaro.org
>         <mailto:takahiro.akashi@linaro.org>>:
>          > > > >>François,
>          > > > >>
>          > > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich
>         Schuchardt wrote:
>          > > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
>          > > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into
>         account memory
>          > > > >>> > description when using Qemu 5.2 NUMA configuration
>         to adapt memory
>          > > > >>map
>          > > > >>> > (kernel_addr_r...):
>          > > > >>> >
>          > > > >>> >         -smp 4 \
>          > > > >>> >           -m 8G,slots=2,maxmem=16G \
>          > > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
>          > > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
>          > > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
>          > > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
>          > > > >>> >
>          > > > >>> > kernel_addr_r is still 0x4040000 and thus you can't
>         use it to
>          > > > >>bootefi.
>          > > > >>> >
>          > > > >>> > fdt addr 0x13ede6de0; fdt print
>          > > > >>> >
>          > > > >>> > Displays fdt while I think it should not.
>          > > > >>> >
>          > > > >>> > If I load the kernel at dram.start, the load works
>         but not boot
>          > > > >>> >
>          > > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
>          > > > >>> >
>          > > > >>> >
>          > > > >>> > DRAM:4 GiB
>          > > > >>> >
>          > > > >>> > Flash: 64 MiB
>          > > > >>> >
>          > > > >>> > Loading Environment from Flash... OK
>          > > > >>> >
>          > > > >>> > In:pl011@9000000
>          > > > >>> >
>          > > > >>> > Out: pl011@9000000
>          > > > >>> >
>          > > > >>> > Err: pl011@9000000
>          > > > >>> >
>          > > > >>> > Net: eth0: virtio-net#32
>          > > > >>> >
>          > > > >>> > Hit any key to stop autoboot:0
>          > > > >>> >
>          > > > >>> > =>
>          > > > >>> >
>          > > > >>> > => bdinfo
>          > > > >>> >
>          > > > >>> > boot_params = 0x0000000000000000
>          > > > >>> >
>          > > > >>> > DRAM bank = 0x0000000000000000
>          > > > >>> >
>          > > > >>> > -> start= 0x0000000140000000
>          > > > >>> >
>          > > > >>> > -> size = 0x0000000100000000
>          > > > >>> >
>          > > > >>> > flashstart= 0x0000000000000000
>          > > > >>> >
>          > > > >>> > flashsize = 0x0000000004000000
>          > > > >>> >
>          > > > >>> > flashoffset = 0x00000000000bc990
>          > > > >>> >
>          > > > >>> > baudrate= 115200 bps
>          > > > >>> >
>          > > > >>> > relocaddr = 0x000000013ff27000
>          > > > >>> >
>          > > > >>> > reloc off = 0x000000013ff27000
>          > > > >>> >
>          > > > >>> > Build = 64-bit
>          > > > >>> >
>          > > > >>> > current eth = virtio-net#32
>          > > > >>> >
>          > > > >>> > ethaddr = 52:52:52:52:52:52
>          > > > >>> >
>          > > > >>> > IP addr = <NULL>
>          > > > >>> >
>          > > > >>> > fdt_blob= 0x000000013ede6de0
>          > > > >>> >
>          > > > >>> > new_fdt = 0x000000013ede6de0
>          > > > >>> >
>          > > > >>> > fdt_size= 0x0000000000100000
>          > > > >>> >
>          > > > >>> > lmb_dump_all:
>          > > > >>> >
>          > > > >>> > memory.cnt= 0x1
>          > > > >>> >
>          > > > >>> > memory.reg[0x0].base = 0x140000000
>          > > > >>> >
>          > > > >>> > .size = 0x100000000
>          > > > >>> >
>          > > > >>> >
>          > > > >>> > reserved.cnt= 0x0
>          > > > >>> >
>          > > > >>> > arch_number = 0x0000000000000000
>          > > > >>> >
>          > > > >>> > TLB addr= 0x000000013fff0000
>          > > > >>> >
>          > > > >>> > irq_sp= 0x000000013ede6dd0
>          > > > >>> >
>          > > > >>> > sp start= 0x000000013ede6dd0
>          > > > >>> >
>          > > > >>> > Early malloc usage: 3a8 / 2000
>          > > > >>> >
>          > > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
>          > > > >>> >
>          > > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
>          > > > >>> >
>          > > > >>> > => bootefi0x140000000 0x13ede6dd0
>          > > > >>> >
>          > > > >>> > ERROR: Failed to register WaitForKey event
>          > > > >>> >
>          > > > >>> > Setting OsIndications failed
>          > > > >>> >
>          > > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
>          > > > >>> >
>          > > > >>> >
>          > > > >>> > I think there is a need to calculate memory map
>         based on previous
>          > > > >>> > firmware (TFA, QEMU can be considered as previous
>         frimware)
>          > > > >>information
>          > > > >>> > (DT or blob_list).
>          > > > >>> >
>          > > > >>> > What do you think ?
>          > > > >>> >
>          > > > >>> > Cheers
>          > > > >>> >
>          > > > >>> > FF
>          > > > >>> >
>          > > > >>> > --
>          > > > >>> >
>          > > > >>> > François-Frédéric Ozog | /Director Business
>         Development/
>          > > > >>> > T: +33.67221.6485
>          > > > >>> > francois.ozog@linaro.org
>         <mailto:francois.ozog@linaro.org>
>         <mailto:francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>>
>          > > > >>| Skype: ffozog
>          > > > >>> >
>          > > > >>> >
>          > > > >>>
>          > > > >>> The kernel load address is hard coded here:
>          > > > >>> include/configs/qemu-arm.h:41:
>         "kernel_addr_r=0x40400000\0" \
>          > > > >>>
>          > > > >>> bdinfo shows:
>          > > > >>> DRAM start = 0x140000000
>          > > > >>> DRAM size  = 0x100000000
>          > > > >>>
>          > > > >>> fdt addr $fdt_addr
>          > > > >>> fdt printf
>          > > > >>>
>          > > > >>> shows two memory areas. One at 40000000, one at
>         140000000.
>          > > > >>
>          > > > >>(This shows that U-Boot receives a correct memory map
>         via dtb.)
>          > > > >>
>          > > > >>Is this a NUMA machine, isn't it? Why should we care of
>         which
>          > > > >>memory region be used here? Please note that this is a
>         virtual
>          > > > >machine,
>          > > > >>there is no practical difference between two regions.
>          > > > >>
>          > > > >>The root problem is that U-Boot did not recognize there
>         were two
>          > > > >>memory regions. We can fix this issue in either way:
>          > > > >>
>          > > > >>1)
>          > > > >>diff --git a/configs/qemu_arm64_defconfig
>          > > > >>b/configs/qemu_arm64_defconfig
>          > > > >>index f6e586627a8e..b70ffae8bf6e 100644
>          > > > >>--- a/configs/qemu_arm64_defconfig
>          > > > >>+++ b/configs/qemu_arm64_defconfig
>          > > > >>@@ -1,7 +1,7 @@
>          > > > >> CONFIG_ARM=y
>          > > > >> CONFIG_POSITION_INDEPENDENT=y
>          > > > >> CONFIG_ARCH_QEMU=y
>          > > > >>-CONFIG_NR_DRAM_BANKS=1
>          > > > >>+CONFIG_NR_DRAM_BANKS=2
>          > > > >> CONFIG_ENV_SIZE=0x40000
>          > > > >> CONFIG_ENV_SECT_SIZE=0x40000
>          > > > >> CONFIG_AHCI=y
>          > > > >>
>          > > > >>2)
>          > > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
>          > > > >>index 4b097fb588ed..4067ea2dead6 100644
>          > > > >>--- a/lib/fdtdec.c
>          > > > >>+++ b/lib/fdtdec.c
>          > > > >>@@ -1111,7 +1111,7 @@ int
>         fdtdec_setup_memory_banksize(void)
>          > > > >>                return -EINVAL;
>          > > > >>        }
>          > > > >>
>          > > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS;
>         bank++) {
>          > > > >>+       for (bank = 0; ; bank++) {
>          > > > >>                ret = ofnode_read_resource(mem, reg++,
>         &res);
>          > > > >>                if (ret < 0) {
>          > > > >>                        reg = 0;
>          > > > >>
>          > > > >>   (fdtdec_setup_memory_banksize() is called in
>         dram_init_banksize().)
>          > > > >>
>          > > > >>
>          > > > >>(2) seems much better, but I don't know why we had to use
>          > > > >>CONFIG_NR_DRAM_BANKS here.
>          > > > >>
>          > >
>          > > 2) alone does not work as other places in the code refer to
>          > > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code
>         work and
>          > > bdinfo seems now correct:
>          > >
>          > => bdinfo
>          > > boot_params = 0x0000000000000000
>          > > DRAM bank   = 0x0000000000000000
>          > > -> start    = 0x0000000140000000
>          > > -> size     = 0x0000000100000000
>          > > DRAM bank   = 0x0000000000000001
>          > > -> start    = 0x0000000040000000
>          > > -> size     = 0x0000000100000000
>          > > flashstart  = 0x0000000000000000
>          > > flashsize   = 0x0000000004000000
>          > > flashoffset = 0x00000000000bcb88
>          > > baudrate    = 115200 bps
>          > > relocaddr   = 0x000000013ff27000
>          > > reloc off   = 0x000000013ff27000
>          > > Build       = 64-bit
>          > > current eth = virtio-net#32
>          > > ethaddr     = 52:52:52:52:52:52
>          > > IP addr     = <NULL>
>          > > fdt_blob    = 0x000000013ede6cf0
>          > > new_fdt     = 0x000000013ede6cf0
>          > > fdt_size    = 0x0000000000100000
>          > > lmb_dump_all:
>          > >     memory.cnt   = 0x1
>          > >     memory.reg[0x0].base   = 0x40000000
>          > >   .size   = 0x200000000
>          > >     reserved.cnt   = 0x1
>          > >     reserved.reg[0x0].base = 0x13ede58f0
>          > >     .size = 0x121a710
>          > > arch_number = 0x0000000000000000
>          > > TLB addr    = 0x000000013fff0000
>          > > irq_sp      = 0x000000013ede6ce0
>          > > sp start    = 0x000000013ede6ce0
>          > > Early malloc usage: 3a8 / 2000
>          > >
>          > > May I suggest you propose a combined patch Akashi-san? If
>         we assume
>          > > NUMA systems to be tested up to 8 nodes to mimic real existing
>          > > enterprise hardware and up to 4 memory slots (say for
>         memory hot
>          > > plugging tests) what about a default value of 32?
>         Alternatively, we
>          > > could set this value to a much higher one if the costs are
>         negligible.
>          > >
>          > >
>          > > Well, lets not rush as there are other twists:
>          >
>          > the 4G bank in node 1 is marked BootServicesData in the UEFI
>         GetMemoryMap
>          > which I assume is not the case. EDK2 reports it as
>         ConventionalMemory.
>          >
>          > The root cause seem to be gd->ramtop not being setup properly.
>          >
>          > Further analysis shows that the DT passed to the booted EFI
>         payload does
>          > not seem to be correct:
>          >
>          > DT fragment passed to U-Boot
>          >
>          > memory@140000000 {
>          > numa-node-id = <0x00000001>;
>          > reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
>          > device_type = "memory";
>          > };
>          > memory@40000000 {
>          > numa-node-id = <0x00000000>;
>          > reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
>          > device_type = "memory";
>          > };
>          >
>          > DT passed to payload (as per my debug code):
>          >
>          > memory@140000000: memory
>          >
>          >     numa-node-id 1
>          >
>          >     reg (len= 32)
>          >
>          >          140000000 100000000
>          >
>          >          40000000 100000000
>          >
>          > memory@40000000: memory
>          >
>          >     numa-node-id 0
>          >
>          >     reg (len= 16)
>          >
>          >          40000000 100000000
>          >
>          > I am investigating this further...
>
>         You should check the logic of fdt_fixup_memory_banks()
>         which is called this way:
>            efi_dt_fixup()
>              image_setup_libfdt()
>                arch_fixup_fdt()
>                  fdt_fixup_memory_banks()
>
>         What it does is to put *all* the memory regions unconditionally as
>         a single "reg" array into the *first-detected* "memory" node,
>         which is
>         "memory@140000000" in this case.
>         It means that this function doesn't respect NUMA configuration.
>
>     Thanks.
>
>     tweaking ram_top to be the correct in
>     https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
>     <https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732>
>     results in memory on node 1 to be considered as ConventionalMemory
>     but U-Boot places all code and data at the end of node 1 while it
>     should position it on the current node. That said it is an
>     acceptable work around for my test case.
>
>     Bottom line, we need to introduce NUMA node management in memory
>     management all over the place.
>     It is unclear if there is a business case for that. I'll ask LEDGE
>     members...
>
>         -Takahiro Akashi
>
>
>          > > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS
>         in this file
>          > > > >>should be replaced with a variable for it.
>          > > > >>
>          > > > >>> Your use case is well beyond the typical U-Boot
>         usage. So I guess it
>          > > > >>> will be up to Linaro to provide the necessary patches:
>          > > > >>>
>          > > > >>> * determine the active CPU
>          > > > >>> * determine the RAM assigned to the active CPU according
>          > > > >>>   to the numa-node-id in the device-tree
>          > > > >>> * make sure that U-Boot only uses the memory of the
>         active CPU
>          > > > >>>   internally
>          > > > >>> * make sure that the UEFI memory map contains a compliant
>          > > > >description
>          > > > >>> * possibly, dynamically set up the environment variables
>          > > > >>>
>          > > > >>> +CC Tuomas Tynkkynen (maintainer for
>         qemu_arm64_defconfig)
>          > > > >>
>          > > > >>For (1), we'd better have a different config, or increase
>          > > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
>          > > > >
>          > > > >Is the system configured such that each CPU can access
>         the others CPU's
>          > > > >RAM when entering U-Boot?
>          > > > >
>          > > > >Best regards
>          > > > >
>          > > > >Heinrich
>          > > > >
>          > > >
>          > > > At least the comments for this patch sound as if on a
>         physical system
>          > > cross NUMA node memory access is only available after full SMP
>          > > initialization:
>          > > >
>          > > >
>          > >
>         https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
>         <https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/>
>          > > >
>          > > > QEMU may be less restrictive.
>          > > >
>          > > > QEMU allows the node distance to be 255 indicating that
>         cross node
>          > > access is infeasible.
>          > > >
>          > > > Best regards
>          > > >
>          > > > Heinrich
>          > > >
>          > > > >>
>          > > > >>-Takahiro Akashi
>          > > > >>
>          > > > >>
>          > > > >>> Best regards
>          > > > >>>
>          > > > >>> Heinrich
>          > > >
>          > >
>          > >
>          > > --
>          > > François-Frédéric Ozog | Director Business Development
>          > > T: +33.67221.6485
>          > > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>         | Skype: ffozog
>          > >
>          >
>          >
>          > --
>          > François-Frédéric Ozog | *Director Business Development*
>          > T: +33.67221.6485
>          > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> |
>         Skype: ffozog
>
>
>
>     --
>
>     François-Frédéric Ozog | /Director Business Development/
>     T: +33.67221.6485
>     francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
>     | Skype: ffozog
>
>
>
>
> --
>
> François-Frédéric Ozog | /Director Business Development/
> T: +33.67221.6485
> francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> | Skype: ffozog
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2021-07-07 17:39                   ` Heinrich Schuchardt
@ 2022-03-23  7:29                     ` François Ozog
  2022-03-23  8:20                       ` Mark Kettenis
  0 siblings, 1 reply; 13+ messages in thread
From: François Ozog @ 2022-03-23  7:29 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: AKASHI Takahiro, Ilias Apalodimas, Tuomas Tynkkynen, U-Boot Mailing List

Le mer. 7 juil. 2021 à 19:39, Heinrich Schuchardt <xypron.glpk@gmx.de> a
écrit :

>
>
> On 7/7/21 5:15 PM, François Ozog wrote:
> > top posting what now works for me:
> > - changed calculation of memory size to loop through different memory
> nodes
> > - added numa_node to banks
> > - filter out banks that do not match the target mixup node (do not want
> > to change ABI to add node information)
> >
> > That's not satisfying overall but at least my code works with NUMA on
> Qemu.
>
> Do we expect real hardware with NUMA to be using U-Boot?

with new Apple M1 Ultra and NVidia Grace I think that we have examples of
what we will find in some automotive Domain Controller Units.
Are there other signs for NUMA requirements for U-Boot?


>
> If you have real ARM hardware in NUMA configuration, can the boot CPU
> access memory of dormant CPUs when U-Boot is entered?
>
> Best regards
>
> Heinrich
>
> >
> >
> > *diff --git a/Kconfig b/Kconfig*
> >
> > *index f8c1a77bed..4d3ab8cb49 100644*
> >
> > *--- a/Kconfig*
> >
> > *+++ b/Kconfig*
> >
> > @@ -192,7 +192,9 @@config NR_DRAM_BANKS
> >
> > default 1 if ARCH_SUNXI || ARCH_OWL
> >
> > default 4
> >
> > help
> >
> > - This defines the number of DRAM banks.
> >
> > +This defines the number of DRAM banks. For qemu with NUMA, you
> >
> > +may want to set this value to <slots> * <possible memdev>.
> >
> > +for instance, for a 2 slot with 4 memdevs set NR_DRAM_BANKS to 8.
> >
> > config SYS_BOOT_GET_CMDLINE
> >
> > bool "Enable kernel command line setup"
> >
> > *diff --git a/arch/arm/lib/bootm-fdt.c b/arch/arm/lib/bootm-fdt.c*
> >
> > *index 29020bd1c6..2b28ab8108 100644*
> >
> > *--- a/arch/arm/lib/bootm-fdt.c*
> >
> > *+++ b/arch/arm/lib/bootm-fdt.c*
> >
> > @@ -42,12 +42,17 @@int arch_fixup_fdt(void *blob)
> >
> > u64 size[CONFIG_NR_DRAM_BANKS];
> >
> > for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> >
> > +unsigned char node = bd->bi_dram[bank].numa_node;
> >
> > start[bank] = bd->bi_dram[bank].start;
> >
> > size[bank] = bd->bi_dram[bank].size;
> >
> > #ifdef CONFIG_ARMV7_NONSEC
> >
> > ret = armv7_apply_memory_carveout(&start[bank], &size[bank]);
> >
> > if (ret)
> >
> > return ret;
> >
> > +#endif
> >
> > +#ifdef CONFIG_OF_LIBFDT
> >
> > +/* add node info for the fdt_fixup_memory below */
> >
> > +start[bank] = (((phys_addr_t)node) << 56) | bd->bi_dram[bank].start;
> >
> > #endif
> >
> > }
> >
> > *diff --git a/common/fdt_support.c b/common/fdt_support.c*
> >
> > *index a9a32df1e7..3bca2ba888 100644*
> >
> > *--- a/common/fdt_support.c*
> >
> > *+++ b/common/fdt_support.c*
> >
> > @@ -415,16 +415,29 @@static int fdt_pack_reg(const void *fdt, void *buf,
> > u64 *address, u64 *size,
> >
> > return p - (char *)buf;
> >
> > }
> >
> > +static inline uint32_t fdt32_ld(const fdt32_t *p)
> >
> > +{
> >
> > +const uint8_t *bp = (const uint8_t *)p;
> >
> > +
> >
> > +return ((uint32_t)bp[0] << 24)
> >
> > + | ((uint32_t)bp[1] << 16)
> >
> > + | ((uint32_t)bp[2] << 8)
> >
> > + | bp[3];
> >
> > +}
> >
> > #if CONFIG_NR_DRAM_BANKS > 4
> >
> > #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS
> >
> > #else
> >
> > #define MEMORY_BANKS_MAX 4
> >
> > #endif
> >
> > +/* NUMA has yet to be properly handled
> >
> > + * This code appends memory to the first memory node that matches the
> > NUMA node.
> >
> > + */
> >
> > int fdt_fixup_memory_banks(void *blob, u64 start[], u64 size[], int
> banks)
> >
> > {
> >
> > int err, nodeoffset;
> >
> > int len, i;
> >
> > u8 tmp[MEMORY_BANKS_MAX * 16]; /* Up to 64-bit address + 64-bit size */
> >
> > +unsigned int numa_node;
> >
> > if (banks > MEMORY_BANKS_MAX) {
> >
> > printf("%s: num banks %d exceeds hardcoded limit %d."
> >
> > @@ -444,6 +457,12 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > u64 size[], int banks)
> >
> > if (nodeoffset < 0)
> >
> > return nodeoffset;
> >
> > +const __be32* numa_node_prop = fdt_getprop(blob, nodeoffset,
> > "numa-node-id", &len);
> >
> > +if (numa_node_prop != NULL && len == sizeof(__be32)) {
> >
> > +numa_node = fdt32_ld(numa_node_prop);
> >
> > +}
> >
> > +else numa_node = 0;
> >
> > +
> >
> > err = fdt_setprop(blob, nodeoffset, "device_type", "memory",
> >
> > sizeof("memory"));
> >
> > if (err < 0) {
> >
> > @@ -453,8 +472,27 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > u64 size[], int banks)
> >
> > }
> >
> > for (i = 0; i < banks; i++) {
> >
> > - if (start[i] == 0 && size[i] == 0)
> >
> > +/* clear node information */
> >
> > +unsigned int node;
> >
> > +recheck:
> >
> > +if (start[i]== 0 && size[i] == 0)
> >
> > break;
> >
> > +node = (start[i] >> 56) & 0xFF;
> >
> > +start[i] = start[i] & 0x00FFFFFFFFFFFFFF;
> >
> > +/* for the moment, just ignore the banks that are not in
> >
> > +* memory NUMA node */
> >
> > +if (node != numa_node) {
> >
> > +/* remove the bank from the list */
> >
> > +int j;
> >
> > +for (j=i; j < banks-1; j++) {
> >
> > +start[j] = start[j+1];
> >
> > +size[j] = size[j+1];
> >
> > +}
> >
> > +start[j]=0;
> >
> > +size[j]=0;
> >
> > +banks--;
> >
> > +goto recheck;
> >
> > +}
> >
> > }
> >
> > banks = i;
> >
> > @@ -470,6 +508,7 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > u64 size[], int banks)
> >
> > "reg", fdt_strerror(err));
> >
> > return err;
> >
> > }
> >
> > +
> >
> > return 0;
> >
> > }
> >
> > *diff --git a/configs/qemu_arm64_defconfig
> b/configs/qemu_arm64_defconfig*
> >
> > *index f6e586627a..0fdc22d71d 100644*
> >
> > *--- a/configs/qemu_arm64_defconfig*
> >
> > *+++ b/configs/qemu_arm64_defconfig*
> >
> > @@ -1,7 +1,7 @@
> >
> > CONFIG_ARM=y
> >
> > CONFIG_POSITION_INDEPENDENT=y
> >
> > CONFIG_ARCH_QEMU=y
> >
> > -CONFIG_NR_DRAM_BANKS=1
> >
> > +CONFIG_NR_DRAM_BANKS=32
> >
> > CONFIG_ENV_SIZE=0x40000
> >
> > CONFIG_ENV_SECT_SIZE=0x40000
> >
> > CONFIG_AHCI=y
> >
> > *diff --git a/include/asm-generic/u-boot.h
> b/include/asm-generic/u-boot.h*
> >
> > *index 637de0c455..3cf45124ec 100644*
> >
> > *--- a/include/asm-generic/u-boot.h*
> >
> > *+++ b/include/asm-generic/u-boot.h*
> >
> > @@ -71,6 +71,8 @@struct bd_info {
> >
> > struct {/* RAM configuration */
> >
> > phys_addr_t start;
> >
> > phys_size_t size;
> >
> > +unsigned numa_node;
> >
> > +unsigned pad; /* just to make sure we do not cause alignment */
> >
> > } bi_dram[CONFIG_NR_DRAM_BANKS];
> >
> > };
> >
> > *diff --git a/lib/fdtdec.c b/lib/fdtdec.c*
> >
> > *index 4b097fb588..b1934edc8d 100644*
> >
> > *--- a/lib/fdtdec.c*
> >
> > *+++ b/lib/fdtdec.c*
> >
> > @@ -1064,77 +1064,101 @@int fdtdec_decode_display_timing(const void
> > *blob, int parent, int index,
> >
> > return ret;
> >
> > }
> >
> > +ofnode get_next_memory_node(ofnode mem)
> >
> > +{
> >
> > +do {
> >
> > +mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
> >
> > +} while (!ofnode_is_available(mem));
> >
> > +
> >
> > +return mem;
> >
> > +}
> >
> > +
> >
> > int fdtdec_setup_mem_size_base(void)
> >
> > {
> >
> > int ret;
> >
> > +int reg;
> >
> > ofnode mem;
> >
> > struct resource res;
> >
> > +phys_addr_t base = ~0;
> >
> > +phys_size_t size = 0;;
> >
> > - mem = ofnode_path("/memory");
> >
> > - if (!ofnode_valid(mem)) {
> >
> > - debug("%s: Missing /memory node\n", __func__);
> >
> > - return -EINVAL;
> >
> > - }
> >
> > +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> > get_next_memory_node(mem)) {
> >
> > - ret = ofnode_read_resource(mem, 0, &res);
> >
> > - if (ret != 0) {
> >
> > - debug("%s: Unable to decode first memory bank\n", __func__);
> >
> > - return -EINVAL;
> >
> > +for(reg = 0, ret = 0; ret == 0 ; reg++) {
> >
> > +ret = ofnode_read_resource(mem, reg, &res);
> >
> > +if (ret != 0)
> >
> > +break;
> >
> > +if ((phys_addr_t)res.start < base)
> >
> > +base = (phys_addr_t)res.start;
> >
> > +size += (phys_size_t)(res.end - res.start + 1);
> >
> > +}
> >
> > }
> >
> > +
> >
> > +gd->ram_base = (unsigned long)base;
> >
> > +gd->ram_size = (phys_size_t)size;
> >
> > - gd->ram_size = (phys_size_t)(res.end - res.start + 1);
> >
> > - gd->ram_base = (unsigned long)res.start;
> >
> > +debug("%s: Initial DRAM base %llx\n", __func__,
> >
> > +(unsigned long long)gd->ram_base);
> >
> > debug("%s: Initial DRAM size %llx\n", __func__,
> >
> > (unsigned long long)gd->ram_size);
> >
> > return 0;
> >
> > }
> >
> > -ofnode get_next_memory_node(ofnode mem)
> >
> > -{
> >
> > - do {
> >
> > - mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
> >
> > - } while (!ofnode_is_available(mem));
> >
> > -
> >
> > - return mem;
> >
> > -}
> >
> > -
> >
> > int fdtdec_setup_memory_banksize(void)
> >
> > {
> >
> > int bank, ret, reg = 0;
> >
> > struct resource res;
> >
> > ofnode mem = ofnode_null();
> >
> > +const __be32* numa_node_prop = NULL;
> >
> > +int len;
> >
> > +int numa_node = -1;
> >
> > +int count = 0;
> >
> > +
> >
> > +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> > get_next_memory_node(mem)) {
> >
> > - mem = get_next_memory_node(mem);
> >
> > - if (!ofnode_valid(mem)) {
> >
> > - debug("%s: Missing /memory node\n", __func__);
> >
> > - return -EINVAL;
> >
> > +count++;
> >
> > +
> >
> > +numa_node_prop = ofnode_get_property(mem, "numa-node-id", &len);
> >
> > +if (numa_node_prop != NULL && len == sizeof(__be32)) {
> >
> > +numa_node = of_read_number(numa_node_prop, 1);
> >
> > }
> >
> > +else numa_node = 0;
> >
> > - for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> >
> > - ret = ofnode_read_resource(mem, reg++, &res);
> >
> > - if (ret < 0) {
> >
> > - reg = 0;
> >
> > - mem = get_next_memory_node(mem);
> >
> > - if (!ofnode_valid(mem))
> >
> > - break;
> >
> > +debug("Found memory for node %d\n", numa_node);
> >
> > - ret = ofnode_read_resource(mem, reg++, &res);
> >
> > - if (ret < 0)
> >
> > - break;
> >
> > - }
> >
> > +ret = 0;
> >
> > +for(reg = 0; ret == 0 && bank < CONFIG_NR_DRAM_BANKS; reg++) {
> >
> > +ret = ofnode_read_resource(mem, reg, &res);
> >
> > if (ret != 0)
> >
> > - return -EINVAL;
> >
> > +break;
> >
> > gd->bd->bi_dram[bank].start = (phys_addr_t)res.start;
> >
> > gd->bd->bi_dram[bank].size =
> >
> > (phys_size_t)(res.end - res.start + 1);
> >
> > +gd->bd->bi_dram[bank].numa_node = numa_node;
> >
> > - debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx\n",
> >
> > +debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx"
> >
> > +" name_node = %d\n",
> >
> > __func__, bank,
> >
> > (unsigned long long)gd->bd->bi_dram[bank].start,
> >
> > - (unsigned long long)gd->bd->bi_dram[bank].size);
> >
> > +(unsigned long long)gd->bd->bi_dram[bank].size,
> >
> > +gd->bd->bi_dram[bank].numa_node);
> >
> > +
> >
> > +bank++;
> >
> > +}
> >
> > +
> >
> > +
> >
> > +}
> >
> > +
> >
> > +if (count == 0) {
> >
> > +debug("%s: Missing /memory node\n", __func__);
> >
> > +return -EINVAL;
> >
> > +}
> >
> > +if (bank >= CONFIG_NR_DRAM_BANKS) {
> >
> > +printf("Too many DT memory nodes for CONFIG_NR_DRAM_BANKS=%d\n",
> >
> > +CONFIG_NR_DRAM_BANKS);
> >
> > }
> >
> > return 0;
> >
> >
> > On Wed, 7 Jul 2021 at 13:00, François Ozog <francois.ozog@linaro.org
> > <mailto:francois.ozog@linaro.org>> wrote:
> >
> >
> >
> >     On Wed, 7 Jul 2021 at 12:16, AKASHI Takahiro
> >     <takahiro.akashi@linaro.org <mailto:takahiro.akashi@linaro.org>>
> wrote:
> >
> >         On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> >          > On Wed, 7 Jul 2021 at 09:40, François Ozog
> >         <francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>>
> wrote:
> >          >
> >          > > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt
> >         <xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>
> >          > > wrote:
> >          > > >
> >          > > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt
> <
> >          > > xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>:
> >          > > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> >          > > > ><takahiro.akashi@linaro.org
> >         <mailto:takahiro.akashi@linaro.org>>:
> >          > > > >>François,
> >          > > > >>
> >          > > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich
> >         Schuchardt wrote:
> >          > > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> >          > > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into
> >         account memory
> >          > > > >>> > description when using Qemu 5.2 NUMA configuration
> >         to adapt memory
> >          > > > >>map
> >          > > > >>> > (kernel_addr_r...):
> >          > > > >>> >
> >          > > > >>> >         -smp 4 \
> >          > > > >>> >           -m 8G,slots=2,maxmem=16G \
> >          > > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> >          > > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> >          > > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> >          > > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> >          > > > >>> >
> >          > > > >>> > kernel_addr_r is still 0x4040000 and thus you can't
> >         use it to
> >          > > > >>bootefi.
> >          > > > >>> >
> >          > > > >>> > fdt addr 0x13ede6de0; fdt print
> >          > > > >>> >
> >          > > > >>> > Displays fdt while I think it should not.
> >          > > > >>> >
> >          > > > >>> > If I load the kernel at dram.start, the load works
> >         but not boot
> >          > > > >>> >
> >          > > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> >          > > > >>> >
> >          > > > >>> >
> >          > > > >>> > DRAM:4 GiB
> >          > > > >>> >
> >          > > > >>> > Flash: 64 MiB
> >          > > > >>> >
> >          > > > >>> > Loading Environment from Flash... OK
> >          > > > >>> >
> >          > > > >>> > In:pl011@9000000
> >          > > > >>> >
> >          > > > >>> > Out: pl011@9000000
> >          > > > >>> >
> >          > > > >>> > Err: pl011@9000000
> >          > > > >>> >
> >          > > > >>> > Net: eth0: virtio-net#32
> >          > > > >>> >
> >          > > > >>> > Hit any key to stop autoboot:0
> >          > > > >>> >
> >          > > > >>> > =>
> >          > > > >>> >
> >          > > > >>> > => bdinfo
> >          > > > >>> >
> >          > > > >>> > boot_params = 0x0000000000000000
> >          > > > >>> >
> >          > > > >>> > DRAM bank = 0x0000000000000000
> >          > > > >>> >
> >          > > > >>> > -> start= 0x0000000140000000
> >          > > > >>> >
> >          > > > >>> > -> size = 0x0000000100000000
> >          > > > >>> >
> >          > > > >>> > flashstart= 0x0000000000000000
> >          > > > >>> >
> >          > > > >>> > flashsize = 0x0000000004000000
> >          > > > >>> >
> >          > > > >>> > flashoffset = 0x00000000000bc990
> >          > > > >>> >
> >          > > > >>> > baudrate= 115200 bps
> >          > > > >>> >
> >          > > > >>> > relocaddr = 0x000000013ff27000
> >          > > > >>> >
> >          > > > >>> > reloc off = 0x000000013ff27000
> >          > > > >>> >
> >          > > > >>> > Build = 64-bit
> >          > > > >>> >
> >          > > > >>> > current eth = virtio-net#32
> >          > > > >>> >
> >          > > > >>> > ethaddr = 52:52:52:52:52:52
> >          > > > >>> >
> >          > > > >>> > IP addr = <NULL>
> >          > > > >>> >
> >          > > > >>> > fdt_blob= 0x000000013ede6de0
> >          > > > >>> >
> >          > > > >>> > new_fdt = 0x000000013ede6de0
> >          > > > >>> >
> >          > > > >>> > fdt_size= 0x0000000000100000
> >          > > > >>> >
> >          > > > >>> > lmb_dump_all:
> >          > > > >>> >
> >          > > > >>> > memory.cnt= 0x1
> >          > > > >>> >
> >          > > > >>> > memory.reg[0x0].base = 0x140000000
> >          > > > >>> >
> >          > > > >>> > .size = 0x100000000
> >          > > > >>> >
> >          > > > >>> >
> >          > > > >>> > reserved.cnt= 0x0
> >          > > > >>> >
> >          > > > >>> > arch_number = 0x0000000000000000
> >          > > > >>> >
> >          > > > >>> > TLB addr= 0x000000013fff0000
> >          > > > >>> >
> >          > > > >>> > irq_sp= 0x000000013ede6dd0
> >          > > > >>> >
> >          > > > >>> > sp start= 0x000000013ede6dd0
> >          > > > >>> >
> >          > > > >>> > Early malloc usage: 3a8 / 2000
> >          > > > >>> >
> >          > > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> >          > > > >>> >
> >          > > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> >          > > > >>> >
> >          > > > >>> > => bootefi0x140000000 0x13ede6dd0
> >          > > > >>> >
> >          > > > >>> > ERROR: Failed to register WaitForKey event
> >          > > > >>> >
> >          > > > >>> > Setting OsIndications failed
> >          > > > >>> >
> >          > > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> >          > > > >>> >
> >          > > > >>> >
> >          > > > >>> > I think there is a need to calculate memory map
> >         based on previous
> >          > > > >>> > firmware (TFA, QEMU can be considered as previous
> >         frimware)
> >          > > > >>information
> >          > > > >>> > (DT or blob_list).
> >          > > > >>> >
> >          > > > >>> > What do you think ?
> >          > > > >>> >
> >          > > > >>> > Cheers
> >          > > > >>> >
> >          > > > >>> > FF
> >          > > > >>> >
> >          > > > >>> > --
> >          > > > >>> >
> >          > > > >>> > François-Frédéric Ozog | /Director Business
> >         Development/
> >          > > > >>> > T: +33.67221.6485
> >          > > > >>> > francois.ozog@linaro.org
> >         <mailto:francois.ozog@linaro.org>
> >         <mailto:francois.ozog@linaro.org <mailto:
> francois.ozog@linaro.org>>
> >          > > > >>| Skype: ffozog
> >          > > > >>> >
> >          > > > >>> >
> >          > > > >>>
> >          > > > >>> The kernel load address is hard coded here:
> >          > > > >>> include/configs/qemu-arm.h:41:
> >         "kernel_addr_r=0x40400000\0" \
> >          > > > >>>
> >          > > > >>> bdinfo shows:
> >          > > > >>> DRAM start = 0x140000000
> >          > > > >>> DRAM size  = 0x100000000
> >          > > > >>>
> >          > > > >>> fdt addr $fdt_addr
> >          > > > >>> fdt printf
> >          > > > >>>
> >          > > > >>> shows two memory areas. One at 40000000, one at
> >         140000000.
> >          > > > >>
> >          > > > >>(This shows that U-Boot receives a correct memory map
> >         via dtb.)
> >          > > > >>
> >          > > > >>Is this a NUMA machine, isn't it? Why should we care of
> >         which
> >          > > > >>memory region be used here? Please note that this is a
> >         virtual
> >          > > > >machine,
> >          > > > >>there is no practical difference between two regions.
> >          > > > >>
> >          > > > >>The root problem is that U-Boot did not recognize there
> >         were two
> >          > > > >>memory regions. We can fix this issue in either way:
> >          > > > >>
> >          > > > >>1)
> >          > > > >>diff --git a/configs/qemu_arm64_defconfig
> >          > > > >>b/configs/qemu_arm64_defconfig
> >          > > > >>index f6e586627a8e..b70ffae8bf6e 100644
> >          > > > >>--- a/configs/qemu_arm64_defconfig
> >          > > > >>+++ b/configs/qemu_arm64_defconfig
> >          > > > >>@@ -1,7 +1,7 @@
> >          > > > >> CONFIG_ARM=y
> >          > > > >> CONFIG_POSITION_INDEPENDENT=y
> >          > > > >> CONFIG_ARCH_QEMU=y
> >          > > > >>-CONFIG_NR_DRAM_BANKS=1
> >          > > > >>+CONFIG_NR_DRAM_BANKS=2
> >          > > > >> CONFIG_ENV_SIZE=0x40000
> >          > > > >> CONFIG_ENV_SECT_SIZE=0x40000
> >          > > > >> CONFIG_AHCI=y
> >          > > > >>
> >          > > > >>2)
> >          > > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> >          > > > >>index 4b097fb588ed..4067ea2dead6 100644
> >          > > > >>--- a/lib/fdtdec.c
> >          > > > >>+++ b/lib/fdtdec.c
> >          > > > >>@@ -1111,7 +1111,7 @@ int
> >         fdtdec_setup_memory_banksize(void)
> >          > > > >>                return -EINVAL;
> >          > > > >>        }
> >          > > > >>
> >          > > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS;
> >         bank++) {
> >          > > > >>+       for (bank = 0; ; bank++) {
> >          > > > >>                ret = ofnode_read_resource(mem, reg++,
> >         &res);
> >          > > > >>                if (ret < 0) {
> >          > > > >>                        reg = 0;
> >          > > > >>
> >          > > > >>   (fdtdec_setup_memory_banksize() is called in
> >         dram_init_banksize().)
> >          > > > >>
> >          > > > >>
> >          > > > >>(2) seems much better, but I don't know why we had to
> use
> >          > > > >>CONFIG_NR_DRAM_BANKS here.
> >          > > > >>
> >          > >
> >          > > 2) alone does not work as other places in the code refer to
> >          > > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code
> >         work and
> >          > > bdinfo seems now correct:
> >          > >
> >          > => bdinfo
> >          > > boot_params = 0x0000000000000000
> >          > > DRAM bank   = 0x0000000000000000
> >          > > -> start    = 0x0000000140000000
> >          > > -> size     = 0x0000000100000000
> >          > > DRAM bank   = 0x0000000000000001
> >          > > -> start    = 0x0000000040000000
> >          > > -> size     = 0x0000000100000000
> >          > > flashstart  = 0x0000000000000000
> >          > > flashsize   = 0x0000000004000000
> >          > > flashoffset = 0x00000000000bcb88
> >          > > baudrate    = 115200 bps
> >          > > relocaddr   = 0x000000013ff27000
> >          > > reloc off   = 0x000000013ff27000
> >          > > Build       = 64-bit
> >          > > current eth = virtio-net#32
> >          > > ethaddr     = 52:52:52:52:52:52
> >          > > IP addr     = <NULL>
> >          > > fdt_blob    = 0x000000013ede6cf0
> >          > > new_fdt     = 0x000000013ede6cf0
> >          > > fdt_size    = 0x0000000000100000
> >          > > lmb_dump_all:
> >          > >     memory.cnt   = 0x1
> >          > >     memory.reg[0x0].base   = 0x40000000
> >          > >   .size   = 0x200000000
> >          > >     reserved.cnt   = 0x1
> >          > >     reserved.reg[0x0].base = 0x13ede58f0
> >          > >     .size = 0x121a710
> >          > > arch_number = 0x0000000000000000
> >          > > TLB addr    = 0x000000013fff0000
> >          > > irq_sp      = 0x000000013ede6ce0
> >          > > sp start    = 0x000000013ede6ce0
> >          > > Early malloc usage: 3a8 / 2000
> >          > >
> >          > > May I suggest you propose a combined patch Akashi-san? If
> >         we assume
> >          > > NUMA systems to be tested up to 8 nodes to mimic real
> existing
> >          > > enterprise hardware and up to 4 memory slots (say for
> >         memory hot
> >          > > plugging tests) what about a default value of 32?
> >         Alternatively, we
> >          > > could set this value to a much higher one if the costs are
> >         negligible.
> >          > >
> >          > >
> >          > > Well, lets not rush as there are other twists:
> >          >
> >          > the 4G bank in node 1 is marked BootServicesData in the UEFI
> >         GetMemoryMap
> >          > which I assume is not the case. EDK2 reports it as
> >         ConventionalMemory.
> >          >
> >          > The root cause seem to be gd->ramtop not being setup properly.
> >          >
> >          > Further analysis shows that the DT passed to the booted EFI
> >         payload does
> >          > not seem to be correct:
> >          >
> >          > DT fragment passed to U-Boot
> >          >
> >          > memory@140000000 {
> >          > numa-node-id = <0x00000001>;
> >          > reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> >          > device_type = "memory";
> >          > };
> >          > memory@40000000 {
> >          > numa-node-id = <0x00000000>;
> >          > reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> >          > device_type = "memory";
> >          > };
> >          >
> >          > DT passed to payload (as per my debug code):
> >          >
> >          > memory@140000000: memory
> >          >
> >          >     numa-node-id 1
> >          >
> >          >     reg (len= 32)
> >          >
> >          >          140000000 100000000
> >          >
> >          >          40000000 100000000
> >          >
> >          > memory@40000000: memory
> >          >
> >          >     numa-node-id 0
> >          >
> >          >     reg (len= 16)
> >          >
> >          >          40000000 100000000
> >          >
> >          > I am investigating this further...
> >
> >         You should check the logic of fdt_fixup_memory_banks()
> >         which is called this way:
> >            efi_dt_fixup()
> >              image_setup_libfdt()
> >                arch_fixup_fdt()
> >                  fdt_fixup_memory_banks()
> >
> >         What it does is to put *all* the memory regions unconditionally
> as
> >         a single "reg" array into the *first-detected* "memory" node,
> >         which is
> >         "memory@140000000" in this case.
> >         It means that this function doesn't respect NUMA configuration.
> >
> >     Thanks.
> >
> >     tweaking ram_top to be the correct in
> >
> https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
> >     <
> https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
> >
> >     results in memory on node 1 to be considered as ConventionalMemory
> >     but U-Boot places all code and data at the end of node 1 while it
> >     should position it on the current node. That said it is an
> >     acceptable work around for my test case.
> >
> >     Bottom line, we need to introduce NUMA node management in memory
> >     management all over the place.
> >     It is unclear if there is a business case for that. I'll ask LEDGE
> >     members...
> >
> >         -Takahiro Akashi
> >
> >
> >          > > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS
> >         in this file
> >          > > > >>should be replaced with a variable for it.
> >          > > > >>
> >          > > > >>> Your use case is well beyond the typical U-Boot
> >         usage. So I guess it
> >          > > > >>> will be up to Linaro to provide the necessary patches:
> >          > > > >>>
> >          > > > >>> * determine the active CPU
> >          > > > >>> * determine the RAM assigned to the active CPU
> according
> >          > > > >>>   to the numa-node-id in the device-tree
> >          > > > >>> * make sure that U-Boot only uses the memory of the
> >         active CPU
> >          > > > >>>   internally
> >          > > > >>> * make sure that the UEFI memory map contains a
> compliant
> >          > > > >description
> >          > > > >>> * possibly, dynamically set up the environment
> variables
> >          > > > >>>
> >          > > > >>> +CC Tuomas Tynkkynen (maintainer for
> >         qemu_arm64_defconfig)
> >          > > > >>
> >          > > > >>For (1), we'd better have a different config, or
> increase
> >          > > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> >          > > > >
> >          > > > >Is the system configured such that each CPU can access
> >         the others CPU's
> >          > > > >RAM when entering U-Boot?
> >          > > > >
> >          > > > >Best regards
> >          > > > >
> >          > > > >Heinrich
> >          > > > >
> >          > > >
> >          > > > At least the comments for this patch sound as if on a
> >         physical system
> >          > > cross NUMA node memory access is only available after full
> SMP
> >          > > initialization:
> >          > > >
> >          > > >
> >          > >
> >
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> >         <
> https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> >
> >          > > >
> >          > > > QEMU may be less restrictive.
> >          > > >
> >          > > > QEMU allows the node distance to be 255 indicating that
> >         cross node
> >          > > access is infeasible.
> >          > > >
> >          > > > Best regards
> >          > > >
> >          > > > Heinrich
> >          > > >
> >          > > > >>
> >          > > > >>-Takahiro Akashi
> >          > > > >>
> >          > > > >>
> >          > > > >>> Best regards
> >          > > > >>>
> >          > > > >>> Heinrich
> >          > > >
> >          > >
> >          > >
> >          > > --
> >          > > François-Frédéric Ozog | Director Business Development
> >          > > T: +33.67221.6485
> >          > > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> >         | Skype: ffozog
> >          > >
> >          >
> >          >
> >          > --
> >          > François-Frédéric Ozog | *Director Business Development*
> >          > T: +33.67221.6485
> >          > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> |
> >         Skype: ffozog
> >
> >
> >
> >     --
> >
> >     François-Frédéric Ozog | /Director Business Development/
> >     T: +33.67221.6485
> >     francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> >     | Skype: ffozog
> >
> >
> >
> >
> > --
> >
> > François-Frédéric Ozog | /Director Business Development/
> > T: +33.67221.6485
> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> | Skype: ffozog
> >
> >
>
-- 
François-Frédéric Ozog | *Director Business Development*
T: +33.67221.6485
francois.ozog@linaro.org | Skype: ffozog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: QEMU NUMA and U-Boot
  2022-03-23  7:29                     ` François Ozog
@ 2022-03-23  8:20                       ` Mark Kettenis
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Kettenis @ 2022-03-23  8:20 UTC (permalink / raw)
  To: François Ozog
  Cc: xypron.glpk, takahiro.akashi, ilias.apalodimas, tuomas.tynkkynen, u-boot

> From: François Ozog <francois.ozog@linaro.org>
> Date: Wed, 23 Mar 2022 08:29:21 +0100
> 
> Le mer. 7 juil. 2021 à 19:39, Heinrich Schuchardt <xypron.glpk@gmx.de> a
> écrit :
> 
> >
> >
> > On 7/7/21 5:15 PM, François Ozog wrote:
> > > top posting what now works for me:
> > > - changed calculation of memory size to loop through different memory
> > nodes
> > > - added numa_node to banks
> > > - filter out banks that do not match the target mixup node (do not want
> > > to change ABI to add node information)
> > >
> > > That's not satisfying overall but at least my code works with NUMA on
> > Qemu.
> >
> > Do we expect real hardware with NUMA to be using U-Boot?
> 
> with new Apple M1 Ultra and NVidia Grace I think that we have examples of
> what we will find in some automotive Domain Controller Units.
> Are there other signs for NUMA requirements for U-Boot?

I don't think U-Boot will care at all about NUMA on Apple M1 Ultra.
Everything should be set up already by the time U-Boot takes control
and I don't see U-Boot caring about slight differences in memory
access time between memory ranges.

And I bet the situation on NVidia Grace will be similar.

> > If you have real ARM hardware in NUMA configuration, can the boot CPU
> > access memory of dormant CPUs when U-Boot is entered?
> >
> > Best regards
> >
> > Heinrich
> >
> > >
> > >
> > > *diff --git a/Kconfig b/Kconfig*
> > >
> > > *index f8c1a77bed..4d3ab8cb49 100644*
> > >
> > > *--- a/Kconfig*
> > >
> > > *+++ b/Kconfig*
> > >
> > > @@ -192,7 +192,9 @@config NR_DRAM_BANKS
> > >
> > > default 1 if ARCH_SUNXI || ARCH_OWL
> > >
> > > default 4
> > >
> > > help
> > >
> > > - This defines the number of DRAM banks.
> > >
> > > +This defines the number of DRAM banks. For qemu with NUMA, you
> > >
> > > +may want to set this value to <slots> * <possible memdev>.
> > >
> > > +for instance, for a 2 slot with 4 memdevs set NR_DRAM_BANKS to 8.
> > >
> > > config SYS_BOOT_GET_CMDLINE
> > >
> > > bool "Enable kernel command line setup"
> > >
> > > *diff --git a/arch/arm/lib/bootm-fdt.c b/arch/arm/lib/bootm-fdt.c*
> > >
> > > *index 29020bd1c6..2b28ab8108 100644*
> > >
> > > *--- a/arch/arm/lib/bootm-fdt.c*
> > >
> > > *+++ b/arch/arm/lib/bootm-fdt.c*
> > >
> > > @@ -42,12 +42,17 @@int arch_fixup_fdt(void *blob)
> > >
> > > u64 size[CONFIG_NR_DRAM_BANKS];
> > >
> > > for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > >
> > > +unsigned char node = bd->bi_dram[bank].numa_node;
> > >
> > > start[bank] = bd->bi_dram[bank].start;
> > >
> > > size[bank] = bd->bi_dram[bank].size;
> > >
> > > #ifdef CONFIG_ARMV7_NONSEC
> > >
> > > ret = armv7_apply_memory_carveout(&start[bank], &size[bank]);
> > >
> > > if (ret)
> > >
> > > return ret;
> > >
> > > +#endif
> > >
> > > +#ifdef CONFIG_OF_LIBFDT
> > >
> > > +/* add node info for the fdt_fixup_memory below */
> > >
> > > +start[bank] = (((phys_addr_t)node) << 56) | bd->bi_dram[bank].start;
> > >
> > > #endif
> > >
> > > }
> > >
> > > *diff --git a/common/fdt_support.c b/common/fdt_support.c*
> > >
> > > *index a9a32df1e7..3bca2ba888 100644*
> > >
> > > *--- a/common/fdt_support.c*
> > >
> > > *+++ b/common/fdt_support.c*
> > >
> > > @@ -415,16 +415,29 @@static int fdt_pack_reg(const void *fdt, void *buf,
> > > u64 *address, u64 *size,
> > >
> > > return p - (char *)buf;
> > >
> > > }
> > >
> > > +static inline uint32_t fdt32_ld(const fdt32_t *p)
> > >
> > > +{
> > >
> > > +const uint8_t *bp = (const uint8_t *)p;
> > >
> > > +
> > >
> > > +return ((uint32_t)bp[0] << 24)
> > >
> > > + | ((uint32_t)bp[1] << 16)
> > >
> > > + | ((uint32_t)bp[2] << 8)
> > >
> > > + | bp[3];
> > >
> > > +}
> > >
> > > #if CONFIG_NR_DRAM_BANKS > 4
> > >
> > > #define MEMORY_BANKS_MAX CONFIG_NR_DRAM_BANKS
> > >
> > > #else
> > >
> > > #define MEMORY_BANKS_MAX 4
> > >
> > > #endif
> > >
> > > +/* NUMA has yet to be properly handled
> > >
> > > + * This code appends memory to the first memory node that matches the
> > > NUMA node.
> > >
> > > + */
> > >
> > > int fdt_fixup_memory_banks(void *blob, u64 start[], u64 size[], int
> > banks)
> > >
> > > {
> > >
> > > int err, nodeoffset;
> > >
> > > int len, i;
> > >
> > > u8 tmp[MEMORY_BANKS_MAX * 16]; /* Up to 64-bit address + 64-bit size */
> > >
> > > +unsigned int numa_node;
> > >
> > > if (banks > MEMORY_BANKS_MAX) {
> > >
> > > printf("%s: num banks %d exceeds hardcoded limit %d."
> > >
> > > @@ -444,6 +457,12 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > > u64 size[], int banks)
> > >
> > > if (nodeoffset < 0)
> > >
> > > return nodeoffset;
> > >
> > > +const __be32* numa_node_prop = fdt_getprop(blob, nodeoffset,
> > > "numa-node-id", &len);
> > >
> > > +if (numa_node_prop != NULL && len == sizeof(__be32)) {
> > >
> > > +numa_node = fdt32_ld(numa_node_prop);
> > >
> > > +}
> > >
> > > +else numa_node = 0;
> > >
> > > +
> > >
> > > err = fdt_setprop(blob, nodeoffset, "device_type", "memory",
> > >
> > > sizeof("memory"));
> > >
> > > if (err < 0) {
> > >
> > > @@ -453,8 +472,27 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > > u64 size[], int banks)
> > >
> > > }
> > >
> > > for (i = 0; i < banks; i++) {
> > >
> > > - if (start[i] == 0 && size[i] == 0)
> > >
> > > +/* clear node information */
> > >
> > > +unsigned int node;
> > >
> > > +recheck:
> > >
> > > +if (start[i]== 0 && size[i] == 0)
> > >
> > > break;
> > >
> > > +node = (start[i] >> 56) & 0xFF;
> > >
> > > +start[i] = start[i] & 0x00FFFFFFFFFFFFFF;
> > >
> > > +/* for the moment, just ignore the banks that are not in
> > >
> > > +* memory NUMA node */
> > >
> > > +if (node != numa_node) {
> > >
> > > +/* remove the bank from the list */
> > >
> > > +int j;
> > >
> > > +for (j=i; j < banks-1; j++) {
> > >
> > > +start[j] = start[j+1];
> > >
> > > +size[j] = size[j+1];
> > >
> > > +}
> > >
> > > +start[j]=0;
> > >
> > > +size[j]=0;
> > >
> > > +banks--;
> > >
> > > +goto recheck;
> > >
> > > +}
> > >
> > > }
> > >
> > > banks = i;
> > >
> > > @@ -470,6 +508,7 @@int fdt_fixup_memory_banks(void *blob, u64 start[],
> > > u64 size[], int banks)
> > >
> > > "reg", fdt_strerror(err));
> > >
> > > return err;
> > >
> > > }
> > >
> > > +
> > >
> > > return 0;
> > >
> > > }
> > >
> > > *diff --git a/configs/qemu_arm64_defconfig
> > b/configs/qemu_arm64_defconfig*
> > >
> > > *index f6e586627a..0fdc22d71d 100644*
> > >
> > > *--- a/configs/qemu_arm64_defconfig*
> > >
> > > *+++ b/configs/qemu_arm64_defconfig*
> > >
> > > @@ -1,7 +1,7 @@
> > >
> > > CONFIG_ARM=y
> > >
> > > CONFIG_POSITION_INDEPENDENT=y
> > >
> > > CONFIG_ARCH_QEMU=y
> > >
> > > -CONFIG_NR_DRAM_BANKS=1
> > >
> > > +CONFIG_NR_DRAM_BANKS=32
> > >
> > > CONFIG_ENV_SIZE=0x40000
> > >
> > > CONFIG_ENV_SECT_SIZE=0x40000
> > >
> > > CONFIG_AHCI=y
> > >
> > > *diff --git a/include/asm-generic/u-boot.h
> > b/include/asm-generic/u-boot.h*
> > >
> > > *index 637de0c455..3cf45124ec 100644*
> > >
> > > *--- a/include/asm-generic/u-boot.h*
> > >
> > > *+++ b/include/asm-generic/u-boot.h*
> > >
> > > @@ -71,6 +71,8 @@struct bd_info {
> > >
> > > struct {/* RAM configuration */
> > >
> > > phys_addr_t start;
> > >
> > > phys_size_t size;
> > >
> > > +unsigned numa_node;
> > >
> > > +unsigned pad; /* just to make sure we do not cause alignment */
> > >
> > > } bi_dram[CONFIG_NR_DRAM_BANKS];
> > >
> > > };
> > >
> > > *diff --git a/lib/fdtdec.c b/lib/fdtdec.c*
> > >
> > > *index 4b097fb588..b1934edc8d 100644*
> > >
> > > *--- a/lib/fdtdec.c*
> > >
> > > *+++ b/lib/fdtdec.c*
> > >
> > > @@ -1064,77 +1064,101 @@int fdtdec_decode_display_timing(const void
> > > *blob, int parent, int index,
> > >
> > > return ret;
> > >
> > > }
> > >
> > > +ofnode get_next_memory_node(ofnode mem)
> > >
> > > +{
> > >
> > > +do {
> > >
> > > +mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
> > >
> > > +} while (!ofnode_is_available(mem));
> > >
> > > +
> > >
> > > +return mem;
> > >
> > > +}
> > >
> > > +
> > >
> > > int fdtdec_setup_mem_size_base(void)
> > >
> > > {
> > >
> > > int ret;
> > >
> > > +int reg;
> > >
> > > ofnode mem;
> > >
> > > struct resource res;
> > >
> > > +phys_addr_t base = ~0;
> > >
> > > +phys_size_t size = 0;;
> > >
> > > - mem = ofnode_path("/memory");
> > >
> > > - if (!ofnode_valid(mem)) {
> > >
> > > - debug("%s: Missing /memory node\n", __func__);
> > >
> > > - return -EINVAL;
> > >
> > > - }
> > >
> > > +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> > > get_next_memory_node(mem)) {
> > >
> > > - ret = ofnode_read_resource(mem, 0, &res);
> > >
> > > - if (ret != 0) {
> > >
> > > - debug("%s: Unable to decode first memory bank\n", __func__);
> > >
> > > - return -EINVAL;
> > >
> > > +for(reg = 0, ret = 0; ret == 0 ; reg++) {
> > >
> > > +ret = ofnode_read_resource(mem, reg, &res);
> > >
> > > +if (ret != 0)
> > >
> > > +break;
> > >
> > > +if ((phys_addr_t)res.start < base)
> > >
> > > +base = (phys_addr_t)res.start;
> > >
> > > +size += (phys_size_t)(res.end - res.start + 1);
> > >
> > > +}
> > >
> > > }
> > >
> > > +
> > >
> > > +gd->ram_base = (unsigned long)base;
> > >
> > > +gd->ram_size = (phys_size_t)size;
> > >
> > > - gd->ram_size = (phys_size_t)(res.end - res.start + 1);
> > >
> > > - gd->ram_base = (unsigned long)res.start;
> > >
> > > +debug("%s: Initial DRAM base %llx\n", __func__,
> > >
> > > +(unsigned long long)gd->ram_base);
> > >
> > > debug("%s: Initial DRAM size %llx\n", __func__,
> > >
> > > (unsigned long long)gd->ram_size);
> > >
> > > return 0;
> > >
> > > }
> > >
> > > -ofnode get_next_memory_node(ofnode mem)
> > >
> > > -{
> > >
> > > - do {
> > >
> > > - mem = ofnode_by_prop_value(mem, "device_type", "memory", 7);
> > >
> > > - } while (!ofnode_is_available(mem));
> > >
> > > -
> > >
> > > - return mem;
> > >
> > > -}
> > >
> > > -
> > >
> > > int fdtdec_setup_memory_banksize(void)
> > >
> > > {
> > >
> > > int bank, ret, reg = 0;
> > >
> > > struct resource res;
> > >
> > > ofnode mem = ofnode_null();
> > >
> > > +const __be32* numa_node_prop = NULL;
> > >
> > > +int len;
> > >
> > > +int numa_node = -1;
> > >
> > > +int count = 0;
> > >
> > > +
> > >
> > > +for (mem = get_next_memory_node(mem); ofnode_valid(mem); mem =
> > > get_next_memory_node(mem)) {
> > >
> > > - mem = get_next_memory_node(mem);
> > >
> > > - if (!ofnode_valid(mem)) {
> > >
> > > - debug("%s: Missing /memory node\n", __func__);
> > >
> > > - return -EINVAL;
> > >
> > > +count++;
> > >
> > > +
> > >
> > > +numa_node_prop = ofnode_get_property(mem, "numa-node-id", &len);
> > >
> > > +if (numa_node_prop != NULL && len == sizeof(__be32)) {
> > >
> > > +numa_node = of_read_number(numa_node_prop, 1);
> > >
> > > }
> > >
> > > +else numa_node = 0;
> > >
> > > - for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > >
> > > - ret = ofnode_read_resource(mem, reg++, &res);
> > >
> > > - if (ret < 0) {
> > >
> > > - reg = 0;
> > >
> > > - mem = get_next_memory_node(mem);
> > >
> > > - if (!ofnode_valid(mem))
> > >
> > > - break;
> > >
> > > +debug("Found memory for node %d\n", numa_node);
> > >
> > > - ret = ofnode_read_resource(mem, reg++, &res);
> > >
> > > - if (ret < 0)
> > >
> > > - break;
> > >
> > > - }
> > >
> > > +ret = 0;
> > >
> > > +for(reg = 0; ret == 0 && bank < CONFIG_NR_DRAM_BANKS; reg++) {
> > >
> > > +ret = ofnode_read_resource(mem, reg, &res);
> > >
> > > if (ret != 0)
> > >
> > > - return -EINVAL;
> > >
> > > +break;
> > >
> > > gd->bd->bi_dram[bank].start = (phys_addr_t)res.start;
> > >
> > > gd->bd->bi_dram[bank].size =
> > >
> > > (phys_size_t)(res.end - res.start + 1);
> > >
> > > +gd->bd->bi_dram[bank].numa_node = numa_node;
> > >
> > > - debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx\n",
> > >
> > > +debug("%s: DRAM Bank #%d: start = 0x%llx, size = 0x%llx"
> > >
> > > +" name_node = %d\n",
> > >
> > > __func__, bank,
> > >
> > > (unsigned long long)gd->bd->bi_dram[bank].start,
> > >
> > > - (unsigned long long)gd->bd->bi_dram[bank].size);
> > >
> > > +(unsigned long long)gd->bd->bi_dram[bank].size,
> > >
> > > +gd->bd->bi_dram[bank].numa_node);
> > >
> > > +
> > >
> > > +bank++;
> > >
> > > +}
> > >
> > > +
> > >
> > > +
> > >
> > > +}
> > >
> > > +
> > >
> > > +if (count == 0) {
> > >
> > > +debug("%s: Missing /memory node\n", __func__);
> > >
> > > +return -EINVAL;
> > >
> > > +}
> > >
> > > +if (bank >= CONFIG_NR_DRAM_BANKS) {
> > >
> > > +printf("Too many DT memory nodes for CONFIG_NR_DRAM_BANKS=%d\n",
> > >
> > > +CONFIG_NR_DRAM_BANKS);
> > >
> > > }
> > >
> > > return 0;
> > >
> > >
> > > On Wed, 7 Jul 2021 at 13:00, François Ozog <francois.ozog@linaro.org
> > > <mailto:francois.ozog@linaro.org>> wrote:
> > >
> > >
> > >
> > >     On Wed, 7 Jul 2021 at 12:16, AKASHI Takahiro
> > >     <takahiro.akashi@linaro.org <mailto:takahiro.akashi@linaro.org>>
> > wrote:
> > >
> > >         On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> > >          > On Wed, 7 Jul 2021 at 09:40, François Ozog
> > >         <francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>>
> > wrote:
> > >          >
> > >          > > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt
> > >         <xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>
> > >          > > wrote:
> > >          > > >
> > >          > > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt
> > <
> > >          > > xypron.glpk@gmx.de <mailto:xypron.glpk@gmx.de>>:
> > >          > > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > >          > > > ><takahiro.akashi@linaro.org
> > >         <mailto:takahiro.akashi@linaro.org>>:
> > >          > > > >>François,
> > >          > > > >>
> > >          > > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich
> > >         Schuchardt wrote:
> > >          > > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > >          > > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into
> > >         account memory
> > >          > > > >>> > description when using Qemu 5.2 NUMA configuration
> > >         to adapt memory
> > >          > > > >>map
> > >          > > > >>> > (kernel_addr_r...):
> > >          > > > >>> >
> > >          > > > >>> >         -smp 4 \
> > >          > > > >>> >           -m 8G,slots=2,maxmem=16G \
> > >          > > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > >          > > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > >          > > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > >          > > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > >          > > > >>> >
> > >          > > > >>> > kernel_addr_r is still 0x4040000 and thus you can't
> > >         use it to
> > >          > > > >>bootefi.
> > >          > > > >>> >
> > >          > > > >>> > fdt addr 0x13ede6de0; fdt print
> > >          > > > >>> >
> > >          > > > >>> > Displays fdt while I think it should not.
> > >          > > > >>> >
> > >          > > > >>> > If I load the kernel at dram.start, the load works
> > >         but not boot
> > >          > > > >>> >
> > >          > > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > >          > > > >>> >
> > >          > > > >>> >
> > >          > > > >>> > DRAM:4 GiB
> > >          > > > >>> >
> > >          > > > >>> > Flash: 64 MiB
> > >          > > > >>> >
> > >          > > > >>> > Loading Environment from Flash... OK
> > >          > > > >>> >
> > >          > > > >>> > In:pl011@9000000
> > >          > > > >>> >
> > >          > > > >>> > Out: pl011@9000000
> > >          > > > >>> >
> > >          > > > >>> > Err: pl011@9000000
> > >          > > > >>> >
> > >          > > > >>> > Net: eth0: virtio-net#32
> > >          > > > >>> >
> > >          > > > >>> > Hit any key to stop autoboot:0
> > >          > > > >>> >
> > >          > > > >>> > =>
> > >          > > > >>> >
> > >          > > > >>> > => bdinfo
> > >          > > > >>> >
> > >          > > > >>> > boot_params = 0x0000000000000000
> > >          > > > >>> >
> > >          > > > >>> > DRAM bank = 0x0000000000000000
> > >          > > > >>> >
> > >          > > > >>> > -> start= 0x0000000140000000
> > >          > > > >>> >
> > >          > > > >>> > -> size = 0x0000000100000000
> > >          > > > >>> >
> > >          > > > >>> > flashstart= 0x0000000000000000
> > >          > > > >>> >
> > >          > > > >>> > flashsize = 0x0000000004000000
> > >          > > > >>> >
> > >          > > > >>> > flashoffset = 0x00000000000bc990
> > >          > > > >>> >
> > >          > > > >>> > baudrate= 115200 bps
> > >          > > > >>> >
> > >          > > > >>> > relocaddr = 0x000000013ff27000
> > >          > > > >>> >
> > >          > > > >>> > reloc off = 0x000000013ff27000
> > >          > > > >>> >
> > >          > > > >>> > Build = 64-bit
> > >          > > > >>> >
> > >          > > > >>> > current eth = virtio-net#32
> > >          > > > >>> >
> > >          > > > >>> > ethaddr = 52:52:52:52:52:52
> > >          > > > >>> >
> > >          > > > >>> > IP addr = <NULL>
> > >          > > > >>> >
> > >          > > > >>> > fdt_blob= 0x000000013ede6de0
> > >          > > > >>> >
> > >          > > > >>> > new_fdt = 0x000000013ede6de0
> > >          > > > >>> >
> > >          > > > >>> > fdt_size= 0x0000000000100000
> > >          > > > >>> >
> > >          > > > >>> > lmb_dump_all:
> > >          > > > >>> >
> > >          > > > >>> > memory.cnt= 0x1
> > >          > > > >>> >
> > >          > > > >>> > memory.reg[0x0].base = 0x140000000
> > >          > > > >>> >
> > >          > > > >>> > .size = 0x100000000
> > >          > > > >>> >
> > >          > > > >>> >
> > >          > > > >>> > reserved.cnt= 0x0
> > >          > > > >>> >
> > >          > > > >>> > arch_number = 0x0000000000000000
> > >          > > > >>> >
> > >          > > > >>> > TLB addr= 0x000000013fff0000
> > >          > > > >>> >
> > >          > > > >>> > irq_sp= 0x000000013ede6dd0
> > >          > > > >>> >
> > >          > > > >>> > sp start= 0x000000013ede6dd0
> > >          > > > >>> >
> > >          > > > >>> > Early malloc usage: 3a8 / 2000
> > >          > > > >>> >
> > >          > > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > >          > > > >>> >
> > >          > > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > >          > > > >>> >
> > >          > > > >>> > => bootefi0x140000000 0x13ede6dd0
> > >          > > > >>> >
> > >          > > > >>> > ERROR: Failed to register WaitForKey event
> > >          > > > >>> >
> > >          > > > >>> > Setting OsIndications failed
> > >          > > > >>> >
> > >          > > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > >          > > > >>> >
> > >          > > > >>> >
> > >          > > > >>> > I think there is a need to calculate memory map
> > >         based on previous
> > >          > > > >>> > firmware (TFA, QEMU can be considered as previous
> > >         frimware)
> > >          > > > >>information
> > >          > > > >>> > (DT or blob_list).
> > >          > > > >>> >
> > >          > > > >>> > What do you think ?
> > >          > > > >>> >
> > >          > > > >>> > Cheers
> > >          > > > >>> >
> > >          > > > >>> > FF
> > >          > > > >>> >
> > >          > > > >>> > --
> > >          > > > >>> >
> > >          > > > >>> > François-Frédéric Ozog | /Director Business
> > >         Development/
> > >          > > > >>> > T: +33.67221.6485
> > >          > > > >>> > francois.ozog@linaro.org
> > >         <mailto:francois.ozog@linaro.org>
> > >         <mailto:francois.ozog@linaro.org <mailto:
> > francois.ozog@linaro.org>>
> > >          > > > >>| Skype: ffozog
> > >          > > > >>> >
> > >          > > > >>> >
> > >          > > > >>>
> > >          > > > >>> The kernel load address is hard coded here:
> > >          > > > >>> include/configs/qemu-arm.h:41:
> > >         "kernel_addr_r=0x40400000\0" \
> > >          > > > >>>
> > >          > > > >>> bdinfo shows:
> > >          > > > >>> DRAM start = 0x140000000
> > >          > > > >>> DRAM size  = 0x100000000
> > >          > > > >>>
> > >          > > > >>> fdt addr $fdt_addr
> > >          > > > >>> fdt printf
> > >          > > > >>>
> > >          > > > >>> shows two memory areas. One at 40000000, one at
> > >         140000000.
> > >          > > > >>
> > >          > > > >>(This shows that U-Boot receives a correct memory map
> > >         via dtb.)
> > >          > > > >>
> > >          > > > >>Is this a NUMA machine, isn't it? Why should we care of
> > >         which
> > >          > > > >>memory region be used here? Please note that this is a
> > >         virtual
> > >          > > > >machine,
> > >          > > > >>there is no practical difference between two regions.
> > >          > > > >>
> > >          > > > >>The root problem is that U-Boot did not recognize there
> > >         were two
> > >          > > > >>memory regions. We can fix this issue in either way:
> > >          > > > >>
> > >          > > > >>1)
> > >          > > > >>diff --git a/configs/qemu_arm64_defconfig
> > >          > > > >>b/configs/qemu_arm64_defconfig
> > >          > > > >>index f6e586627a8e..b70ffae8bf6e 100644
> > >          > > > >>--- a/configs/qemu_arm64_defconfig
> > >          > > > >>+++ b/configs/qemu_arm64_defconfig
> > >          > > > >>@@ -1,7 +1,7 @@
> > >          > > > >> CONFIG_ARM=y
> > >          > > > >> CONFIG_POSITION_INDEPENDENT=y
> > >          > > > >> CONFIG_ARCH_QEMU=y
> > >          > > > >>-CONFIG_NR_DRAM_BANKS=1
> > >          > > > >>+CONFIG_NR_DRAM_BANKS=2
> > >          > > > >> CONFIG_ENV_SIZE=0x40000
> > >          > > > >> CONFIG_ENV_SECT_SIZE=0x40000
> > >          > > > >> CONFIG_AHCI=y
> > >          > > > >>
> > >          > > > >>2)
> > >          > > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > >          > > > >>index 4b097fb588ed..4067ea2dead6 100644
> > >          > > > >>--- a/lib/fdtdec.c
> > >          > > > >>+++ b/lib/fdtdec.c
> > >          > > > >>@@ -1111,7 +1111,7 @@ int
> > >         fdtdec_setup_memory_banksize(void)
> > >          > > > >>                return -EINVAL;
> > >          > > > >>        }
> > >          > > > >>
> > >          > > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS;
> > >         bank++) {
> > >          > > > >>+       for (bank = 0; ; bank++) {
> > >          > > > >>                ret = ofnode_read_resource(mem, reg++,
> > >         &res);
> > >          > > > >>                if (ret < 0) {
> > >          > > > >>                        reg = 0;
> > >          > > > >>
> > >          > > > >>   (fdtdec_setup_memory_banksize() is called in
> > >         dram_init_banksize().)
> > >          > > > >>
> > >          > > > >>
> > >          > > > >>(2) seems much better, but I don't know why we had to
> > use
> > >          > > > >>CONFIG_NR_DRAM_BANKS here.
> > >          > > > >>
> > >          > >
> > >          > > 2) alone does not work as other places in the code refer to
> > >          > > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code
> > >         work and
> > >          > > bdinfo seems now correct:
> > >          > >
> > >          > => bdinfo
> > >          > > boot_params = 0x0000000000000000
> > >          > > DRAM bank   = 0x0000000000000000
> > >          > > -> start    = 0x0000000140000000
> > >          > > -> size     = 0x0000000100000000
> > >          > > DRAM bank   = 0x0000000000000001
> > >          > > -> start    = 0x0000000040000000
> > >          > > -> size     = 0x0000000100000000
> > >          > > flashstart  = 0x0000000000000000
> > >          > > flashsize   = 0x0000000004000000
> > >          > > flashoffset = 0x00000000000bcb88
> > >          > > baudrate    = 115200 bps
> > >          > > relocaddr   = 0x000000013ff27000
> > >          > > reloc off   = 0x000000013ff27000
> > >          > > Build       = 64-bit
> > >          > > current eth = virtio-net#32
> > >          > > ethaddr     = 52:52:52:52:52:52
> > >          > > IP addr     = <NULL>
> > >          > > fdt_blob    = 0x000000013ede6cf0
> > >          > > new_fdt     = 0x000000013ede6cf0
> > >          > > fdt_size    = 0x0000000000100000
> > >          > > lmb_dump_all:
> > >          > >     memory.cnt   = 0x1
> > >          > >     memory.reg[0x0].base   = 0x40000000
> > >          > >   .size   = 0x200000000
> > >          > >     reserved.cnt   = 0x1
> > >          > >     reserved.reg[0x0].base = 0x13ede58f0
> > >          > >     .size = 0x121a710
> > >          > > arch_number = 0x0000000000000000
> > >          > > TLB addr    = 0x000000013fff0000
> > >          > > irq_sp      = 0x000000013ede6ce0
> > >          > > sp start    = 0x000000013ede6ce0
> > >          > > Early malloc usage: 3a8 / 2000
> > >          > >
> > >          > > May I suggest you propose a combined patch Akashi-san? If
> > >         we assume
> > >          > > NUMA systems to be tested up to 8 nodes to mimic real
> > existing
> > >          > > enterprise hardware and up to 4 memory slots (say for
> > >         memory hot
> > >          > > plugging tests) what about a default value of 32?
> > >         Alternatively, we
> > >          > > could set this value to a much higher one if the costs are
> > >         negligible.
> > >          > >
> > >          > >
> > >          > > Well, lets not rush as there are other twists:
> > >          >
> > >          > the 4G bank in node 1 is marked BootServicesData in the UEFI
> > >         GetMemoryMap
> > >          > which I assume is not the case. EDK2 reports it as
> > >         ConventionalMemory.
> > >          >
> > >          > The root cause seem to be gd->ramtop not being setup properly.
> > >          >
> > >          > Further analysis shows that the DT passed to the booted EFI
> > >         payload does
> > >          > not seem to be correct:
> > >          >
> > >          > DT fragment passed to U-Boot
> > >          >
> > >          > memory@140000000 {
> > >          > numa-node-id = <0x00000001>;
> > >          > reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> > >          > device_type = "memory";
> > >          > };
> > >          > memory@40000000 {
> > >          > numa-node-id = <0x00000000>;
> > >          > reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> > >          > device_type = "memory";
> > >          > };
> > >          >
> > >          > DT passed to payload (as per my debug code):
> > >          >
> > >          > memory@140000000: memory
> > >          >
> > >          >     numa-node-id 1
> > >          >
> > >          >     reg (len= 32)
> > >          >
> > >          >          140000000 100000000
> > >          >
> > >          >          40000000 100000000
> > >          >
> > >          > memory@40000000: memory
> > >          >
> > >          >     numa-node-id 0
> > >          >
> > >          >     reg (len= 16)
> > >          >
> > >          >          40000000 100000000
> > >          >
> > >          > I am investigating this further...
> > >
> > >         You should check the logic of fdt_fixup_memory_banks()
> > >         which is called this way:
> > >            efi_dt_fixup()
> > >              image_setup_libfdt()
> > >                arch_fixup_fdt()
> > >                  fdt_fixup_memory_banks()
> > >
> > >         What it does is to put *all* the memory regions unconditionally
> > as
> > >         a single "reg" array into the *first-detected* "memory" node,
> > >         which is
> > >         "memory@140000000" in this case.
> > >         It means that this function doesn't respect NUMA configuration.
> > >
> > >     Thanks.
> > >
> > >     tweaking ram_top to be the correct in
> > >
> > https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
> > >     <
> > https://elixir.bootlin.com/u-boot/latest/source/lib/efi_loader/efi_memory.c#L732
> > >
> > >     results in memory on node 1 to be considered as ConventionalMemory
> > >     but U-Boot places all code and data at the end of node 1 while it
> > >     should position it on the current node. That said it is an
> > >     acceptable work around for my test case.
> > >
> > >     Bottom line, we need to introduce NUMA node management in memory
> > >     management all over the place.
> > >     It is unclear if there is a business case for that. I'll ask LEDGE
> > >     members...
> > >
> > >         -Takahiro Akashi
> > >
> > >
> > >          > > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS
> > >         in this file
> > >          > > > >>should be replaced with a variable for it.
> > >          > > > >>
> > >          > > > >>> Your use case is well beyond the typical U-Boot
> > >         usage. So I guess it
> > >          > > > >>> will be up to Linaro to provide the necessary patches:
> > >          > > > >>>
> > >          > > > >>> * determine the active CPU
> > >          > > > >>> * determine the RAM assigned to the active CPU
> > according
> > >          > > > >>>   to the numa-node-id in the device-tree
> > >          > > > >>> * make sure that U-Boot only uses the memory of the
> > >         active CPU
> > >          > > > >>>   internally
> > >          > > > >>> * make sure that the UEFI memory map contains a
> > compliant
> > >          > > > >description
> > >          > > > >>> * possibly, dynamically set up the environment
> > variables
> > >          > > > >>>
> > >          > > > >>> +CC Tuomas Tynkkynen (maintainer for
> > >         qemu_arm64_defconfig)
> > >          > > > >>
> > >          > > > >>For (1), we'd better have a different config, or
> > increase
> > >          > > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > >          > > > >
> > >          > > > >Is the system configured such that each CPU can access
> > >         the others CPU's
> > >          > > > >RAM when entering U-Boot?
> > >          > > > >
> > >          > > > >Best regards
> > >          > > > >
> > >          > > > >Heinrich
> > >          > > > >
> > >          > > >
> > >          > > > At least the comments for this patch sound as if on a
> > >         physical system
> > >          > > cross NUMA node memory access is only available after full
> > SMP
> > >          > > initialization:
> > >          > > >
> > >          > > >
> > >          > >
> > >
> > https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > >         <
> > https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > >
> > >          > > >
> > >          > > > QEMU may be less restrictive.
> > >          > > >
> > >          > > > QEMU allows the node distance to be 255 indicating that
> > >         cross node
> > >          > > access is infeasible.
> > >          > > >
> > >          > > > Best regards
> > >          > > >
> > >          > > > Heinrich
> > >          > > >
> > >          > > > >>
> > >          > > > >>-Takahiro Akashi
> > >          > > > >>
> > >          > > > >>
> > >          > > > >>> Best regards
> > >          > > > >>>
> > >          > > > >>> Heinrich
> > >          > > >
> > >          > >
> > >          > >
> > >          > > --
> > >          > > François-Frédéric Ozog | Director Business Development
> > >          > > T: +33.67221.6485
> > >          > > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > >         | Skype: ffozog
> > >          > >
> > >          >
> > >          >
> > >          > --
> > >          > François-Frédéric Ozog | *Director Business Development*
> > >          > T: +33.67221.6485
> > >          > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org> |
> > >         Skype: ffozog
> > >
> > >
> > >
> > >     --
> > >
> > >     François-Frédéric Ozog | /Director Business Development/
> > >     T: +33.67221.6485
> > >     francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > >     | Skype: ffozog
> > >
> > >
> > >
> > >
> > > --
> > >
> > > François-Frédéric Ozog | /Director Business Development/
> > > T: +33.67221.6485
> > > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > | Skype: ffozog
> > >
> > >
> >
> -- 
> François-Frédéric Ozog | *Director Business Development*
> T: +33.67221.6485
> francois.ozog@linaro.org | Skype: ffozog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-23  8:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHFG_=WTxz24QAPgfSfS_uz7X6wZMnjqwjTkbi0eO1Vk_TGkiQ@mail.gmail.com>
2021-07-06 18:10 ` QEMU NUMA and U-Boot Heinrich Schuchardt
2021-07-07  1:44   ` AKASHI Takahiro
2021-07-07  3:18     ` Heinrich Schuchardt
2021-07-07  3:58       ` Heinrich Schuchardt
2021-07-07  3:59       ` Heinrich Schuchardt
2021-07-07  7:40         ` François Ozog
2021-07-07  9:37           ` François Ozog
2021-07-07 10:16             ` AKASHI Takahiro
2021-07-07 11:00               ` François Ozog
2021-07-07 15:15                 ` François Ozog
2021-07-07 17:39                   ` Heinrich Schuchardt
2022-03-23  7:29                     ` François Ozog
2022-03-23  8:20                       ` Mark Kettenis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.