QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
@ 2019-08-09  6:57 Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V8 patches link:
https://patchwork.kernel.org/cover/11066983/

Changelog:
v9:
    - change the CLI input way, make it more user firendly (Daniel Black)
    use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
    the base-lat and base-bw input.
v8:
    - rebase to upstream
    - Add check if numa->numa_state is NULL in pxb_dev_realize_common
    - Use nb_nodes in spapr_populate_memory() (RESEND to fix) (Igor)
v7:
    - Defer 11-13 of patch v6, because the driver of _HMA hasn't been
      implemented in kernel driver
    - Drop the HMAT_LB_MEM_CACHE_LAST_LEVEL which is not used in
      ACPI 6.3 (Jonathan)
    - Add bit mask in flags of hmat-lb (Jonathan)
    - Add a marco to indicate the type is latency or bandwidth (Jonathan)

Liu Jingqi (5):
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)
  numa: Extend the CLI to provide memory latency and bandwidth
    information
  numa: Extend the CLI to provide memory side cache information

Tao Xu (6):
  hw/arm: simplify arm_load_dtb
  numa: move numa global variable nb_numa_nodes into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable numa_info into MachineState
  numa: Extend CLI to provide initiator information for numa nodes
  tests/bios-tables-test: add test cases for ACPI HMAT

 exec.c                              |   5 +-
 hw/acpi/Kconfig                     |   5 +
 hw/acpi/Makefile.objs               |   1 +
 hw/acpi/aml-build.c                 |   9 +-
 hw/acpi/hmat.c                      | 256 +++++++++++++++++++++
 hw/acpi/hmat.h                      | 106 +++++++++
 hw/arm/aspeed.c                     |   5 +-
 hw/arm/boot.c                       |  20 +-
 hw/arm/collie.c                     |   8 +-
 hw/arm/cubieboard.c                 |   5 +-
 hw/arm/exynos4_boards.c             |   7 +-
 hw/arm/highbank.c                   |   8 +-
 hw/arm/imx25_pdk.c                  |   5 +-
 hw/arm/integratorcp.c               |   8 +-
 hw/arm/kzm.c                        |   5 +-
 hw/arm/mainstone.c                  |   5 +-
 hw/arm/mcimx6ul-evk.c               |   5 +-
 hw/arm/mcimx7d-sabre.c              |   5 +-
 hw/arm/musicpal.c                   |   8 +-
 hw/arm/nseries.c                    |   5 +-
 hw/arm/omap_sx1.c                   |   5 +-
 hw/arm/palm.c                       |  10 +-
 hw/arm/raspi.c                      |   6 +-
 hw/arm/realview.c                   |   5 +-
 hw/arm/sabrelite.c                  |   5 +-
 hw/arm/sbsa-ref.c                   |  12 +-
 hw/arm/spitz.c                      |   5 +-
 hw/arm/tosa.c                       |   8 +-
 hw/arm/versatilepb.c                |   5 +-
 hw/arm/vexpress.c                   |   5 +-
 hw/arm/virt-acpi-build.c            |  19 +-
 hw/arm/virt.c                       |  17 +-
 hw/arm/xilinx_zynq.c                |   8 +-
 hw/arm/xlnx-versal-virt.c           |   7 +-
 hw/arm/xlnx-zcu102.c                |   5 +-
 hw/arm/z2.c                         |   8 +-
 hw/core/machine-hmp-cmds.c          |  12 +-
 hw/core/machine.c                   |  38 ++-
 hw/core/numa.c                      | 345 +++++++++++++++++++++++++---
 hw/i386/acpi-build.c                |   7 +-
 hw/i386/pc.c                        |  13 +-
 hw/mem/pc-dimm.c                    |   2 +
 hw/pci-bridge/pci_expander_bridge.c |   8 +-
 hw/ppc/spapr.c                      |  29 +--
 hw/ppc/spapr_pci.c                  |   4 +-
 include/hw/acpi/aml-build.h         |   2 +-
 include/hw/arm/boot.h               |   4 +-
 include/hw/boards.h                 |   1 +
 include/qemu/typedefs.h             |   2 +
 include/sysemu/numa.h               |  30 ++-
 include/sysemu/sysemu.h             |  23 ++
 qapi/machine.json                   | 178 +++++++++++++-
 qemu-options.hx                     |  83 ++++++-
 tests/bios-tables-test.c            |  43 ++++
 54 files changed, 1189 insertions(+), 246 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-13 21:55   ` Alistair Francis
  2019-08-13 21:55   ` Eduardo Habkost
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

In struct arm_boot_info, kernel_filename, initrd_filename and
kernel_cmdline are copied from from MachineState. This patch add
MachineState as a parameter into arm_load_dtb() and move the copy chunk
of kernel_filename, initrd_filename and kernel_cmdline into
arm_load_kernel().

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 hw/arm/aspeed.c           |  5 +----
 hw/arm/boot.c             | 14 ++++++++------
 hw/arm/collie.c           |  8 +-------
 hw/arm/cubieboard.c       |  5 +----
 hw/arm/exynos4_boards.c   |  7 ++-----
 hw/arm/highbank.c         |  8 +-------
 hw/arm/imx25_pdk.c        |  5 +----
 hw/arm/integratorcp.c     |  8 +-------
 hw/arm/kzm.c              |  5 +----
 hw/arm/mainstone.c        |  5 +----
 hw/arm/mcimx6ul-evk.c     |  5 +----
 hw/arm/mcimx7d-sabre.c    |  5 +----
 hw/arm/musicpal.c         |  8 +-------
 hw/arm/nseries.c          |  5 +----
 hw/arm/omap_sx1.c         |  5 +----
 hw/arm/palm.c             | 10 ++--------
 hw/arm/raspi.c            |  6 +-----
 hw/arm/realview.c         |  5 +----
 hw/arm/sabrelite.c        |  5 +----
 hw/arm/sbsa-ref.c         |  3 +--
 hw/arm/spitz.c            |  5 +----
 hw/arm/tosa.c             |  8 +-------
 hw/arm/versatilepb.c      |  5 +----
 hw/arm/vexpress.c         |  5 +----
 hw/arm/virt.c             |  8 +++-----
 hw/arm/xilinx_zynq.c      |  8 +-------
 hw/arm/xlnx-versal-virt.c |  7 ++-----
 hw/arm/xlnx-zcu102.c      |  5 +----
 hw/arm/z2.c               |  8 +-------
 include/hw/arm/boot.h     |  4 ++--
 30 files changed, 43 insertions(+), 147 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 843b708247..f8733b86b9 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -241,9 +241,6 @@ static void aspeed_board_init(MachineState *machine,
         write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
     }
 
-    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
-    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
-    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
     aspeed_board_binfo.ram_size = ram_size;
     aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM];
     aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus;
@@ -252,7 +249,7 @@ static void aspeed_board_init(MachineState *machine,
         cfg->i2c_init(bmc);
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
 }
 
 static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index c2b89b3bb9..ba604f8277 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -524,7 +524,7 @@ static void fdt_add_psci_node(void *fdt)
 }
 
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                 hwaddr addr_limit, AddressSpace *as)
+                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
 {
     void *fdt = NULL;
     int size, rc, n = 0;
@@ -627,9 +627,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         qemu_fdt_add_subnode(fdt, "/chosen");
     }
 
-    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
+    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
         rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
-                                     binfo->kernel_cmdline);
+                                     ms->kernel_cmdline);
         if (rc < 0) {
             fprintf(stderr, "couldn't set /chosen/bootargs\n");
             goto fail;
@@ -1261,7 +1261,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
      */
 }
 
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
 {
     CPUState *cs;
     AddressSpace *as = arm_boot_address_space(cpu, info);
@@ -1282,7 +1282,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
      * doesn't support secure.
      */
     assert(!(info->secure_board_setup && kvm_enabled()));
-
+    info->kernel_filename = ms->kernel_filename;
+    info->kernel_cmdline = ms->kernel_cmdline;
+    info->initrd_filename = ms->initrd_filename;
     info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
     info->dtb_limit = 0;
 
@@ -1294,7 +1296,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     }
 
     if (!info->skip_dtb_autoload && have_dtb(info)) {
-        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
+        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
             exit(1);
         }
     }
diff --git a/hw/arm/collie.c b/hw/arm/collie.c
index 3db3c56004..72bc8f26e5 100644
--- a/hw/arm/collie.c
+++ b/hw/arm/collie.c
@@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
 
 static void collie_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     StrongARMState *s;
     DriveInfo *dinfo;
     MemoryRegion *sysmem = get_system_memory();
@@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
 
     sysbus_create_simple("scoop", 0x40800000, NULL);
 
-    collie_binfo.kernel_filename = kernel_filename;
-    collie_binfo.kernel_cmdline = kernel_cmdline;
-    collie_binfo.initrd_filename = initrd_filename;
     collie_binfo.board_id = 0x208;
-    arm_load_kernel(s->cpu, &collie_binfo);
+    arm_load_kernel(s->cpu, machine, &collie_binfo);
 }
 
 static void collie_machine_init(MachineClass *mc)
diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index f7c8a5985a..d992fa087a 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -72,10 +72,7 @@ static void cubieboard_init(MachineState *machine)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     cubieboard_binfo.ram_size = machine->ram_size;
-    cubieboard_binfo.kernel_filename = machine->kernel_filename;
-    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
-    cubieboard_binfo.initrd_filename = machine->initrd_filename;
-    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
+    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
 }
 
 static void cubieboard_machine_init(MachineClass *mc)
diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
index ac0b0dc2a9..da402d5216 100644
--- a/hw/arm/exynos4_boards.c
+++ b/hw/arm/exynos4_boards.c
@@ -120,9 +120,6 @@ exynos4_boards_init_common(MachineState *machine,
     exynos4_board_binfo.board_id = exynos4_board_id[board_type];
     exynos4_board_binfo.smp_bootreg_addr =
             exynos4_board_smp_bootreg_addr[board_type];
-    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
-    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
-    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
     exynos4_board_binfo.gic_cpu_if_addr =
             EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
 
@@ -141,7 +138,7 @@ static void nuri_init(MachineState *machine)
 {
     exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
 
-    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
 }
 
 static void smdkc210_init(MachineState *machine)
@@ -151,7 +148,7 @@ static void smdkc210_init(MachineState *machine)
 
     lan9215_init(SMDK_LAN9118_BASE_ADDR,
             qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
-    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
 }
 
 static void nuri_class_init(ObjectClass *oc, void *data)
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index def0f1ce6a..1a35b6d82f 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -234,9 +234,6 @@ enum cxmachines {
 static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     DeviceState *dev = NULL;
     SysBusDevice *busdev;
     qemu_irq pic[128];
@@ -388,9 +385,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     highbank_binfo.ram_size = ram_size;
-    highbank_binfo.kernel_filename = kernel_filename;
-    highbank_binfo.kernel_cmdline = kernel_cmdline;
-    highbank_binfo.initrd_filename = initrd_filename;
     /* highbank requires a dtb in order to boot, and the dtb will override
      * the board ID. The following value is ignored, so set it to -1 to be
      * clear that the value is meaningless.
@@ -410,7 +404,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
                     "may not boot.");
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
 }
 
 static void highbank_init(MachineState *machine)
diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
index 5d673e47bc..c76fc2bd94 100644
--- a/hw/arm/imx25_pdk.c
+++ b/hw/arm/imx25_pdk.c
@@ -116,9 +116,6 @@ static void imx25_pdk_init(MachineState *machine)
     }
 
     imx25_pdk_binfo.ram_size = machine->ram_size;
-    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
-    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
-    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
     imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
     imx25_pdk_binfo.board_id = 1771,
     imx25_pdk_binfo.nb_cpus = 1;
@@ -129,7 +126,7 @@ static void imx25_pdk_init(MachineState *machine)
      * fail.
      */
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
+        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
     }
 }
 
diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index 200c0107f0..4d9e9c9e49 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -578,9 +578,6 @@ static struct arm_boot_info integrator_binfo = {
 static void integratorcp_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     Object *cpuobj;
     ARMCPU *cpu;
     MemoryRegion *address_space_mem = get_system_memory();
@@ -650,10 +647,7 @@ static void integratorcp_init(MachineState *machine)
     sysbus_create_simple("pl110", 0xc0000000, pic[22]);
 
     integrator_binfo.ram_size = ram_size;
-    integrator_binfo.kernel_filename = kernel_filename;
-    integrator_binfo.kernel_cmdline = kernel_cmdline;
-    integrator_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(cpu, &integrator_binfo);
+    arm_load_kernel(cpu, machine, &integrator_binfo);
 }
 
 static void integratorcp_machine_init(MachineClass *mc)
diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
index 59d2102dc5..5ff419a555 100644
--- a/hw/arm/kzm.c
+++ b/hw/arm/kzm.c
@@ -126,13 +126,10 @@ static void kzm_init(MachineState *machine)
     }
 
     kzm_binfo.ram_size = machine->ram_size;
-    kzm_binfo.kernel_filename = machine->kernel_filename;
-    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
-    kzm_binfo.initrd_filename = machine->initrd_filename;
     kzm_binfo.nb_cpus = 1;
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
+        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
     }
 }
 
diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index cd1f904c6c..c76cfb5dd1 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
     smc91c111_init(&nd_table[0], MST_ETH_PHYS,
                     qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
 
-    mainstone_binfo.kernel_filename = machine->kernel_filename;
-    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
-    mainstone_binfo.initrd_filename = machine->initrd_filename;
     mainstone_binfo.board_id = arm_id;
-    arm_load_kernel(mpu->cpu, &mainstone_binfo);
+    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
 }
 
 static void mainstone_init(MachineState *machine)
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index 1f6f4aed97..e8a9b03069 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -39,9 +39,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
         .loader_start = FSL_IMX6UL_MMDC_ADDR,
         .board_id = -1,
         .ram_size = machine->ram_size,
-        .kernel_filename = machine->kernel_filename,
-        .kernel_cmdline = machine->kernel_cmdline,
-        .initrd_filename = machine->initrd_filename,
         .nb_cpus = machine->smp.cpus,
     };
 
@@ -71,7 +68,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
     }
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu, &boot_info);
+        arm_load_kernel(&s->soc.cpu, machine, &boot_info);
     }
 }
 
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index 72eab03a0c..3123d8767f 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -42,9 +42,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
         .loader_start = FSL_IMX7_MMDC_ADDR,
         .board_id = -1,
         .ram_size = machine->ram_size,
-        .kernel_filename = machine->kernel_filename,
-        .kernel_cmdline = machine->kernel_cmdline,
-        .initrd_filename = machine->initrd_filename,
         .nb_cpus = machine->smp.cpus,
     };
 
@@ -74,7 +71,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
     }
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu[0], &boot_info);
+        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
     }
 }
 
diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 95d56f3208..a53ee12737 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1568,9 +1568,6 @@ static struct arm_boot_info musicpal_binfo = {
 
 static void musicpal_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     ARMCPU *cpu;
     qemu_irq pic[32];
     DeviceState *dev;
@@ -1699,10 +1696,7 @@ static void musicpal_init(MachineState *machine)
     sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
 
     musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
-    musicpal_binfo.kernel_filename = kernel_filename;
-    musicpal_binfo.kernel_cmdline = kernel_cmdline;
-    musicpal_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(cpu, &musicpal_binfo);
+    arm_load_kernel(cpu, machine, &musicpal_binfo);
 }
 
 static void musicpal_machine_init(MachineClass *mc)
diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
index 4a79f5c88b..31dd2f1b51 100644
--- a/hw/arm/nseries.c
+++ b/hw/arm/nseries.c
@@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
 
     if (machine->kernel_filename) {
         /* Or at the linux loader.  */
-        binfo->kernel_filename = machine->kernel_filename;
-        binfo->kernel_cmdline = machine->kernel_cmdline;
-        binfo->initrd_filename = machine->initrd_filename;
-        arm_load_kernel(s->mpu->cpu, binfo);
+        arm_load_kernel(s->mpu->cpu, machine, binfo);
 
         qemu_register_reset(n8x0_boot_init, s);
     }
diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index cae78d0a36..3cc2817f06 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
     }
 
     /* Load the kernel.  */
-    sx1_binfo.kernel_filename = machine->kernel_filename;
-    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
-    sx1_binfo.initrd_filename = machine->initrd_filename;
-    arm_load_kernel(mpu->cpu, &sx1_binfo);
+    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
 
     /* TODO: fix next line */
     //~ qemu_console_resize(ds, 640, 480);
diff --git a/hw/arm/palm.c b/hw/arm/palm.c
index 9eb9612bce..67ab30b5bc 100644
--- a/hw/arm/palm.c
+++ b/hw/arm/palm.c
@@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
 
 static void palmte_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     struct omap_mpu_state_s *mpu;
     int flash_size = 0x00800000;
@@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
         }
     }
 
-    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
+    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
         fprintf(stderr, "Kernel or ROM image must be specified\n");
         exit(1);
     }
 
     /* Load the kernel.  */
-    palmte_binfo.kernel_filename = kernel_filename;
-    palmte_binfo.kernel_cmdline = kernel_cmdline;
-    palmte_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(mpu->cpu, &palmte_binfo);
+    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
 }
 
 static void palmte_machine_init(MachineClass *mc)
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 5b2620acb4..74c062d05e 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -157,13 +157,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
 
         binfo.entry = firmware_addr;
         binfo.firmware_loaded = true;
-    } else {
-        binfo.kernel_filename = machine->kernel_filename;
-        binfo.kernel_cmdline = machine->kernel_cmdline;
-        binfo.initrd_filename = machine->initrd_filename;
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
 }
 
 static void raspi_init(MachineState *machine, int version)
diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 7c56c8d2ed..5a3e65ddd6 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
     memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
 
     realview_binfo.ram_size = ram_size;
-    realview_binfo.kernel_filename = machine->kernel_filename;
-    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
-    realview_binfo.initrd_filename = machine->initrd_filename;
     realview_binfo.nb_cpus = smp_cpus;
     realview_binfo.board_id = realview_board_id[board_type];
     realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
-    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
 }
 
 static void realview_eb_init(MachineState *machine)
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index 934f4c9261..8f4b68e14c 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -102,16 +102,13 @@ static void sabrelite_init(MachineState *machine)
     }
 
     sabrelite_binfo.ram_size = machine->ram_size;
-    sabrelite_binfo.kernel_filename = machine->kernel_filename;
-    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
-    sabrelite_binfo.initrd_filename = machine->initrd_filename;
     sabrelite_binfo.nb_cpus = machine->smp.cpus;
     sabrelite_binfo.secure_boot = true;
     sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
     sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
+        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
     }
 }
 
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 9c67d5c6f9..2aba3c58c5 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -709,13 +709,12 @@ static void sbsa_ref_init(MachineState *machine)
     create_pcie(sms, pic);
 
     sms->bootinfo.ram_size = machine->ram_size;
-    sms->bootinfo.kernel_filename = machine->kernel_filename;
     sms->bootinfo.nb_cpus = smp_cpus;
     sms->bootinfo.board_id = -1;
     sms->bootinfo.loader_start = sbsa_ref_memmap[SBSA_MEM].base;
     sms->bootinfo.get_dtb = sbsa_ref_dtb;
     sms->bootinfo.firmware_loaded = firmware_loaded;
-    arm_load_kernel(ARM_CPU(first_cpu), &sms->bootinfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
 }
 
 static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index 723cf5d592..42338696b3 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
         /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
         spitz_microdrive_attach(mpu, 0);
 
-    spitz_binfo.kernel_filename = machine->kernel_filename;
-    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
-    spitz_binfo.initrd_filename = machine->initrd_filename;
     spitz_binfo.board_id = arm_id;
-    arm_load_kernel(mpu->cpu, &spitz_binfo);
+    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
     sl_bootparam_write(SL_PXA_PARAM_BASE);
 }
 
diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
index 7843d68d46..3a1de81278 100644
--- a/hw/arm/tosa.c
+++ b/hw/arm/tosa.c
@@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
 
 static void tosa_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     MemoryRegion *rom = g_new(MemoryRegion, 1);
     PXA2xxState *mpu;
@@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
 
     tosa_tg_init(mpu);
 
-    tosa_binfo.kernel_filename = kernel_filename;
-    tosa_binfo.kernel_cmdline = kernel_cmdline;
-    tosa_binfo.initrd_filename = initrd_filename;
     tosa_binfo.board_id = 0x208;
-    arm_load_kernel(mpu->cpu, &tosa_binfo);
+    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
     sl_bootparam_write(SL_PXA_PARAM_BASE);
 }
 
diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index e5857117ac..d3c3c00f55 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -373,11 +373,8 @@ static void versatile_init(MachineState *machine, int board_id)
     }
 
     versatile_binfo.ram_size = machine->ram_size;
-    versatile_binfo.kernel_filename = machine->kernel_filename;
-    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
-    versatile_binfo.initrd_filename = machine->initrd_filename;
     versatile_binfo.board_id = board_id;
-    arm_load_kernel(cpu, &versatile_binfo);
+    arm_load_kernel(cpu, machine, &versatile_binfo);
 }
 
 static void vpb_init(MachineState *machine)
diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 5d932c27c0..4673a88a8d 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -707,9 +707,6 @@ static void vexpress_common_init(MachineState *machine)
     }
 
     daughterboard->bootinfo.ram_size = machine->ram_size;
-    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
-    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
-    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
     daughterboard->bootinfo.nb_cpus = machine->smp.cpus;
     daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
     daughterboard->bootinfo.loader_start = daughterboard->loader_start;
@@ -719,7 +716,7 @@ static void vexpress_common_init(MachineState *machine)
     daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
     /* When booting Linux we should be in secure state if the CPU has one. */
     daughterboard->bootinfo.secure_boot = vms->secure;
-    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
 }
 
 static bool vexpress_get_secure(Object *obj, Error **errp)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d9496c9363..6ffb80bf5b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1364,6 +1364,7 @@ void virt_machine_done(Notifier *notifier, void *data)
 {
     VirtMachineState *vms = container_of(notifier, VirtMachineState,
                                          machine_done);
+    MachineState *ms = MACHINE(vms);
     ARMCPU *cpu = ARM_CPU(first_cpu);
     struct arm_boot_info *info = &vms->bootinfo;
     AddressSpace *as = arm_boot_address_space(cpu, info);
@@ -1381,7 +1382,7 @@ void virt_machine_done(Notifier *notifier, void *data)
                                        vms->memmap[VIRT_PLATFORM_BUS].size,
                                        vms->irqmap[VIRT_PLATFORM_BUS]);
     }
-    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
+    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
         exit(1);
     }
 
@@ -1707,16 +1708,13 @@ static void machvirt_init(MachineState *machine)
     create_platform_bus(vms, pic);
 
     vms->bootinfo.ram_size = machine->ram_size;
-    vms->bootinfo.kernel_filename = machine->kernel_filename;
-    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
-    vms->bootinfo.initrd_filename = machine->initrd_filename;
     vms->bootinfo.nb_cpus = smp_cpus;
     vms->bootinfo.board_id = -1;
     vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
     vms->bootinfo.get_dtb = machvirt_dtb;
     vms->bootinfo.skip_dtb_autoload = true;
     vms->bootinfo.firmware_loaded = firmware_loaded;
-    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
 
     vms->machine_done.notify = virt_machine_done;
     qemu_add_machine_init_done_notifier(&vms->machine_done);
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 89da34808b..c14774e542 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -158,9 +158,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
 static void zynq_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     ARMCPU *cpu;
     MemoryRegion *address_space_mem = get_system_memory();
     MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
@@ -303,16 +300,13 @@ static void zynq_init(MachineState *machine)
     sysbus_mmio_map(busdev, 0, 0xF8007000);
 
     zynq_binfo.ram_size = ram_size;
-    zynq_binfo.kernel_filename = kernel_filename;
-    zynq_binfo.kernel_cmdline = kernel_cmdline;
-    zynq_binfo.initrd_filename = initrd_filename;
     zynq_binfo.nb_cpus = 1;
     zynq_binfo.board_id = 0xd32;
     zynq_binfo.loader_start = 0;
     zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
     zynq_binfo.write_board_setup = zynq_write_board_setup;
 
-    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
 }
 
 static void zynq_machine_init(MachineClass *mc)
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index f95fde2309..462493c467 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
                                         0, &s->soc.fpd.apu.mr, 0);
 
     s->binfo.ram_size = machine->ram_size;
-    s->binfo.kernel_filename = machine->kernel_filename;
-    s->binfo.kernel_cmdline = machine->kernel_cmdline;
-    s->binfo.initrd_filename = machine->initrd_filename;
     s->binfo.loader_start = 0x0;
     s->binfo.get_dtb = versal_virt_get_dtb;
     s->binfo.modify_dtb = versal_virt_modify_dtb;
     if (machine->kernel_filename) {
-        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
+        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
     } else {
         AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
                                                   &s->binfo);
@@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
         s->binfo.loader_start = 0x1000;
         s->binfo.dtb_limit = 0x1000000;
         if (arm_load_dtb(s->binfo.loader_start,
-                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
+                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
             exit(EXIT_FAILURE);
         }
     }
diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
index 044d3394c0..53cfe7c1f1 100644
--- a/hw/arm/xlnx-zcu102.c
+++ b/hw/arm/xlnx-zcu102.c
@@ -171,11 +171,8 @@ static void xlnx_zcu102_init(MachineState *machine)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     xlnx_zcu102_binfo.ram_size = ram_size;
-    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
-    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
-    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
     xlnx_zcu102_binfo.loader_start = 0;
-    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
+    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
 }
 
 static void xlnx_zcu102_machine_instance_init(Object *obj)
diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index 44aa748d39..2f21421683 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
 
 static void z2_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     uint32_t sector_len = 0x10000;
     PXA2xxState *mpu;
@@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
     qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
                           qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
 
-    z2_binfo.kernel_filename = kernel_filename;
-    z2_binfo.kernel_cmdline = kernel_cmdline;
-    z2_binfo.initrd_filename = initrd_filename;
     z2_binfo.board_id = 0x6dd;
-    arm_load_kernel(mpu->cpu, &z2_binfo);
+    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
 }
 
 static void z2_machine_init(MachineClass *mc)
diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
index c48cc4c2bc..2673abe81f 100644
--- a/include/hw/arm/boot.h
+++ b/include/hw/arm/boot.h
@@ -133,7 +133,7 @@ struct arm_boot_info {
  * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
  * machine init done notifiers are called in registration reverse order.
  */
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
+void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
 
 AddressSpace *arm_boot_address_space(ARMCPU *cpu,
                                      const struct arm_boot_info *info);
@@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
  * Note: Must not be called unless have_dtb(binfo) is true.
  */
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                 hwaddr addr_limit, AddressSpace *as);
+                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
 
 /* Write a secure board setup routine with a dummy handler for SMCs */
 void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 exec.c                              |  5 ++-
 hw/acpi/aml-build.c                 |  3 +-
 hw/arm/boot.c                       |  4 +-
 hw/arm/sbsa-ref.c                   |  4 +-
 hw/arm/virt-acpi-build.c            | 10 +++--
 hw/arm/virt.c                       |  4 +-
 hw/core/machine-hmp-cmds.c          | 12 ++++--
 hw/core/machine.c                   | 14 +++++--
 hw/core/numa.c                      | 60 +++++++++++++++++------------
 hw/i386/acpi-build.c                |  2 +-
 hw/i386/pc.c                        |  9 +++--
 hw/mem/pc-dimm.c                    |  2 +
 hw/pci-bridge/pci_expander_bridge.c |  8 +++-
 hw/ppc/spapr.c                      | 19 ++++-----
 include/hw/acpi/aml-build.h         |  2 +-
 include/hw/boards.h                 |  1 +
 include/sysemu/numa.h               | 10 ++++-
 17 files changed, 110 insertions(+), 59 deletions(-)

diff --git a/exec.c b/exec.c
index 3e78de3b8f..4fd6ec2bd0 100644
--- a/exec.c
+++ b/exec.c
@@ -1749,6 +1749,7 @@ long qemu_minrampagesize(void)
     long hpsize = LONG_MAX;
     long mainrampagesize;
     Object *memdev_root;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
     mainrampagesize = qemu_mempath_getpagesize(mem_path);
 
@@ -1776,7 +1777,9 @@ long qemu_minrampagesize(void)
      * so if its page size is smaller we have got to report that size instead.
      */
     if (hpsize > mainrampagesize &&
-        (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL)) {
+        (ms->numa_state == NULL ||
+         ms->numa_state->num_nodes == 0 ||
+         numa_info[0].node_memdev == NULL)) {
         static bool warned;
         if (!warned) {
             error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 555c24f21d..63c1cae8c9 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1726,10 +1726,11 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
  * ACPI spec 5.2.17 System Locality Distance Information Table
  * (Revision 2.0 or later)
  */
-void build_slit(GArray *table_data, BIOSLinker *linker)
+void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
 {
     int slit_start, i, j;
     slit_start = table_data->len;
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ba604f8277..d02d2dae85 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -598,9 +598,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
     g_strfreev(node_path);
 
-    if (nb_numa_nodes > 0) {
+    if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = numa_info[i].node_mem;
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 2aba3c58c5..22847909bf 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -144,6 +144,7 @@ static void create_fdt(SBSAMachineState *sms)
 {
     void *fdt = create_device_tree(&sms->fdt_size);
     const MachineState *ms = MACHINE(sms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
     int cpu;
 
     if (!fdt) {
@@ -760,7 +761,7 @@ sbsa_ref_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 static int64_t
 sbsa_ref_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx % nb_numa_nodes;
+    return idx % ms->numa_state->num_nodes;
 }
 
 static void sbsa_ref_instance_init(Object *obj)
@@ -787,6 +788,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
     mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
     mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
     mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;
+    mc->numa_mem_supported = true;
 }
 
 static const TypeInfo sbsa_ref_info = {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0afb372769..a2cc4b84fe 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -516,7 +516,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     int i, srat_start;
     uint64_t mem_base;
     MachineClass *mc = MACHINE_GET_CLASS(vms);
-    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
+    MachineState *ms = MACHINE(vms);
+    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(ms);
 
     srat_start = table_data->len;
     srat = acpi_data_push(table_data, sizeof(*srat));
@@ -532,7 +533,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     }
 
     mem_base = vms->memmap[VIRT_MEM].base;
-    for (i = 0; i < nb_numa_nodes; ++i) {
+    for (i = 0; i < ms->numa_state->num_nodes; ++i) {
         if (numa_info[i].node_mem > 0) {
             numamem = acpi_data_push(table_data, sizeof(*numamem));
             build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
@@ -758,6 +759,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     GArray *table_offsets;
     unsigned dsdt, xsdt;
     GArray *tables_blob = tables->table_data;
+    MachineState *ms = MACHINE(vms);
 
     table_offsets = g_array_new(false, true /* clear */,
                                         sizeof(uint32_t));
@@ -792,12 +794,12 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     acpi_add_table(table_offsets, tables_blob);
     build_spcr(tables_blob, tables->linker, vms);
 
-    if (nb_numa_nodes > 0) {
+    if (ms->numa_state->num_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
         if (have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
-            build_slit(tables_blob, tables->linker);
+            build_slit(tables_blob, tables->linker, ms);
         }
     }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6ffb80bf5b..c72b8fd3a7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -199,6 +199,8 @@ static bool cpu_type_valid(const char *cpu)
 
 static void create_fdt(VirtMachineState *vms)
 {
+    MachineState *ms = MACHINE(vms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
     void *fdt = create_device_tree(&vms->fdt_size);
 
     if (!fdt) {
@@ -1842,7 +1844,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 
 static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx % nb_numa_nodes;
+    return idx % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 1f66bda346..6a1a2599d8 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -139,15 +139,21 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
 
 void hmp_info_numa(Monitor *mon, const QDict *qdict)
 {
-    int i;
+    int i, nb_numa_nodes;
     NumaNodeMem *node_mem;
     CpuInfoList *cpu_list, *cpu;
+    MachineState *ms = MACHINE(qdev_get_machine());
+
+    nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
+    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
+    if (!nb_numa_nodes) {
+        return;
+    }
 
     cpu_list = qmp_query_cpus(&error_abort);
     node_mem = g_new0(NumaNodeMem, nb_numa_nodes);
 
-    query_numa_node_mem(node_mem);
-    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
+    query_numa_node_mem(node_mem, ms);
     for (i = 0; i < nb_numa_nodes; i++) {
         monitor_printf(mon, "node %d cpus:", i);
         for (cpu = cpu_list; cpu; cpu = cpu->next) {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 32d1ca9abc..3c55470103 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -953,6 +953,9 @@ static void machine_initfn(Object *obj)
                                         NULL);
     }
 
+    if (mc->numa_mem_supported) {
+        ms->numa_state = g_new0(NumaState, 1);
+    }
 
     /* Register notifier when init is done for sysbus sanity checks */
     ms->sysbus_notifier.notify = machine_init_notify;
@@ -973,6 +976,7 @@ static void machine_finalize(Object *obj)
     g_free(ms->firmware);
     g_free(ms->device_memory);
     g_free(ms->nvdimms_state);
+    g_free(ms->numa_state);
 }
 
 bool machine_usb(MachineState *machine)
@@ -1047,7 +1051,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
 
-    assert(nb_numa_nodes);
+    assert(machine->numa_state->num_nodes);
     for (i = 0; i < possible_cpus->len; i++) {
         if (possible_cpus->cpus[i].props.has_node_id) {
             break;
@@ -1093,9 +1097,11 @@ void machine_run_board_init(MachineState *machine)
 {
     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
-    numa_complete_configuration(machine);
-    if (nb_numa_nodes) {
-        machine_numa_finish_cpu_init(machine);
+    if (machine_class->numa_mem_supported) {
+        numa_complete_configuration(machine);
+        if (machine->numa_state->num_nodes) {
+            machine_numa_finish_cpu_init(machine);
+        }
     }
 
     /* If the machine supports the valid_cpu_types check and the user
diff --git a/hw/core/numa.c b/hw/core/numa.c
index a11431483c..4d5e308bf1 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -50,7 +50,6 @@ static int have_mem;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
                              */
-int nb_numa_nodes;
 bool have_numa_distance;
 NodeInfo numa_info[MAX_NODES];
 
@@ -67,7 +66,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     if (node->has_nodeid) {
         nodenr = node->nodeid;
     } else {
-        nodenr = nb_numa_nodes;
+        nodenr = ms->numa_state->num_nodes;
     }
 
     if (nodenr >= MAX_NODES) {
@@ -133,10 +132,11 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
-    nb_numa_nodes++;
+    ms->numa_state->num_nodes++;
 }
 
-static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
+static
+void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
 {
     uint16_t src = dist->src;
     uint16_t dst = dist->dst;
@@ -174,6 +174,12 @@ static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+    if (!mc->numa_mem_supported) {
+        error_setg(errp, "NUMA is not supported by this machine-type");
+        goto end;
+    }
 
     switch (object->type) {
     case NUMA_OPTIONS_TYPE_NODE:
@@ -183,7 +189,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         }
         break;
     case NUMA_OPTIONS_TYPE_DIST:
-        parse_numa_distance(&object->u.dist, &err);
+        parse_numa_distance(ms, &object->u.dist, &err);
         if (err) {
             goto end;
         }
@@ -248,10 +254,11 @@ end:
  * distance from a node to itself is always NUMA_DISTANCE_MIN,
  * so providing it is never necessary.
  */
-static void validate_numa_distance(void)
+static void validate_numa_distance(MachineState *ms)
 {
     int src, dst;
     bool is_asymmetrical = false;
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = src; dst < nb_numa_nodes; dst++) {
@@ -289,7 +296,7 @@ static void validate_numa_distance(void)
     }
 }
 
-static void complete_init_numa_distance(void)
+static void complete_init_numa_distance(MachineState *ms)
 {
     int src, dst;
 
@@ -298,8 +305,8 @@ static void complete_init_numa_distance(void)
      * there would not be any missing distance except local node, which
      * is verified by validate_numa_distance above.
      */
-    for (src = 0; src < nb_numa_nodes; src++) {
-        for (dst = 0; dst < nb_numa_nodes; dst++) {
+    for (src = 0; src < ms->numa_state->num_nodes; src++) {
+        for (dst = 0; dst < ms->numa_state->num_nodes; dst++) {
             if (numa_info[src].distance[dst] == 0) {
                 if (src == dst) {
                     numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
@@ -365,7 +372,7 @@ void numa_complete_configuration(MachineState *ms)
      *
      * Enable NUMA implicitly by adding a new NUMA node automatically.
      */
-    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
+    if (ms->ram_slots > 0 && ms->numa_state->num_nodes == 0 &&
         mc->auto_enable_numa_with_memhp) {
             NumaNodeOptions node = { };
             parse_numa_node(ms, &node, &error_abort);
@@ -383,26 +390,27 @@ void numa_complete_configuration(MachineState *ms)
     }
 
     /* This must be always true if all nodes are present: */
-    assert(nb_numa_nodes == max_numa_nodeid);
+    assert(ms->numa_state->num_nodes == max_numa_nodeid);
 
-    if (nb_numa_nodes > 0) {
+    if (ms->numa_state->num_nodes > 0) {
         uint64_t numa_total;
 
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
+        if (ms->numa_state->num_nodes > MAX_NODES) {
+            ms->numa_state->num_nodes = MAX_NODES;
         }
 
         /* If no memory size is given for any node, assume the default case
          * and distribute the available memory equally across all nodes
          */
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             if (numa_info[i].node_mem != 0) {
                 break;
             }
         }
-        if (i == nb_numa_nodes) {
+        if (i == ms->numa_state->num_nodes) {
             assert(mc->numa_auto_assign_ram);
-            mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+            mc->numa_auto_assign_ram(mc, numa_info,
+                                     ms->numa_state->num_nodes, ram_size);
             if (!qtest_enabled()) {
                 warn_report("Default splitting of RAM between nodes is deprecated,"
                             " Use '-numa node,memdev' to explictly define RAM"
@@ -411,7 +419,7 @@ void numa_complete_configuration(MachineState *ms)
         }
 
         numa_total = 0;
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             numa_total += numa_info[i].node_mem;
         }
         if (numa_total != ram_size) {
@@ -435,10 +443,10 @@ void numa_complete_configuration(MachineState *ms)
          */
         if (have_numa_distance) {
             /* Validate enough NUMA distance information was provided. */
-            validate_numa_distance();
+            validate_numa_distance(ms);
 
             /* Validation succeeded, now fill in any missing distances. */
-            complete_init_numa_distance();
+            complete_init_numa_distance(ms);
         }
     }
 }
@@ -505,14 +513,16 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 {
     uint64_t addr = 0;
     int i;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
-    if (nb_numa_nodes == 0 || !have_memdevs) {
+    if (ms->numa_state == NULL ||
+        ms->numa_state->num_nodes == 0 || !have_memdevs) {
         allocate_system_memory_nonnuma(mr, owner, name, ram_size);
         return;
     }
 
     memory_region_init(mr, owner, name, ram_size);
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
         uint64_t size = numa_info[i].node_mem;
         HostMemoryBackend *backend = numa_info[i].node_memdev;
         if (!backend) {
@@ -570,16 +580,16 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
     qapi_free_MemoryDeviceInfoList(info_list);
 }
 
-void query_numa_node_mem(NumaNodeMem node_mem[])
+void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 {
     int i;
 
-    if (nb_numa_nodes <= 0) {
+    if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
         return;
     }
 
     numa_stat_memory_devices(node_mem);
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
         node_mem[i].node_mem += numa_info[i].node_mem;
     }
 }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f3fdfefcd5..d4c092358d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2694,7 +2694,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
         build_srat(tables_blob, tables->linker, machine);
         if (have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
-            build_slit(tables_blob, tables->linker);
+            build_slit(tables_blob, tables->linker, machine);
         }
     }
     if (acpi_get_mcfg(&mcfg)) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 549c437050..b2cc618fbf 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -996,6 +996,8 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     int i;
     const CPUArchIdList *cpus;
     MachineClass *mc = MACHINE_GET_CLASS(pcms);
+    MachineState *ms = MACHINE(pcms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
     fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
@@ -1759,12 +1761,13 @@ void pc_machine_done(Notifier *notifier, void *data)
 void pc_guest_info_init(PCMachineState *pcms)
 {
     int i;
+    MachineState *ms = MACHINE(pcms);
 
     pcms->apic_xrupt_override = kvm_allows_irq0_override();
-    pcms->numa_nodes = nb_numa_nodes;
+    pcms->numa_nodes = ms->numa_state->num_nodes;
     pcms->node_mem = g_malloc0(pcms->numa_nodes *
                                     sizeof *pcms->node_mem);
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
         pcms->node_mem[i] = numa_info[i].node_mem;
     }
 
@@ -2847,7 +2850,7 @@ static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
    x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
                             pcms->smp_dies, ms->smp.cores,
                             ms->smp.threads, &topo);
-   return topo.pkg_id % nb_numa_nodes;
+   return topo.pkg_id % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 29c785799c..6b4de33f60 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -168,6 +168,8 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
 {
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MachineState *ms = MACHINE(qdev_get_machine());
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     if (!dimm->hostmem) {
         error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index aecf3d7ddf..d83d89d701 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -212,9 +212,15 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
     PCIBus *bus;
     const char *dev_name = NULL;
     Error *local_err = NULL;
+    MachineState *ms = MACHINE(qdev_get_machine());
+
+    if (ms->numa_state == NULL) {
+        error_setg(errp, "NUMA is not supported by this machine-type");
+        return;
+    }
 
     if (pxb->numa_node != NUMA_NODE_UNASSIGNED &&
-        pxb->numa_node >= nb_numa_nodes) {
+        pxb->numa_node >= ms->numa_state->num_nodes) {
         error_setg(errp, "Illegal numa node %d", pxb->numa_node);
         return;
     }
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 821f0d4a49..358d670485 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -331,7 +331,7 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
             return ret;
         }
 
-        if (nb_numa_nodes > 1) {
+        if (ms->numa_state->num_nodes > 1) {
             ret = spapr_fixup_cpu_numa_dt(fdt, offset, cpu);
             if (ret < 0) {
                 return ret;
@@ -351,9 +351,9 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
 
 static hwaddr spapr_node0_size(MachineState *machine)
 {
-    if (nb_numa_nodes) {
+    if (machine->numa_state->num_nodes) {
         int i;
-        for (i = 0; i < nb_numa_nodes; ++i) {
+        for (i = 0; i < machine->numa_state->num_nodes; ++i) {
             if (numa_info[i].node_mem) {
                 return MIN(pow2floor(numa_info[i].node_mem),
                            machine->ram_size);
@@ -398,12 +398,12 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *machine = MACHINE(spapr);
     hwaddr mem_start, node_size;
-    int i, nb_nodes = nb_numa_nodes;
+    int i, nb_nodes = machine->numa_state->num_nodes;
     NodeInfo *nodes = numa_info;
     NodeInfo ramnode;
 
     /* No NUMA nodes, assume there is just one node with whole RAM */
-    if (!nb_numa_nodes) {
+    if (!nb_nodes) {
         nb_nodes = 1;
         ramnode.node_mem = machine->ram_size;
         nodes = &ramnode;
@@ -554,7 +554,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
     _FDT((fdt_setprop(fdt, offset, "ibm,pft-size",
                       pft_size_prop, sizeof(pft_size_prop))));
 
-    if (nb_numa_nodes > 1) {
+    if (ms->numa_state->num_nodes > 1) {
         _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cpu));
     }
 
@@ -861,6 +861,7 @@ static int spapr_populate_drmem_v1(SpaprMachineState *spapr, void *fdt,
 static int spapr_populate_drconf_memory(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *machine = MACHINE(spapr);
+    int nb_numa_nodes = machine->numa_state->num_nodes;
     int ret, i, offset;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
     uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
@@ -1750,7 +1751,7 @@ static void spapr_machine_reset(MachineState *machine)
      * The final value of spapr->gpu_numa_id is going to be written to
      * max-associativity-domains in spapr_build_fdt().
      */
-    spapr->gpu_numa_id = MAX(1, nb_numa_nodes);
+    spapr->gpu_numa_id = MAX(1, machine->numa_state->num_nodes);
     qemu_devices_reset();
 
     /*
@@ -2537,7 +2538,7 @@ static void spapr_validate_node_memory(MachineState *machine, Error **errp)
         return;
     }
 
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < machine->numa_state->num_nodes; i++) {
         if (numa_info[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
             error_setg(errp,
                        "Node %d memory size 0x%" PRIx64
@@ -4139,7 +4140,7 @@ spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
 
 static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx / ms->smp.cores % nb_numa_nodes;
+    return idx / ms->smp.cores % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1a563ad756..991cf05134 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -414,7 +414,7 @@ build_append_gas_from_struct(GArray *table, const struct AcpiGenericAddress *s)
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
-void build_slit(GArray *table_data, BIOSLinker *linker);
+void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a71d1a53a5..2eb9a0b4e0 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -298,6 +298,7 @@ struct MachineState {
     CPUArchIdList *possible_cpus;
     CpuTopology smp;
     struct NVDIMMState *nvdimms_state;
+    struct NumaState *numa_state;
 };
 
 #define DEFINE_MACHINE(namestr, machine_initfn) \
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 01a263eba2..3e8dbf20c1 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,7 +6,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
 
-extern int nb_numa_nodes;   /* Number of NUMA nodes */
 extern bool have_numa_distance;
 
 struct NodeInfo {
@@ -23,10 +22,17 @@ struct NumaNodeMem {
 
 extern NodeInfo numa_info[MAX_NODES];
 
+struct NumaState {
+    /* Number of NUMA nodes */
+    int num_nodes;
+
+};
+typedef struct NumaState NumaState;
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void numa_complete_configuration(MachineState *ms);
-void query_numa_node_mem(NumaNodeMem node_mem[]);
+void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
 void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
                                  int nb_nodes, ram_addr_t size);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance into MachineState
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
@ 2019-08-09  6:57 ` " Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

Move existing numa global have_numa_distance into NumaState.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 hw/arm/sbsa-ref.c        | 2 +-
 hw/arm/virt-acpi-build.c | 2 +-
 hw/arm/virt.c            | 2 +-
 hw/core/numa.c           | 5 ++---
 hw/i386/acpi-build.c     | 2 +-
 include/sysemu/numa.h    | 4 ++--
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 22847909bf..7e4c471717 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -158,7 +158,7 @@ static void create_fdt(SBSAMachineState *sms)
     qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x2);
     qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
 
-    if (have_numa_distance) {
+    if (ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
         uint32_t *matrix = g_malloc0(size);
         int idx, i, j;
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a2cc4b84fe..461a44b5b0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -797,7 +797,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (ms->numa_state->num_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
-        if (have_numa_distance) {
+        if (ms->numa_state->have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, ms);
         }
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c72b8fd3a7..6f0170cf1d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -232,7 +232,7 @@ static void create_fdt(VirtMachineState *vms)
                                 "clk24mhz");
     qemu_fdt_setprop_cell(fdt, "/apb-pclk", "phandle", vms->clock_phandle);
 
-    if (have_numa_distance) {
+    if (nb_numa_nodes > 0 && ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
         uint32_t *matrix = g_malloc0(size);
         int idx, i, j;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 4d5e308bf1..2142ec29e8 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -50,7 +50,6 @@ static int have_mem;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
                              */
-bool have_numa_distance;
 NodeInfo numa_info[MAX_NODES];
 
 
@@ -168,7 +167,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     }
 
     numa_info[src].distance[dst] = val;
-    have_numa_distance = true;
+    ms->numa_state->have_numa_distance = true;
 }
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
@@ -441,7 +440,7 @@ void numa_complete_configuration(MachineState *ms)
          * asymmetric. In this case, the distances for both directions
          * of all node pairs are required.
          */
-        if (have_numa_distance) {
+        if (ms->numa_state->have_numa_distance) {
             /* Validate enough NUMA distance information was provided. */
             validate_numa_distance(ms);
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d4c092358d..081a8fc116 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2692,7 +2692,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     if (pcms->numa_nodes) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, machine);
-        if (have_numa_distance) {
+        if (machine->numa_state->have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 3e8dbf20c1..2e5e998adb 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,8 +6,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
 
-extern bool have_numa_distance;
-
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -26,6 +24,8 @@ struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
 
+    /* Allow setting NUMA distance for different NUMA nodes */
+    bool have_numa_distance;
 };
 typedef struct NumaState NumaState;
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info into MachineState
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (2 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
@ 2019-08-09  6:57 ` " Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

Move existing numa global numa_info (renamed as "nodes") into NumaState.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 exec.c                   |  2 +-
 hw/acpi/aml-build.c      |  6 ++++--
 hw/arm/boot.c            |  2 +-
 hw/arm/sbsa-ref.c        |  3 ++-
 hw/arm/virt-acpi-build.c |  7 ++++---
 hw/arm/virt.c            |  3 ++-
 hw/core/numa.c           | 15 +++++++++------
 hw/i386/pc.c             |  4 ++--
 hw/ppc/spapr.c           | 10 +++++-----
 hw/ppc/spapr_pci.c       |  4 +++-
 include/sysemu/numa.h    |  5 +++--
 11 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/exec.c b/exec.c
index 4fd6ec2bd0..de87d3548b 100644
--- a/exec.c
+++ b/exec.c
@@ -1779,7 +1779,7 @@ long qemu_minrampagesize(void)
     if (hpsize > mainrampagesize &&
         (ms->numa_state == NULL ||
          ms->numa_state->num_nodes == 0 ||
-         numa_info[0].node_memdev == NULL)) {
+         ms->numa_state->nodes[0].node_memdev == NULL)) {
         static bool warned;
         if (!warned) {
             error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 63c1cae8c9..26ccc1a3e2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1737,8 +1737,10 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
     build_append_int_noprefix(table_data, nb_numa_nodes, 8);
     for (i = 0; i < nb_numa_nodes; i++) {
         for (j = 0; j < nb_numa_nodes; j++) {
-            assert(numa_info[i].distance[j]);
-            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
+            assert(ms->numa_state->nodes[i].distance[j]);
+            build_append_int_noprefix(table_data,
+                                      ms->numa_state->nodes[i].distance[j],
+                                      1);
         }
     }
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d02d2dae85..6472aa441e 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -601,7 +601,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
-            mem_len = numa_info[i].node_mem;
+            mem_len = ms->numa_state->nodes[i].node_mem;
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 7e4c471717..3a243e6a53 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -168,7 +168,8 @@ static void create_fdt(SBSAMachineState *sms)
                 idx = (i * nb_numa_nodes + j) * 3;
                 matrix[idx + 0] = cpu_to_be32(i);
                 matrix[idx + 1] = cpu_to_be32(j);
-                matrix[idx + 2] = cpu_to_be32(numa_info[i].distance[j]);
+                matrix[idx + 2] =
+                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
             }
         }
 
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 461a44b5b0..89899ec4c1 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -534,11 +534,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     mem_base = vms->memmap[VIRT_MEM].base;
     for (i = 0; i < ms->numa_state->num_nodes; ++i) {
-        if (numa_info[i].node_mem > 0) {
+        if (ms->numa_state->nodes[i].node_mem > 0) {
             numamem = acpi_data_push(table_data, sizeof(*numamem));
-            build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
+            build_srat_memory(numamem, mem_base,
+                              ms->numa_state->nodes[i].node_mem, i,
                               MEM_AFFINITY_ENABLED);
-            mem_base += numa_info[i].node_mem;
+            mem_base += ms->numa_state->nodes[i].node_mem;
         }
     }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6f0170cf1d..46f39e20bc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -242,7 +242,8 @@ static void create_fdt(VirtMachineState *vms)
                 idx = (i * nb_numa_nodes + j) * 3;
                 matrix[idx + 0] = cpu_to_be32(i);
                 matrix[idx + 1] = cpu_to_be32(j);
-                matrix[idx + 2] = cpu_to_be32(numa_info[i].distance[j]);
+                matrix[idx + 2] =
+                    cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
             }
         }
 
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 2142ec29e8..8fcbba05d6 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -50,8 +50,6 @@ static int have_mem;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
                              */
-NodeInfo numa_info[MAX_NODES];
-
 
 static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
                             Error **errp)
@@ -61,6 +59,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     uint16List *cpus = NULL;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
     unsigned int max_cpus = ms->smp.max_cpus;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     if (node->has_nodeid) {
         nodenr = node->nodeid;
@@ -140,6 +139,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     uint16_t src = dist->src;
     uint16_t dst = dist->dst;
     uint8_t val = dist->val;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     if (src >= MAX_NODES || dst >= MAX_NODES) {
         error_setg(errp, "Parameter '%s' expects an integer between 0 and %d",
@@ -198,7 +198,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             error_setg(&err, "Missing mandatory node-id property");
             goto end;
         }
-        if (!numa_info[object->u.cpu.node_id].present) {
+        if (!ms->numa_state->nodes[object->u.cpu.node_id].present) {
             error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
                 "defined with -numa node,nodeid=ID before it's used with "
                 "-numa cpu,node-id=ID", object->u.cpu.node_id);
@@ -258,6 +258,7 @@ static void validate_numa_distance(MachineState *ms)
     int src, dst;
     bool is_asymmetrical = false;
     int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = src; dst < nb_numa_nodes; dst++) {
@@ -298,6 +299,7 @@ static void validate_numa_distance(MachineState *ms)
 static void complete_init_numa_distance(MachineState *ms)
 {
     int src, dst;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     /* Fixup NUMA distance by symmetric policy because if it is an
      * asymmetric distance table, it should be a complete table and
@@ -357,6 +359,7 @@ void numa_complete_configuration(MachineState *ms)
 {
     int i;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     /*
      * If memory hotplug is enabled (slots > 0) but without '-numa'
@@ -522,8 +525,8 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 
     memory_region_init(mr, owner, name, ram_size);
     for (i = 0; i < ms->numa_state->num_nodes; i++) {
-        uint64_t size = numa_info[i].node_mem;
-        HostMemoryBackend *backend = numa_info[i].node_memdev;
+        uint64_t size = ms->numa_state->nodes[i].node_mem;
+        HostMemoryBackend *backend = ms->numa_state->nodes[i].node_memdev;
         if (!backend) {
             continue;
         }
@@ -589,7 +592,7 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 
     numa_stat_memory_devices(node_mem);
     for (i = 0; i < ms->numa_state->num_nodes; i++) {
-        node_mem[i].node_mem += numa_info[i].node_mem;
+        node_mem[i].node_mem += ms->numa_state->nodes[i].node_mem;
     }
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b2cc618fbf..c3f5a70a56 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1040,7 +1040,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     }
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
-            cpu_to_le64(numa_info[i].node_mem);
+            cpu_to_le64(ms->numa_state->nodes[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + pcms->apic_id_limit + nb_numa_nodes) *
@@ -1768,7 +1768,7 @@ void pc_guest_info_init(PCMachineState *pcms)
     pcms->node_mem = g_malloc0(pcms->numa_nodes *
                                     sizeof *pcms->node_mem);
     for (i = 0; i < ms->numa_state->num_nodes; i++) {
-        pcms->node_mem[i] = numa_info[i].node_mem;
+        pcms->node_mem[i] = ms->numa_state->nodes[i].node_mem;
     }
 
     pcms->machine_done.notify = pc_machine_done;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 358d670485..f607ca567b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -354,8 +354,8 @@ static hwaddr spapr_node0_size(MachineState *machine)
     if (machine->numa_state->num_nodes) {
         int i;
         for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-            if (numa_info[i].node_mem) {
-                return MIN(pow2floor(numa_info[i].node_mem),
+            if (machine->numa_state->nodes[i].node_mem) {
+                return MIN(pow2floor(machine->numa_state->nodes[i].node_mem),
                            machine->ram_size);
             }
         }
@@ -399,7 +399,7 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
     MachineState *machine = MACHINE(spapr);
     hwaddr mem_start, node_size;
     int i, nb_nodes = machine->numa_state->num_nodes;
-    NodeInfo *nodes = numa_info;
+    NodeInfo *nodes = machine->numa_state->nodes;
     NodeInfo ramnode;
 
     /* No NUMA nodes, assume there is just one node with whole RAM */
@@ -2539,11 +2539,11 @@ static void spapr_validate_node_memory(MachineState *machine, Error **errp)
     }
 
     for (i = 0; i < machine->numa_state->num_nodes; i++) {
-        if (numa_info[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
+        if (machine->numa_state->nodes[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
             error_setg(errp,
                        "Node %d memory size 0x%" PRIx64
                        " is not aligned to %" PRIu64 " MiB",
-                       i, numa_info[i].node_mem,
+                       i, machine->numa_state->nodes[i].node_mem,
                        SPAPR_MEMORY_BLOCK_SIZE / MiB);
             return;
         }
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 9003fe9010..f05d82eee7 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1818,6 +1818,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
     SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
+    MachineState *ms = MACHINE(spapr);
     char *namebuf;
     int i;
     PCIBus *bus;
@@ -1870,7 +1871,8 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     }
 
     if (sphb->numa_node != -1 &&
-        (sphb->numa_node >= MAX_NODES || !numa_info[sphb->numa_node].present)) {
+        (sphb->numa_node >= MAX_NODES ||
+         !ms->numa_state->nodes[sphb->numa_node].present)) {
         error_setg(errp, "Invalid NUMA node ID for PCI host bridge");
         return;
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 2e5e998adb..76da3016db 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -18,14 +18,15 @@ struct NumaNodeMem {
     uint64_t node_plugged_mem;
 };
 
-extern NodeInfo numa_info[MAX_NODES];
-
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
 
     /* Allow setting NUMA distance for different NUMA nodes */
     bool have_numa_distance;
+
+    /* NUMA nodes information */
+    NodeInfo nodes[MAX_NODES];
 };
 typedef struct NumaState NumaState;
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (3 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-13 15:00   ` Igor Mammedov
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: Jingqi Liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 hw/core/machine.c     | 24 ++++++++++++++++++++++++
 hw/core/numa.c        | 13 +++++++++++++
 include/sysemu/numa.h |  3 +++
 qapi/machine.json     |  6 +++++-
 qemu-options.hx       | 27 +++++++++++++++++++++++----
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 3c55470103..113184a9df 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props, Error **errp)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
     bool match = false;
     int i;
 
@@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
         match = true;
         slot->props.node_id = props->node_id;
         slot->props.has_node_id = props->has_node_id;
+
+        if (numa_info[props->node_id].initiator_valid &&
+            (props->node_id != numa_info[props->node_id].initiator)) {
+            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+                       " should be itself.", props->node_id);
+            return;
+        }
+        numa_info[props->node_id].initiator_valid = true;
+        numa_info[props->node_id].has_cpu = true;
+        numa_info[props->node_id].initiator = props->node_id;
     }
 
     if (!match) {
@@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
     GString *s = g_string_new(NULL);
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
 
     assert(machine->numa_state->num_nodes);
     for (i = 0; i < possible_cpus->len; i++) {
@@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
             machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
+
+    for (i = 0; i < machine->numa_state->num_nodes; i++) {
+        if (numa_info[i].initiator_valid &&
+            !numa_info[numa_info[i].initiator].has_cpu) {
+            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
+                         " does not exist.", numa_info[i].initiator, i);
+            error_printf("\n");
+
+            exit(1);
+        }
+    }
+
     if (s->len && !qtest_enabled()) {
         warn_report("CPU(s) not present in any NUMA nodes: %s",
                     s->str);
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 8fcbba05d6..cfb6339810 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
+
+    if (node->has_initiator) {
+        if (numa_info[nodenr].initiator_valid &&
+            (node->initiator != numa_info[nodenr].initiator)) {
+            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
+                       "set to node %" PRIu16, nodenr,
+                       numa_info[nodenr].initiator);
+            return;
+        }
+
+        numa_info[nodenr].initiator_valid = true;
+        numa_info[nodenr].initiator = node->initiator;
+    }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 76da3016db..46ad06e000 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -10,6 +10,9 @@ struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
+    bool has_cpu;
+    bool initiator_valid;
+    uint16_t initiator;
     uint8_t distance[MAX_NODES];
 };
 
diff --git a/qapi/machine.json b/qapi/machine.json
index 6db8a7e2ec..05e367d26a 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -414,6 +414,9 @@
 # @memdev: memory backend object.  If specified for one node,
 #          it must be specified for all nodes.
 #
+# @initiator: the initiator numa nodeid that is closest (as in directly
+#             attached) to this numa node (since 4.2)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOptions',
@@ -421,7 +424,8 @@
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
    '*mem':    'size',
-   '*memdev': 'str' }}
+   '*memdev': 'str',
+   '*initiator': 'uint16' }}
 
 ##
 # @NumaDistOptions:
diff --git a/qemu-options.hx b/qemu-options.hx
index 9621e934c0..c480781992 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
     QEMU_ARCH_ALL)
 STEXI
-@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
-@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
+@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @findex -numa
@@ -215,6 +215,25 @@ split equally between them.
 @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
 if one node uses @samp{memdev}, all of them have to use it.
 
+@samp{initiator} indicate the initiator NUMA @var{initiator} that is
+closest (as in directly attached) to this NUMA @var{node}.
+
+For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
+node 1 has only memory, and its' initiator is node 0. Note that because
+node 0 has CPU, by default the initiator of node 0 is itself and must be
+itself.
+@example
+-M pc \
+-m 2G,slots=2,maxmem=4G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-smp 2,sockets=2,maxcpus=2  \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+@end example
+
 @var{source} and @var{destination} are NUMA node IDs.
 @var{distance} is the NUMA distance from @var{source} to @var{destination}.
 The distance from a node to itself is always 10. If any pair of nodes is
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (4 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	Jonathan Cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 hw/acpi/Kconfig       |   5 +++
 hw/acpi/Makefile.objs |   1 +
 hw/acpi/hmat.c        | 101 ++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/hmat.h        |  45 +++++++++++++++++++
 hw/i386/acpi-build.c  |   3 ++
 5 files changed, 155 insertions(+)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 7c59cf900b..039bb99efa 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
     select ACPI_NVDIMM
     select ACPI_CPU_HOTPLUG
     select ACPI_MEMORY_HOTPLUG
+    select ACPI_HMAT
 
 config ACPI_X86_ICH
     bool
@@ -31,3 +32,7 @@ config ACPI_VMGENID
     bool
     default y
     depends on PC
+
+config ACPI_HMAT
+    bool
+    depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 9bb2101e3b..c05019b059 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 0000000000..abf99b1adc
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,101 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-141
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
+                           int mem_node)
+{
+
+    /* Memory Proximity Domain Attributes Structure */
+    /* Type */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 40, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Proximity Domain for the Attached Initiator */
+    build_append_int_noprefix(table_data, initiator, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, mem_node, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /*
+     * Reserved:
+     * Previously defined as the Start Address of the System Physical
+     * Address Range. Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+    /*
+     * Reserved:
+     * Previously defined as the Range Length of the region in bytes.
+     * Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
+{
+    uint16_t flags;
+    int i;
+
+    for (i = 0; i < nstat->num_nodes; i++) {
+        flags = 0;
+
+        if (nstat->nodes[i].initiator_valid) {
+            flags |= HMAT_PROX_INIT_VALID;
+        }
+
+        build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
+    }
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
+{
+    uint64_t hmat_start;
+
+    hmat_start = table_data->len;
+
+    /* reserve space for HMAT header  */
+    acpi_data_push(table_data, 40);
+
+    hmat_build_table_structs(table_data, nstat);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + hmat_start),
+                 "HMAT", table_data->len - hmat_start, 2, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 0000000000..574cfba60a
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,45 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/aml-build.h"
+
+/*
+ * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
+ * Table 5-141, Field "flag", Bit [0]: set to 1 to indicate that data in
+ * the Proximity Domain for the Attached Initiator field is valid.
+ * Other bits reserved.
+ */
+#define HMAT_PROX_INIT_VALID 0x1
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
+
+#endif
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 081a8fc116..90ad0dff99 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -66,6 +66,7 @@
 #include "hw/i386/intel_iommu.h"
 
 #include "hw/acpi/ipmi.h"
+#include "hw/acpi/hmat.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2696,6 +2697,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
+        acpi_add_table(table_offsets, tables_blob);
+        build_hmat(tables_blob, tables->linker, machine->numa_state);
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (5 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	Jonathan Cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9.
---
 hw/acpi/hmat.c          | 95 ++++++++++++++++++++++++++++++++++++++++-
 hw/acpi/hmat.h          | 41 ++++++++++++++++++
 include/qemu/typedefs.h |  1 +
 include/sysemu/numa.h   |  3 ++
 include/sysemu/sysemu.h | 21 +++++++++
 5 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index abf99b1adc..431818dc82 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
     build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+                          uint32_t num_initiator, uint32_t num_target,
+                          uint32_t *initiator_pxm, int type)
+{
+    uint32_t s = num_initiator;
+    uint32_t t = num_target;
+    uint8_t m, n;
+    uint8_t mask = 0x0f;
+    int i;
+
+    /* Type */
+    build_append_int_noprefix(table_data, 1, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
+    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+    build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
+    /* Data Type */
+    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Number of Initiator Proximity Domains (s) */
+    build_append_int_noprefix(table_data, s, 4);
+    /* Number of Target Proximity Domains (t) */
+    build_append_int_noprefix(table_data, t, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+
+    /* Entry Base Unit */
+    if (HMAT_IS_LATENCY(type)) {
+        build_append_int_noprefix(table_data, hmat_lb->base_lat, 8);
+    } else {
+        build_append_int_noprefix(table_data, hmat_lb->base_bw, 8);
+    }
+
+    /* Initiator Proximity Domain List */
+    for (i = 0; i < s; i++) {
+        build_append_int_noprefix(table_data, initiator_pxm[i], 4);
+    }
+
+    /* Target Proximity Domain List */
+    for (i = 0; i < t; i++) {
+        build_append_int_noprefix(table_data, i, 4);
+    }
+
+    /* Latency or Bandwidth Entries */
+    for (i = 0; i < s; i++) {
+        m = initiator_pxm[i];
+        for (n = 0; n < t; n++) {
+            uint16_t entry;
+
+            if (HMAT_IS_LATENCY(type)) {
+                entry = hmat_lb->latency[m][n];
+            } else {
+                entry = hmat_lb->bandwidth[m][n];
+            }
+
+            build_append_int_noprefix(table_data, entry, 2);
+        }
+    }
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
     uint16_t flags;
-    int i;
+    uint32_t num_initiator = 0;
+    uint32_t initiator_pxm[MAX_NODES];
+    int i, hrchy, type;
+    HMAT_LB_Info *numa_hmat_lb;
 
     for (i = 0; i < nstat->num_nodes; i++) {
         flags = 0;
@@ -82,6 +152,29 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 
         build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
     }
+
+    for (i = 0; i < nstat->num_nodes; i++) {
+        if (nstat->nodes[i].has_cpu) {
+            initiator_pxm[num_initiator++] = i;
+        }
+    }
+
+    /*
+     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+     * Structure: Table 5-142
+     */
+    for (hrchy = HMAT_LB_MEM_MEMORY;
+         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
+        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
+
+            if (numa_hmat_lb) {
+                build_hmat_lb(table_data, numa_hmat_lb, num_initiator,
+                              nstat->num_nodes, initiator_pxm, type);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 574cfba60a..5f050781e6 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -40,6 +40,47 @@
  */
 #define HMAT_PROX_INIT_VALID 0x1
 
+#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
+
+struct HMAT_LB_Info {
+    /*
+     * Indicates total number of Proximity Domains
+     * that can initiate memory access requests.
+     */
+    uint32_t    num_initiator;
+    /*
+     * Indicates total number of Proximity Domains
+     * that can act as target.
+     */
+    uint32_t    num_target;
+    /*
+     * Indicates it's memory or
+     * the specified level memory side cache.
+     */
+    uint8_t     hierarchy;
+    /*
+     * Present the type of data,
+     * access/read/write latency or bandwidth.
+     */
+    uint8_t     data_type;
+    /* The base unit for latency in nanoseconds. */
+    uint64_t    base_lat;
+    /* The base unit for bandwidth in megabytes per second(MB/s). */
+    uint64_t    base_bw;
+    /*
+     * latency[i][j]:
+     * Indicates the latency based on base_lat
+     * from Initiator Proximity Domain i to Target Proximity Domain j.
+     */
+    uint16_t    latency[MAX_NODES][MAX_NODES];
+    /*
+     * bandwidth[i][j]:
+     * Indicates the bandwidth based on base_bw
+     * from Initiator Proximity Domain i to Target Proximity Domain j.
+     */
+    uint16_t    bandwidth[MAX_NODES][MAX_NODES];
+};
+
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
 
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index fcdaae58c4..c0257e936b 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -33,6 +33,7 @@ typedef struct FWCfgEntry FWCfgEntry;
 typedef struct FWCfgIoState FWCfgIoState;
 typedef struct FWCfgMemState FWCfgMemState;
 typedef struct FWCfgState FWCfgState;
+typedef struct HMAT_LB_Info HMAT_LB_Info;
 typedef struct HVFX86EmulatorState HVFX86EmulatorState;
 typedef struct I2CBus I2CBus;
 typedef struct I2SCodec I2SCodec;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 46ad06e000..85ddad99b4 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,6 +30,9 @@ struct NumaState {
 
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
+
+    /* NUMA modes HMAT Locality Latency and Bandwidth Information */
+    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
 };
 typedef struct NumaState NumaState;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 984c439ac9..fc638f06cd 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -126,6 +126,27 @@ extern int mem_prealloc;
 #define NUMA_DISTANCE_MAX         254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+    HMAT_LB_MEM_MEMORY           = 0,
+    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
+    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
+    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+    HMAT_LB_DATA_READ_LATENCY     = 1,
+    HMAT_LB_DATA_WRITE_LATENCY    = 2,
+    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+};
+
+#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
+#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
+
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
     const char *name;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (6 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
@ 2019-08-09  6:57 ` " Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	Jonathan Cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9
---
 hw/acpi/hmat.c          | 64 ++++++++++++++++++++++++++++++++++++++++-
 hw/acpi/hmat.h          | 17 +++++++++++
 include/qemu/typedefs.h |  1 +
 include/sysemu/numa.h   |  3 ++
 include/sysemu/sysemu.h |  2 ++
 5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 431818dc82..01a6552d51 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -134,14 +134,63 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
     }
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-143 */
+static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache)
+{
+    /*
+     * Cache Attributes: Bits [3:0] – Total Cache Levels
+     * for this Memory Proximity Domain
+     */
+    uint32_t cache_attr = hmat_cache->total_levels & 0xF;
+
+    /* Bits [7:4] : Cache Level described in this structure */
+    cache_attr |= (hmat_cache->level & 0xF) << 4;
+
+    /* Bits [11:8] - Cache Associativity */
+    cache_attr |= (hmat_cache->associativity & 0xF) << 8;
+
+    /* Bits [15:12] - Write Policy */
+    cache_attr |= (hmat_cache->write_policy & 0xF) << 12;
+
+    /* Bits [31:16] - Cache Line size in bytes */
+    cache_attr |= (hmat_cache->line_size & 0xFFFF) << 16;
+
+    cache_attr = cpu_to_le32(cache_attr);
+
+    /* Type */
+    build_append_int_noprefix(table_data, 2, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, hmat_cache->mem_proximity, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* Memory Side Cache Size */
+    build_append_int_noprefix(table_data, hmat_cache->size, 8);
+    /* Cache Attributes */
+    build_append_int_noprefix(table_data, cache_attr, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /*
+     * Number of SMBIOS handles (n)
+     * Linux kernel uses Memory Side Cache Information Structure
+     * without SMBIOS entries for now, so set Number of SMBIOS handles
+     * as 0.
+     */
+    build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
     uint16_t flags;
     uint32_t num_initiator = 0;
     uint32_t initiator_pxm[MAX_NODES];
-    int i, hrchy, type;
+    int i, hrchy, type, level;
     HMAT_LB_Info *numa_hmat_lb;
+    HMAT_Cache_Info *numa_hmat_cache;
 
     for (i = 0; i < nstat->num_nodes; i++) {
         flags = 0;
@@ -175,6 +224,19 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
             }
         }
     }
+
+    /*
+     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+     * Table 5-143
+     */
+    for (i = 0; i < nstat->num_nodes; i++) {
+        for (level = 0; level <= MAX_HMAT_CACHE_LEVEL; level++) {
+            numa_hmat_cache = nstat->hmat_cache[i][level];
+            if (numa_hmat_cache) {
+                build_hmat_cache(table_data, numa_hmat_cache);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 5f050781e6..6c32f12e78 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -81,6 +81,23 @@ struct HMAT_LB_Info {
     uint16_t    bandwidth[MAX_NODES][MAX_NODES];
 };
 
+struct HMAT_Cache_Info {
+    /* The memory proximity domain to which the memory belongs. */
+    uint32_t    mem_proximity;
+    /* Size of memory side cache in bytes. */
+    uint64_t    size;
+    /* Total cache levels for this memory proximity domain. */
+    uint8_t     total_levels;
+    /* Cache level described in this structure. */
+    uint8_t     level;
+    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
+    uint8_t     associativity;
+    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
+    uint8_t     write_policy;
+    /* Cache Line size in bytes. */
+    uint16_t    line_size;
+};
+
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
 
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index c0257e936b..d971f5109e 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -33,6 +33,7 @@ typedef struct FWCfgEntry FWCfgEntry;
 typedef struct FWCfgIoState FWCfgIoState;
 typedef struct FWCfgMemState FWCfgMemState;
 typedef struct FWCfgState FWCfgState;
+typedef struct HMAT_Cache_Info HMAT_Cache_Info;
 typedef struct HMAT_LB_Info HMAT_LB_Info;
 typedef struct HVFX86EmulatorState HVFX86EmulatorState;
 typedef struct I2CBus I2CBus;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 85ddad99b4..1ed3362917 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -33,6 +33,9 @@ struct NumaState {
 
     /* NUMA modes HMAT Locality Latency and Bandwidth Information */
     HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
+
+    /* Memory Side Cache Information Structure */
+    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
 };
 typedef struct NumaState NumaState;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index fc638f06cd..45525ff8ae 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -144,6 +144,8 @@ enum {
     HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
 };
 
+#define MAX_HMAT_CACHE_LEVEL        3
+
 #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
 #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (7 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-12  5:13   ` Daniel Black
  2019-08-13 15:11   ` Eric Blake
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v9:
    - change the CLI input way, make it more user firendly (Daniel Black)
    use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
    the base-lat and base-bw input.
---
 hw/acpi/hmat.h        |   3 +
 hw/core/numa.c        | 185 ++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |   2 +
 qapi/machine.json     |  95 +++++++++++++++++++++-
 qemu-options.hx       |  44 +++++++++-
 5 files changed, 326 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 6c32f12e78..b7c1e02cf0 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -42,6 +42,9 @@
 
 #define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
 
+#define PICO_PER_USEC 1000000
+#define PICO_PER_NSEC 1000
+
 struct HMAT_LB_Info {
     /*
      * Indicates total number of Proximity Domains
diff --git a/hw/core/numa.c b/hw/core/numa.c
index cfb6339810..9a494145f3 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -37,6 +37,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
+#include "hw/acpi/hmat.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
+    HMAT_LB_Info *hmat_lb = NULL;
+    const char *endptr;
+    int ret;
+    uint32_t latency, bandwidth;
+    uint64_t base_lat = 0, base_bw = 0;
+
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option.");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency.");
+            return;
+        }
+    }
+
+    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option.");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+    }
+
+    if (node->initiator >= nb_numa_nodes) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->initiator, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->initiator].has_cpu) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it isn't an initiator proximity domain.",
+                   node->initiator);
+        return;
+    }
+
+    if (node->target >= nb_numa_nodes) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->target, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->target].initiator_valid) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it hasn't a valid initiator proximity domain.",
+                   node->target);
+        return;
+    }
+
+    if (node->has_latency) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->latency[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the latency for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        ret = qemu_strtoui(node->latency, &endptr, 10, &latency);
+        if (ret < 0) {
+            error_setg(errp, "Invalid latency %s", node->latency);
+            return;
+        }
+
+        if (*endptr == '\0') {
+            base_lat = 1;
+        } else if (*(endptr + 1) == 's') {
+            switch (*endptr) {
+            case 'p':
+                base_lat = 1;
+                break;
+            case 'n':
+                base_lat = PICO_PER_NSEC;
+                break;
+            case 'u':
+                base_lat = PICO_PER_USEC;
+                break;
+            }
+        } else {
+            error_setg(errp, "Invalid latency unit %s,"
+                "vaild units are \"ps\" \"ns\" \"us\"", node->latency);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if (hmat_lb->base_lat == 0) {
+            hmat_lb->base_lat = base_lat;
+        } else if (hmat_lb->base_lat != base_lat) {
+            error_setg(errp, "Invalid latency unit %s,"
+                " please unify the units.", node->latency);
+            return;
+        }
+
+        if (latency >= UINT16_MAX) {
+            error_setg(errp, "Latency value %s overflow, max value"
+                " is %" PRIu16, node->latency, UINT16_MAX - 1);
+        }
+
+        hmat_lb->latency[node->initiator][node->target] = latency;
+    }
+
+    if (node->has_bandwidth) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the bandwidth for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        ret = qemu_strtoui(node->bandwidth, &endptr, 10, &bandwidth);
+        if (ret < 0) {
+            error_setg(errp, "Invalid bandwidth %s", node->bandwidth);
+            return;
+        }
+
+        switch (toupper(*endptr)) {
+        case '\0':
+        case 'M':
+            base_bw = 1;
+            break;
+        case 'G':
+            base_bw = UINT64_C(1) << 10;
+            break;
+        case 'P':
+            base_bw = UINT64_C(1) << 20;
+            break;
+        }
+
+        if (base_bw == 0) {
+            error_setg(errp, "Invalid bandwidth unit %s,"
+                " vaild units are \"M\" \"G\" \"P\"", node->bandwidth);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if (hmat_lb->base_bw == 0) {
+            hmat_lb->base_bw = base_bw;
+        } else if (hmat_lb->base_lat != base_lat) {
+            error_setg(errp, "Invalid bandwidth unit %s,"
+                " please unify the units.", node->bandwidth);
+            return;
+        }
+
+        if (bandwidth >= UINT16_MAX) {
+            error_setg(errp, "Bandwidth value %s overflow, max value"
+                " is %" PRIu16, node->bandwidth, UINT16_MAX - 1);
+        }
+
+        hmat_lb->bandwidth[node->initiator][node->target] = bandwidth;
+    }
+
+    if (hmat_lb) {
+        hmat_lb->hierarchy = node->hierarchy;
+        hmat_lb->data_type = node->data_type;
+    }
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -221,6 +400,12 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        parse_numa_hmat_lb(ms, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 1ed3362917..f0857b7ee6 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -41,6 +41,8 @@ typedef struct NumaState NumaState;
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 05e367d26a..f19c761e52 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -377,10 +377,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -395,7 +397,8 @@
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -504,6 +507,94 @@
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBMemoryHierarchy see
+# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level memory of memory side cached memory
+#
+# @second-level: second level memory of memory side cached memory
+#
+# @third-level: third level memory of memory side cached memory
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (picoseconds)
+#
+# @read-latency: read latency (picoseconds)
+#
+# @write-latency: write latency (picoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information of @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target proximity domain,
+#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
+#           "us(microsecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target proximity
+#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "PB(/s)".
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*latency': 'str',
+    '*bandwidth': 'str' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index c480781992..cda4607f3a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -164,16 +164,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -250,6 +253,45 @@ specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually including one or more processors.
+Target NUMA node contains addressable memory.
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
+is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
+represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{str} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(picoseconds)
+or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
+'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency(picoseconds)
+or 'access|read|write' hit bandwidth of the target memory side cache.
+
+@var{lat} of 'latency' is latency value, the possible value and units are
+NUM[ps|ns|us] (picosecond|nanosecond|microsecond), default unit is 'ps'. @var{bw}
+is bandwidth value, the possible value and units are NUM[M|G|P], mean that
+the bandwidth value are NUM MB/s, GB/s or PB/s. Note that max NUM is 65534,
+if NUM is 0, means the corresponding latency or bandwidth information is not provided.
+
+For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
+a ram, node 1 has only a ram. The processors in node 0 access memory in node
+0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+nanoseconds, access-bandwidth is 100 MB/s.
+@example
+-m 2G \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
+-smp 2 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (8 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v9.
---
 hw/core/numa.c        | 67 +++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |  2 ++
 qapi/machine.json     | 81 +++++++++++++++++++++++++++++++++++++++++--
 qemu-options.hx       | 14 +++++++-
 4 files changed, 161 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 9a494145f3..2caf2937aa 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -362,6 +362,67 @@ void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
     }
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    HMAT_Cache_Info *hmat_cache = NULL;
+
+    if (node->node_id >= nb_numa_nodes) {
+        error_setg(errp, "Invalid node-id=%" PRIu32
+                   ", it should be less than %d.",
+                   node->node_id, nb_numa_nodes);
+        return;
+    }
+
+    if (node->total > MAX_HMAT_CACHE_LEVEL) {
+        error_setg(errp, "Invalid total=%" PRIu8
+                   ", it should be less than or equal to %d.",
+                   node->total, MAX_HMAT_CACHE_LEVEL);
+        return;
+    }
+    if (node->level > node->total) {
+        error_setg(errp, "Invalid level=%" PRIu8
+                   ", it should be less than or equal to"
+                   " total=%" PRIu8 ".",
+                   node->level, node->total);
+        return;
+    }
+    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+        error_setg(errp, "Duplicate configuration of the side cache for "
+                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
+                   node->node_id, node->level);
+        return;
+    }
+
+    if ((node->level > 1) &&
+        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+        (node->size >=
+            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
+        error_setg(errp, "Invalid size=0x%" PRIx64
+                   ", the size of level=%" PRIu8
+                   " should be less than the size(0x%" PRIx64
+                   ") of level=%" PRIu8 ".",
+                   node->size, node->level,
+                   ms->numa_state->hmat_cache[node->node_id]
+                                             [node->level - 1]->size,
+                   node->level - 1);
+        return;
+    }
+
+    hmat_cache = g_malloc0(sizeof(*hmat_cache));
+
+    hmat_cache->mem_proximity = node->node_id;
+    hmat_cache->size = node->size;
+    hmat_cache->total_levels = node->total;
+    hmat_cache->level = node->level;
+    hmat_cache->associativity = node->assoc;
+    hmat_cache->write_policy = node->policy;
+    hmat_cache->line_size = node->line;
+
+    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -406,6 +467,12 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             goto end;
         }
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index f0857b7ee6..9009bbdee3 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -43,6 +43,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
                         Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index f19c761e52..e0bc862657 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -379,10 +379,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
 #
+# @hmat-cache: memory side cache information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -398,7 +400,8 @@
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
     'cpu': 'NumaCpuOptions',
-    'hmat-lb': 'NumaHmatLBOptions' }}
+    'hmat-lb': 'NumaHmatLBOptions',
+    'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -595,6 +598,80 @@
     '*latency': 'str',
     '*bandwidth': 'str' }}
 
+##
+# @HmatCacheAssociativity:
+#
+# Cache associativity in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheAssociativity see
+# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @direct: Direct Mapped
+#
+# @complex: Complex Cache Indexing (implementation specific)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheAssociativity',
+  'data': [ 'none', 'direct', 'complex' ] }
+
+##
+# @HmatCacheWritePolicy:
+#
+# Cache write policy in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheWritePolicy see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @write-back: Write Back (WB)
+#
+# @write-through: Write Through (WT)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheWritePolicy',
+  'data': [ 'none', 'write-back', 'write-through' ] }
+
+##
+# @NumaHmatCacheOptions:
+#
+# Set the memory side cache information for a given memory domain.
+#
+# For more information of @NumaHmatCacheOptions see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @node-id: the memory proximity domain to which the memory belongs.
+#
+# @size: the size of memory side cache in bytes.
+#
+# @total: the total cache levels for this memory proximity domain.
+#
+# @level: the cache level described in this structure.
+#
+# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
+#
+# @policy: the write policy, none/write-back/write-through.
+#
+# @line: the cache Line size in bytes.
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatCacheOptions',
+  'data': {
+   'node-id': 'uint32',
+   'size': 'size',
+   'total': 'uint8',
+   'level': 'uint8',
+   'assoc': 'HmatCacheAssociativity',
+   'policy': 'HmatCacheWritePolicy',
+   'line': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index cda4607f3a..600023d578 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -165,7 +165,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
-    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
+    "-numa hmat-cache,node-id=node,size=size,total=total,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
@@ -173,6 +174,7 @@ STEXI
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
+@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},total=@var{total},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
@@ -274,11 +276,19 @@ is bandwidth value, the possible value and units are NUM[M|G|P], mean that
 the bandwidth value are NUM MB/s, GB/s or PB/s. Note that max NUM is 65534,
 if NUM is 0, means the corresponding latency or bandwidth information is not provided.
 
+In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
+@var{size} is the size of memory side cache in bytes. @var{total} is the total cache levels.
+@var{level} is the cache level described in this structure. @var{assoc} is the cache associativity,
+the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
+@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
+
 For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
 a ram, node 1 has only a ram. The processors in node 0 access memory in node
 0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
 The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
 nanoseconds, access-bandwidth is 100 MB/s.
+And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
+cache, size is 0x20000 bytes, policy is write-back, the cache Line size is 8 bytes:
 @example
 -m 2G \
 -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
@@ -290,6 +300,8 @@ nanoseconds, access-bandwidth is 100 MB/s.
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
+-numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
+-numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
 @end example
 
 ETEXI
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (9 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
@ 2019-08-09  6:57 ` Tao
  2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
  2019-08-13  8:53 ` Tao Xu
  12 siblings, 0 replies; 34+ messages in thread
From: Tao @ 2019-08-09  6:57 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: Jingqi Liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

From: Tao Xu <tao3.xu@intel.com>

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
  tests/acpi-test-data/pc/*.acpihmat
  tests/acpi-test-data/pc/HMAT.*
  tests/acpi-test-data/q35/*.acpihmat
  tests/acpi-test-data/q35/HMAT.*

Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v9:
    - update the test case
---
 tests/bios-tables-test.c | 43 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index a356ac3489..294f097e52 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -871,6 +871,47 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
     test_acpi_tcg_dimm_pxm(MACHINE_PC);
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    data.variant = ".acpihmat";
+    test_acpi_one(" -smp 2,sockets=2"
+                  " -m 128M,slots=2,maxmem=1G"
+                  " -object memory-backend-ram,size=64M,id=m0"
+                  " -object memory-backend-ram,size=64M,id=m1"
+                  " -numa node,nodeid=0,memdev=m0"
+                  " -numa node,nodeid=1,memdev=m1,initiator=0"
+                  " -numa cpu,node-id=0,socket-id=0"
+                  " -numa cpu,node-id=0,socket-id=1"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=5ns"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=500M"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-latency,latency=10ns"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1"
+                  ",assoc=direct,policy=write-back,line=8"
+                  " -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1"
+                  ",assoc=direct,policy=write-back,line=8",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
     test_data data = {
@@ -915,6 +956,8 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
         qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
         qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+        qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+        qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
     } else if (strcmp(arch, "aarch64") == 0) {
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
     }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (10 preceding siblings ...)
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
@ 2019-08-09 11:11 ` no-reply
  2019-08-13  8:53 ` Tao Xu
  12 siblings, 0 replies; 34+ messages in thread
From: no-reply @ 2019-08-09 11:11 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel, daniel,
	jonathan.cameron, imammedo, dan.j.williams

Patchew URL: https://patchew.org/QEMU/20190809065731.9097-1-tao3.xu@intel.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==13440==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-coroutine" 
==13485==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13485==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd3f64a000; bottom 0x7f35643f8000; size: 0x00c7db252000 (858375135232)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
==13504==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 26 test-aio /aio-gsource/event/flush
PASS 27 test-aio /aio-gsource/event/wait/no-flush-cb
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ide-test" 
==13513==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-aio-multithread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==13522==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
==13519==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
PASS 2 test-aio-multithread /aio/multi/schedule
==13540==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-aio-multithread /aio/multi/mutex/contended
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==13556==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
==13562==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==13568==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt
==13579==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 7 ide-test /x86_64/ide/bmdma/long_prdt
==13590==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13590==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffec7f0e000; bottom 0x7fcc715fe000; size: 0x003256910000 (216200708096)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 8 ide-test /x86_64/ide/bmdma/no_busmaster
---
PASS 6 test-throttle /throttle/detach_attach
PASS 7 test-throttle /throttle/config_functions
PASS 8 test-throttle /throttle/accounting
==13599==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-throttle /throttle/groups
PASS 10 test-throttle /throttle/config/enabled
PASS 11 test-throttle /throttle/config/conflicting
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-thread-pool" 
==13608==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 9 ide-test /x86_64/ide/flush/nodev
==13676==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 ide-test /x86_64/ide/flush/empty_drive
==13681==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-thread-pool /thread-pool/cancel
PASS 11 ide-test /x86_64/ide/flush/retry_pci
==13687==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 ide-test /x86_64/ide/flush/retry_isa
PASS 6 test-thread-pool /thread-pool/cancel-async
==13693==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-hbitmap" 
PASS 1 test-hbitmap /hbitmap/granularity
PASS 2 test-hbitmap /hbitmap/size/0
---
PASS 5 test-hbitmap /hbitmap/iter/partial
PASS 6 test-hbitmap /hbitmap/iter/granularity
PASS 7 test-hbitmap /hbitmap/iter/iter_and_reset
==13704==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-hbitmap /hbitmap/get/all
PASS 9 test-hbitmap /hbitmap/get/some
PASS 10 test-hbitmap /hbitmap/set/all
---
PASS 28 test-hbitmap /hbitmap/truncate/shrink/medium
PASS 29 test-hbitmap /hbitmap/truncate/shrink/large
PASS 30 test-hbitmap /hbitmap/meta/zero
==13710==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 ide-test /x86_64/ide/cdrom/dma
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ahci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ahci-test" 
==13724==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 ahci-test /x86_64/ahci/sanity
==13730==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ahci-test /x86_64/ahci/pci_spec
==13736==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ahci-test /x86_64/ahci/pci_enable
==13742==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 31 test-hbitmap /hbitmap/meta/one
PASS 32 test-hbitmap /hbitmap/meta/byte
PASS 33 test-hbitmap /hbitmap/meta/word
PASS 4 ahci-test /x86_64/ahci/hba_spec
==13748==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 34 test-hbitmap /hbitmap/meta/sector
PASS 35 test-hbitmap /hbitmap/serialize/align
PASS 5 ahci-test /x86_64/ahci/hba_enable
==13754==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 ahci-test /x86_64/ahci/identify
PASS 36 test-hbitmap /hbitmap/serialize/basic
PASS 37 test-hbitmap /hbitmap/serialize/part
---
PASS 42 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_1
PASS 43 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_4
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-drain -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-drain" 
==13760==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13763==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-drain /bdrv-drain/nested
PASS 2 test-bdrv-drain /bdrv-drain/multiparent
PASS 3 test-bdrv-drain /bdrv-drain/set_aio_context
---
PASS 38 test-bdrv-drain /bdrv-drain/detach/parent_cb
PASS 39 test-bdrv-drain /bdrv-drain/detach/driver_cb
PASS 40 test-bdrv-drain /bdrv-drain/attach/drain
==13793==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-graph-mod -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-graph-mod" 
==13813==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-graph-mod /bdrv-graph-mod/update-perm-tree
PASS 2 test-bdrv-graph-mod /bdrv-graph-mod/should-update-child
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob" 
==13818==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob /blockjob/ids
PASS 2 test-blockjob /blockjob/cancel/created
PASS 3 test-blockjob /blockjob/cancel/running
---
PASS 7 test-blockjob /blockjob/cancel/pending
PASS 8 test-blockjob /blockjob/cancel/concluded
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob-txn -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob-txn" 
==13823==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob-txn /single/success
PASS 2 test-blockjob-txn /single/failure
PASS 3 test-blockjob-txn /single/cancel
---
PASS 7 test-blockjob-txn /pair/fail-cancel-race
PASS 8 ahci-test /x86_64/ahci/reset
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-backend -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-backend" 
==13830==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-backend /block-backend/drain_aio_error
PASS 2 test-block-backend /block-backend/drain_all_aio_error
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-iothread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-iothread" 
==13828==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13835==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-iothread /sync-op/pread
PASS 2 test-block-iothread /sync-op/pwrite
PASS 3 test-block-iothread /sync-op/load_vmstate
---
PASS 14 test-block-iothread /propagate/basic
PASS 15 test-block-iothread /propagate/diamond
PASS 16 test-block-iothread /propagate/mirror
==13828==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff5445a000; bottom 0x7fd6e1dfe000; size: 0x00287265c000 (173717962752)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-image-locking -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-image-locking" 
==13861==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-image-locking /image-locking/basic
PASS 2 test-image-locking /image-locking/set-perm-abort
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-x86-cpuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid" 
PASS 9 ahci-test /x86_64/ahci/io/pio/lba28/simple/zero
PASS 1 test-x86-cpuid /cpuid/topology/basic
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-xbzrle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-xbzrle" 
==13868==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-xbzrle /xbzrle/uleb
PASS 2 test-xbzrle /xbzrle/encode_decode_zero
PASS 3 test-xbzrle /xbzrle/encode_decode_unchanged
PASS 4 test-xbzrle /xbzrle/encode_decode_1_byte
PASS 5 test-xbzrle /xbzrle/encode_decode_overflow
==13868==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe4edd5000; bottom 0x7f1c215fe000; size: 0x00e22d7d7000 (971425804288)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 6 test-xbzrle /xbzrle/encode_decode
---
PASS 16 test-vmstate /vmstate/qtailq/save/saveq
PASS 17 test-vmstate /vmstate/qtailq/load/loadq
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-cutils -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-cutils" 
==13879==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-cutils /cutils/parse_uint/null
PASS 2 test-cutils /cutils/parse_uint/empty
PASS 3 test-cutils /cutils/parse_uint/whitespace
---
PASS 133 test-cutils /cutils/strtosz/erange
PASS 134 test-cutils /cutils/strtosz/metric
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-shift128 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-shift128" 
==13879==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd72179000; bottom 0x7f16af7fe000; size: 0x00e6c297b000 (991107198976)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-shift128 /host-utils/test_lshift
---
PASS 9 test-int128 /int128/int128_gt
PASS 10 test-int128 /int128/int128_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/rcutorture -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="rcutorture" 
==13906==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13906==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd5dfb0000; bottom 0x7f4068bfe000; size: 0x00bcf53b2000 (811568144384)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 rcutorture /rcu/torture/1reader
PASS 12 ahci-test /x86_64/ahci/io/pio/lba28/double/zero
==13939==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13939==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc3cfeb000; bottom 0x7fadbcbfe000; size: 0x004e803ed000 (337159049216)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 rcutorture /rcu/torture/10readers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-list -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-list" 
PASS 13 ahci-test /x86_64/ahci/io/pio/lba28/double/low
==13952==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13952==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff75158000; bottom 0x7fbef65fe000; size: 0x00407eb5a000 (277003739136)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-rcu-list /rcu/qlist/single-threaded
PASS 14 ahci-test /x86_64/ahci/io/pio/lba28/double/high
==13964==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-rcu-list /rcu/qlist/short-few
==13964==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fffce48e000; bottom 0x7fc354924000; size: 0x003c79b6a000 (259740049408)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 15 ahci-test /x86_64/ahci/io/pio/lba28/long/zero
==13991==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==13991==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffeecff6000; bottom 0x7f82a7ffe000; size: 0x007c44ff8000 (533733539840)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 3 test-rcu-list /rcu/qlist/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-simpleq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-simpleq" 
PASS 16 ahci-test /x86_64/ahci/io/pio/lba28/long/low
==14004==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14004==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffefd34a000; bottom 0x7f0a3857c000; size: 0x00f4c4dce000 (1051274829824)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-rcu-simpleq /rcu/qsimpleq/single-threaded
PASS 17 ahci-test /x86_64/ahci/io/pio/lba28/long/high
==14016==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-rcu-simpleq /rcu/qsimpleq/short-few
PASS 18 ahci-test /x86_64/ahci/io/pio/lba28/short/zero
==14043==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 19 ahci-test /x86_64/ahci/io/pio/lba28/short/low
==14049==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-rcu-simpleq /rcu/qsimpleq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-tailq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-tailq" 
PASS 20 ahci-test /x86_64/ahci/io/pio/lba28/short/high
==14062==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14062==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc70bdd000; bottom 0x7f3ff7bfe000; size: 0x00bc78fdf000 (809483759616)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-rcu-tailq /rcu/qtailq/single-threaded
PASS 21 ahci-test /x86_64/ahci/io/pio/lba48/simple/zero
==14074==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14074==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd77fbe000; bottom 0x7fbbdfffe000; size: 0x004197fc0000 (281722748928)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-tailq /rcu/qtailq/short-few
PASS 22 ahci-test /x86_64/ahci/io/pio/lba48/simple/low
==14101==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14101==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffff9430000; bottom 0x7fa565dfe000; size: 0x005a93632000 (389019803648)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 23 ahci-test /x86_64/ahci/io/pio/lba48/simple/high
PASS 3 test-rcu-tailq /rcu/qtailq/long-many
==14107==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qdist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdist" 
==14107==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe0b205000; bottom 0x7f8d229fe000; size: 0x0070e8807000 (484937068544)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-qdist /qdist/none
---
PASS 8 test-qdist /qdist/binning/shrink
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht" 
PASS 24 ahci-test /x86_64/ahci/io/pio/lba48/double/zero
==14122==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14122==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdafe87000; bottom 0x7fe5ae1fe000; size: 0x001801c89000 (103109136384)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 25 ahci-test /x86_64/ahci/io/pio/lba48/double/low
==14128==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14128==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4b04f000; bottom 0x7f42945fe000; size: 0x00b9b6a51000 (797633220608)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 26 ahci-test /x86_64/ahci/io/pio/lba48/double/high
==14134==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14134==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff7b03e000; bottom 0x7fccb2324000; size: 0x0032c8d1a000 (218117545984)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 27 ahci-test /x86_64/ahci/io/pio/lba48/long/zero
==14140==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14140==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcaaf87000; bottom 0x7f7ea3dfe000; size: 0x007e07189000 (541284929536)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 28 ahci-test /x86_64/ahci/io/pio/lba48/long/low
==14146==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14146==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff504b5000; bottom 0x7f56d77fe000; size: 0x00a878cb7000 (723581104128)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 29 ahci-test /x86_64/ahci/io/pio/lba48/long/high
==14152==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 30 ahci-test /x86_64/ahci/io/pio/lba48/short/zero
==14158==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 31 ahci-test /x86_64/ahci/io/pio/lba48/short/low
==14164==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 32 ahci-test /x86_64/ahci/io/pio/lba48/short/high
==14170==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 33 ahci-test /x86_64/ahci/io/dma/lba28/fragmented
==14176==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 34 ahci-test /x86_64/ahci/io/dma/lba28/retry
==14182==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qht /qht/mode/default
PASS 35 ahci-test /x86_64/ahci/io/dma/lba28/simple/zero
PASS 2 test-qht /qht/mode/resize
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht-par -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht-par" 
==14188==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 36 ahci-test /x86_64/ahci/io/dma/lba28/simple/low
PASS 1 test-qht-par /qht/parallel/2threads-0%updates-1s
==14204==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 37 ahci-test /x86_64/ahci/io/dma/lba28/simple/high
PASS 2 test-qht-par /qht/parallel/2threads-20%updates-1s
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bitops -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitops" 
---
PASS 5 test-bitops /bitops/half_unshuffle32
PASS 6 test-bitops /bitops/half_unshuffle64
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bitcnt -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitcnt" 
==14217==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bitcnt /bitcnt/ctpop8
PASS 2 test-bitcnt /bitcnt/ctpop16
PASS 3 test-bitcnt /bitcnt/ctpop32
---
PASS 1 check-qom-interface /qom/interface/direct_impl
PASS 2 check-qom-interface /qom/interface/intermediate_impl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/check-qom-proplist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qom-proplist" 
==14248==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 check-qom-proplist /qom/proplist/createlist
PASS 2 check-qom-proplist /qom/proplist/createv
PASS 3 check-qom-proplist /qom/proplist/createcmdline
---
PASS 4 test-write-threshold /write-threshold/not-trigger
PASS 5 test-write-threshold /write-threshold/trigger
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-hash -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-hash" 
==14266==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-hash /crypto/hash/iov
PASS 2 test-crypto-hash /crypto/hash/alloc
PASS 3 test-crypto-hash /crypto/hash/prealloc
---
PASS 15 test-crypto-secret /crypto/secret/crypt/missingiv
PASS 16 test-crypto-secret /crypto/secret/crypt/badiv
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlscredsx509 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlscredsx509" 
==14297==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectserver
PASS 2 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectclient
PASS 41 ahci-test /x86_64/ahci/io/dma/lba28/long/zero
PASS 3 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca1
==14307==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 42 ahci-test /x86_64/ahci/io/dma/lba28/long/low
PASS 4 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca2
PASS 5 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca3
---
PASS 7 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca2
PASS 8 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca3
PASS 9 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver1
==14313==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver2
PASS 43 ahci-test /x86_64/ahci/io/dma/lba28/long/high
==14319==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver3
PASS 44 ahci-test /x86_64/ahci/io/dma/lba28/short/zero
==14325==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 45 ahci-test /x86_64/ahci/io/dma/lba28/short/low
PASS 12 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver4
==14331==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver5
PASS 46 ahci-test /x86_64/ahci/io/dma/lba28/short/high
PASS 14 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver6
---
PASS 32 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive1
PASS 33 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive2
PASS 34 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive3
==14337==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 35 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain1
PASS 36 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain2
PASS 37 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingca
---
PASS 39 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingclient
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlssession -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlssession" 
PASS 47 ahci-test /x86_64/ahci/io/dma/lba48/simple/zero
==14348==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-tlssession /qcrypto/tlssession/psk
PASS 48 ahci-test /x86_64/ahci/io/dma/lba48/simple/low
==14354==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-crypto-tlssession /qcrypto/tlssession/basicca
PASS 49 ahci-test /x86_64/ahci/io/dma/lba48/simple/high
PASS 3 test-crypto-tlssession /qcrypto/tlssession/differentca
==14360==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 50 ahci-test /x86_64/ahci/io/dma/lba48/double/zero
PASS 4 test-crypto-tlssession /qcrypto/tlssession/altname1
==14366==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 51 ahci-test /x86_64/ahci/io/dma/lba48/double/low
PASS 5 test-crypto-tlssession /qcrypto/tlssession/altname2
PASS 6 test-crypto-tlssession /qcrypto/tlssession/altname3
==14372==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 test-crypto-tlssession /qcrypto/tlssession/altname4
PASS 52 ahci-test /x86_64/ahci/io/dma/lba48/double/high
==14378==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 53 ahci-test /x86_64/ahci/io/dma/lba48/long/zero
PASS 8 test-crypto-tlssession /qcrypto/tlssession/altname5
PASS 9 test-crypto-tlssession /qcrypto/tlssession/altname6
PASS 10 test-crypto-tlssession /qcrypto/tlssession/wildcard1
==14384==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlssession /qcrypto/tlssession/wildcard2
PASS 12 test-crypto-tlssession /qcrypto/tlssession/wildcard3
PASS 54 ahci-test /x86_64/ahci/io/dma/lba48/long/low
==14390==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 test-crypto-tlssession /qcrypto/tlssession/wildcard4
PASS 14 test-crypto-tlssession /qcrypto/tlssession/wildcard5
PASS 55 ahci-test /x86_64/ahci/io/dma/lba48/long/high
PASS 15 test-crypto-tlssession /qcrypto/tlssession/wildcard6
PASS 16 test-crypto-tlssession /qcrypto/tlssession/cachain
==14396==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qga -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qga" 
PASS 56 ahci-test /x86_64/ahci/io/dma/lba48/short/zero
PASS 1 test-qga /qga/sync-delimited
---
PASS 7 test-qga /qga/get-fsinfo
PASS 8 test-qga /qga/get-memory-block-info
PASS 9 test-qga /qga/get-memory-blocks
==14409==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-qga /qga/file-ops
PASS 11 test-qga /qga/file-write-read
PASS 12 test-qga /qga/get-time
---
PASS 57 ahci-test /x86_64/ahci/io/dma/lba48/short/low
PASS 18 test-qga /qga/blacklist
PASS 19 test-qga /qga/config
==14416==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 20 test-qga /qga/guest-exec
PASS 21 test-qga /qga/guest-exec-invalid
PASS 58 ahci-test /x86_64/ahci/io/dma/lba48/short/high
==14429==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 22 test-qga /qga/guest-get-osinfo
PASS 23 test-qga /qga/guest-get-host-name
PASS 24 test-qga /qga/guest-get-timezone
---
PASS 5 test-authz-list /auth/list/explicit/deny
PASS 6 test-authz-list /auth/list/explicit/allow
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-authz-listfile -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-authz-listfile" 
==14453==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-authz-listfile /auth/list/complex
PASS 2 test-authz-listfile /auth/list/default/deny
PASS 3 test-authz-listfile /auth/list/default/allow
---
PASS 4 test-io-channel-file /io/channel/pipe/sync
PASS 5 test-io-channel-file /io/channel/pipe/async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-tls -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-tls" 
==14524==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-channel-tls /qio/channel/tls/basic
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-command -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-command" 
PASS 61 ahci-test /x86_64/ahci/flush/simple
---
PASS 4 test-io-channel-command /io/channel/command/echo/async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-buffer -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-buffer" 
PASS 1 test-io-channel-buffer /io/channel/buf
==14549==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-base64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-base64" 
PASS 1 test-base64 /util/base64/good
PASS 2 test-base64 /util/base64/embedded-nul
---
PASS 17 test-crypto-xts /crypto/xts/t-21-key-32-ptx-31/basic
PASS 18 test-crypto-xts /crypto/xts/t-21-key-32-ptx-31/unaligned
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-block -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-block" 
==14576==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-block /crypto/block/qcow
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-logging -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-logging" 
PASS 1 test-logging /logging/parse_range
PASS 2 test-logging /logging/parse_path
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-replication -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-replication" 
==14602==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14600==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-replication /replication/primary/read
PASS 2 test-replication /replication/primary/write
PASS 3 test-replication /replication/primary/start
---
PASS 6 test-replication /replication/primary/get_error_all
PASS 63 ahci-test /x86_64/ahci/flush/migrate
PASS 7 test-replication /replication/secondary/read
==14614==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14619==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-replication /replication/secondary/write
PASS 64 ahci-test /x86_64/ahci/migrate/sanity
==14628==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14633==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14602==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe1fdde000; bottom 0x7fbea23fc000; size: 0x003f7d9e2000 (272690454528)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 9 test-replication /replication/secondary/start
PASS 65 ahci-test /x86_64/ahci/migrate/dma/simple
==14662==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14667==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-replication /replication/secondary/stop
PASS 66 ahci-test /x86_64/ahci/migrate/dma/halted
==14676==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-replication /replication/secondary/do_checkpoint
==14682==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 test-replication /replication/secondary/get_error_all
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bufferiszero -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bufferiszero" 
PASS 67 ahci-test /x86_64/ahci/migrate/ncq/simple
==14695==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14700==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 68 ahci-test /x86_64/ahci/migrate/ncq/halted
==14709==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 69 ahci-test /x86_64/ahci/cdrom/eject
==14714==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 70 ahci-test /x86_64/ahci/cdrom/dma/single
==14720==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 71 ahci-test /x86_64/ahci/cdrom/dma/multi
==14726==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 72 ahci-test /x86_64/ahci/cdrom/pio/single
==14732==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==14732==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc8ec89000; bottom 0x7f701198a000; size: 0x008c7d2ff000 (603395715072)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 73 ahci-test /x86_64/ahci/cdrom/pio/multi
==14738==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 74 ahci-test /x86_64/ahci/cdrom/pio/bcl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/hd-geo-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="hd-geo-test" 
PASS 1 hd-geo-test /x86_64/hd-geo/ide/none
==14752==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 hd-geo-test /x86_64/hd-geo/ide/drive/cd_0
==14758==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/blank
==14764==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/lba
==14770==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/chs
==14776==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 hd-geo-test /x86_64/hd-geo/ide/device/mbr/blank
==14782==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 hd-geo-test /x86_64/hd-geo/ide/device/mbr/lba
==14788==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 hd-geo-test /x86_64/hd-geo/ide/device/mbr/chs
==14794==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 hd-geo-test /x86_64/hd-geo/ide/device/user/chs
==14799==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 hd-geo-test /x86_64/hd-geo/ide/device/user/chst
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" 
PASS 1 test-bufferiszero /cutils/bufferiszero
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==14884==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP'
Using expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==14890==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/q35/FACP'
Using expected file 'tests/data/acpi/q35/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==14896==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.bridge'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==14902==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.ipmikcs'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==14908==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!

Looking for expected file 'tests/data/acpi/pc/FACP.cphp'
Looking for expected file 'tests/data/acpi/pc/FACP'
---
Looking for expected file 'tests/data/acpi/pc/HMAT.cphp'
Looking for expected file 'tests/data/acpi/pc/HMAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:327:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [/tmp/qemu-test/src/tests/Makefile.include:899: check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):


The full log is available at
http://patchew.org/logs/20190809065731.9097-1-tao3.xu@intel.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
@ 2019-08-12  5:13   ` Daniel Black
  2019-08-12  6:11     ` Tao Xu
  2019-08-13 15:11   ` Eric Blake
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Black @ 2019-08-12  5:13 UTC (permalink / raw)
  To: Tao
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron,
	imammedo, dan.j.williams



Tao Xu, Liu Jingqi,

Thanks for doing these updates.

On Fri,  9 Aug 2019 14:57:29 +0800
Tao <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
>  hw/acpi/hmat.h        |   3 +
>  hw/core/numa.c        | 185
> ++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h |
> 2 + qapi/machine.json     |  95 +++++++++++++++++++++-
>  qemu-options.hx       |  44 +++++++++-
>  5 files changed, 326 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> index 6c32f12e78..b7c1e02cf0 100644
> --- a/hw/acpi/hmat.h
> +++ b/hw/acpi/hmat.h
> @@ -42,6 +42,9 @@
>  
>  #define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
>  
> +#define PICO_PER_USEC 1000000
> +#define PICO_PER_NSEC 1000
> +
>  struct HMAT_LB_Info {
>      /*
>       * Indicates total number of Proximity Domains
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index cfb6339810..9a494145f3 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -37,6 +37,7 @@
>  #include "qemu/option.h"
>  #include "qemu/config-file.h"
>  #include "qemu/cutils.h"
> +#include "hw/acpi/hmat.h"
>  
>  QemuOptsList qemu_numa_opts = {
>      .name = "numa",
> @@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms,
> NumaDistOptions *dist, Error **errp)
> ms->numa_state->have_numa_distance = true; }
>  
> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
> +                        Error **errp)
> +{
..

Optional; you could support not connected (0xffff) for latency/bandwidth in
this parsing.

> +        if (*endptr == '\0') {
> +            base_lat = 1;
> +        } else if (*(endptr + 1) == 's') {
> +            switch (*endptr) {
> +            case 'p':
> +                base_lat = 1;
> +                break;
> +            case 'n':
> +                base_lat = PICO_PER_NSEC;
> +                break;
> +            case 'u':

Glad you picked up my mismatch of "u/micro".

> +        } else {
> +            error_setg(errp, "Invalid latency unit %s,"
> +                "vaild units are \"ps\" \"ns\" \"us\"",
>node->latency);

typo "valid"

> +        } else if (hmat_lb->base_lat != base_lat) {
> +            error_setg(errp, "Invalid latency unit %s,"
> +                " please unify the units.", node->latency);

This error is misleading. Should be something like "all latencies must be
specified in the same units"

> +        switch (toupper(*endptr)) {
> +        case '\0':
> +        case 'M':
> +            base_bw = 1;
> +            break;
> +        case 'G':
> +            base_bw = UINT64_C(1) << 10;
> +            break;

There was one more gap - Terra.

        case 'T':
           base_bw = UINT64_C(1) << 20;
           break;

> +        case 'P':
> +            base_bw = UINT64_C(1) << 20;
and:
               base_bw = UINT64_C(1) << 30;

> +            break;
> +        }


Currently Linux 5.3.0-rc3+ doesn't cope with real corrected "bandwidth=2P" so
maybe not worth it.

[    2.092060] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
Domains:1 Target Domains:2 Base:1073741824 [    2.092326]   Initiator-Target[0-0]:-2147483648 MB/s

On values, testing for overflow is required. e.g:

 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=4096T
bandwidth=4096T

[    2.047676] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1048576
[    2.048084]   Initiator-Target[0-0]:0 MB/s

Technically ACPI could support up to 4P with base/offset but you'd need to be a
lot trickier (i.e. base is highest common multiple of all entries and then see
if entry/base > 2^32-2 ) with base/entry values to arrive at this number.

+docs/commit message propagation of this.


> +        } else if (hmat_lb->base_lat != base_lat) {

Bug: Incorrectly copied - base_lat should be base_bw (twice)

> +            error_setg(errp, "Invalid bandwidth unit %s,"
> +                " please unify the units.", node->bandwidth);

This error is misleading. Should be something like "all bandwidths must be
specified in the same units"

> diff --git a/qemu-options.hx b/qemu-options.hx
> index c480781992..cda4607f3a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx

> +@example
> +-m 2G \
> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
> +-smp 2 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
> +@end example

nit: remove slash on last line

Is this a valid example? I get

qemu-system-x86_64: -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=11us: Invalid target=1, it hasn't a valid initiator proximity domain.

(I tested with host-nodes=1 changed to 0 as local machine is single node)

Technically on [PATCH v9 07/11]
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index abf99b1adc..431818dc82 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
     build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142

nit: 5-146

Test as follows:

qemu-system-x86_64   -kernel /home/dan/repos/linux/vmlinux   -nographic -append  console=ttyS0  \
   -m 2G -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
   -numa node,nodeid=0,memdev=ram-node0 \
   -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node1 \
   -numa node,nodeid=1,memdev=ram-node1 -smp 2 -numa cpu,node-id=0,socket-id=0 \
   -numa cpu,node-id=0,socket-id=1  \
   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=123us \
   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
   -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
   -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
| grep -A 5 HMAT


[    0.038912] ACPI: HMAT 0x000000007FFE16C5 000118 (v02 BOCHS  BXPCHMAT 00000001 BXPC 00000001)
[    0.040954] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.040999] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.041189] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.041250] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
[    0.041276] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
--
[    1.984572] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
[    1.984792] HMAT: Memory Flags:0000 Processor Domain:0 Memory Domain:1
[    1.985435] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 Target Domains:2 Base:1000000
[    1.986424]   Initiator-Target[0-0]:123000 nsec
[    1.986664]   Initiator-Target[0-1]:0 nsec
[    1.986910] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1
[    1.987229]   Initiator-Target[0-0]:200 MB/s
[    1.987356]   Initiator-Target[0-1]:0 MB/s
[    1.987549] HMAT: Cache: Domain:0 Size:131072 Attrs:00081111 SMBIOS Handles:0
[    1.988393] HMAT: Cache: Domain:1 Size:131072 Attrs:00081111 SMBIOS Handles:0

Leaving default latency/bw as 0 if ok as spec says '0: the corresponding latency
or bandwidth information is not provided.'. Potentially the kernel could display this better.

Also note https://marc.info/?l=linux-acpi&m=156506549410279&w=2 submitted as
hmat_build_table_structs only calls build_hmat_mpda with flags=0 or HMAT_PROX_INIT_VALID (0x1) which is right looking at ACPI-6.3. An Ack/(Nack if I'm wrong) there would be good to have both kernel and this patch series working together.

for entire series:

Reviewed-by: Daniel Black <daniel@linux.ibm.com>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
  2019-08-12  5:13   ` Daniel Black
@ 2019-08-12  6:11     ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-08-12  6:11 UTC (permalink / raw)
  To: Daniel Black
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron,
	imammedo, dan.j.williams

On 8/12/2019 1:13 PM, Daniel Black wrote:
> 
> 
> Tao Xu, Liu Jingqi,
> 
> Thanks for doing these updates.
> 
> On Fri,  9 Aug 2019 14:57:29 +0800
> Tao <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>   hw/acpi/hmat.h        |   3 +
>>   hw/core/numa.c        | 185
>> ++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h |
>> 2 + qapi/machine.json     |  95 +++++++++++++++++++++-
>>   qemu-options.hx       |  44 +++++++++-
>>   5 files changed, 326 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
>> index 6c32f12e78..b7c1e02cf0 100644
>> --- a/hw/acpi/hmat.h
>> +++ b/hw/acpi/hmat.h
>> @@ -42,6 +42,9 @@
>>   
>>   #define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
>>   
>> +#define PICO_PER_USEC 1000000
>> +#define PICO_PER_NSEC 1000
>> +
>>   struct HMAT_LB_Info {
>>       /*
>>        * Indicates total number of Proximity Domains
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index cfb6339810..9a494145f3 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -37,6 +37,7 @@
>>   #include "qemu/option.h"
>>   #include "qemu/config-file.h"
>>   #include "qemu/cutils.h"
>> +#include "hw/acpi/hmat.h"
>>   
>>   QemuOptsList qemu_numa_opts = {
>>       .name = "numa",
>> @@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms,
>> NumaDistOptions *dist, Error **errp)
>> ms->numa_state->have_numa_distance = true; }
>>   
>> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
>> +                        Error **errp)
>> +{
> ..
> 
> Optional; you could support not connected (0xffff) for latency/bandwidth in
> this parsing.
> 
>> +        if (*endptr == '\0') {
>> +            base_lat = 1;
>> +        } else if (*(endptr + 1) == 's') {
>> +            switch (*endptr) {
>> +            case 'p':
>> +                base_lat = 1;
>> +                break;
>> +            case 'n':
>> +                base_lat = PICO_PER_NSEC;
>> +                break;
>> +            case 'u':
> 
> Glad you picked up my mismatch of "u/micro".
> 
>> +        } else {
>> +            error_setg(errp, "Invalid latency unit %s,"
>> +                "vaild units are \"ps\" \"ns\" \"us\"",
>> node->latency);
> 
> typo "valid"
> 
>> +        } else if (hmat_lb->base_lat != base_lat) {
>> +            error_setg(errp, "Invalid latency unit %s,"
>> +                " please unify the units.", node->latency);
> 
> This error is misleading. Should be something like "all latencies must be
> specified in the same units"
> 
>> +        switch (toupper(*endptr)) {
>> +        case '\0':
>> +        case 'M':
>> +            base_bw = 1;
>> +            break;
>> +        case 'G':
>> +            base_bw = UINT64_C(1) << 10;
>> +            break;
> 
> There was one more gap - Terra.
> 
>          case 'T':
>             base_bw = UINT64_C(1) << 20;
>             break;
> 
Oh, my mistake, I should use TB/s instead of PB/s.
Thank you for pointing out this.

>> +        case 'P':
>> +            base_bw = UINT64_C(1) << 20;
> and:
>                 base_bw = UINT64_C(1) << 30;
> 
>> +            break;
>> +        }
> 
> 
> Currently Linux 5.3.0-rc3+ doesn't cope with real corrected "bandwidth=2P" so
> maybe not worth it. >
> [    2.092060] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
> Domains:1 Target Domains:2 Base:1073741824 [    2.092326]   Initiator-Target[0-0]:-2147483648 MB/s
> 
> On values, testing for overflow is required. e.g:
> 
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=4096T
> bandwidth=4096T
> 
> [    2.047676] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1048576
> [    2.048084]   Initiator-Target[0-0]:0 MB/s
> 
> Technically ACPI could support up to 4P with base/offset but you'd need to be a
> lot trickier (i.e. base is highest common multiple of all entries and then see
> if entry/base > 2^32-2 ) with base/entry values to arrive at this number.
> 
> +docs/commit message propagation of this.
>

I agree. I also test the overflow case. Thank you for your suggestion. I 
will add a docs for it.
> 
>> +        } else if (hmat_lb->base_lat != base_lat) {
> 
> Bug: Incorrectly copied - base_lat should be base_bw (twice)
> 

My mistake, I will correct it.
>> +            error_setg(errp, "Invalid bandwidth unit %s,"
>> +                " please unify the units.", node->bandwidth);
> 
> This error is misleading. Should be something like "all bandwidths must be
> specified in the same units"
> 
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index c480781992..cda4607f3a 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
> 
>> +@example
>> +-m 2G \
>> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
>> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
>> +-smp 2 \
>> +-numa cpu,node-id=0,socket-id=0 \
>> +-numa cpu,node-id=0,socket-id=1 \
>> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
>> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
>> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
>> +@end example
> 
> nit: remove slash on last line
> 
> Is this a valid example? I get
> 
> qemu-system-x86_64: -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=11us: Invalid target=1, it hasn't a valid initiator proximity domain.
> 
> (I tested with host-nodes=1 changed to 0 as local machine is single node)
> 

I forget to update the example. It should add as follow:

   -numa node,nodeid=0,memdev=ram-node0 \
   -numa node,nodeid=1,memdev=ram-node1,initiator=1 \

> Technically on [PATCH v9 07/11]
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index abf99b1adc..431818dc82 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>       build_append_int_noprefix(table_data, 0, 8);
>   }
>   
> +/*
> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> + * Structure: Table 5-142
> 
> nit: 5-146
> 
> Test as follows:
> 
> qemu-system-x86_64   -kernel /home/dan/repos/linux/vmlinux   -nographic -append  console=ttyS0  \
>     -m 2G -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
>     -numa node,nodeid=0,memdev=ram-node0 \
>     -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node1 \
>     -numa node,nodeid=1,memdev=ram-node1 -smp 2 -numa cpu,node-id=0,socket-id=0 \
>     -numa cpu,node-id=0,socket-id=1  \
>     -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=123us \
>     -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>     -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
>     -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
> | grep -A 5 HMAT
> 
> 
> [    0.038912] ACPI: HMAT 0x000000007FFE16C5 000118 (v02 BOCHS  BXPCHMAT 00000001 BXPC 00000001)
> [    0.040954] SRAT: PXM 0 -> APIC 0x00 -> Node 0
> [    0.040999] SRAT: PXM 0 -> APIC 0x01 -> Node 0
> [    0.041189] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> [    0.041250] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
> [    0.041276] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
> --
> [    1.984572] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
> [    1.984792] HMAT: Memory Flags:0000 Processor Domain:0 Memory Domain:1
> [    1.985435] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 Target Domains:2 Base:1000000
> [    1.986424]   Initiator-Target[0-0]:123000 nsec
> [    1.986664]   Initiator-Target[0-1]:0 nsec
> [    1.986910] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1
> [    1.987229]   Initiator-Target[0-0]:200 MB/s
> [    1.987356]   Initiator-Target[0-1]:0 MB/s
> [    1.987549] HMAT: Cache: Domain:0 Size:131072 Attrs:00081111 SMBIOS Handles:0
> [    1.988393] HMAT: Cache: Domain:1 Size:131072 Attrs:00081111 SMBIOS Handles:0
> 
> Leaving default latency/bw as 0 if ok as spec says '0: the corresponding latency
> or bandwidth information is not provided.'. Potentially the kernel could display this better.

Yes, we set default as 0, because spec said. But the kernel code has no 
warning just show 0. But in qemu-options.hx we note this:

if NUM is 0, means the corresponding latency or bandwidth information is 
not provided.

> 
> Also note https://marc.info/?l=linux-acpi&m=156506549410279&w=2 submitted as
> hmat_build_table_structs only calls build_hmat_mpda with flags=0 or HMAT_PROX_INIT_VALID (0x1) which is right looking at ACPI-6.3. An Ack/(Nack if I'm wrong) there would be good to have both kernel and this patch series working together.

Sounds good!
> 
> for entire series:
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
                   ` (11 preceding siblings ...)
  2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
@ 2019-08-13  8:53 ` Tao Xu
  2019-08-14 20:57   ` Eduardo Habkost
  12 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-08-13  8:53 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, fan.du, qemu-devel, daniel, jonathan.cameron, dan.j.williams

Hi Igor and Eduardo,

I am wondering if there are more comments about patch 1/11~4/11? Because 
these 4 patch are independent and the patch series are big and pushing 
for a long time. Could the patch 1/11~4/11 be ready for queuing firstly?

If there is anything else need me to do about this series, please tell me.

Tao

On 8/9/2019 2:57 PM, Tao wrote:
> This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
> according to the command line. The ACPI HMAT describes the memory attributes,
> such as memory side cache attributes and bandwidth and latency details,
> related to the Memory Proximity Domain.
> The software is expected to use HMAT information as hint for optimization.
> 
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
> 
> The V8 patches link:
> https://patchwork.kernel.org/cover/11066983/
> 
> Changelog:
> v9:
>      - change the CLI input way, make it more user firendly (Daniel Black)
>      use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
>      the base-lat and base-bw input.
> v8:
>      - rebase to upstream
>      - Add check if numa->numa_state is NULL in pxb_dev_realize_common
>      - Use nb_nodes in spapr_populate_memory() (RESEND to fix) (Igor)
> v7:
>      - Defer 11-13 of patch v6, because the driver of _HMA hasn't been
>        implemented in kernel driver
>      - Drop the HMAT_LB_MEM_CACHE_LAST_LEVEL which is not used in
>        ACPI 6.3 (Jonathan)
>      - Add bit mask in flags of hmat-lb (Jonathan)
>      - Add a marco to indicate the type is latency or bandwidth (Jonathan)
> 
> Liu Jingqi (5):
>    hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
>    hmat acpi: Build System Locality Latency and Bandwidth Information
>      Structure(s)
>    hmat acpi: Build Memory Side Cache Information Structure(s)
>    numa: Extend the CLI to provide memory latency and bandwidth
>      information
>    numa: Extend the CLI to provide memory side cache information
> 
> Tao Xu (6):
>    hw/arm: simplify arm_load_dtb
>    numa: move numa global variable nb_numa_nodes into MachineState
>    numa: move numa global variable have_numa_distance into MachineState
>    numa: move numa global variable numa_info into MachineState
>    numa: Extend CLI to provide initiator information for numa nodes
>    tests/bios-tables-test: add test cases for ACPI HMAT
> 
>   exec.c                              |   5 +-
>   hw/acpi/Kconfig                     |   5 +
>   hw/acpi/Makefile.objs               |   1 +
>   hw/acpi/aml-build.c                 |   9 +-
>   hw/acpi/hmat.c                      | 256 +++++++++++++++++++++
>   hw/acpi/hmat.h                      | 106 +++++++++
>   hw/arm/aspeed.c                     |   5 +-
>   hw/arm/boot.c                       |  20 +-
>   hw/arm/collie.c                     |   8 +-
>   hw/arm/cubieboard.c                 |   5 +-
>   hw/arm/exynos4_boards.c             |   7 +-
>   hw/arm/highbank.c                   |   8 +-
>   hw/arm/imx25_pdk.c                  |   5 +-
>   hw/arm/integratorcp.c               |   8 +-
>   hw/arm/kzm.c                        |   5 +-
>   hw/arm/mainstone.c                  |   5 +-
>   hw/arm/mcimx6ul-evk.c               |   5 +-
>   hw/arm/mcimx7d-sabre.c              |   5 +-
>   hw/arm/musicpal.c                   |   8 +-
>   hw/arm/nseries.c                    |   5 +-
>   hw/arm/omap_sx1.c                   |   5 +-
>   hw/arm/palm.c                       |  10 +-
>   hw/arm/raspi.c                      |   6 +-
>   hw/arm/realview.c                   |   5 +-
>   hw/arm/sabrelite.c                  |   5 +-
>   hw/arm/sbsa-ref.c                   |  12 +-
>   hw/arm/spitz.c                      |   5 +-
>   hw/arm/tosa.c                       |   8 +-
>   hw/arm/versatilepb.c                |   5 +-
>   hw/arm/vexpress.c                   |   5 +-
>   hw/arm/virt-acpi-build.c            |  19 +-
>   hw/arm/virt.c                       |  17 +-
>   hw/arm/xilinx_zynq.c                |   8 +-
>   hw/arm/xlnx-versal-virt.c           |   7 +-
>   hw/arm/xlnx-zcu102.c                |   5 +-
>   hw/arm/z2.c                         |   8 +-
>   hw/core/machine-hmp-cmds.c          |  12 +-
>   hw/core/machine.c                   |  38 ++-
>   hw/core/numa.c                      | 345 +++++++++++++++++++++++++---
>   hw/i386/acpi-build.c                |   7 +-
>   hw/i386/pc.c                        |  13 +-
>   hw/mem/pc-dimm.c                    |   2 +
>   hw/pci-bridge/pci_expander_bridge.c |   8 +-
>   hw/ppc/spapr.c                      |  29 +--
>   hw/ppc/spapr_pci.c                  |   4 +-
>   include/hw/acpi/aml-build.h         |   2 +-
>   include/hw/arm/boot.h               |   4 +-
>   include/hw/boards.h                 |   1 +
>   include/qemu/typedefs.h             |   2 +
>   include/sysemu/numa.h               |  30 ++-
>   include/sysemu/sysemu.h             |  23 ++
>   qapi/machine.json                   | 178 +++++++++++++-
>   qemu-options.hx                     |  83 ++++++-
>   tests/bios-tables-test.c            |  43 ++++
>   54 files changed, 1189 insertions(+), 246 deletions(-)
>   create mode 100644 hw/acpi/hmat.c
>   create mode 100644 hw/acpi/hmat.h
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
@ 2019-08-13 15:00   ` Igor Mammedov
  2019-08-14  2:24     ` Tao Xu
  2019-08-14  2:39     ` Dan Williams
  0 siblings, 2 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-08-13 15:00 UTC (permalink / raw)
  To: Tao
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

On Fri,  9 Aug 2019 14:57:25 +0800
Tao <tao3.xu@intel.com> wrote:

> From: Tao Xu <tao3.xu@intel.com>
> 
> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> The initiator represents processor which access to memory. And in 5.2.27.3
> Memory Proximity Domain Attributes Structure, the attached initiator is
> defined as where the memory controller responsible for a memory proximity
> domain. With attached initiator information, the topology of heterogeneous
> memory can be described.
> 
> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
> 
> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v9
> ---
>  hw/core/machine.c     | 24 ++++++++++++++++++++++++
>  hw/core/numa.c        | 13 +++++++++++++
>  include/sysemu/numa.h |  3 +++
>  qapi/machine.json     |  6 +++++-
>  qemu-options.hx       | 27 +++++++++++++++++++++++----
>  5 files changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 3c55470103..113184a9df 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>                                 const CpuInstanceProperties *props, Error **errp)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>      bool match = false;
>      int i;
>  
> @@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
>          match = true;
>          slot->props.node_id = props->node_id;
>          slot->props.has_node_id = props->has_node_id;
> +
> +        if (numa_info[props->node_id].initiator_valid &&
> +            (props->node_id != numa_info[props->node_id].initiator)) {
> +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> +                       " should be itself.", props->node_id);
> +            return;
> +        }
> +        numa_info[props->node_id].initiator_valid = true;
> +        numa_info[props->node_id].has_cpu = true;
> +        numa_info[props->node_id].initiator = props->node_id;
>      }
>  
>      if (!match) {
> @@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>      GString *s = g_string_new(NULL);
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>  
>      assert(machine->numa_state->num_nodes);
>      for (i = 0; i < possible_cpus->len; i++) {
> @@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>              machine_set_cpu_numa_node(machine, &props, &error_fatal);
>          }
>      }
> +
> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> +        if (numa_info[i].initiator_valid &&
> +            !numa_info[numa_info[i].initiator].has_cpu) {
                          ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow

> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> +                         " does not exist.", numa_info[i].initiator, i);
> +            error_printf("\n");
> +
> +            exit(1);
> +        }
it takes care only about nodes that have cpus or memory-only ones that have
initiator explicitly provided on CLI. And leaves possibility to have
memory-only nodes without initiator mixed with nodes that have initiator.
Is it valid to have mixed configuration?
Should we forbid it?

> +    }
> +
>      if (s->len && !qtest_enabled()) {
>          warn_report("CPU(s) not present in any NUMA nodes: %s",
>                      s->str);
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 8fcbba05d6..cfb6339810 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>      }
> +
> +    if (node->has_initiator) {
> +        if (numa_info[nodenr].initiator_valid &&
> +            (node->initiator != numa_info[nodenr].initiator)) {
> +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
> +                       "set to node %" PRIu16, nodenr,
> +                       numa_info[nodenr].initiator);
> +            return;
> +        }
> +
> +        numa_info[nodenr].initiator_valid = true;
> +        numa_info[nodenr].initiator = node->initiator;
                                             ^^^
not validated  user input? (which could lead to read beyond numa_info[] boundaries
in previous hunk).

> +    }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>      ms->numa_state->num_nodes++;
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 76da3016db..46ad06e000 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -10,6 +10,9 @@ struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
> +    bool has_cpu;
> +    bool initiator_valid;
> +    uint16_t initiator;
>      uint8_t distance[MAX_NODES];
>  };
>  
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6db8a7e2ec..05e367d26a 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -414,6 +414,9 @@
>  # @memdev: memory backend object.  If specified for one node,
>  #          it must be specified for all nodes.
>  #
> +# @initiator: the initiator numa nodeid that is closest (as in directly
> +#             attached) to this numa node (since 4.2)
well, it's pretty unclear what doc comment means (unless reader knows well
specific part of ACPI spec)

suggest to rephrase to something more understandable for unaware
readers (+ possible reference to spec for those who is interested
in spec definition since this doc is meant for developers).

> +#
>  # Since: 2.1
>  ##
>  { 'struct': 'NumaNodeOptions',
> @@ -421,7 +424,8 @@
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
>     '*mem':    'size',
> -   '*memdev': 'str' }}
> +   '*memdev': 'str',
> +   '*initiator': 'uint16' }}
>  
>  ##
>  # @NumaDistOptions:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9621e934c0..c480781992 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>  ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>      QEMU_ARCH_ALL)
>  STEXI
> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @findex -numa
> @@ -215,6 +215,25 @@ split equally between them.
>  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>  if one node uses @samp{memdev}, all of them have to use it.
>  
> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
                                  ^^^^^^^       ^^^^^^^^^^^^^^
above will result in "initiator NUMA initiator", was it your intention?

> +closest (as in directly attached) to this NUMA @var{node}.
Again suggest replace spec language with something more user friendly
(this time without spec reference as it's geared for end user) 

> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
Following example creates a machine with 2 NUMA ...

> +node 1 has only memory, and its' initiator is node 0. Note that because
> +node 0 has CPU, by default the initiator of node 0 is itself and must be
> +itself.
> +@example
> +-M pc \
> +-m 2G,slots=2,maxmem=4G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-smp 2,sockets=2,maxcpus=2  \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +@end example
> +
>  @var{source} and @var{destination} are NUMA node IDs.
>  @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>  The distance from a node to itself is always 10. If any pair of nodes is



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
  2019-08-12  5:13   ` Daniel Black
@ 2019-08-13 15:11   ` Eric Blake
  2019-08-14  2:58     ` Tao Xu
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Blake @ 2019-08-13 15:11 UTC (permalink / raw)
  To: Tao, imammedo, ehabkost
  Cc: jingqi.liu, fan.du, qemu-devel, daniel, Markus Armbruster,
	jonathan.cameron, dan.j.williams

[-- Attachment #1.1: Type: text/plain, Size: 7132 bytes --]

On 8/9/19 1:57 AM, Tao wrote:
> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v9:
>     - change the CLI input way, make it more user firendly (Daniel Black)
>     use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
>     the base-lat and base-bw input.

Why are you hand-rolling yet another scaling parser instead of reusing
one that's already in-tree?

> +++ b/hw/core/numa.c

> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
> +                        Error **errp)
> +{

> +    if (node->has_latency) {
> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
> +
> +        if (!hmat_lb) {
> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
> +        } else if (hmat_lb->latency[node->initiator][node->target]) {
> +            error_setg(errp, "Duplicate configuration of the latency for "
> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
> +                       node->initiator, node->target);
> +            return;
> +        }
> +
> +        ret = qemu_strtoui(node->latency, &endptr, 10, &latency);
> +        if (ret < 0) {
> +            error_setg(errp, "Invalid latency %s", node->latency);
> +            return;
> +        }
> +
> +        if (*endptr == '\0') {
> +            base_lat = 1;
> +        } else if (*(endptr + 1) == 's') {
> +            switch (*endptr) {
> +            case 'p':
> +                base_lat = 1;
> +                break;
> +            case 'n':
> +                base_lat = PICO_PER_NSEC;
> +                break;
> +            case 'u':
> +                base_lat = PICO_PER_USEC;
> +                break;

Hmm - this is a different scaling than any of our existing parsers
(which assume multiples k/M/G..., not subdivisions u/n/s)


> +    if (node->has_bandwidth) {
> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
> +
> +        if (!hmat_lb) {
> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
> +        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
> +            error_setg(errp, "Duplicate configuration of the bandwidth for "
> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
> +                       node->initiator, node->target);
> +            return;
> +        }
> +
> +        ret = qemu_strtoui(node->bandwidth, &endptr, 10, &bandwidth);
> +        if (ret < 0) {
> +            error_setg(errp, "Invalid bandwidth %s", node->bandwidth);
> +            return;
> +        }
> +
> +        switch (toupper(*endptr)) {
> +        case '\0':
> +        case 'M':
> +            base_bw = 1;
> +            break;
> +        case 'G':
> +            base_bw = UINT64_C(1) << 10;
> +            break;
> +        case 'P':
> +            base_bw = UINT64_C(1) << 20;
> +            break;

But this one, in addition to being wrong (P is 1<<30, not 1<<20), should
definitely be reusing qemu_strtosz_metric() or similar (look in
util/cutils.c).


> +++ b/qapi/machine.json
> @@ -377,10 +377,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  

> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
> +#
> +# @access-latency: access latency (picoseconds)
> +#
> +# @read-latency: read latency (picoseconds)
> +#
> +# @write-latency: write latency (picoseconds)
> +#
> +# @access-bandwidth: access bandwidth (MB/s)
> +#
> +# @read-bandwidth: read bandwidth (MB/s)
> +#
> +# @write-bandwidth: write bandwidth (MB/s)

Are these really the best scales?


> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information of @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +#             of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +#             latency or hit latency.
> +#
> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
> +#           "us(microsecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "PB(/s)".
> +#
> +# Since: 4.2
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +    'data': {
> +    'initiator': 'uint16',
> +    'target': 'uint16',
> +    'hierarchy': 'HmatLBMemoryHierarchy',
> +    'data-type': 'HmatLBDataType',
> +    '*latency': 'str',
> +    '*bandwidth': 'str' }}

...and then parsing strings instead of taking raw integers?  Parsing
strings is okay for HMP, but for QMP, our goal should be a single
representation with no additional sugar on top.  Latency and bandwidth
should be int in a single scale.


> +++ b/qemu-options.hx
> @@ -164,16 +164,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",

Command-line parsing can then take human-written scaled numbers, and
pre-convert them into the single scale accepted by the QMP interface.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
@ 2019-08-13 21:55   ` Alistair Francis
  2019-08-14  1:19     ` Andrew Jeffery
  2019-08-13 21:55   ` Eduardo Habkost
  1 sibling, 1 reply; 34+ messages in thread
From: Alistair Francis @ 2019-08-13 21:55 UTC (permalink / raw)
  To: Tao
  Cc: Eduardo Habkost, jingqi.liu, fan.du,
	qemu-devel@nongnu.org Developers, daniel, Jonathan Cameron,
	Igor Mammedov, dan.j.williams

On Fri, Aug 9, 2019 at 12:01 AM Tao <tao3.xu@intel.com> wrote:
>
> From: Tao Xu <tao3.xu@intel.com>
>
> In struct arm_boot_info, kernel_filename, initrd_filename and
> kernel_cmdline are copied from from MachineState. This patch add
> MachineState as a parameter into arm_load_dtb() and move the copy chunk
> of kernel_filename, initrd_filename and kernel_cmdline into
> arm_load_kernel().
>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>
> No changes in v9
> ---
>  hw/arm/aspeed.c           |  5 +----
>  hw/arm/boot.c             | 14 ++++++++------
>  hw/arm/collie.c           |  8 +-------
>  hw/arm/cubieboard.c       |  5 +----
>  hw/arm/exynos4_boards.c   |  7 ++-----
>  hw/arm/highbank.c         |  8 +-------
>  hw/arm/imx25_pdk.c        |  5 +----
>  hw/arm/integratorcp.c     |  8 +-------
>  hw/arm/kzm.c              |  5 +----
>  hw/arm/mainstone.c        |  5 +----
>  hw/arm/mcimx6ul-evk.c     |  5 +----
>  hw/arm/mcimx7d-sabre.c    |  5 +----
>  hw/arm/musicpal.c         |  8 +-------
>  hw/arm/nseries.c          |  5 +----
>  hw/arm/omap_sx1.c         |  5 +----
>  hw/arm/palm.c             | 10 ++--------
>  hw/arm/raspi.c            |  6 +-----
>  hw/arm/realview.c         |  5 +----
>  hw/arm/sabrelite.c        |  5 +----
>  hw/arm/sbsa-ref.c         |  3 +--
>  hw/arm/spitz.c            |  5 +----
>  hw/arm/tosa.c             |  8 +-------
>  hw/arm/versatilepb.c      |  5 +----
>  hw/arm/vexpress.c         |  5 +----
>  hw/arm/virt.c             |  8 +++-----
>  hw/arm/xilinx_zynq.c      |  8 +-------
>  hw/arm/xlnx-versal-virt.c |  7 ++-----
>  hw/arm/xlnx-zcu102.c      |  5 +----
>  hw/arm/z2.c               |  8 +-------
>  include/hw/arm/boot.h     |  4 ++--
>  30 files changed, 43 insertions(+), 147 deletions(-)
>
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 843b708247..f8733b86b9 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -241,9 +241,6 @@ static void aspeed_board_init(MachineState *machine,
>          write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
>      }
>
> -    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
> -    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
> -    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      aspeed_board_binfo.ram_size = ram_size;
>      aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM];
>      aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus;
> @@ -252,7 +249,7 @@ static void aspeed_board_init(MachineState *machine,
>          cfg->i2c_init(bmc);
>      }
>
> -    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
>  }
>
>  static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index c2b89b3bb9..ba604f8277 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -524,7 +524,7 @@ static void fdt_add_psci_node(void *fdt)
>  }
>
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as)
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
>  {
>      void *fdt = NULL;
>      int size, rc, n = 0;
> @@ -627,9 +627,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          qemu_fdt_add_subnode(fdt, "/chosen");
>      }
>
> -    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
> +    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
>          rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
> -                                     binfo->kernel_cmdline);
> +                                     ms->kernel_cmdline);
>          if (rc < 0) {
>              fprintf(stderr, "couldn't set /chosen/bootargs\n");
>              goto fail;
> @@ -1261,7 +1261,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
>       */
>  }
>
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
>  {
>      CPUState *cs;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1282,7 +1282,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>       * doesn't support secure.
>       */
>      assert(!(info->secure_board_setup && kvm_enabled()));
> -
> +    info->kernel_filename = ms->kernel_filename;
> +    info->kernel_cmdline = ms->kernel_cmdline;
> +    info->initrd_filename = ms->initrd_filename;
>      info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
>      info->dtb_limit = 0;
>
> @@ -1294,7 +1296,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>      }
>
>      if (!info->skip_dtb_autoload && have_dtb(info)) {
> -        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>              exit(1);
>          }
>      }
> diff --git a/hw/arm/collie.c b/hw/arm/collie.c
> index 3db3c56004..72bc8f26e5 100644
> --- a/hw/arm/collie.c
> +++ b/hw/arm/collie.c
> @@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
>
>  static void collie_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      StrongARMState *s;
>      DriveInfo *dinfo;
>      MemoryRegion *sysmem = get_system_memory();
> @@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
>
>      sysbus_create_simple("scoop", 0x40800000, NULL);
>
> -    collie_binfo.kernel_filename = kernel_filename;
> -    collie_binfo.kernel_cmdline = kernel_cmdline;
> -    collie_binfo.initrd_filename = initrd_filename;
>      collie_binfo.board_id = 0x208;
> -    arm_load_kernel(s->cpu, &collie_binfo);
> +    arm_load_kernel(s->cpu, machine, &collie_binfo);
>  }
>
>  static void collie_machine_init(MachineClass *mc)
> diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
> index f7c8a5985a..d992fa087a 100644
> --- a/hw/arm/cubieboard.c
> +++ b/hw/arm/cubieboard.c
> @@ -72,10 +72,7 @@ static void cubieboard_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>
>      cubieboard_binfo.ram_size = machine->ram_size;
> -    cubieboard_binfo.kernel_filename = machine->kernel_filename;
> -    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    cubieboard_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
> +    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
>  }
>
>  static void cubieboard_machine_init(MachineClass *mc)
> diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
> index ac0b0dc2a9..da402d5216 100644
> --- a/hw/arm/exynos4_boards.c
> +++ b/hw/arm/exynos4_boards.c
> @@ -120,9 +120,6 @@ exynos4_boards_init_common(MachineState *machine,
>      exynos4_board_binfo.board_id = exynos4_board_id[board_type];
>      exynos4_board_binfo.smp_bootreg_addr =
>              exynos4_board_smp_bootreg_addr[board_type];
> -    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
> -    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
> -    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      exynos4_board_binfo.gic_cpu_if_addr =
>              EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
>
> @@ -141,7 +138,7 @@ static void nuri_init(MachineState *machine)
>  {
>      exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
>
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>
>  static void smdkc210_init(MachineState *machine)
> @@ -151,7 +148,7 @@ static void smdkc210_init(MachineState *machine)
>
>      lan9215_init(SMDK_LAN9118_BASE_ADDR,
>              qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>
>  static void nuri_class_init(ObjectClass *oc, void *data)
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index def0f1ce6a..1a35b6d82f 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -234,9 +234,6 @@ enum cxmachines {
>  static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      DeviceState *dev = NULL;
>      SysBusDevice *busdev;
>      qemu_irq pic[128];
> @@ -388,9 +385,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>
>      highbank_binfo.ram_size = ram_size;
> -    highbank_binfo.kernel_filename = kernel_filename;
> -    highbank_binfo.kernel_cmdline = kernel_cmdline;
> -    highbank_binfo.initrd_filename = initrd_filename;
>      /* highbank requires a dtb in order to boot, and the dtb will override
>       * the board ID. The following value is ignored, so set it to -1 to be
>       * clear that the value is meaningless.
> @@ -410,7 +404,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>                      "may not boot.");
>      }
>
> -    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
>  }
>
>  static void highbank_init(MachineState *machine)
> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
> index 5d673e47bc..c76fc2bd94 100644
> --- a/hw/arm/imx25_pdk.c
> +++ b/hw/arm/imx25_pdk.c
> @@ -116,9 +116,6 @@ static void imx25_pdk_init(MachineState *machine)
>      }
>
>      imx25_pdk_binfo.ram_size = machine->ram_size;
> -    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
> -    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
>      imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
>      imx25_pdk_binfo.board_id = 1771,
>      imx25_pdk_binfo.nb_cpus = 1;
> @@ -129,7 +126,7 @@ static void imx25_pdk_init(MachineState *machine)
>       * fail.
>       */
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
>      }
>  }
>
> diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
> index 200c0107f0..4d9e9c9e49 100644
> --- a/hw/arm/integratorcp.c
> +++ b/hw/arm/integratorcp.c
> @@ -578,9 +578,6 @@ static struct arm_boot_info integrator_binfo = {
>  static void integratorcp_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      Object *cpuobj;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
> @@ -650,10 +647,7 @@ static void integratorcp_init(MachineState *machine)
>      sysbus_create_simple("pl110", 0xc0000000, pic[22]);
>
>      integrator_binfo.ram_size = ram_size;
> -    integrator_binfo.kernel_filename = kernel_filename;
> -    integrator_binfo.kernel_cmdline = kernel_cmdline;
> -    integrator_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &integrator_binfo);
> +    arm_load_kernel(cpu, machine, &integrator_binfo);
>  }
>
>  static void integratorcp_machine_init(MachineClass *mc)
> diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
> index 59d2102dc5..5ff419a555 100644
> --- a/hw/arm/kzm.c
> +++ b/hw/arm/kzm.c
> @@ -126,13 +126,10 @@ static void kzm_init(MachineState *machine)
>      }
>
>      kzm_binfo.ram_size = machine->ram_size;
> -    kzm_binfo.kernel_filename = machine->kernel_filename;
> -    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    kzm_binfo.initrd_filename = machine->initrd_filename;
>      kzm_binfo.nb_cpus = 1;
>
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
>      }
>  }
>
> diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
> index cd1f904c6c..c76cfb5dd1 100644
> --- a/hw/arm/mainstone.c
> +++ b/hw/arm/mainstone.c
> @@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
>      smc91c111_init(&nd_table[0], MST_ETH_PHYS,
>                      qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
>
> -    mainstone_binfo.kernel_filename = machine->kernel_filename;
> -    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    mainstone_binfo.initrd_filename = machine->initrd_filename;
>      mainstone_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &mainstone_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
>  }
>
>  static void mainstone_init(MachineState *machine)
> diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
> index 1f6f4aed97..e8a9b03069 100644
> --- a/hw/arm/mcimx6ul-evk.c
> +++ b/hw/arm/mcimx6ul-evk.c
> @@ -39,9 +39,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
>          .loader_start = FSL_IMX6UL_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = machine->smp.cpus,
>      };
>
> @@ -71,7 +68,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
>      }
>
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &boot_info);
> +        arm_load_kernel(&s->soc.cpu, machine, &boot_info);
>      }
>  }
>
> diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
> index 72eab03a0c..3123d8767f 100644
> --- a/hw/arm/mcimx7d-sabre.c
> +++ b/hw/arm/mcimx7d-sabre.c
> @@ -42,9 +42,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
>          .loader_start = FSL_IMX7_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = machine->smp.cpus,
>      };
>
> @@ -74,7 +71,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
>      }
>
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &boot_info);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
>      }
>  }
>
> diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
> index 95d56f3208..a53ee12737 100644
> --- a/hw/arm/musicpal.c
> +++ b/hw/arm/musicpal.c
> @@ -1568,9 +1568,6 @@ static struct arm_boot_info musicpal_binfo = {
>
>  static void musicpal_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      qemu_irq pic[32];
>      DeviceState *dev;
> @@ -1699,10 +1696,7 @@ static void musicpal_init(MachineState *machine)
>      sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
>
>      musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
> -    musicpal_binfo.kernel_filename = kernel_filename;
> -    musicpal_binfo.kernel_cmdline = kernel_cmdline;
> -    musicpal_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &musicpal_binfo);
> +    arm_load_kernel(cpu, machine, &musicpal_binfo);
>  }
>
>  static void musicpal_machine_init(MachineClass *mc)
> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
> index 4a79f5c88b..31dd2f1b51 100644
> --- a/hw/arm/nseries.c
> +++ b/hw/arm/nseries.c
> @@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
>
>      if (machine->kernel_filename) {
>          /* Or at the linux loader.  */
> -        binfo->kernel_filename = machine->kernel_filename;
> -        binfo->kernel_cmdline = machine->kernel_cmdline;
> -        binfo->initrd_filename = machine->initrd_filename;
> -        arm_load_kernel(s->mpu->cpu, binfo);
> +        arm_load_kernel(s->mpu->cpu, machine, binfo);
>
>          qemu_register_reset(n8x0_boot_init, s);
>      }
> diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
> index cae78d0a36..3cc2817f06 100644
> --- a/hw/arm/omap_sx1.c
> +++ b/hw/arm/omap_sx1.c
> @@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
>      }
>
>      /* Load the kernel.  */
> -    sx1_binfo.kernel_filename = machine->kernel_filename;
> -    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sx1_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(mpu->cpu, &sx1_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
>
>      /* TODO: fix next line */
>      //~ qemu_console_resize(ds, 640, 480);
> diff --git a/hw/arm/palm.c b/hw/arm/palm.c
> index 9eb9612bce..67ab30b5bc 100644
> --- a/hw/arm/palm.c
> +++ b/hw/arm/palm.c
> @@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
>
>  static void palmte_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      struct omap_mpu_state_s *mpu;
>      int flash_size = 0x00800000;
> @@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
>          }
>      }
>
> -    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
> +    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
>          fprintf(stderr, "Kernel or ROM image must be specified\n");
>          exit(1);
>      }
>
>      /* Load the kernel.  */
> -    palmte_binfo.kernel_filename = kernel_filename;
> -    palmte_binfo.kernel_cmdline = kernel_cmdline;
> -    palmte_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(mpu->cpu, &palmte_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
>  }
>
>  static void palmte_machine_init(MachineClass *mc)
> diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
> index 5b2620acb4..74c062d05e 100644
> --- a/hw/arm/raspi.c
> +++ b/hw/arm/raspi.c
> @@ -157,13 +157,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
>
>          binfo.entry = firmware_addr;
>          binfo.firmware_loaded = true;
> -    } else {
> -        binfo.kernel_filename = machine->kernel_filename;
> -        binfo.kernel_cmdline = machine->kernel_cmdline;
> -        binfo.initrd_filename = machine->initrd_filename;
>      }
>
> -    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
>  }
>
>  static void raspi_init(MachineState *machine, int version)
> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
> index 7c56c8d2ed..5a3e65ddd6 100644
> --- a/hw/arm/realview.c
> +++ b/hw/arm/realview.c
> @@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
>      memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
>
>      realview_binfo.ram_size = ram_size;
> -    realview_binfo.kernel_filename = machine->kernel_filename;
> -    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    realview_binfo.initrd_filename = machine->initrd_filename;
>      realview_binfo.nb_cpus = smp_cpus;
>      realview_binfo.board_id = realview_board_id[board_type];
>      realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
> -    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
>  }
>
>  static void realview_eb_init(MachineState *machine)
> diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
> index 934f4c9261..8f4b68e14c 100644
> --- a/hw/arm/sabrelite.c
> +++ b/hw/arm/sabrelite.c
> @@ -102,16 +102,13 @@ static void sabrelite_init(MachineState *machine)
>      }
>
>      sabrelite_binfo.ram_size = machine->ram_size;
> -    sabrelite_binfo.kernel_filename = machine->kernel_filename;
> -    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sabrelite_binfo.initrd_filename = machine->initrd_filename;
>      sabrelite_binfo.nb_cpus = machine->smp.cpus;
>      sabrelite_binfo.secure_boot = true;
>      sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
>      sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
>
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
>      }
>  }
>
> diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
> index 9c67d5c6f9..2aba3c58c5 100644
> --- a/hw/arm/sbsa-ref.c
> +++ b/hw/arm/sbsa-ref.c
> @@ -709,13 +709,12 @@ static void sbsa_ref_init(MachineState *machine)
>      create_pcie(sms, pic);
>
>      sms->bootinfo.ram_size = machine->ram_size;
> -    sms->bootinfo.kernel_filename = machine->kernel_filename;
>      sms->bootinfo.nb_cpus = smp_cpus;
>      sms->bootinfo.board_id = -1;
>      sms->bootinfo.loader_start = sbsa_ref_memmap[SBSA_MEM].base;
>      sms->bootinfo.get_dtb = sbsa_ref_dtb;
>      sms->bootinfo.firmware_loaded = firmware_loaded;
> -    arm_load_kernel(ARM_CPU(first_cpu), &sms->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
>  }
>
>  static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
> diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
> index 723cf5d592..42338696b3 100644
> --- a/hw/arm/spitz.c
> +++ b/hw/arm/spitz.c
> @@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
>          /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
>          spitz_microdrive_attach(mpu, 0);
>
> -    spitz_binfo.kernel_filename = machine->kernel_filename;
> -    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    spitz_binfo.initrd_filename = machine->initrd_filename;
>      spitz_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &spitz_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index 7843d68d46..3a1de81278 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
>
>  static void tosa_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *rom = g_new(MemoryRegion, 1);
>      PXA2xxState *mpu;
> @@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
>
>      tosa_tg_init(mpu);
>
> -    tosa_binfo.kernel_filename = kernel_filename;
> -    tosa_binfo.kernel_cmdline = kernel_cmdline;
> -    tosa_binfo.initrd_filename = initrd_filename;
>      tosa_binfo.board_id = 0x208;
> -    arm_load_kernel(mpu->cpu, &tosa_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>
> diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
> index e5857117ac..d3c3c00f55 100644
> --- a/hw/arm/versatilepb.c
> +++ b/hw/arm/versatilepb.c
> @@ -373,11 +373,8 @@ static void versatile_init(MachineState *machine, int board_id)
>      }
>
>      versatile_binfo.ram_size = machine->ram_size;
> -    versatile_binfo.kernel_filename = machine->kernel_filename;
> -    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    versatile_binfo.initrd_filename = machine->initrd_filename;
>      versatile_binfo.board_id = board_id;
> -    arm_load_kernel(cpu, &versatile_binfo);
> +    arm_load_kernel(cpu, machine, &versatile_binfo);
>  }
>
>  static void vpb_init(MachineState *machine)
> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
> index 5d932c27c0..4673a88a8d 100644
> --- a/hw/arm/vexpress.c
> +++ b/hw/arm/vexpress.c
> @@ -707,9 +707,6 @@ static void vexpress_common_init(MachineState *machine)
>      }
>
>      daughterboard->bootinfo.ram_size = machine->ram_size;
> -    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
> -    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
>      daughterboard->bootinfo.nb_cpus = machine->smp.cpus;
>      daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
>      daughterboard->bootinfo.loader_start = daughterboard->loader_start;
> @@ -719,7 +716,7 @@ static void vexpress_common_init(MachineState *machine)
>      daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
>      /* When booting Linux we should be in secure state if the CPU has one. */
>      daughterboard->bootinfo.secure_boot = vms->secure;
> -    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
>  }
>
>  static bool vexpress_get_secure(Object *obj, Error **errp)
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d9496c9363..6ffb80bf5b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1364,6 +1364,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>  {
>      VirtMachineState *vms = container_of(notifier, VirtMachineState,
>                                           machine_done);
> +    MachineState *ms = MACHINE(vms);
>      ARMCPU *cpu = ARM_CPU(first_cpu);
>      struct arm_boot_info *info = &vms->bootinfo;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1381,7 +1382,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>                                         vms->memmap[VIRT_PLATFORM_BUS].size,
>                                         vms->irqmap[VIRT_PLATFORM_BUS]);
>      }
> -    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>          exit(1);
>      }
>
> @@ -1707,16 +1708,13 @@ static void machvirt_init(MachineState *machine)
>      create_platform_bus(vms, pic);
>
>      vms->bootinfo.ram_size = machine->ram_size;
> -    vms->bootinfo.kernel_filename = machine->kernel_filename;
> -    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    vms->bootinfo.initrd_filename = machine->initrd_filename;
>      vms->bootinfo.nb_cpus = smp_cpus;
>      vms->bootinfo.board_id = -1;
>      vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
>      vms->bootinfo.get_dtb = machvirt_dtb;
>      vms->bootinfo.skip_dtb_autoload = true;
>      vms->bootinfo.firmware_loaded = firmware_loaded;
> -    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
>
>      vms->machine_done.notify = virt_machine_done;
>      qemu_add_machine_init_done_notifier(&vms->machine_done);
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 89da34808b..c14774e542 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -158,9 +158,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>  static void zynq_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
> @@ -303,16 +300,13 @@ static void zynq_init(MachineState *machine)
>      sysbus_mmio_map(busdev, 0, 0xF8007000);
>
>      zynq_binfo.ram_size = ram_size;
> -    zynq_binfo.kernel_filename = kernel_filename;
> -    zynq_binfo.kernel_cmdline = kernel_cmdline;
> -    zynq_binfo.initrd_filename = initrd_filename;
>      zynq_binfo.nb_cpus = 1;
>      zynq_binfo.board_id = 0xd32;
>      zynq_binfo.loader_start = 0;
>      zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
>      zynq_binfo.write_board_setup = zynq_write_board_setup;
>
> -    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
>  }
>
>  static void zynq_machine_init(MachineClass *mc)
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index f95fde2309..462493c467 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
>                                          0, &s->soc.fpd.apu.mr, 0);
>
>      s->binfo.ram_size = machine->ram_size;
> -    s->binfo.kernel_filename = machine->kernel_filename;
> -    s->binfo.kernel_cmdline = machine->kernel_cmdline;
> -    s->binfo.initrd_filename = machine->initrd_filename;
>      s->binfo.loader_start = 0x0;
>      s->binfo.get_dtb = versal_virt_get_dtb;
>      s->binfo.modify_dtb = versal_virt_modify_dtb;
>      if (machine->kernel_filename) {
> -        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
> +        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
>      } else {
>          AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
>                                                    &s->binfo);
> @@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
>          s->binfo.loader_start = 0x1000;
>          s->binfo.dtb_limit = 0x1000000;
>          if (arm_load_dtb(s->binfo.loader_start,
> -                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
> +                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
>              exit(EXIT_FAILURE);
>          }
>      }
> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
> index 044d3394c0..53cfe7c1f1 100644
> --- a/hw/arm/xlnx-zcu102.c
> +++ b/hw/arm/xlnx-zcu102.c
> @@ -171,11 +171,8 @@ static void xlnx_zcu102_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>
>      xlnx_zcu102_binfo.ram_size = ram_size;
> -    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
> -    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
>      xlnx_zcu102_binfo.loader_start = 0;
> -    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
> +    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
>  }
>
>  static void xlnx_zcu102_machine_instance_init(Object *obj)
> diff --git a/hw/arm/z2.c b/hw/arm/z2.c
> index 44aa748d39..2f21421683 100644
> --- a/hw/arm/z2.c
> +++ b/hw/arm/z2.c
> @@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
>
>  static void z2_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      uint32_t sector_len = 0x10000;
>      PXA2xxState *mpu;
> @@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
>      qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
>                            qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
>
> -    z2_binfo.kernel_filename = kernel_filename;
> -    z2_binfo.kernel_cmdline = kernel_cmdline;
> -    z2_binfo.initrd_filename = initrd_filename;
>      z2_binfo.board_id = 0x6dd;
> -    arm_load_kernel(mpu->cpu, &z2_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
>  }
>
>  static void z2_machine_init(MachineClass *mc)
> diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
> index c48cc4c2bc..2673abe81f 100644
> --- a/include/hw/arm/boot.h
> +++ b/include/hw/arm/boot.h
> @@ -133,7 +133,7 @@ struct arm_boot_info {
>   * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
>   * machine init done notifiers are called in registration reverse order.
>   */
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
>
>  AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>                                       const struct arm_boot_info *info);
> @@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>   * Note: Must not be called unless have_dtb(binfo) is true.
>   */
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as);
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
>
>  /* Write a secure board setup routine with a dummy handler for SMCs */
>  void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
> --
> 2.20.1
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb
  2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
  2019-08-13 21:55   ` Alistair Francis
@ 2019-08-13 21:55   ` Eduardo Habkost
  2019-08-14 13:08     ` Cédric Le Goater
  1 sibling, 1 reply; 34+ messages in thread
From: Eduardo Habkost @ 2019-08-13 21:55 UTC (permalink / raw)
  To: Tao
  Cc: Peter Maydell, imammedo, qemu-devel, daniel, Edgar E. Iglesias,
	Rob Herring, Andrey Smirnov, Joel Stanley, Alistair Francis,
	jingqi.liu, fan.du, Leif Lindholm, Beniamino Galvani, qemu-arm,
	Jan Kiszka, Cédric Le Goater, jonathan.cameron,
	dan.j.williams, Radoslaw Biernacki, Andrew Jeffery,
	Philippe Mathieu-Daudé,
	Andrew Baumann, Jean-Christophe Dubois, Igor Mitsyanko,
	Peter Chubb


CCing ARM maintainers.  I'd like to at least get one Acked-by from
them before queueing this on machine-next.


On Fri, Aug 09, 2019 at 02:57:21PM +0800, Tao wrote:
> From: Tao Xu <tao3.xu@intel.com>
> 
> In struct arm_boot_info, kernel_filename, initrd_filename and
> kernel_cmdline are copied from from MachineState. This patch add
> MachineState as a parameter into arm_load_dtb() and move the copy chunk
> of kernel_filename, initrd_filename and kernel_cmdline into
> arm_load_kernel().
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v9
> ---
>  hw/arm/aspeed.c           |  5 +----
>  hw/arm/boot.c             | 14 ++++++++------
>  hw/arm/collie.c           |  8 +-------
>  hw/arm/cubieboard.c       |  5 +----
>  hw/arm/exynos4_boards.c   |  7 ++-----
>  hw/arm/highbank.c         |  8 +-------
>  hw/arm/imx25_pdk.c        |  5 +----
>  hw/arm/integratorcp.c     |  8 +-------
>  hw/arm/kzm.c              |  5 +----
>  hw/arm/mainstone.c        |  5 +----
>  hw/arm/mcimx6ul-evk.c     |  5 +----
>  hw/arm/mcimx7d-sabre.c    |  5 +----
>  hw/arm/musicpal.c         |  8 +-------
>  hw/arm/nseries.c          |  5 +----
>  hw/arm/omap_sx1.c         |  5 +----
>  hw/arm/palm.c             | 10 ++--------
>  hw/arm/raspi.c            |  6 +-----
>  hw/arm/realview.c         |  5 +----
>  hw/arm/sabrelite.c        |  5 +----
>  hw/arm/sbsa-ref.c         |  3 +--
>  hw/arm/spitz.c            |  5 +----
>  hw/arm/tosa.c             |  8 +-------
>  hw/arm/versatilepb.c      |  5 +----
>  hw/arm/vexpress.c         |  5 +----
>  hw/arm/virt.c             |  8 +++-----
>  hw/arm/xilinx_zynq.c      |  8 +-------
>  hw/arm/xlnx-versal-virt.c |  7 ++-----
>  hw/arm/xlnx-zcu102.c      |  5 +----
>  hw/arm/z2.c               |  8 +-------
>  include/hw/arm/boot.h     |  4 ++--
>  30 files changed, 43 insertions(+), 147 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 843b708247..f8733b86b9 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -241,9 +241,6 @@ static void aspeed_board_init(MachineState *machine,
>          write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
>      }
>  
> -    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
> -    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
> -    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      aspeed_board_binfo.ram_size = ram_size;
>      aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM];
>      aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus;
> @@ -252,7 +249,7 @@ static void aspeed_board_init(MachineState *machine,
>          cfg->i2c_init(bmc);
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
>  }
>  
>  static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index c2b89b3bb9..ba604f8277 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -524,7 +524,7 @@ static void fdt_add_psci_node(void *fdt)
>  }
>  
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as)
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
>  {
>      void *fdt = NULL;
>      int size, rc, n = 0;
> @@ -627,9 +627,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          qemu_fdt_add_subnode(fdt, "/chosen");
>      }
>  
> -    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
> +    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
>          rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
> -                                     binfo->kernel_cmdline);
> +                                     ms->kernel_cmdline);
>          if (rc < 0) {
>              fprintf(stderr, "couldn't set /chosen/bootargs\n");
>              goto fail;
> @@ -1261,7 +1261,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
>       */
>  }
>  
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
>  {
>      CPUState *cs;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1282,7 +1282,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>       * doesn't support secure.
>       */
>      assert(!(info->secure_board_setup && kvm_enabled()));
> -
> +    info->kernel_filename = ms->kernel_filename;
> +    info->kernel_cmdline = ms->kernel_cmdline;
> +    info->initrd_filename = ms->initrd_filename;
>      info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
>      info->dtb_limit = 0;
>  
> @@ -1294,7 +1296,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>      }
>  
>      if (!info->skip_dtb_autoload && have_dtb(info)) {
> -        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>              exit(1);
>          }
>      }
> diff --git a/hw/arm/collie.c b/hw/arm/collie.c
> index 3db3c56004..72bc8f26e5 100644
> --- a/hw/arm/collie.c
> +++ b/hw/arm/collie.c
> @@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
>  
>  static void collie_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      StrongARMState *s;
>      DriveInfo *dinfo;
>      MemoryRegion *sysmem = get_system_memory();
> @@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
>  
>      sysbus_create_simple("scoop", 0x40800000, NULL);
>  
> -    collie_binfo.kernel_filename = kernel_filename;
> -    collie_binfo.kernel_cmdline = kernel_cmdline;
> -    collie_binfo.initrd_filename = initrd_filename;
>      collie_binfo.board_id = 0x208;
> -    arm_load_kernel(s->cpu, &collie_binfo);
> +    arm_load_kernel(s->cpu, machine, &collie_binfo);
>  }
>  
>  static void collie_machine_init(MachineClass *mc)
> diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
> index f7c8a5985a..d992fa087a 100644
> --- a/hw/arm/cubieboard.c
> +++ b/hw/arm/cubieboard.c
> @@ -72,10 +72,7 @@ static void cubieboard_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      cubieboard_binfo.ram_size = machine->ram_size;
> -    cubieboard_binfo.kernel_filename = machine->kernel_filename;
> -    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    cubieboard_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
> +    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
>  }
>  
>  static void cubieboard_machine_init(MachineClass *mc)
> diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
> index ac0b0dc2a9..da402d5216 100644
> --- a/hw/arm/exynos4_boards.c
> +++ b/hw/arm/exynos4_boards.c
> @@ -120,9 +120,6 @@ exynos4_boards_init_common(MachineState *machine,
>      exynos4_board_binfo.board_id = exynos4_board_id[board_type];
>      exynos4_board_binfo.smp_bootreg_addr =
>              exynos4_board_smp_bootreg_addr[board_type];
> -    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
> -    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
> -    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      exynos4_board_binfo.gic_cpu_if_addr =
>              EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
>  
> @@ -141,7 +138,7 @@ static void nuri_init(MachineState *machine)
>  {
>      exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>  
>  static void smdkc210_init(MachineState *machine)
> @@ -151,7 +148,7 @@ static void smdkc210_init(MachineState *machine)
>  
>      lan9215_init(SMDK_LAN9118_BASE_ADDR,
>              qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>  
>  static void nuri_class_init(ObjectClass *oc, void *data)
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index def0f1ce6a..1a35b6d82f 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -234,9 +234,6 @@ enum cxmachines {
>  static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      DeviceState *dev = NULL;
>      SysBusDevice *busdev;
>      qemu_irq pic[128];
> @@ -388,9 +385,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      highbank_binfo.ram_size = ram_size;
> -    highbank_binfo.kernel_filename = kernel_filename;
> -    highbank_binfo.kernel_cmdline = kernel_cmdline;
> -    highbank_binfo.initrd_filename = initrd_filename;
>      /* highbank requires a dtb in order to boot, and the dtb will override
>       * the board ID. The following value is ignored, so set it to -1 to be
>       * clear that the value is meaningless.
> @@ -410,7 +404,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>                      "may not boot.");
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
>  }
>  
>  static void highbank_init(MachineState *machine)
> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
> index 5d673e47bc..c76fc2bd94 100644
> --- a/hw/arm/imx25_pdk.c
> +++ b/hw/arm/imx25_pdk.c
> @@ -116,9 +116,6 @@ static void imx25_pdk_init(MachineState *machine)
>      }
>  
>      imx25_pdk_binfo.ram_size = machine->ram_size;
> -    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
> -    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
>      imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
>      imx25_pdk_binfo.board_id = 1771,
>      imx25_pdk_binfo.nb_cpus = 1;
> @@ -129,7 +126,7 @@ static void imx25_pdk_init(MachineState *machine)
>       * fail.
>       */
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
> index 200c0107f0..4d9e9c9e49 100644
> --- a/hw/arm/integratorcp.c
> +++ b/hw/arm/integratorcp.c
> @@ -578,9 +578,6 @@ static struct arm_boot_info integrator_binfo = {
>  static void integratorcp_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      Object *cpuobj;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
> @@ -650,10 +647,7 @@ static void integratorcp_init(MachineState *machine)
>      sysbus_create_simple("pl110", 0xc0000000, pic[22]);
>  
>      integrator_binfo.ram_size = ram_size;
> -    integrator_binfo.kernel_filename = kernel_filename;
> -    integrator_binfo.kernel_cmdline = kernel_cmdline;
> -    integrator_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &integrator_binfo);
> +    arm_load_kernel(cpu, machine, &integrator_binfo);
>  }
>  
>  static void integratorcp_machine_init(MachineClass *mc)
> diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
> index 59d2102dc5..5ff419a555 100644
> --- a/hw/arm/kzm.c
> +++ b/hw/arm/kzm.c
> @@ -126,13 +126,10 @@ static void kzm_init(MachineState *machine)
>      }
>  
>      kzm_binfo.ram_size = machine->ram_size;
> -    kzm_binfo.kernel_filename = machine->kernel_filename;
> -    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    kzm_binfo.initrd_filename = machine->initrd_filename;
>      kzm_binfo.nb_cpus = 1;
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
> index cd1f904c6c..c76cfb5dd1 100644
> --- a/hw/arm/mainstone.c
> +++ b/hw/arm/mainstone.c
> @@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
>      smc91c111_init(&nd_table[0], MST_ETH_PHYS,
>                      qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
>  
> -    mainstone_binfo.kernel_filename = machine->kernel_filename;
> -    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    mainstone_binfo.initrd_filename = machine->initrd_filename;
>      mainstone_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &mainstone_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
>  }
>  
>  static void mainstone_init(MachineState *machine)
> diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
> index 1f6f4aed97..e8a9b03069 100644
> --- a/hw/arm/mcimx6ul-evk.c
> +++ b/hw/arm/mcimx6ul-evk.c
> @@ -39,9 +39,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
>          .loader_start = FSL_IMX6UL_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = machine->smp.cpus,
>      };
>  
> @@ -71,7 +68,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
>      }
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &boot_info);
> +        arm_load_kernel(&s->soc.cpu, machine, &boot_info);
>      }
>  }
>  
> diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
> index 72eab03a0c..3123d8767f 100644
> --- a/hw/arm/mcimx7d-sabre.c
> +++ b/hw/arm/mcimx7d-sabre.c
> @@ -42,9 +42,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
>          .loader_start = FSL_IMX7_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = machine->smp.cpus,
>      };
>  
> @@ -74,7 +71,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
>      }
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &boot_info);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
>      }
>  }
>  
> diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
> index 95d56f3208..a53ee12737 100644
> --- a/hw/arm/musicpal.c
> +++ b/hw/arm/musicpal.c
> @@ -1568,9 +1568,6 @@ static struct arm_boot_info musicpal_binfo = {
>  
>  static void musicpal_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      qemu_irq pic[32];
>      DeviceState *dev;
> @@ -1699,10 +1696,7 @@ static void musicpal_init(MachineState *machine)
>      sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
>  
>      musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
> -    musicpal_binfo.kernel_filename = kernel_filename;
> -    musicpal_binfo.kernel_cmdline = kernel_cmdline;
> -    musicpal_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &musicpal_binfo);
> +    arm_load_kernel(cpu, machine, &musicpal_binfo);
>  }
>  
>  static void musicpal_machine_init(MachineClass *mc)
> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
> index 4a79f5c88b..31dd2f1b51 100644
> --- a/hw/arm/nseries.c
> +++ b/hw/arm/nseries.c
> @@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
>  
>      if (machine->kernel_filename) {
>          /* Or at the linux loader.  */
> -        binfo->kernel_filename = machine->kernel_filename;
> -        binfo->kernel_cmdline = machine->kernel_cmdline;
> -        binfo->initrd_filename = machine->initrd_filename;
> -        arm_load_kernel(s->mpu->cpu, binfo);
> +        arm_load_kernel(s->mpu->cpu, machine, binfo);
>  
>          qemu_register_reset(n8x0_boot_init, s);
>      }
> diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
> index cae78d0a36..3cc2817f06 100644
> --- a/hw/arm/omap_sx1.c
> +++ b/hw/arm/omap_sx1.c
> @@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
>      }
>  
>      /* Load the kernel.  */
> -    sx1_binfo.kernel_filename = machine->kernel_filename;
> -    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sx1_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(mpu->cpu, &sx1_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
>  
>      /* TODO: fix next line */
>      //~ qemu_console_resize(ds, 640, 480);
> diff --git a/hw/arm/palm.c b/hw/arm/palm.c
> index 9eb9612bce..67ab30b5bc 100644
> --- a/hw/arm/palm.c
> +++ b/hw/arm/palm.c
> @@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
>  
>  static void palmte_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      struct omap_mpu_state_s *mpu;
>      int flash_size = 0x00800000;
> @@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
>          }
>      }
>  
> -    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
> +    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
>          fprintf(stderr, "Kernel or ROM image must be specified\n");
>          exit(1);
>      }
>  
>      /* Load the kernel.  */
> -    palmte_binfo.kernel_filename = kernel_filename;
> -    palmte_binfo.kernel_cmdline = kernel_cmdline;
> -    palmte_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(mpu->cpu, &palmte_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
>  }
>  
>  static void palmte_machine_init(MachineClass *mc)
> diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
> index 5b2620acb4..74c062d05e 100644
> --- a/hw/arm/raspi.c
> +++ b/hw/arm/raspi.c
> @@ -157,13 +157,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
>  
>          binfo.entry = firmware_addr;
>          binfo.firmware_loaded = true;
> -    } else {
> -        binfo.kernel_filename = machine->kernel_filename;
> -        binfo.kernel_cmdline = machine->kernel_cmdline;
> -        binfo.initrd_filename = machine->initrd_filename;
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
>  }
>  
>  static void raspi_init(MachineState *machine, int version)
> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
> index 7c56c8d2ed..5a3e65ddd6 100644
> --- a/hw/arm/realview.c
> +++ b/hw/arm/realview.c
> @@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
>      memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
>  
>      realview_binfo.ram_size = ram_size;
> -    realview_binfo.kernel_filename = machine->kernel_filename;
> -    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    realview_binfo.initrd_filename = machine->initrd_filename;
>      realview_binfo.nb_cpus = smp_cpus;
>      realview_binfo.board_id = realview_board_id[board_type];
>      realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
> -    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
>  }
>  
>  static void realview_eb_init(MachineState *machine)
> diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
> index 934f4c9261..8f4b68e14c 100644
> --- a/hw/arm/sabrelite.c
> +++ b/hw/arm/sabrelite.c
> @@ -102,16 +102,13 @@ static void sabrelite_init(MachineState *machine)
>      }
>  
>      sabrelite_binfo.ram_size = machine->ram_size;
> -    sabrelite_binfo.kernel_filename = machine->kernel_filename;
> -    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sabrelite_binfo.initrd_filename = machine->initrd_filename;
>      sabrelite_binfo.nb_cpus = machine->smp.cpus;
>      sabrelite_binfo.secure_boot = true;
>      sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
>      sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
> index 9c67d5c6f9..2aba3c58c5 100644
> --- a/hw/arm/sbsa-ref.c
> +++ b/hw/arm/sbsa-ref.c
> @@ -709,13 +709,12 @@ static void sbsa_ref_init(MachineState *machine)
>      create_pcie(sms, pic);
>  
>      sms->bootinfo.ram_size = machine->ram_size;
> -    sms->bootinfo.kernel_filename = machine->kernel_filename;
>      sms->bootinfo.nb_cpus = smp_cpus;
>      sms->bootinfo.board_id = -1;
>      sms->bootinfo.loader_start = sbsa_ref_memmap[SBSA_MEM].base;
>      sms->bootinfo.get_dtb = sbsa_ref_dtb;
>      sms->bootinfo.firmware_loaded = firmware_loaded;
> -    arm_load_kernel(ARM_CPU(first_cpu), &sms->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
>  }
>  
>  static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
> diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
> index 723cf5d592..42338696b3 100644
> --- a/hw/arm/spitz.c
> +++ b/hw/arm/spitz.c
> @@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
>          /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
>          spitz_microdrive_attach(mpu, 0);
>  
> -    spitz_binfo.kernel_filename = machine->kernel_filename;
> -    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    spitz_binfo.initrd_filename = machine->initrd_filename;
>      spitz_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &spitz_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>  
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index 7843d68d46..3a1de81278 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
>  
>  static void tosa_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *rom = g_new(MemoryRegion, 1);
>      PXA2xxState *mpu;
> @@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
>  
>      tosa_tg_init(mpu);
>  
> -    tosa_binfo.kernel_filename = kernel_filename;
> -    tosa_binfo.kernel_cmdline = kernel_cmdline;
> -    tosa_binfo.initrd_filename = initrd_filename;
>      tosa_binfo.board_id = 0x208;
> -    arm_load_kernel(mpu->cpu, &tosa_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>  
> diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
> index e5857117ac..d3c3c00f55 100644
> --- a/hw/arm/versatilepb.c
> +++ b/hw/arm/versatilepb.c
> @@ -373,11 +373,8 @@ static void versatile_init(MachineState *machine, int board_id)
>      }
>  
>      versatile_binfo.ram_size = machine->ram_size;
> -    versatile_binfo.kernel_filename = machine->kernel_filename;
> -    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    versatile_binfo.initrd_filename = machine->initrd_filename;
>      versatile_binfo.board_id = board_id;
> -    arm_load_kernel(cpu, &versatile_binfo);
> +    arm_load_kernel(cpu, machine, &versatile_binfo);
>  }
>  
>  static void vpb_init(MachineState *machine)
> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
> index 5d932c27c0..4673a88a8d 100644
> --- a/hw/arm/vexpress.c
> +++ b/hw/arm/vexpress.c
> @@ -707,9 +707,6 @@ static void vexpress_common_init(MachineState *machine)
>      }
>  
>      daughterboard->bootinfo.ram_size = machine->ram_size;
> -    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
> -    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
>      daughterboard->bootinfo.nb_cpus = machine->smp.cpus;
>      daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
>      daughterboard->bootinfo.loader_start = daughterboard->loader_start;
> @@ -719,7 +716,7 @@ static void vexpress_common_init(MachineState *machine)
>      daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
>      /* When booting Linux we should be in secure state if the CPU has one. */
>      daughterboard->bootinfo.secure_boot = vms->secure;
> -    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
>  }
>  
>  static bool vexpress_get_secure(Object *obj, Error **errp)
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d9496c9363..6ffb80bf5b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1364,6 +1364,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>  {
>      VirtMachineState *vms = container_of(notifier, VirtMachineState,
>                                           machine_done);
> +    MachineState *ms = MACHINE(vms);
>      ARMCPU *cpu = ARM_CPU(first_cpu);
>      struct arm_boot_info *info = &vms->bootinfo;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1381,7 +1382,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>                                         vms->memmap[VIRT_PLATFORM_BUS].size,
>                                         vms->irqmap[VIRT_PLATFORM_BUS]);
>      }
> -    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>          exit(1);
>      }
>  
> @@ -1707,16 +1708,13 @@ static void machvirt_init(MachineState *machine)
>      create_platform_bus(vms, pic);
>  
>      vms->bootinfo.ram_size = machine->ram_size;
> -    vms->bootinfo.kernel_filename = machine->kernel_filename;
> -    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    vms->bootinfo.initrd_filename = machine->initrd_filename;
>      vms->bootinfo.nb_cpus = smp_cpus;
>      vms->bootinfo.board_id = -1;
>      vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
>      vms->bootinfo.get_dtb = machvirt_dtb;
>      vms->bootinfo.skip_dtb_autoload = true;
>      vms->bootinfo.firmware_loaded = firmware_loaded;
> -    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
>  
>      vms->machine_done.notify = virt_machine_done;
>      qemu_add_machine_init_done_notifier(&vms->machine_done);
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 89da34808b..c14774e542 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -158,9 +158,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>  static void zynq_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
> @@ -303,16 +300,13 @@ static void zynq_init(MachineState *machine)
>      sysbus_mmio_map(busdev, 0, 0xF8007000);
>  
>      zynq_binfo.ram_size = ram_size;
> -    zynq_binfo.kernel_filename = kernel_filename;
> -    zynq_binfo.kernel_cmdline = kernel_cmdline;
> -    zynq_binfo.initrd_filename = initrd_filename;
>      zynq_binfo.nb_cpus = 1;
>      zynq_binfo.board_id = 0xd32;
>      zynq_binfo.loader_start = 0;
>      zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
>      zynq_binfo.write_board_setup = zynq_write_board_setup;
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
>  }
>  
>  static void zynq_machine_init(MachineClass *mc)
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index f95fde2309..462493c467 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
>                                          0, &s->soc.fpd.apu.mr, 0);
>  
>      s->binfo.ram_size = machine->ram_size;
> -    s->binfo.kernel_filename = machine->kernel_filename;
> -    s->binfo.kernel_cmdline = machine->kernel_cmdline;
> -    s->binfo.initrd_filename = machine->initrd_filename;
>      s->binfo.loader_start = 0x0;
>      s->binfo.get_dtb = versal_virt_get_dtb;
>      s->binfo.modify_dtb = versal_virt_modify_dtb;
>      if (machine->kernel_filename) {
> -        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
> +        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
>      } else {
>          AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
>                                                    &s->binfo);
> @@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
>          s->binfo.loader_start = 0x1000;
>          s->binfo.dtb_limit = 0x1000000;
>          if (arm_load_dtb(s->binfo.loader_start,
> -                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
> +                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
>              exit(EXIT_FAILURE);
>          }
>      }
> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
> index 044d3394c0..53cfe7c1f1 100644
> --- a/hw/arm/xlnx-zcu102.c
> +++ b/hw/arm/xlnx-zcu102.c
> @@ -171,11 +171,8 @@ static void xlnx_zcu102_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      xlnx_zcu102_binfo.ram_size = ram_size;
> -    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
> -    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
>      xlnx_zcu102_binfo.loader_start = 0;
> -    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
> +    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
>  }
>  
>  static void xlnx_zcu102_machine_instance_init(Object *obj)
> diff --git a/hw/arm/z2.c b/hw/arm/z2.c
> index 44aa748d39..2f21421683 100644
> --- a/hw/arm/z2.c
> +++ b/hw/arm/z2.c
> @@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
>  
>  static void z2_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      uint32_t sector_len = 0x10000;
>      PXA2xxState *mpu;
> @@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
>      qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
>                            qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
>  
> -    z2_binfo.kernel_filename = kernel_filename;
> -    z2_binfo.kernel_cmdline = kernel_cmdline;
> -    z2_binfo.initrd_filename = initrd_filename;
>      z2_binfo.board_id = 0x6dd;
> -    arm_load_kernel(mpu->cpu, &z2_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
>  }
>  
>  static void z2_machine_init(MachineClass *mc)
> diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
> index c48cc4c2bc..2673abe81f 100644
> --- a/include/hw/arm/boot.h
> +++ b/include/hw/arm/boot.h
> @@ -133,7 +133,7 @@ struct arm_boot_info {
>   * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
>   * machine init done notifiers are called in registration reverse order.
>   */
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
>  
>  AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>                                       const struct arm_boot_info *info);
> @@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>   * Note: Must not be called unless have_dtb(binfo) is true.
>   */
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as);
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
>  
>  /* Write a secure board setup routine with a dummy handler for SMCs */
>  void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
> -- 
> 2.20.1
> 
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb
  2019-08-13 21:55   ` Alistair Francis
@ 2019-08-14  1:19     ` Andrew Jeffery
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Jeffery @ 2019-08-14  1:19 UTC (permalink / raw)
  To: Alistair Francis, Tao
  Cc: Eduardo Habkost, jingqi.liu, fan.du,
	qemu-devel@nongnu.org Developers, daniel, Jonathan Cameron,
	Igor Mammedov, dan.j.williams



On Wed, 14 Aug 2019, at 07:30, Alistair Francis wrote:
> On Fri, Aug 9, 2019 at 12:01 AM Tao <tao3.xu@intel.com> wrote:
> >
> > From: Tao Xu <tao3.xu@intel.com>
> >
> > In struct arm_boot_info, kernel_filename, initrd_filename and
> > kernel_cmdline are copied from from MachineState. This patch add
> > MachineState as a parameter into arm_load_dtb() and move the copy chunk
> > of kernel_filename, initrd_filename and kernel_cmdline into
> > arm_load_kernel().
> >
> > Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> > Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> > Suggested-by: Igor Mammedov <imammedo@redhat.com>
> > Signed-off-by: Tao Xu <tao3.xu@intel.com>
> 
> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
> 
> Alistair
> 
> > ---
> >
> > No changes in v9
> > ---
> >  hw/arm/aspeed.c           |  5 +----

For the ASPEED machines:

Acked-by: Andrew Jeffery <andrew@aj.id.au>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-13 15:00   ` Igor Mammedov
@ 2019-08-14  2:24     ` Tao Xu
  2019-08-16 14:47       ` Igor Mammedov
  2019-08-14  2:39     ` Dan Williams
  1 sibling, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-08-14  2:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

On 8/13/2019 11:00 PM, Igor Mammedov wrote:
> On Fri,  9 Aug 2019 14:57:25 +0800
> Tao <tao3.xu@intel.com> wrote:
> 
>> From: Tao Xu <tao3.xu@intel.com>
>>
>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
>> The initiator represents processor which access to memory. And in 5.2.27.3
>> Memory Proximity Domain Attributes Structure, the attached initiator is
>> defined as where the memory controller responsible for a memory proximity
>> domain. With attached initiator information, the topology of heterogeneous
>> memory can be described.
>>
>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
>> the platform's HMAT tables.
>>
>> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v9
>> ---
[...]
>> +
>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
>> +        if (numa_info[i].initiator_valid &&
>> +            !numa_info[numa_info[i].initiator].has_cpu) {
>                            ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
> 
I will add a error "if (numa_info[i].initiator >= MAX_NODES)" when input.
>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>> +                         " does not exist.", numa_info[i].initiator, i);
>> +            error_printf("\n");
>> +
>> +            exit(1);
>> +        }
> it takes care only about nodes that have cpus or memory-only ones that have
> initiator explicitly provided on CLI. And leaves possibility to have
> memory-only nodes without initiator mixed with nodes that have initiator.
> Is it valid to have mixed configuration?
> Should we forbid it?
> 
Mixed configuration may indeed trigger bug in the future. Because in 
this patches we default generate HMAT. But mixed configuration situation 
or without initiator setting will let mem-only node "Flags" field 0, 
then the Proximity Domain for the Attached Initiator field is not
valid.

List are three situations:

1) full configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2,initiator=0

2) mixed configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2

3) no configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1 \
-numa node,nodeid=2,memdev=m2

I have 3 ideas:

1. HMAT option. Add a machine option like "-machine,hmat=yes", then qemu 
can have HMAT.

2. Default setting. The numa without initiator default set numa node 
which has cpu 0 as initiator.

3. Auto setting. intelligent auto configuration like 
numa_default_auto_assign_ram, auto set initiator of the memory-only 
nodes averagely.

Therefore, there are 2 different solution:

1) HMAT option + Default setting

2) HMAT option + Auto setting

>> +    }
>> +
>>       if (s->len && !qtest_enabled()) {
>>           warn_report("CPU(s) not present in any NUMA nodes: %s",
>>                       s->str);
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index 8fcbba05d6..cfb6339810 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>           numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>>           numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>>       }
>> +
>> +    if (node->has_initiator) {
>> +        if (numa_info[nodenr].initiator_valid &&
>> +            (node->initiator != numa_info[nodenr].initiator)) {
>> +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
>> +                       "set to node %" PRIu16, nodenr,
>> +                       numa_info[nodenr].initiator);
>> +            return;
>> +        }
>> +
>> +        numa_info[nodenr].initiator_valid = true;
>> +        numa_info[nodenr].initiator = node->initiator;
>                                               ^^^
> not validated  user input? (which could lead to read beyond numa_info[] boundaries
> in previous hunk).
> 
>> +    }
>>       numa_info[nodenr].present = true;
>>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>>       ms->numa_state->num_nodes++;
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 76da3016db..46ad06e000 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -10,6 +10,9 @@ struct NodeInfo {
>>       uint64_t node_mem;
>>       struct HostMemoryBackend *node_memdev;
>>       bool present;
>> +    bool has_cpu;
>> +    bool initiator_valid;
>> +    uint16_t initiator;
>>       uint8_t distance[MAX_NODES];
>>   };
>>   
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 6db8a7e2ec..05e367d26a 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -414,6 +414,9 @@
>>   # @memdev: memory backend object.  If specified for one node,
>>   #          it must be specified for all nodes.
>>   #
>> +# @initiator: the initiator numa nodeid that is closest (as in directly
>> +#             attached) to this numa node (since 4.2)
> well, it's pretty unclear what doc comment means (unless reader knows well
> specific part of ACPI spec)
> 
> suggest to rephrase to something more understandable for unaware
> readers (+ possible reference to spec for those who is interested
> in spec definition since this doc is meant for developers).
> 
>> +#
>>   # Since: 2.1
>>   ##
>>   { 'struct': 'NumaNodeOptions',
>> @@ -421,7 +424,8 @@
>>      '*nodeid': 'uint16',
>>      '*cpus':   ['uint16'],
>>      '*mem':    'size',
>> -   '*memdev': 'str' }}
>> +   '*memdev': 'str',
>> +   '*initiator': 'uint16' }}
>>   
>>   ##
>>   # @NumaDistOptions:
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 9621e934c0..c480781992 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>>   ETEXI
>>   
>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>>       QEMU_ARCH_ALL)
>>   STEXI
>> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>>   @findex -numa
>> @@ -215,6 +215,25 @@ split equally between them.
>>   @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>>   if one node uses @samp{memdev}, all of them have to use it.
>>   
>> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
>                                    ^^^^^^^       ^^^^^^^^^^^^^^
> above will result in "initiator NUMA initiator", was it your intention?
> 
>> +closest (as in directly attached) to this NUMA @var{node}.
> Again suggest replace spec language with something more user friendly
> (this time without spec reference as it's geared for end user)
> 
>> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
> Following example creates a machine with 2 NUMA ...
> 
>> +node 1 has only memory, and its' initiator is node 0. Note that because
>> +node 0 has CPU, by default the initiator of node 0 is itself and must be
>> +itself.
>> +@example
>> +-M pc \
>> +-m 2G,slots=2,maxmem=4G \
>> +-object memory-backend-ram,size=1G,id=m0 \
>> +-object memory-backend-ram,size=1G,id=m1 \
>> +-numa node,nodeid=0,memdev=m0 \
>> +-numa node,nodeid=1,memdev=m1,initiator=0 \
>> +-smp 2,sockets=2,maxcpus=2  \
>> +-numa cpu,node-id=0,socket-id=0 \
>> +-numa cpu,node-id=0,socket-id=1 \
>> +@end example
>> +
>>   @var{source} and @var{destination} are NUMA node IDs.
>>   @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>>   The distance from a node to itself is always 10. If any pair of nodes is
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-13 15:00   ` Igor Mammedov
  2019-08-14  2:24     ` Tao Xu
@ 2019-08-14  2:39     ` Dan Williams
  2019-08-14  5:13       ` Tao Xu
  1 sibling, 1 reply; 34+ messages in thread
From: Dan Williams @ 2019-08-14  2:39 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, Jingqi Liu, Tao, Du, Fan, Qemu Developers,
	daniel, Jonathan Cameron

On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Fri,  9 Aug 2019 14:57:25 +0800
> Tao <tao3.xu@intel.com> wrote:
>
> > From: Tao Xu <tao3.xu@intel.com>
> >
> > In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> > The initiator represents processor which access to memory. And in 5.2.27.3
> > Memory Proximity Domain Attributes Structure, the attached initiator is
> > defined as where the memory controller responsible for a memory proximity
> > domain. With attached initiator information, the topology of heterogeneous
> > memory can be described.
> >
> > Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> > In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> > the platform's HMAT tables.
> >
> > Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Tao Xu <tao3.xu@intel.com>
> > ---
> >
> > No changes in v9
> > ---
> >  hw/core/machine.c     | 24 ++++++++++++++++++++++++
> >  hw/core/numa.c        | 13 +++++++++++++
> >  include/sysemu/numa.h |  3 +++
> >  qapi/machine.json     |  6 +++++-
> >  qemu-options.hx       | 27 +++++++++++++++++++++++----
> >  5 files changed, 68 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 3c55470103..113184a9df 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >                                 const CpuInstanceProperties *props, Error **errp)
> >  {
> >      MachineClass *mc = MACHINE_GET_CLASS(machine);
> > +    NodeInfo *numa_info = machine->numa_state->nodes;
> >      bool match = false;
> >      int i;
> >
> > @@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >          match = true;
> >          slot->props.node_id = props->node_id;
> >          slot->props.has_node_id = props->has_node_id;
> > +
> > +        if (numa_info[props->node_id].initiator_valid &&
> > +            (props->node_id != numa_info[props->node_id].initiator)) {
> > +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> > +                       " should be itself.", props->node_id);
> > +            return;
> > +        }
> > +        numa_info[props->node_id].initiator_valid = true;
> > +        numa_info[props->node_id].has_cpu = true;
> > +        numa_info[props->node_id].initiator = props->node_id;
> >      }
> >
> >      if (!match) {
> > @@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
> >      GString *s = g_string_new(NULL);
> >      MachineClass *mc = MACHINE_GET_CLASS(machine);
> >      const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
> > +    NodeInfo *numa_info = machine->numa_state->nodes;
> >
> >      assert(machine->numa_state->num_nodes);
> >      for (i = 0; i < possible_cpus->len; i++) {
> > @@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
> >              machine_set_cpu_numa_node(machine, &props, &error_fatal);
> >          }
> >      }
> > +
> > +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> > +        if (numa_info[i].initiator_valid &&
> > +            !numa_info[numa_info[i].initiator].has_cpu) {
>                           ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>
> > +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> > +                         " does not exist.", numa_info[i].initiator, i);
> > +            error_printf("\n");
> > +
> > +            exit(1);
> > +        }
> it takes care only about nodes that have cpus or memory-only ones that have
> initiator explicitly provided on CLI. And leaves possibility to have
> memory-only nodes without initiator mixed with nodes that have initiator.
> Is it valid to have mixed configuration?
> Should we forbid it?

The spec talks about the "Proximity Domain for the Attached Initiator"
field only being valid if the memory controller for the memory can be
identified by an initiator id in the SRAT. So I expect the only way to
define a memory proximity domain without this local initiator is to
allow specifying a node-id that does not have an entry in the SRAT.

That would be a useful feature for testing OS HMAT parsing behavior,
and may match platforms that exist in practice.

>
> > +    }
> > +
> >      if (s->len && !qtest_enabled()) {
> >          warn_report("CPU(s) not present in any NUMA nodes: %s",
> >                      s->str);
> > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > index 8fcbba05d6..cfb6339810 100644
> > --- a/hw/core/numa.c
> > +++ b/hw/core/numa.c
> > @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> >          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
> >          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> >      }
> > +
> > +    if (node->has_initiator) {
> > +        if (numa_info[nodenr].initiator_valid &&
> > +            (node->initiator != numa_info[nodenr].initiator)) {
> > +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
> > +                       "set to node %" PRIu16, nodenr,
> > +                       numa_info[nodenr].initiator);
> > +            return;
> > +        }
> > +
> > +        numa_info[nodenr].initiator_valid = true;
> > +        numa_info[nodenr].initiator = node->initiator;
>                                              ^^^
> not validated  user input? (which could lead to read beyond numa_info[] boundaries
> in previous hunk).
>
> > +    }
> >      numa_info[nodenr].present = true;
> >      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> >      ms->numa_state->num_nodes++;
> > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> > index 76da3016db..46ad06e000 100644
> > --- a/include/sysemu/numa.h
> > +++ b/include/sysemu/numa.h
> > @@ -10,6 +10,9 @@ struct NodeInfo {
> >      uint64_t node_mem;
> >      struct HostMemoryBackend *node_memdev;
> >      bool present;
> > +    bool has_cpu;
> > +    bool initiator_valid;
> > +    uint16_t initiator;
> >      uint8_t distance[MAX_NODES];
> >  };
> >
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index 6db8a7e2ec..05e367d26a 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -414,6 +414,9 @@
> >  # @memdev: memory backend object.  If specified for one node,
> >  #          it must be specified for all nodes.
> >  #
> > +# @initiator: the initiator numa nodeid that is closest (as in directly
> > +#             attached) to this numa node (since 4.2)
> well, it's pretty unclear what doc comment means (unless reader knows well
> specific part of ACPI spec)
>
> suggest to rephrase to something more understandable for unaware
> readers (+ possible reference to spec for those who is interested
> in spec definition since this doc is meant for developers).
>
> > +#
> >  # Since: 2.1
> >  ##
> >  { 'struct': 'NumaNodeOptions',
> > @@ -421,7 +424,8 @@
> >     '*nodeid': 'uint16',
> >     '*cpus':   ['uint16'],
> >     '*mem':    'size',
> > -   '*memdev': 'str' }}
> > +   '*memdev': 'str',
> > +   '*initiator': 'uint16' }}
> >
> >  ##
> >  # @NumaDistOptions:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 9621e934c0..c480781992 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
> >  ETEXI
> >
> >  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> > -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> > +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> >      "-numa dist,src=source,dst=destination,val=distance\n"
> >      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> >      QEMU_ARCH_ALL)
> >  STEXI
> > -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> > +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
> >  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> >  @findex -numa
> > @@ -215,6 +215,25 @@ split equally between them.
> >  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
> >  if one node uses @samp{memdev}, all of them have to use it.
> >
> > +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
>                                   ^^^^^^^       ^^^^^^^^^^^^^^
> above will result in "initiator NUMA initiator", was it your intention?
>
> > +closest (as in directly attached) to this NUMA @var{node}.
> Again suggest replace spec language with something more user friendly
> (this time without spec reference as it's geared for end user)
>
> > +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
> Following example creates a machine with 2 NUMA ...
>
> > +node 1 has only memory, and its' initiator is node 0. Note that because
> > +node 0 has CPU, by default the initiator of node 0 is itself and must be
> > +itself.
> > +@example
> > +-M pc \
> > +-m 2G,slots=2,maxmem=4G \
> > +-object memory-backend-ram,size=1G,id=m0 \
> > +-object memory-backend-ram,size=1G,id=m1 \
> > +-numa node,nodeid=0,memdev=m0 \
> > +-numa node,nodeid=1,memdev=m1,initiator=0 \
> > +-smp 2,sockets=2,maxcpus=2  \
> > +-numa cpu,node-id=0,socket-id=0 \
> > +-numa cpu,node-id=0,socket-id=1 \
> > +@end example
> > +
> >  @var{source} and @var{destination} are NUMA node IDs.
> >  @var{distance} is the NUMA distance from @var{source} to @var{destination}.
> >  The distance from a node to itself is always 10. If any pair of nodes is
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
  2019-08-13 15:11   ` Eric Blake
@ 2019-08-14  2:58     ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-08-14  2:58 UTC (permalink / raw)
  To: Eric Blake, daniel
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Markus Armbruster,
	jonathan.cameron, imammedo, dan.j.williams

On 8/13/2019 11:11 PM, Eric Blake wrote:
> On 8/9/19 1:57 AM, Tao wrote:
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v9:
>>      - change the CLI input way, make it more user firendly (Daniel Black)
>>      use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
>>      the base-lat and base-bw input.
> 
> Why are you hand-rolling yet another scaling parser instead of reusing
> one that's already in-tree?

Because there are no time scaling parser and QMP 'size' type will use kb 
as default. It is a tricky issue because the entry in HMAT is small(max 
0xffff) and we need to store the unit in HMAT.

But as you mentioned blew, 'str' is not a good choice for QMP.
Therefore, what about this solution:

For bandwidth, reuse the qemu_strtosz_MiB() (because the smllest unit is 
MB/s). For latency, write a time scaling parser named as 
"qemu_strtotime_ps()" and "qemu_strtotime_ns()" in util/cutils.c. And 
then use it to pre-convert them into the single scale (QMP interface can 
use).

At last, in HMAT, we auto store the data, separate it into the same base 
unit and entry, and show error if overflow. Then the HMAT can support as 
large as possible.

I am wondering if this solution is OK.
> 
>> +++ b/hw/core/numa.c
> 
>> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
>> +                        Error **errp)
>> +{
> 
>> +    if (node->has_latency) {
>> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
>> +
>> +        if (!hmat_lb) {
>> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
>> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
>> +        } else if (hmat_lb->latency[node->initiator][node->target]) {
>> +            error_setg(errp, "Duplicate configuration of the latency for "
>> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
>> +                       node->initiator, node->target);
>> +            return;
>> +        }
>> +
>> +        ret = qemu_strtoui(node->latency, &endptr, 10, &latency);
>> +        if (ret < 0) {
>> +            error_setg(errp, "Invalid latency %s", node->latency);
>> +            return;
>> +        }
>> +
>> +        if (*endptr == '\0') {
>> +            base_lat = 1;
>> +        } else if (*(endptr + 1) == 's') {
>> +            switch (*endptr) {
>> +            case 'p':
>> +                base_lat = 1;
>> +                break;
>> +            case 'n':
>> +                base_lat = PICO_PER_NSEC;
>> +                break;
>> +            case 'u':
>> +                base_lat = PICO_PER_USEC;
>> +                break;
> 
> Hmm - this is a different scaling than any of our existing parsers
> (which assume multiples k/M/G..., not subdivisions u/n/s)
> 
> 
>> +    if (node->has_bandwidth) {
>> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
>> +
>> +        if (!hmat_lb) {
>> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
>> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
>> +        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
>> +            error_setg(errp, "Duplicate configuration of the bandwidth for "
>> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
>> +                       node->initiator, node->target);
>> +            return;
>> +        }
>> +
>> +        ret = qemu_strtoui(node->bandwidth, &endptr, 10, &bandwidth);
>> +        if (ret < 0) {
>> +            error_setg(errp, "Invalid bandwidth %s", node->bandwidth);
>> +            return;
>> +        }
>> +
>> +        switch (toupper(*endptr)) {
>> +        case '\0':
>> +        case 'M':
>> +            base_bw = 1;
>> +            break;
>> +        case 'G':
>> +            base_bw = UINT64_C(1) << 10;
>> +            break;
>> +        case 'P':
>> +            base_bw = UINT64_C(1) << 20;
>> +            break;
> 
> But this one, in addition to being wrong (P is 1<<30, not 1<<20), should
> definitely be reusing qemu_strtosz_metric() or similar (look in
> util/cutils.c).
> 
> 
>> +++ b/qapi/machine.json
>> @@ -377,10 +377,12 @@
>>   #
>>   # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>>   #
>> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
>> +#
>>   # Since: 2.1
>>   ##
>>   { 'enum': 'NumaOptionsType',
>> -  'data': [ 'node', 'dist', 'cpu' ] }
>> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>>   
> 
>> +##
>> +# @HmatLBDataType:
>> +#
>> +# Data type in the System Locality Latency
>> +# and Bandwidth Information Structure of HMAT (Heterogeneous
>> +# Memory Attribute Table)
>> +#
>> +# For more information of @HmatLBDataType see
>> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
>> +#
>> +# @access-latency: access latency (picoseconds)
>> +#
>> +# @read-latency: read latency (picoseconds)
>> +#
>> +# @write-latency: write latency (picoseconds)
>> +#
>> +# @access-bandwidth: access bandwidth (MB/s)
>> +#
>> +# @read-bandwidth: read bandwidth (MB/s)
>> +#
>> +# @write-bandwidth: write bandwidth (MB/s)
> 
> Are these really the best scales?
> 
> 
>> +
>> +##
>> +# @NumaHmatLBOptions:
>> +#
>> +# Set the system locality latency and bandwidth information
>> +# between Initiator and Target proximity Domains.
>> +#
>> +# For more information of @NumaHmatLBOptions see
>> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
>> +#
>> +# @initiator: the Initiator Proximity Domain.
>> +#
>> +# @target: the Target Proximity Domain.
>> +#
>> +# @hierarchy: the Memory Hierarchy. Indicates the performance
>> +#             of memory or side cache.
>> +#
>> +# @data-type: presents the type of data, access/read/write
>> +#             latency or hit latency.
>> +#
>> +# @latency: the value of latency from @initiator to @target proximity domain,
>> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
>> +#           "us(microsecond)".
>> +#
>> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
>> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "PB(/s)".
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'struct': 'NumaHmatLBOptions',
>> +    'data': {
>> +    'initiator': 'uint16',
>> +    'target': 'uint16',
>> +    'hierarchy': 'HmatLBMemoryHierarchy',
>> +    'data-type': 'HmatLBDataType',
>> +    '*latency': 'str',
>> +    '*bandwidth': 'str' }}
> 
> ...and then parsing strings instead of taking raw integers?  Parsing
> strings is okay for HMP, but for QMP, our goal should be a single
> representation with no additional sugar on top.  Latency and bandwidth
> should be int in a single scale.
> 
> 
>> +++ b/qemu-options.hx
>> @@ -164,16 +164,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>       "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa dist,src=source,dst=destination,val=distance\n"
>> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
> 
> Command-line parsing can then take human-written scaled numbers, and
> pre-convert them into the single scale accepted by the QMP interface.
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-14  2:39     ` Dan Williams
@ 2019-08-14  5:13       ` Tao Xu
  2019-08-14 21:29         ` Dan Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-08-14  5:13 UTC (permalink / raw)
  To: Dan Williams, Igor Mammedov
  Cc: Eduardo Habkost, Jingqi Liu, Du, Fan, Qemu Developers, daniel,
	Jonathan Cameron

On 8/14/2019 10:39 AM, Dan Williams wrote:
> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Fri,  9 Aug 2019 14:57:25 +0800
>> Tao <tao3.xu@intel.com> wrote:
>>
>>> From: Tao Xu <tao3.xu@intel.com>
>>>
>>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
>>> The initiator represents processor which access to memory. And in 5.2.27.3
>>> Memory Proximity Domain Attributes Structure, the attached initiator is
>>> defined as where the memory controller responsible for a memory proximity
>>> domain. With attached initiator information, the topology of heterogeneous
>>> memory can be described.
>>>
>>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
>>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
>>> the platform's HMAT tables.
>>>
>>> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
>>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>> ---
>>>
>>> No changes in v9
>>> ---
>>>   hw/core/machine.c     | 24 ++++++++++++++++++++++++
>>>   hw/core/numa.c        | 13 +++++++++++++
>>>   include/sysemu/numa.h |  3 +++
>>>   qapi/machine.json     |  6 +++++-
>>>   qemu-options.hx       | 27 +++++++++++++++++++++++----
>>>   5 files changed, 68 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index 3c55470103..113184a9df 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>                                  const CpuInstanceProperties *props, Error **errp)
>>>   {
>>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>>       bool match = false;
>>>       int i;
>>>
>>> @@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>           match = true;
>>>           slot->props.node_id = props->node_id;
>>>           slot->props.has_node_id = props->has_node_id;
>>> +
>>> +        if (numa_info[props->node_id].initiator_valid &&
>>> +            (props->node_id != numa_info[props->node_id].initiator)) {
>>> +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
>>> +                       " should be itself.", props->node_id);
>>> +            return;
>>> +        }
>>> +        numa_info[props->node_id].initiator_valid = true;
>>> +        numa_info[props->node_id].has_cpu = true;
>>> +        numa_info[props->node_id].initiator = props->node_id;
>>>       }
>>>
>>>       if (!match) {
>>> @@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>>>       GString *s = g_string_new(NULL);
>>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>>       const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
>>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>>
>>>       assert(machine->numa_state->num_nodes);
>>>       for (i = 0; i < possible_cpus->len; i++) {
>>> @@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>>>               machine_set_cpu_numa_node(machine, &props, &error_fatal);
>>>           }
>>>       }
>>> +
>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
>>> +        if (numa_info[i].initiator_valid &&
>>> +            !numa_info[numa_info[i].initiator].has_cpu) {
>>                            ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>>
>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>>> +                         " does not exist.", numa_info[i].initiator, i);
>>> +            error_printf("\n");
>>> +
>>> +            exit(1);
>>> +        }
>> it takes care only about nodes that have cpus or memory-only ones that have
>> initiator explicitly provided on CLI. And leaves possibility to have
>> memory-only nodes without initiator mixed with nodes that have initiator.
>> Is it valid to have mixed configuration?
>> Should we forbid it?
> 
> The spec talks about the "Proximity Domain for the Attached Initiator"
> field only being valid if the memory controller for the memory can be
> identified by an initiator id in the SRAT. So I expect the only way to
> define a memory proximity domain without this local initiator is to
> allow specifying a node-id that does not have an entry in the SRAT.
> 
Hi Dan,

So there may be a situation for the Attached Initiator field is not
valid? If true, I would allow user to input Initiator invalid.

> That would be a useful feature for testing OS HMAT parsing behavior,
> and may match platforms that exist in practice.
> 
>>
>>> +    }
>>> +
>>>       if (s->len && !qtest_enabled()) {
>>>           warn_report("CPU(s) not present in any NUMA nodes: %s",
>>>                       s->str);
>>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>>> index 8fcbba05d6..cfb6339810 100644
>>> --- a/hw/core/numa.c
>>> +++ b/hw/core/numa.c
>>> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>>           numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>>>           numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>>>       }
>>> +
>>> +    if (node->has_initiator) {
>>> +        if (numa_info[nodenr].initiator_valid &&
>>> +            (node->initiator != numa_info[nodenr].initiator)) {
>>> +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
>>> +                       "set to node %" PRIu16, nodenr,
>>> +                       numa_info[nodenr].initiator);
>>> +            return;
>>> +        }
>>> +
>>> +        numa_info[nodenr].initiator_valid = true;
>>> +        numa_info[nodenr].initiator = node->initiator;
>>                                               ^^^
>> not validated  user input? (which could lead to read beyond numa_info[] boundaries
>> in previous hunk).
>>
>>> +    }
>>>       numa_info[nodenr].present = true;
>>>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>>>       ms->numa_state->num_nodes++;
>>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>>> index 76da3016db..46ad06e000 100644
>>> --- a/include/sysemu/numa.h
>>> +++ b/include/sysemu/numa.h
>>> @@ -10,6 +10,9 @@ struct NodeInfo {
>>>       uint64_t node_mem;
>>>       struct HostMemoryBackend *node_memdev;
>>>       bool present;
>>> +    bool has_cpu;
>>> +    bool initiator_valid;
>>> +    uint16_t initiator;
>>>       uint8_t distance[MAX_NODES];
>>>   };
>>>
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 6db8a7e2ec..05e367d26a 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -414,6 +414,9 @@
>>>   # @memdev: memory backend object.  If specified for one node,
>>>   #          it must be specified for all nodes.
>>>   #
>>> +# @initiator: the initiator numa nodeid that is closest (as in directly
>>> +#             attached) to this numa node (since 4.2)
>> well, it's pretty unclear what doc comment means (unless reader knows well
>> specific part of ACPI spec)
>>
>> suggest to rephrase to something more understandable for unaware
>> readers (+ possible reference to spec for those who is interested
>> in spec definition since this doc is meant for developers).
>>
>>> +#
>>>   # Since: 2.1
>>>   ##
>>>   { 'struct': 'NumaNodeOptions',
>>> @@ -421,7 +424,8 @@
>>>      '*nodeid': 'uint16',
>>>      '*cpus':   ['uint16'],
>>>      '*mem':    'size',
>>> -   '*memdev': 'str' }}
>>> +   '*memdev': 'str',
>>> +   '*initiator': 'uint16' }}
>>>
>>>   ##
>>>   # @NumaDistOptions:
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index 9621e934c0..c480781992 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>>>   ETEXI
>>>
>>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>>> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>>> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>>>       QEMU_ARCH_ALL)
>>>   STEXI
>>> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>>> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>>> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>>>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>>>   @findex -numa
>>> @@ -215,6 +215,25 @@ split equally between them.
>>>   @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>>>   if one node uses @samp{memdev}, all of them have to use it.
>>>
>>> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
>>                                    ^^^^^^^       ^^^^^^^^^^^^^^
>> above will result in "initiator NUMA initiator", was it your intention?
>>
>>> +closest (as in directly attached) to this NUMA @var{node}.
>> Again suggest replace spec language with something more user friendly
>> (this time without spec reference as it's geared for end user)
>>
>>> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
>> Following example creates a machine with 2 NUMA ...
>>
>>> +node 1 has only memory, and its' initiator is node 0. Note that because
>>> +node 0 has CPU, by default the initiator of node 0 is itself and must be
>>> +itself.
>>> +@example
>>> +-M pc \
>>> +-m 2G,slots=2,maxmem=4G \
>>> +-object memory-backend-ram,size=1G,id=m0 \
>>> +-object memory-backend-ram,size=1G,id=m1 \
>>> +-numa node,nodeid=0,memdev=m0 \
>>> +-numa node,nodeid=1,memdev=m1,initiator=0 \
>>> +-smp 2,sockets=2,maxcpus=2  \
>>> +-numa cpu,node-id=0,socket-id=0 \
>>> +-numa cpu,node-id=0,socket-id=1 \
>>> +@end example
>>> +
>>>   @var{source} and @var{destination} are NUMA node IDs.
>>>   @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>>>   The distance from a node to itself is always 10. If any pair of nodes is
>>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb
  2019-08-13 21:55   ` Eduardo Habkost
@ 2019-08-14 13:08     ` Cédric Le Goater
  0 siblings, 0 replies; 34+ messages in thread
From: Cédric Le Goater @ 2019-08-14 13:08 UTC (permalink / raw)
  To: Eduardo Habkost, Tao
  Cc: Peter Maydell, imammedo, qemu-devel, daniel, Edgar E. Iglesias,
	Rob Herring, Andrey Smirnov, Joel Stanley, Alistair Francis,
	jingqi.liu, fan.du, Leif Lindholm, Beniamino Galvani, qemu-arm,
	Jan Kiszka, jonathan.cameron, dan.j.williams, Radoslaw Biernacki,
	Andrew Jeffery, Philippe Mathieu-Daudé,
	Andrew Baumann, Jean-Christophe Dubois, Igor Mitsyanko,
	Peter Chubb

On 13/08/2019 23:55, Eduardo Habkost wrote:
> 
> CCing ARM maintainers.  I'd like to at least get one Acked-by from
> them before queueing this on machine-next.
> 
> 
> On Fri, Aug 09, 2019 at 02:57:21PM +0800, Tao wrote:
>> From: Tao Xu <tao3.xu@intel.com>
>>
>> In struct arm_boot_info, kernel_filename, initrd_filename and
>> kernel_cmdline are copied from from MachineState. This patch add
>> MachineState as a parameter into arm_load_dtb() and move the copy chunk
>> of kernel_filename, initrd_filename and kernel_cmdline into
>> arm_load_kernel().
>>
>> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
>> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v9
>> ---
>>  hw/arm/aspeed.c           |  5 +----
>>  hw/arm/boot.c             | 14 ++++++++------
>>  hw/arm/collie.c           |  8 +-------
>>  hw/arm/cubieboard.c       |  5 +----
>>  hw/arm/exynos4_boards.c   |  7 ++-----
>>  hw/arm/highbank.c         |  8 +-------
>>  hw/arm/imx25_pdk.c        |  5 +----
>>  hw/arm/integratorcp.c     |  8 +-------
>>  hw/arm/kzm.c              |  5 +----
>>  hw/arm/mainstone.c        |  5 +----
>>  hw/arm/mcimx6ul-evk.c     |  5 +----
>>  hw/arm/mcimx7d-sabre.c    |  5 +----
>>  hw/arm/musicpal.c         |  8 +-------
>>  hw/arm/nseries.c          |  5 +----
>>  hw/arm/omap_sx1.c         |  5 +----
>>  hw/arm/palm.c             | 10 ++--------
>>  hw/arm/raspi.c            |  6 +-----
>>  hw/arm/realview.c         |  5 +----
>>  hw/arm/sabrelite.c        |  5 +----
>>  hw/arm/sbsa-ref.c         |  3 +--
>>  hw/arm/spitz.c            |  5 +----
>>  hw/arm/tosa.c             |  8 +-------
>>  hw/arm/versatilepb.c      |  5 +----
>>  hw/arm/vexpress.c         |  5 +----
>>  hw/arm/virt.c             |  8 +++-----
>>  hw/arm/xilinx_zynq.c      |  8 +-------
>>  hw/arm/xlnx-versal-virt.c |  7 ++-----
>>  hw/arm/xlnx-zcu102.c      |  5 +----
>>  hw/arm/z2.c               |  8 +-------
>>  include/hw/arm/boot.h     |  4 ++--
>>  30 files changed, 43 insertions(+), 147 deletions(-)
>>
>> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
>> index 843b708247..f8733b86b9 100644
>> --- a/hw/arm/aspeed.c
>> +++ b/hw/arm/aspeed.c
>> @@ -241,9 +241,6 @@ static void aspeed_board_init(MachineState *machine,
>>          write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
>>      }
>>  
>> -    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
>> -    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
>> -    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>>      aspeed_board_binfo.ram_size = ram_size;
>>      aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM];
>>      aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus;
>> @@ -252,7 +249,7 @@ static void aspeed_board_init(MachineState *machine,
>>          cfg->i2c_init(bmc);
>>      }
>>  
>> -    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
>>  }

It looks OK to me. 

To be noted that the Aspeed machine use machine->kernel_filename to detect 
it is running without a bootloader which does some special settings, such 
as unlocking devices. In that case, we emulate the same behaviour.

Acked-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

>>  static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index c2b89b3bb9..ba604f8277 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -524,7 +524,7 @@ static void fdt_add_psci_node(void *fdt)
>>  }
>>  
>>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>> -                 hwaddr addr_limit, AddressSpace *as)
>> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
>>  {
>>      void *fdt = NULL;
>>      int size, rc, n = 0;
>> @@ -627,9 +627,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>          qemu_fdt_add_subnode(fdt, "/chosen");
>>      }
>>  
>> -    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
>> +    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
>>          rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
>> -                                     binfo->kernel_cmdline);
>> +                                     ms->kernel_cmdline);
>>          if (rc < 0) {
>>              fprintf(stderr, "couldn't set /chosen/bootargs\n");
>>              goto fail;
>> @@ -1261,7 +1261,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
>>       */
>>  }
>>  
>> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
>>  {
>>      CPUState *cs;
>>      AddressSpace *as = arm_boot_address_space(cpu, info);
>> @@ -1282,7 +1282,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>>       * doesn't support secure.
>>       */
>>      assert(!(info->secure_board_setup && kvm_enabled()));
>> -
>> +    info->kernel_filename = ms->kernel_filename;
>> +    info->kernel_cmdline = ms->kernel_cmdline;
>> +    info->initrd_filename = ms->initrd_filename;
>>      info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
>>      info->dtb_limit = 0;
>>  
>> @@ -1294,7 +1296,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>>      }
>>  
>>      if (!info->skip_dtb_autoload && have_dtb(info)) {
>> -        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
>> +        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>>              exit(1);
>>          }
>>      }
>> diff --git a/hw/arm/collie.c b/hw/arm/collie.c
>> index 3db3c56004..72bc8f26e5 100644
>> --- a/hw/arm/collie.c
>> +++ b/hw/arm/collie.c
>> @@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
>>  
>>  static void collie_init(MachineState *machine)
>>  {
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      StrongARMState *s;
>>      DriveInfo *dinfo;
>>      MemoryRegion *sysmem = get_system_memory();
>> @@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
>>  
>>      sysbus_create_simple("scoop", 0x40800000, NULL);
>>  
>> -    collie_binfo.kernel_filename = kernel_filename;
>> -    collie_binfo.kernel_cmdline = kernel_cmdline;
>> -    collie_binfo.initrd_filename = initrd_filename;
>>      collie_binfo.board_id = 0x208;
>> -    arm_load_kernel(s->cpu, &collie_binfo);
>> +    arm_load_kernel(s->cpu, machine, &collie_binfo);
>>  }
>>  
>>  static void collie_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
>> index f7c8a5985a..d992fa087a 100644
>> --- a/hw/arm/cubieboard.c
>> +++ b/hw/arm/cubieboard.c
>> @@ -72,10 +72,7 @@ static void cubieboard_init(MachineState *machine)
>>      /* TODO create and connect IDE devices for ide_drive_get() */
>>  
>>      cubieboard_binfo.ram_size = machine->ram_size;
>> -    cubieboard_binfo.kernel_filename = machine->kernel_filename;
>> -    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    cubieboard_binfo.initrd_filename = machine->initrd_filename;
>> -    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
>> +    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
>>  }
>>  
>>  static void cubieboard_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
>> index ac0b0dc2a9..da402d5216 100644
>> --- a/hw/arm/exynos4_boards.c
>> +++ b/hw/arm/exynos4_boards.c
>> @@ -120,9 +120,6 @@ exynos4_boards_init_common(MachineState *machine,
>>      exynos4_board_binfo.board_id = exynos4_board_id[board_type];
>>      exynos4_board_binfo.smp_bootreg_addr =
>>              exynos4_board_smp_bootreg_addr[board_type];
>> -    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
>> -    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
>> -    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>>      exynos4_board_binfo.gic_cpu_if_addr =
>>              EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
>>  
>> @@ -141,7 +138,7 @@ static void nuri_init(MachineState *machine)
>>  {
>>      exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
>>  
>> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>>  }
>>  
>>  static void smdkc210_init(MachineState *machine)
>> @@ -151,7 +148,7 @@ static void smdkc210_init(MachineState *machine)
>>  
>>      lan9215_init(SMDK_LAN9118_BASE_ADDR,
>>              qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
>> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>>  }
>>  
>>  static void nuri_class_init(ObjectClass *oc, void *data)
>> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
>> index def0f1ce6a..1a35b6d82f 100644
>> --- a/hw/arm/highbank.c
>> +++ b/hw/arm/highbank.c
>> @@ -234,9 +234,6 @@ enum cxmachines {
>>  static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>>  {
>>      ram_addr_t ram_size = machine->ram_size;
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      DeviceState *dev = NULL;
>>      SysBusDevice *busdev;
>>      qemu_irq pic[128];
>> @@ -388,9 +385,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>>      /* TODO create and connect IDE devices for ide_drive_get() */
>>  
>>      highbank_binfo.ram_size = ram_size;
>> -    highbank_binfo.kernel_filename = kernel_filename;
>> -    highbank_binfo.kernel_cmdline = kernel_cmdline;
>> -    highbank_binfo.initrd_filename = initrd_filename;
>>      /* highbank requires a dtb in order to boot, and the dtb will override
>>       * the board ID. The following value is ignored, so set it to -1 to be
>>       * clear that the value is meaningless.
>> @@ -410,7 +404,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>>                      "may not boot.");
>>      }
>>  
>> -    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
>>  }
>>  
>>  static void highbank_init(MachineState *machine)
>> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
>> index 5d673e47bc..c76fc2bd94 100644
>> --- a/hw/arm/imx25_pdk.c
>> +++ b/hw/arm/imx25_pdk.c
>> @@ -116,9 +116,6 @@ static void imx25_pdk_init(MachineState *machine)
>>      }
>>  
>>      imx25_pdk_binfo.ram_size = machine->ram_size;
>> -    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
>> -    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
>>      imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
>>      imx25_pdk_binfo.board_id = 1771,
>>      imx25_pdk_binfo.nb_cpus = 1;
>> @@ -129,7 +126,7 @@ static void imx25_pdk_init(MachineState *machine)
>>       * fail.
>>       */
>>      if (!qtest_enabled()) {
>> -        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
>> +        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
>>      }
>>  }
>>  
>> diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
>> index 200c0107f0..4d9e9c9e49 100644
>> --- a/hw/arm/integratorcp.c
>> +++ b/hw/arm/integratorcp.c
>> @@ -578,9 +578,6 @@ static struct arm_boot_info integrator_binfo = {
>>  static void integratorcp_init(MachineState *machine)
>>  {
>>      ram_addr_t ram_size = machine->ram_size;
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      Object *cpuobj;
>>      ARMCPU *cpu;
>>      MemoryRegion *address_space_mem = get_system_memory();
>> @@ -650,10 +647,7 @@ static void integratorcp_init(MachineState *machine)
>>      sysbus_create_simple("pl110", 0xc0000000, pic[22]);
>>  
>>      integrator_binfo.ram_size = ram_size;
>> -    integrator_binfo.kernel_filename = kernel_filename;
>> -    integrator_binfo.kernel_cmdline = kernel_cmdline;
>> -    integrator_binfo.initrd_filename = initrd_filename;
>> -    arm_load_kernel(cpu, &integrator_binfo);
>> +    arm_load_kernel(cpu, machine, &integrator_binfo);
>>  }
>>  
>>  static void integratorcp_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
>> index 59d2102dc5..5ff419a555 100644
>> --- a/hw/arm/kzm.c
>> +++ b/hw/arm/kzm.c
>> @@ -126,13 +126,10 @@ static void kzm_init(MachineState *machine)
>>      }
>>  
>>      kzm_binfo.ram_size = machine->ram_size;
>> -    kzm_binfo.kernel_filename = machine->kernel_filename;
>> -    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    kzm_binfo.initrd_filename = machine->initrd_filename;
>>      kzm_binfo.nb_cpus = 1;
>>  
>>      if (!qtest_enabled()) {
>> -        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
>> +        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
>>      }
>>  }
>>  
>> diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
>> index cd1f904c6c..c76cfb5dd1 100644
>> --- a/hw/arm/mainstone.c
>> +++ b/hw/arm/mainstone.c
>> @@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
>>      smc91c111_init(&nd_table[0], MST_ETH_PHYS,
>>                      qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
>>  
>> -    mainstone_binfo.kernel_filename = machine->kernel_filename;
>> -    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    mainstone_binfo.initrd_filename = machine->initrd_filename;
>>      mainstone_binfo.board_id = arm_id;
>> -    arm_load_kernel(mpu->cpu, &mainstone_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
>>  }
>>  
>>  static void mainstone_init(MachineState *machine)
>> diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
>> index 1f6f4aed97..e8a9b03069 100644
>> --- a/hw/arm/mcimx6ul-evk.c
>> +++ b/hw/arm/mcimx6ul-evk.c
>> @@ -39,9 +39,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
>>          .loader_start = FSL_IMX6UL_MMDC_ADDR,
>>          .board_id = -1,
>>          .ram_size = machine->ram_size,
>> -        .kernel_filename = machine->kernel_filename,
>> -        .kernel_cmdline = machine->kernel_cmdline,
>> -        .initrd_filename = machine->initrd_filename,
>>          .nb_cpus = machine->smp.cpus,
>>      };
>>  
>> @@ -71,7 +68,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
>>      }
>>  
>>      if (!qtest_enabled()) {
>> -        arm_load_kernel(&s->soc.cpu, &boot_info);
>> +        arm_load_kernel(&s->soc.cpu, machine, &boot_info);
>>      }
>>  }
>>  
>> diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
>> index 72eab03a0c..3123d8767f 100644
>> --- a/hw/arm/mcimx7d-sabre.c
>> +++ b/hw/arm/mcimx7d-sabre.c
>> @@ -42,9 +42,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
>>          .loader_start = FSL_IMX7_MMDC_ADDR,
>>          .board_id = -1,
>>          .ram_size = machine->ram_size,
>> -        .kernel_filename = machine->kernel_filename,
>> -        .kernel_cmdline = machine->kernel_cmdline,
>> -        .initrd_filename = machine->initrd_filename,
>>          .nb_cpus = machine->smp.cpus,
>>      };
>>  
>> @@ -74,7 +71,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
>>      }
>>  
>>      if (!qtest_enabled()) {
>> -        arm_load_kernel(&s->soc.cpu[0], &boot_info);
>> +        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
>>      }
>>  }
>>  
>> diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
>> index 95d56f3208..a53ee12737 100644
>> --- a/hw/arm/musicpal.c
>> +++ b/hw/arm/musicpal.c
>> @@ -1568,9 +1568,6 @@ static struct arm_boot_info musicpal_binfo = {
>>  
>>  static void musicpal_init(MachineState *machine)
>>  {
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      ARMCPU *cpu;
>>      qemu_irq pic[32];
>>      DeviceState *dev;
>> @@ -1699,10 +1696,7 @@ static void musicpal_init(MachineState *machine)
>>      sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
>>  
>>      musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
>> -    musicpal_binfo.kernel_filename = kernel_filename;
>> -    musicpal_binfo.kernel_cmdline = kernel_cmdline;
>> -    musicpal_binfo.initrd_filename = initrd_filename;
>> -    arm_load_kernel(cpu, &musicpal_binfo);
>> +    arm_load_kernel(cpu, machine, &musicpal_binfo);
>>  }
>>  
>>  static void musicpal_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
>> index 4a79f5c88b..31dd2f1b51 100644
>> --- a/hw/arm/nseries.c
>> +++ b/hw/arm/nseries.c
>> @@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
>>  
>>      if (machine->kernel_filename) {
>>          /* Or at the linux loader.  */
>> -        binfo->kernel_filename = machine->kernel_filename;
>> -        binfo->kernel_cmdline = machine->kernel_cmdline;
>> -        binfo->initrd_filename = machine->initrd_filename;
>> -        arm_load_kernel(s->mpu->cpu, binfo);
>> +        arm_load_kernel(s->mpu->cpu, machine, binfo);
>>  
>>          qemu_register_reset(n8x0_boot_init, s);
>>      }
>> diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
>> index cae78d0a36..3cc2817f06 100644
>> --- a/hw/arm/omap_sx1.c
>> +++ b/hw/arm/omap_sx1.c
>> @@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
>>      }
>>  
>>      /* Load the kernel.  */
>> -    sx1_binfo.kernel_filename = machine->kernel_filename;
>> -    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    sx1_binfo.initrd_filename = machine->initrd_filename;
>> -    arm_load_kernel(mpu->cpu, &sx1_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
>>  
>>      /* TODO: fix next line */
>>      //~ qemu_console_resize(ds, 640, 480);
>> diff --git a/hw/arm/palm.c b/hw/arm/palm.c
>> index 9eb9612bce..67ab30b5bc 100644
>> --- a/hw/arm/palm.c
>> +++ b/hw/arm/palm.c
>> @@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
>>  
>>  static void palmte_init(MachineState *machine)
>>  {
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      MemoryRegion *address_space_mem = get_system_memory();
>>      struct omap_mpu_state_s *mpu;
>>      int flash_size = 0x00800000;
>> @@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
>>          }
>>      }
>>  
>> -    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
>> +    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
>>          fprintf(stderr, "Kernel or ROM image must be specified\n");
>>          exit(1);
>>      }
>>  
>>      /* Load the kernel.  */
>> -    palmte_binfo.kernel_filename = kernel_filename;
>> -    palmte_binfo.kernel_cmdline = kernel_cmdline;
>> -    palmte_binfo.initrd_filename = initrd_filename;
>> -    arm_load_kernel(mpu->cpu, &palmte_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
>>  }
>>  
>>  static void palmte_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
>> index 5b2620acb4..74c062d05e 100644
>> --- a/hw/arm/raspi.c
>> +++ b/hw/arm/raspi.c
>> @@ -157,13 +157,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
>>  
>>          binfo.entry = firmware_addr;
>>          binfo.firmware_loaded = true;
>> -    } else {
>> -        binfo.kernel_filename = machine->kernel_filename;
>> -        binfo.kernel_cmdline = machine->kernel_cmdline;
>> -        binfo.initrd_filename = machine->initrd_filename;
>>      }
>>  
>> -    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
>>  }
>>  
>>  static void raspi_init(MachineState *machine, int version)
>> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
>> index 7c56c8d2ed..5a3e65ddd6 100644
>> --- a/hw/arm/realview.c
>> +++ b/hw/arm/realview.c
>> @@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
>>      memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
>>  
>>      realview_binfo.ram_size = ram_size;
>> -    realview_binfo.kernel_filename = machine->kernel_filename;
>> -    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    realview_binfo.initrd_filename = machine->initrd_filename;
>>      realview_binfo.nb_cpus = smp_cpus;
>>      realview_binfo.board_id = realview_board_id[board_type];
>>      realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
>> -    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
>>  }
>>  
>>  static void realview_eb_init(MachineState *machine)
>> diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
>> index 934f4c9261..8f4b68e14c 100644
>> --- a/hw/arm/sabrelite.c
>> +++ b/hw/arm/sabrelite.c
>> @@ -102,16 +102,13 @@ static void sabrelite_init(MachineState *machine)
>>      }
>>  
>>      sabrelite_binfo.ram_size = machine->ram_size;
>> -    sabrelite_binfo.kernel_filename = machine->kernel_filename;
>> -    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    sabrelite_binfo.initrd_filename = machine->initrd_filename;
>>      sabrelite_binfo.nb_cpus = machine->smp.cpus;
>>      sabrelite_binfo.secure_boot = true;
>>      sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
>>      sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
>>  
>>      if (!qtest_enabled()) {
>> -        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
>> +        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
>>      }
>>  }
>>  
>> diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
>> index 9c67d5c6f9..2aba3c58c5 100644
>> --- a/hw/arm/sbsa-ref.c
>> +++ b/hw/arm/sbsa-ref.c
>> @@ -709,13 +709,12 @@ static void sbsa_ref_init(MachineState *machine)
>>      create_pcie(sms, pic);
>>  
>>      sms->bootinfo.ram_size = machine->ram_size;
>> -    sms->bootinfo.kernel_filename = machine->kernel_filename;
>>      sms->bootinfo.nb_cpus = smp_cpus;
>>      sms->bootinfo.board_id = -1;
>>      sms->bootinfo.loader_start = sbsa_ref_memmap[SBSA_MEM].base;
>>      sms->bootinfo.get_dtb = sbsa_ref_dtb;
>>      sms->bootinfo.firmware_loaded = firmware_loaded;
>> -    arm_load_kernel(ARM_CPU(first_cpu), &sms->bootinfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
>>  }
>>  
>>  static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
>> diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
>> index 723cf5d592..42338696b3 100644
>> --- a/hw/arm/spitz.c
>> +++ b/hw/arm/spitz.c
>> @@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
>>          /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
>>          spitz_microdrive_attach(mpu, 0);
>>  
>> -    spitz_binfo.kernel_filename = machine->kernel_filename;
>> -    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    spitz_binfo.initrd_filename = machine->initrd_filename;
>>      spitz_binfo.board_id = arm_id;
>> -    arm_load_kernel(mpu->cpu, &spitz_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
>>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>>  }
>>  
>> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
>> index 7843d68d46..3a1de81278 100644
>> --- a/hw/arm/tosa.c
>> +++ b/hw/arm/tosa.c
>> @@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
>>  
>>  static void tosa_init(MachineState *machine)
>>  {
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      MemoryRegion *address_space_mem = get_system_memory();
>>      MemoryRegion *rom = g_new(MemoryRegion, 1);
>>      PXA2xxState *mpu;
>> @@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
>>  
>>      tosa_tg_init(mpu);
>>  
>> -    tosa_binfo.kernel_filename = kernel_filename;
>> -    tosa_binfo.kernel_cmdline = kernel_cmdline;
>> -    tosa_binfo.initrd_filename = initrd_filename;
>>      tosa_binfo.board_id = 0x208;
>> -    arm_load_kernel(mpu->cpu, &tosa_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
>>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>>  }
>>  
>> diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
>> index e5857117ac..d3c3c00f55 100644
>> --- a/hw/arm/versatilepb.c
>> +++ b/hw/arm/versatilepb.c
>> @@ -373,11 +373,8 @@ static void versatile_init(MachineState *machine, int board_id)
>>      }
>>  
>>      versatile_binfo.ram_size = machine->ram_size;
>> -    versatile_binfo.kernel_filename = machine->kernel_filename;
>> -    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    versatile_binfo.initrd_filename = machine->initrd_filename;
>>      versatile_binfo.board_id = board_id;
>> -    arm_load_kernel(cpu, &versatile_binfo);
>> +    arm_load_kernel(cpu, machine, &versatile_binfo);
>>  }
>>  
>>  static void vpb_init(MachineState *machine)
>> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
>> index 5d932c27c0..4673a88a8d 100644
>> --- a/hw/arm/vexpress.c
>> +++ b/hw/arm/vexpress.c
>> @@ -707,9 +707,6 @@ static void vexpress_common_init(MachineState *machine)
>>      }
>>  
>>      daughterboard->bootinfo.ram_size = machine->ram_size;
>> -    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
>> -    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
>> -    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
>>      daughterboard->bootinfo.nb_cpus = machine->smp.cpus;
>>      daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
>>      daughterboard->bootinfo.loader_start = daughterboard->loader_start;
>> @@ -719,7 +716,7 @@ static void vexpress_common_init(MachineState *machine)
>>      daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
>>      /* When booting Linux we should be in secure state if the CPU has one. */
>>      daughterboard->bootinfo.secure_boot = vms->secure;
>> -    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
>>  }
>>  
>>  static bool vexpress_get_secure(Object *obj, Error **errp)
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index d9496c9363..6ffb80bf5b 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -1364,6 +1364,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>>  {
>>      VirtMachineState *vms = container_of(notifier, VirtMachineState,
>>                                           machine_done);
>> +    MachineState *ms = MACHINE(vms);
>>      ARMCPU *cpu = ARM_CPU(first_cpu);
>>      struct arm_boot_info *info = &vms->bootinfo;
>>      AddressSpace *as = arm_boot_address_space(cpu, info);
>> @@ -1381,7 +1382,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>>                                         vms->memmap[VIRT_PLATFORM_BUS].size,
>>                                         vms->irqmap[VIRT_PLATFORM_BUS]);
>>      }
>> -    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
>> +    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>>          exit(1);
>>      }
>>  
>> @@ -1707,16 +1708,13 @@ static void machvirt_init(MachineState *machine)
>>      create_platform_bus(vms, pic);
>>  
>>      vms->bootinfo.ram_size = machine->ram_size;
>> -    vms->bootinfo.kernel_filename = machine->kernel_filename;
>> -    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
>> -    vms->bootinfo.initrd_filename = machine->initrd_filename;
>>      vms->bootinfo.nb_cpus = smp_cpus;
>>      vms->bootinfo.board_id = -1;
>>      vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
>>      vms->bootinfo.get_dtb = machvirt_dtb;
>>      vms->bootinfo.skip_dtb_autoload = true;
>>      vms->bootinfo.firmware_loaded = firmware_loaded;
>> -    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
>>  
>>      vms->machine_done.notify = virt_machine_done;
>>      qemu_add_machine_init_done_notifier(&vms->machine_done);
>> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
>> index 89da34808b..c14774e542 100644
>> --- a/hw/arm/xilinx_zynq.c
>> +++ b/hw/arm/xilinx_zynq.c
>> @@ -158,9 +158,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>>  static void zynq_init(MachineState *machine)
>>  {
>>      ram_addr_t ram_size = machine->ram_size;
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      ARMCPU *cpu;
>>      MemoryRegion *address_space_mem = get_system_memory();
>>      MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
>> @@ -303,16 +300,13 @@ static void zynq_init(MachineState *machine)
>>      sysbus_mmio_map(busdev, 0, 0xF8007000);
>>  
>>      zynq_binfo.ram_size = ram_size;
>> -    zynq_binfo.kernel_filename = kernel_filename;
>> -    zynq_binfo.kernel_cmdline = kernel_cmdline;
>> -    zynq_binfo.initrd_filename = initrd_filename;
>>      zynq_binfo.nb_cpus = 1;
>>      zynq_binfo.board_id = 0xd32;
>>      zynq_binfo.loader_start = 0;
>>      zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
>>      zynq_binfo.write_board_setup = zynq_write_board_setup;
>>  
>> -    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
>> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
>>  }
>>  
>>  static void zynq_machine_init(MachineClass *mc)
>> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
>> index f95fde2309..462493c467 100644
>> --- a/hw/arm/xlnx-versal-virt.c
>> +++ b/hw/arm/xlnx-versal-virt.c
>> @@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
>>                                          0, &s->soc.fpd.apu.mr, 0);
>>  
>>      s->binfo.ram_size = machine->ram_size;
>> -    s->binfo.kernel_filename = machine->kernel_filename;
>> -    s->binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    s->binfo.initrd_filename = machine->initrd_filename;
>>      s->binfo.loader_start = 0x0;
>>      s->binfo.get_dtb = versal_virt_get_dtb;
>>      s->binfo.modify_dtb = versal_virt_modify_dtb;
>>      if (machine->kernel_filename) {
>> -        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
>> +        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
>>      } else {
>>          AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
>>                                                    &s->binfo);
>> @@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
>>          s->binfo.loader_start = 0x1000;
>>          s->binfo.dtb_limit = 0x1000000;
>>          if (arm_load_dtb(s->binfo.loader_start,
>> -                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
>> +                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
>>              exit(EXIT_FAILURE);
>>          }
>>      }
>> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
>> index 044d3394c0..53cfe7c1f1 100644
>> --- a/hw/arm/xlnx-zcu102.c
>> +++ b/hw/arm/xlnx-zcu102.c
>> @@ -171,11 +171,8 @@ static void xlnx_zcu102_init(MachineState *machine)
>>      /* TODO create and connect IDE devices for ide_drive_get() */
>>  
>>      xlnx_zcu102_binfo.ram_size = ram_size;
>> -    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
>> -    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
>> -    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
>>      xlnx_zcu102_binfo.loader_start = 0;
>> -    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
>> +    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
>>  }
>>  
>>  static void xlnx_zcu102_machine_instance_init(Object *obj)
>> diff --git a/hw/arm/z2.c b/hw/arm/z2.c
>> index 44aa748d39..2f21421683 100644
>> --- a/hw/arm/z2.c
>> +++ b/hw/arm/z2.c
>> @@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
>>  
>>  static void z2_init(MachineState *machine)
>>  {
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -    const char *initrd_filename = machine->initrd_filename;
>>      MemoryRegion *address_space_mem = get_system_memory();
>>      uint32_t sector_len = 0x10000;
>>      PXA2xxState *mpu;
>> @@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
>>      qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
>>                            qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
>>  
>> -    z2_binfo.kernel_filename = kernel_filename;
>> -    z2_binfo.kernel_cmdline = kernel_cmdline;
>> -    z2_binfo.initrd_filename = initrd_filename;
>>      z2_binfo.board_id = 0x6dd;
>> -    arm_load_kernel(mpu->cpu, &z2_binfo);
>> +    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
>>  }
>>  
>>  static void z2_machine_init(MachineClass *mc)
>> diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
>> index c48cc4c2bc..2673abe81f 100644
>> --- a/include/hw/arm/boot.h
>> +++ b/include/hw/arm/boot.h
>> @@ -133,7 +133,7 @@ struct arm_boot_info {
>>   * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
>>   * machine init done notifiers are called in registration reverse order.
>>   */
>> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
>> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
>>  
>>  AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>>                                       const struct arm_boot_info *info);
>> @@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>>   * Note: Must not be called unless have_dtb(binfo) is true.
>>   */
>>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>> -                 hwaddr addr_limit, AddressSpace *as);
>> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
>>  
>>  /* Write a secure board setup routine with a dummy handler for SMCs */
>>  void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
>> -- 
>> 2.20.1
>>
>>
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-08-13  8:53 ` Tao Xu
@ 2019-08-14 20:57   ` Eduardo Habkost
  2019-08-15  0:53     ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Eduardo Habkost @ 2019-08-14 20:57 UTC (permalink / raw)
  To: Tao Xu
  Cc: jingqi.liu, fan.du, qemu-devel, daniel, jonathan.cameron,
	imammedo, dan.j.williams

On Tue, Aug 13, 2019 at 04:53:33PM +0800, Tao Xu wrote:
> Hi Igor and Eduardo,
> 
> I am wondering if there are more comments about patch 1/11~4/11? Because
> these 4 patch are independent and the patch series are big and pushing for a
> long time. Could the patch 1/11~4/11 be ready for queuing firstly?

Now that I got a few Acked-bys for patch 1/4, I plan to queue
patches 1-4 in machine-next soon.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-14  5:13       ` Tao Xu
@ 2019-08-14 21:29         ` Dan Williams
  2019-08-15  1:56           ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2019-08-14 21:29 UTC (permalink / raw)
  To: Tao Xu
  Cc: Eduardo Habkost, Jingqi Liu, Du, Fan, Qemu Developers, daniel,
	Jonathan Cameron, Igor Mammedov

On Tue, Aug 13, 2019 at 10:14 PM Tao Xu <tao3.xu@intel.com> wrote:
>
> On 8/14/2019 10:39 AM, Dan Williams wrote:
> > On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >>
> >> On Fri,  9 Aug 2019 14:57:25 +0800
> >> Tao <tao3.xu@intel.com> wrote:
> >>
> >>> From: Tao Xu <tao3.xu@intel.com>
> >>>
> >>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> >>> The initiator represents processor which access to memory. And in 5.2.27.3
> >>> Memory Proximity Domain Attributes Structure, the attached initiator is
> >>> defined as where the memory controller responsible for a memory proximity
> >>> domain. With attached initiator information, the topology of heterogeneous
> >>> memory can be described.
> >>>
> >>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> >>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> >>> the platform's HMAT tables.
> >>>
> >>> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
> >>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> >>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >>> ---
> >>>
> >>> No changes in v9
> >>> ---
> >>>   hw/core/machine.c     | 24 ++++++++++++++++++++++++
> >>>   hw/core/numa.c        | 13 +++++++++++++
> >>>   include/sysemu/numa.h |  3 +++
> >>>   qapi/machine.json     |  6 +++++-
> >>>   qemu-options.hx       | 27 +++++++++++++++++++++++----
> >>>   5 files changed, 68 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >>> index 3c55470103..113184a9df 100644
> >>> --- a/hw/core/machine.c
> >>> +++ b/hw/core/machine.c
> >>> @@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >>>                                  const CpuInstanceProperties *props, Error **errp)
> >>>   {
> >>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
> >>> +    NodeInfo *numa_info = machine->numa_state->nodes;
> >>>       bool match = false;
> >>>       int i;
> >>>
> >>> @@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >>>           match = true;
> >>>           slot->props.node_id = props->node_id;
> >>>           slot->props.has_node_id = props->has_node_id;
> >>> +
> >>> +        if (numa_info[props->node_id].initiator_valid &&
> >>> +            (props->node_id != numa_info[props->node_id].initiator)) {
> >>> +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> >>> +                       " should be itself.", props->node_id);
> >>> +            return;
> >>> +        }
> >>> +        numa_info[props->node_id].initiator_valid = true;
> >>> +        numa_info[props->node_id].has_cpu = true;
> >>> +        numa_info[props->node_id].initiator = props->node_id;
> >>>       }
> >>>
> >>>       if (!match) {
> >>> @@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
> >>>       GString *s = g_string_new(NULL);
> >>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
> >>>       const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
> >>> +    NodeInfo *numa_info = machine->numa_state->nodes;
> >>>
> >>>       assert(machine->numa_state->num_nodes);
> >>>       for (i = 0; i < possible_cpus->len; i++) {
> >>> @@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
> >>>               machine_set_cpu_numa_node(machine, &props, &error_fatal);
> >>>           }
> >>>       }
> >>> +
> >>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> >>> +        if (numa_info[i].initiator_valid &&
> >>> +            !numa_info[numa_info[i].initiator].has_cpu) {
> >>                            ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
> >>
> >>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> >>> +                         " does not exist.", numa_info[i].initiator, i);
> >>> +            error_printf("\n");
> >>> +
> >>> +            exit(1);
> >>> +        }
> >> it takes care only about nodes that have cpus or memory-only ones that have
> >> initiator explicitly provided on CLI. And leaves possibility to have
> >> memory-only nodes without initiator mixed with nodes that have initiator.
> >> Is it valid to have mixed configuration?
> >> Should we forbid it?
> >
> > The spec talks about the "Proximity Domain for the Attached Initiator"
> > field only being valid if the memory controller for the memory can be
> > identified by an initiator id in the SRAT. So I expect the only way to
> > define a memory proximity domain without this local initiator is to
> > allow specifying a node-id that does not have an entry in the SRAT.
> >
> Hi Dan,
>
> So there may be a situation for the Attached Initiator field is not
> valid? If true, I would allow user to input Initiator invalid.

Yes it's something the OS needs to consider because the platform may
not be able to meet the constraint that a single initiator is
associated with the memory controller for a given memory target. In
retrospect it would have been nice if the spec reserved 0xffffffff for
this purpose, but it seems "not in SRAT" is the only way to identify
memory that is not attached to any single initiator.

> > That would be a useful feature for testing OS HMAT parsing behavior,
> > and may match platforms that exist in practice.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-08-14 20:57   ` Eduardo Habkost
@ 2019-08-15  0:53     ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-08-15  0:53 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Liu, Jingqi, Du, Fan, qemu-devel, daniel, jonathan.cameron,
	imammedo, Williams, Dan J

On 8/15/2019 4:57 AM, Eduardo Habkost wrote:
> On Tue, Aug 13, 2019 at 04:53:33PM +0800, Tao Xu wrote:
>> Hi Igor and Eduardo,
>>
>> I am wondering if there are more comments about patch 1/11~4/11? Because
>> these 4 patch are independent and the patch series are big and pushing for a
>> long time. Could the patch 1/11~4/11 be ready for queuing firstly?
> 
> Now that I got a few Acked-bys for patch 1/4, I plan to queue
> patches 1-4 in machine-next soon.
> 
Thank you very much!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-14 21:29         ` Dan Williams
@ 2019-08-15  1:56           ` Tao Xu
  2019-08-15  2:31             ` Dan Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Tao Xu @ 2019-08-15  1:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Eduardo Habkost, Liu, Jingqi, Du, Fan, Qemu Developers, daniel,
	Jonathan Cameron, Igor Mammedov

On 8/15/2019 5:29 AM, Dan Williams wrote:
> On Tue, Aug 13, 2019 at 10:14 PM Tao Xu <tao3.xu@intel.com> wrote:
>>
>> On 8/14/2019 10:39 AM, Dan Williams wrote:
>>> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> On Fri,  9 Aug 2019 14:57:25 +0800
>>>> Tao <tao3.xu@intel.com> wrote:
>>>>
>>>>> From: Tao Xu <tao3.xu@intel.com>
>>>>>
[...]
>>>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
>>>>> +        if (numa_info[i].initiator_valid &&
>>>>> +            !numa_info[numa_info[i].initiator].has_cpu) {
>>>>                             ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>>>>
>>>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>>>>> +                         " does not exist.", numa_info[i].initiator, i);
>>>>> +            error_printf("\n");
>>>>> +
>>>>> +            exit(1);
>>>>> +        }
>>>> it takes care only about nodes that have cpus or memory-only ones that have
>>>> initiator explicitly provided on CLI. And leaves possibility to have
>>>> memory-only nodes without initiator mixed with nodes that have initiator.
>>>> Is it valid to have mixed configuration?
>>>> Should we forbid it?
>>>
>>> The spec talks about the "Proximity Domain for the Attached Initiator"
>>> field only being valid if the memory controller for the memory can be
>>> identified by an initiator id in the SRAT. So I expect the only way to
>>> define a memory proximity domain without this local initiator is to
>>> allow specifying a node-id that does not have an entry in the SRAT.
>>>
>> Hi Dan,
>>
>> So there may be a situation for the Attached Initiator field is not
>> valid? If true, I would allow user to input Initiator invalid.
> 
> Yes it's something the OS needs to consider because the platform may
> not be able to meet the constraint that a single initiator is
> associated with the memory controller for a given memory target. In
> retrospect it would have been nice if the spec reserved 0xffffffff for
> this purpose, but it seems "not in SRAT" is the only way to identify
> memory that is not attached to any single initiator.
> 
But As far as I konw, QEMU can't emulate a NUMA node "not in SRAT". I am 
wondering if it is effective only set Initiator invalid?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-15  1:56           ` Tao Xu
@ 2019-08-15  2:31             ` Dan Williams
  2019-08-16 14:57               ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2019-08-15  2:31 UTC (permalink / raw)
  To: Tao Xu
  Cc: Eduardo Habkost, Liu, Jingqi, Du, Fan, Qemu Developers, daniel,
	Jonathan Cameron, Igor Mammedov

On Wed, Aug 14, 2019 at 6:57 PM Tao Xu <tao3.xu@intel.com> wrote:
>
> On 8/15/2019 5:29 AM, Dan Williams wrote:
> > On Tue, Aug 13, 2019 at 10:14 PM Tao Xu <tao3.xu@intel.com> wrote:
> >>
> >> On 8/14/2019 10:39 AM, Dan Williams wrote:
> >>> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >>>>
> >>>> On Fri,  9 Aug 2019 14:57:25 +0800
> >>>> Tao <tao3.xu@intel.com> wrote:
> >>>>
> >>>>> From: Tao Xu <tao3.xu@intel.com>
> >>>>>
> [...]
> >>>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> >>>>> +        if (numa_info[i].initiator_valid &&
> >>>>> +            !numa_info[numa_info[i].initiator].has_cpu) {
> >>>>                             ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
> >>>>
> >>>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> >>>>> +                         " does not exist.", numa_info[i].initiator, i);
> >>>>> +            error_printf("\n");
> >>>>> +
> >>>>> +            exit(1);
> >>>>> +        }
> >>>> it takes care only about nodes that have cpus or memory-only ones that have
> >>>> initiator explicitly provided on CLI. And leaves possibility to have
> >>>> memory-only nodes without initiator mixed with nodes that have initiator.
> >>>> Is it valid to have mixed configuration?
> >>>> Should we forbid it?
> >>>
> >>> The spec talks about the "Proximity Domain for the Attached Initiator"
> >>> field only being valid if the memory controller for the memory can be
> >>> identified by an initiator id in the SRAT. So I expect the only way to
> >>> define a memory proximity domain without this local initiator is to
> >>> allow specifying a node-id that does not have an entry in the SRAT.
> >>>
> >> Hi Dan,
> >>
> >> So there may be a situation for the Attached Initiator field is not
> >> valid? If true, I would allow user to input Initiator invalid.
> >
> > Yes it's something the OS needs to consider because the platform may
> > not be able to meet the constraint that a single initiator is
> > associated with the memory controller for a given memory target. In
> > retrospect it would have been nice if the spec reserved 0xffffffff for
> > this purpose, but it seems "not in SRAT" is the only way to identify
> > memory that is not attached to any single initiator.
> >
> But As far as I konw, QEMU can't emulate a NUMA node "not in SRAT". I am
> wondering if it is effective only set Initiator invalid?

You don't need to emulate a NUMA node not in SRAT. Just put a number
in this HMAT entry larger than the largest proximity domain number
found in the SRAT.
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-14  2:24     ` Tao Xu
@ 2019-08-16 14:47       ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2019-08-16 14:47 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, daniel,
	jonathan.cameron, dan.j.williams

On Wed, 14 Aug 2019 10:24:03 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 8/13/2019 11:00 PM, Igor Mammedov wrote:
> > On Fri,  9 Aug 2019 14:57:25 +0800
> > Tao <tao3.xu@intel.com> wrote:
> >   
> >> From: Tao Xu <tao3.xu@intel.com>
> >>
> >> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> >> The initiator represents processor which access to memory. And in 5.2.27.3
> >> Memory Proximity Domain Attributes Structure, the attached initiator is
> >> defined as where the memory controller responsible for a memory proximity
> >> domain. With attached initiator information, the topology of heterogeneous
> >> memory can be described.
> >>
> >> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> >> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> >> the platform's HMAT tables.
> >>
> >> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
> >> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>

see comments below,

PS:
I'll continue reviewing series in a week when I'm back.

> >> ---
> >>
> >> No changes in v9
> >> ---  
> [...]
> >> +
> >> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> >> +        if (numa_info[i].initiator_valid &&
> >> +            !numa_info[numa_info[i].initiator].has_cpu) {  
> >                            ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
> >   
> I will add a error "if (numa_info[i].initiator >= MAX_NODES)" when input.

it'd would be better to validate user input instead, at the place pointed below

> >> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> >> +                         " does not exist.", numa_info[i].initiator, i);
> >> +            error_printf("\n");
> >> +
> >> +            exit(1);
> >> +        }  
> > it takes care only about nodes that have cpus or memory-only ones that have
> > initiator explicitly provided on CLI. And leaves possibility to have
> > memory-only nodes without initiator mixed with nodes that have initiator.
> > Is it valid to have mixed configuration?
> > Should we forbid it?
> >   
> Mixed configuration may indeed trigger bug in the future. Because in 
> this patches we default generate HMAT. But mixed configuration situation 
> or without initiator setting will let mem-only node "Flags" field 0, 
> then the Proximity Domain for the Attached Initiator field is not
> valid.
> 
> List are three situations:
> 
> 1) full configuration, just like
> -object memory-backend-ram,size=1G,id=m0 \
> -object memory-backend-ram,size=1G,id=m1 \
> -object memory-backend-ram,size=1G,id=m2 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1,initiator=0 \
> -numa node,nodeid=2,memdev=m2,initiator=0
> 
> 2) mixed configuration, just like
> -object memory-backend-ram,size=1G,id=m0 \
> -object memory-backend-ram,size=1G,id=m1 \
> -object memory-backend-ram,size=1G,id=m2 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1,initiator=0 \
> -numa node,nodeid=2,memdev=m2
> 
> 3) no configuration, just like
> -object memory-backend-ram,size=1G,id=m0 \
> -object memory-backend-ram,size=1G,id=m1 \
> -object memory-backend-ram,size=1G,id=m2 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1 \
> -numa node,nodeid=2,memdev=m2
> 
> I have 3 ideas:
> 
> 1. HMAT option. Add a machine option like "-machine,hmat=yes", then qemu 
> can have HMAT.
I'd go with it. HAMT even if it's broken won't affect anything unless requested by user.
So we could polish impl. and experiment with it with little risk
to break something


> 2. Default setting. The numa without initiator default set numa node 
> which has cpu 0 as initiator.
> 
> 3. Auto setting. intelligent auto configuration like 
> numa_default_auto_assign_ram, auto set initiator of the memory-only 
> nodes averagely.
numa_default_auto_assign_ram is deprecated.
Usually auto_something bites us back long therm
when we need to change related code so we end up with a bunch of
compat code and maintenance burden that introduces.
(the same applies to made up defaults (i.e. non spec dictated)).

> 
> Therefore, there are 2 different solution:
> 
> 1) HMAT option + Default setting
> 
> 2) HMAT option + Auto setting
> 
> >> +    }
> >> +
> >>       if (s->len && !qtest_enabled()) {
> >>           warn_report("CPU(s) not present in any NUMA nodes: %s",
> >>                       s->str);
> >> diff --git a/hw/core/numa.c b/hw/core/numa.c
> >> index 8fcbba05d6..cfb6339810 100644
> >> --- a/hw/core/numa.c
> >> +++ b/hw/core/numa.c
> >> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> >>           numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
> >>           numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> >>       }
> >> +
> >> +    if (node->has_initiator) {
> >> +        if (numa_info[nodenr].initiator_valid &&
> >> +            (node->initiator != numa_info[nodenr].initiator)) {
> >> +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
> >> +                       "set to node %" PRIu16, nodenr,
> >> +                       numa_info[nodenr].initiator);
> >> +            return;
> >> +        }
> >> +
> >> +        numa_info[nodenr].initiator_valid = true;
> >> +        numa_info[nodenr].initiator = node->initiator;  
> >                                               ^^^
> > not validated  user input? (which could lead to read beyond numa_info[] boundaries
> > in previous hunk).
> >   
> >> +    }
> >>       numa_info[nodenr].present = true;
> >>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> >>       ms->numa_state->num_nodes++;
> >> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> >> index 76da3016db..46ad06e000 100644
> >> --- a/include/sysemu/numa.h
> >> +++ b/include/sysemu/numa.h
> >> @@ -10,6 +10,9 @@ struct NodeInfo {
> >>       uint64_t node_mem;
> >>       struct HostMemoryBackend *node_memdev;
> >>       bool present;
> >> +    bool has_cpu;
> >> +    bool initiator_valid;
> >> +    uint16_t initiator;
> >>       uint8_t distance[MAX_NODES];
> >>   };
> >>   
> >> diff --git a/qapi/machine.json b/qapi/machine.json
> >> index 6db8a7e2ec..05e367d26a 100644
> >> --- a/qapi/machine.json
> >> +++ b/qapi/machine.json
> >> @@ -414,6 +414,9 @@
> >>   # @memdev: memory backend object.  If specified for one node,
> >>   #          it must be specified for all nodes.
> >>   #
> >> +# @initiator: the initiator numa nodeid that is closest (as in directly
> >> +#             attached) to this numa node (since 4.2)  
> > well, it's pretty unclear what doc comment means (unless reader knows well
> > specific part of ACPI spec)
> > 
> > suggest to rephrase to something more understandable for unaware
> > readers (+ possible reference to spec for those who is interested
> > in spec definition since this doc is meant for developers).
> >   
> >> +#
> >>   # Since: 2.1
> >>   ##
> >>   { 'struct': 'NumaNodeOptions',
> >> @@ -421,7 +424,8 @@
> >>      '*nodeid': 'uint16',
> >>      '*cpus':   ['uint16'],
> >>      '*mem':    'size',
> >> -   '*memdev': 'str' }}
> >> +   '*memdev': 'str',
> >> +   '*initiator': 'uint16' }}
> >>   
> >>   ##
> >>   # @NumaDistOptions:
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index 9621e934c0..c480781992 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
> >>   ETEXI
> >>   
> >>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> >> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> >> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> >> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> >> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> >>       "-numa dist,src=source,dst=destination,val=distance\n"
> >>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> >>       QEMU_ARCH_ALL)
> >>   STEXI
> >> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> >> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> >> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
> >>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> >>   @findex -numa
> >> @@ -215,6 +215,25 @@ split equally between them.
> >>   @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
> >>   if one node uses @samp{memdev}, all of them have to use it.
> >>   
> >> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is  
> >                                    ^^^^^^^       ^^^^^^^^^^^^^^
> > above will result in "initiator NUMA initiator", was it your intention?
> >   
> >> +closest (as in directly attached) to this NUMA @var{node}.  
> > Again suggest replace spec language with something more user friendly
> > (this time without spec reference as it's geared for end user)
> >   
> >> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.  
> > Following example creates a machine with 2 NUMA ...
> >   
> >> +node 1 has only memory, and its' initiator is node 0. Note that because
> >> +node 0 has CPU, by default the initiator of node 0 is itself and must be
> >> +itself.
> >> +@example
> >> +-M pc \
> >> +-m 2G,slots=2,maxmem=4G \
> >> +-object memory-backend-ram,size=1G,id=m0 \
> >> +-object memory-backend-ram,size=1G,id=m1 \
> >> +-numa node,nodeid=0,memdev=m0 \
> >> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> >> +-smp 2,sockets=2,maxcpus=2  \
> >> +-numa cpu,node-id=0,socket-id=0 \
> >> +-numa cpu,node-id=0,socket-id=1 \
> >> +@end example
> >> +
> >>   @var{source} and @var{destination} are NUMA node IDs.
> >>   @var{distance} is the NUMA distance from @var{source} to @var{destination}.
> >>   The distance from a node to itself is always 10. If any pair of nodes is  
> >   
> 
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-15  2:31             ` Dan Williams
@ 2019-08-16 14:57               ` Igor Mammedov
  2019-08-20  8:34                 ` Tao Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2019-08-16 14:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Eduardo Habkost, Liu, Jingqi, Tao Xu, Du, Fan, Qemu Developers,
	daniel, Jonathan Cameron

On Wed, 14 Aug 2019 19:31:27 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Wed, Aug 14, 2019 at 6:57 PM Tao Xu <tao3.xu@intel.com> wrote:
> >
> > On 8/15/2019 5:29 AM, Dan Williams wrote:  
> > > On Tue, Aug 13, 2019 at 10:14 PM Tao Xu <tao3.xu@intel.com> wrote:  
> > >>
> > >> On 8/14/2019 10:39 AM, Dan Williams wrote:  
> > >>> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:  
> > >>>>
> > >>>> On Fri,  9 Aug 2019 14:57:25 +0800
> > >>>> Tao <tao3.xu@intel.com> wrote:
> > >>>>  
> > >>>>> From: Tao Xu <tao3.xu@intel.com>
> > >>>>>  
> > [...]  
> > >>>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
> > >>>>> +        if (numa_info[i].initiator_valid &&
> > >>>>> +            !numa_info[numa_info[i].initiator].has_cpu) {  
> > >>>>                             ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
> > >>>>  
> > >>>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
> > >>>>> +                         " does not exist.", numa_info[i].initiator, i);
> > >>>>> +            error_printf("\n");
> > >>>>> +
> > >>>>> +            exit(1);
> > >>>>> +        }  
> > >>>> it takes care only about nodes that have cpus or memory-only ones that have
> > >>>> initiator explicitly provided on CLI. And leaves possibility to have
> > >>>> memory-only nodes without initiator mixed with nodes that have initiator.
> > >>>> Is it valid to have mixed configuration?
> > >>>> Should we forbid it?  
> > >>>
> > >>> The spec talks about the "Proximity Domain for the Attached Initiator"
> > >>> field only being valid if the memory controller for the memory can be
> > >>> identified by an initiator id in the SRAT. So I expect the only way to
> > >>> define a memory proximity domain without this local initiator is to
> > >>> allow specifying a node-id that does not have an entry in the SRAT.
> > >>>  
> > >> Hi Dan,
> > >>
> > >> So there may be a situation for the Attached Initiator field is not
> > >> valid? If true, I would allow user to input Initiator invalid.  
> > >
> > > Yes it's something the OS needs to consider because the platform may
> > > not be able to meet the constraint that a single initiator is
> > > associated with the memory controller for a given memory target. In
> > > retrospect it would have been nice if the spec reserved 0xffffffff for
> > > this purpose, but it seems "not in SRAT" is the only way to identify
> > > memory that is not attached to any single initiator.
> > >  
> > But As far as I konw, QEMU can't emulate a NUMA node "not in SRAT". I am
> > wondering if it is effective only set Initiator invalid?  
> 
> You don't need to emulate a NUMA node not in SRAT. Just put a number
> in this HMAT entry larger than the largest proximity domain number
> found in the SRAT.
> >  
> 

So behavior is really not defined in the spec
(well I wasn't able to convince myself that above behavior is in the spec).

In this case I'd go with a strict check for now not allowing invalid initiator
(we can easily relax check and allow it point to nonsense later but no other way around)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
  2019-08-16 14:57               ` Igor Mammedov
@ 2019-08-20  8:34                 ` Tao Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Tao Xu @ 2019-08-20  8:34 UTC (permalink / raw)
  To: Igor Mammedov, Dan Williams
  Cc: Eduardo Habkost, Liu, Jingqi, Du, Fan, Qemu Developers, daniel,
	Jonathan Cameron

On 8/16/2019 10:57 PM, Igor Mammedov wrote:
> On Wed, 14 Aug 2019 19:31:27 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
>> On Wed, Aug 14, 2019 at 6:57 PM Tao Xu <tao3.xu@intel.com> wrote:
>>>
>>> On 8/15/2019 5:29 AM, Dan Williams wrote:
>>>> On Tue, Aug 13, 2019 at 10:14 PM Tao Xu <tao3.xu@intel.com> wrote:
>>>>>
>>>>> On 8/14/2019 10:39 AM, Dan Williams wrote:
>>>>>> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>>>>>>
>>>>>>> On Fri,  9 Aug 2019 14:57:25 +0800
>>>>>>> Tao <tao3.xu@intel.com> wrote:
>>>>>>>   
>>>>>>>> From: Tao Xu <tao3.xu@intel.com>
>>>>>>>>   
>>> [...]
>>>>>>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
>>>>>>>> +        if (numa_info[i].initiator_valid &&
>>>>>>>> +            !numa_info[numa_info[i].initiator].has_cpu) {
>>>>>>>                              ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>>>>>>>   
>>>>>>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>>>>>>>> +                         " does not exist.", numa_info[i].initiator, i);
>>>>>>>> +            error_printf("\n");
>>>>>>>> +
>>>>>>>> +            exit(1);
>>>>>>>> +        }
>>>>>>> it takes care only about nodes that have cpus or memory-only ones that have
>>>>>>> initiator explicitly provided on CLI. And leaves possibility to have
>>>>>>> memory-only nodes without initiator mixed with nodes that have initiator.
>>>>>>> Is it valid to have mixed configuration?
>>>>>>> Should we forbid it?
>>>>>>
>>>>>> The spec talks about the "Proximity Domain for the Attached Initiator"
>>>>>> field only being valid if the memory controller for the memory can be
>>>>>> identified by an initiator id in the SRAT. So I expect the only way to
>>>>>> define a memory proximity domain without this local initiator is to
>>>>>> allow specifying a node-id that does not have an entry in the SRAT.
>>>>>>   
>>>>> Hi Dan,
>>>>>
>>>>> So there may be a situation for the Attached Initiator field is not
>>>>> valid? If true, I would allow user to input Initiator invalid.
>>>>
>>>> Yes it's something the OS needs to consider because the platform may
>>>> not be able to meet the constraint that a single initiator is
>>>> associated with the memory controller for a given memory target. In
>>>> retrospect it would have been nice if the spec reserved 0xffffffff for
>>>> this purpose, but it seems "not in SRAT" is the only way to identify
>>>> memory that is not attached to any single initiator.
>>>>   
>>> But As far as I konw, QEMU can't emulate a NUMA node "not in SRAT". I am
>>> wondering if it is effective only set Initiator invalid?
>>
>> You don't need to emulate a NUMA node not in SRAT. Just put a number
>> in this HMAT entry larger than the largest proximity domain number
>> found in the SRAT.
>>>   
>>
> 
> So behavior is really not defined in the spec
> (well I wasn't able to convince myself that above behavior is in the spec).
> 
> In this case I'd go with a strict check for now not allowing invalid initiator
> (we can easily relax check and allow it point to nonsense later but no other way around)
> 

So let me summarize the solution, in order to avoid misunderstanding, if 
there are something wrong, pls tell me:

1)
-machine,hmat=yes
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2,initiator=0 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1

then qemu can use HMAT.

2)
if initiator this case:

-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2

then qemu can't boot and show error message.

3)
if initiator this case:

-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2,initiator=1

then qemu can boot and the initiator of nodeid=2 is invalid.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
2019-08-13 21:55   ` Alistair Francis
2019-08-14  1:19     ` Andrew Jeffery
2019-08-13 21:55   ` Eduardo Habkost
2019-08-14 13:08     ` Cédric Le Goater
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
2019-08-13 15:00   ` Igor Mammedov
2019-08-14  2:24     ` Tao Xu
2019-08-16 14:47       ` Igor Mammedov
2019-08-14  2:39     ` Dan Williams
2019-08-14  5:13       ` Tao Xu
2019-08-14 21:29         ` Dan Williams
2019-08-15  1:56           ` Tao Xu
2019-08-15  2:31             ` Dan Williams
2019-08-16 14:57               ` Igor Mammedov
2019-08-20  8:34                 ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
2019-08-12  5:13   ` Daniel Black
2019-08-12  6:11     ` Tao Xu
2019-08-13 15:11   ` Eric Blake
2019-08-14  2:58     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-08-13  8:53 ` Tao Xu
2019-08-14 20:57   ` Eduardo Habkost
2019-08-15  0:53     ` Tao Xu

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox