qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
@ 2019-06-14 15:56 Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb Tao Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the System Physical Address (SPA) Memory Ranges.
The software is expected to use this information as hint for optimization.

The V4 patches link:
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01644.html

Changelog:
v5:
    - spilt the 1-6/11 and 8/11 from patch v4 to build Memory Subsystem
    Address Range Structure(s) and System Locality Latency and Bandwidth
    Information Structure(s) firstly.
    - add 1/8 of patch v5 to simplify arm_load_dtb() (Igor)
    - drop the helper machine_num_numa_nodes() and use
    machine->numa_state->num_nodes (and numa_state->nodes) directly (Igor)
    - Add more descriptions from ACPI spec (Igor)
    - Add the reason of using stub (Igor)
    - Use GArray for NUMA memory ranges data (Igor)
    - Separate hmat_build_lb() (Igor)
    - Drop all global variables and use local variables instead (Igor)
    - Add error message when base unit < 10
    - Update the hmat-lb option example by using '-numa cpu'
    and '-numa memdev' (Igor)

v4:
    - send the patch of "move numa global variables into MachineState"
    together with HMAT patches.
    https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03662.html
    - spilt the 1/8 of v3 patch into two patches, 4/11 introduces
    build_mem_ranges() and 5/11 builds HMAT (Igor)
    - use build_append_int_noprefix() to build parts of ACPI table in
    all patches (Igor)
    - Split 8/8 of patch v3 into two parts, 10/11 introduces NFIT
    generalizations (build_acpi_aml_common), and use it in 11/11 to
    simplify hmat_build_aml (Igor)
    - use MachineState instead of PCMachineState to build HMAT more
    generalic (Igor)
    - move the 7/8 v3 patch into the former patches
    - update the version tag from 4.0 to 4.1
v3:
    - rebase the fixing patch into the jingqi's patches (Eric)
    - update the version tag from 3.10 to 4.0 (Eric)
v2:
  Per Igor and Eric's comments, fix some coding style and small issues:
    - update the version number in qapi/misc.json
    - including the expansion of the acronym HMAT in qapi/misc.json
    - correct spell mistakes in qapi/misc.json and qemu-options.hx
    - fix the comment syle in hw/i386/acpi-build.c
    and hw/acpi/hmat.h
   - remove some unnecessary head files in hw/acpi/hmat.c 
   - use hardcoded numbers from spec to generate
   Memory Subsystem Address Range Structure in hw/acpi/hmat.c
   - drop the struct AcpiHmat and AcpiHmatSpaRange
    in hw/acpi/hmat.h
   - rewrite NFIT code to build _HMA method

Liu Jingqi (3):
  hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI
    HMAT
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s) in ACPI HMAT
  numa: Extend the command-line to provide memory latency and bandwidth
    information

Tao Xu (5):
  hw/arm: simplify arm_load_dtb
  numa: move numa global variable nb_numa_nodes into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable numa_info into MachineState
  acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook

 exec.c                               |   5 +-
 hw/acpi/Kconfig                      |   5 +
 hw/acpi/Makefile.objs                |   1 +
 hw/acpi/aml-build.c                  |   9 +-
 hw/acpi/hmat.c                       | 252 +++++++++++++++++++++++++++
 hw/acpi/hmat.h                       |  82 +++++++++
 hw/acpi/piix4.c                      |   1 +
 hw/arm/aspeed.c                      |   5 +-
 hw/arm/boot.c                        |  20 ++-
 hw/arm/collie.c                      |   8 +-
 hw/arm/cubieboard.c                  |   5 +-
 hw/arm/exynos4_boards.c              |   7 +-
 hw/arm/highbank.c                    |   8 +-
 hw/arm/imx25_pdk.c                   |   5 +-
 hw/arm/integratorcp.c                |   8 +-
 hw/arm/kzm.c                         |   5 +-
 hw/arm/mainstone.c                   |   5 +-
 hw/arm/mcimx6ul-evk.c                |   5 +-
 hw/arm/mcimx7d-sabre.c               |   5 +-
 hw/arm/musicpal.c                    |   8 +-
 hw/arm/nseries.c                     |   5 +-
 hw/arm/omap_sx1.c                    |   5 +-
 hw/arm/palm.c                        |  10 +-
 hw/arm/raspi.c                       |   6 +-
 hw/arm/realview.c                    |   5 +-
 hw/arm/sabrelite.c                   |   5 +-
 hw/arm/spitz.c                       |   5 +-
 hw/arm/tosa.c                        |   8 +-
 hw/arm/versatilepb.c                 |   5 +-
 hw/arm/vexpress.c                    |   5 +-
 hw/arm/virt-acpi-build.c             |  17 +-
 hw/arm/virt.c                        |  16 +-
 hw/arm/xilinx_zynq.c                 |   8 +-
 hw/arm/xlnx-versal-virt.c            |   7 +-
 hw/arm/xlnx-zcu102.c                 |   5 +-
 hw/arm/z2.c                          |   8 +-
 hw/core/machine.c                    |  16 +-
 hw/i386/acpi-build.c                 | 140 +++++++++------
 hw/i386/pc.c                         |  11 +-
 hw/isa/lpc_ich9.c                    |   1 +
 hw/mem/pc-dimm.c                     |   2 +
 hw/pci-bridge/pci_expander_bridge.c  |   2 +
 hw/ppc/spapr.c                       |  23 ++-
 hw/ppc/spapr_pci.c                   |   2 +
 include/hw/acpi/acpi_dev_interface.h |   4 +
 include/hw/acpi/aml-build.h          |   2 +-
 include/hw/arm/boot.h                |   4 +-
 include/hw/boards.h                  |   2 +
 include/hw/i386/pc.h                 |   1 +
 include/qemu/typedefs.h              |   1 +
 include/sysemu/numa.h                |  37 +++-
 include/sysemu/sysemu.h              |  24 +++
 monitor.c                            |  11 +-
 numa.c                               | 219 +++++++++++++++++++----
 qapi/misc.json                       |  94 +++++++++-
 qemu-options.hx                      |  45 ++++-
 stubs/Makefile.objs                  |   1 +
 stubs/pc_build_mem_ranges.c          |  14 ++
 58 files changed, 961 insertions(+), 264 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h
 create mode 100644 stubs/pc_build_mem_ranges.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-27 12:42   ` Igor Mammedov
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState Tao Xu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

In struct arm_boot_info, kernel_filename, initrd_filename and
kernel_cmdline are copied from from MachineState. This patch add
MachineState as a parameter into arm_load_dtb() and move the copy chunk
of kernel_filename, initrd_filename and kernel_cmdline into
arm_load_kernel().

Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---
 hw/arm/aspeed.c           |  5 +----
 hw/arm/boot.c             | 14 ++++++++------
 hw/arm/collie.c           |  8 +-------
 hw/arm/cubieboard.c       |  5 +----
 hw/arm/exynos4_boards.c   |  7 ++-----
 hw/arm/highbank.c         |  8 +-------
 hw/arm/imx25_pdk.c        |  5 +----
 hw/arm/integratorcp.c     |  8 +-------
 hw/arm/kzm.c              |  5 +----
 hw/arm/mainstone.c        |  5 +----
 hw/arm/mcimx6ul-evk.c     |  5 +----
 hw/arm/mcimx7d-sabre.c    |  5 +----
 hw/arm/musicpal.c         |  8 +-------
 hw/arm/nseries.c          |  5 +----
 hw/arm/omap_sx1.c         |  5 +----
 hw/arm/palm.c             | 10 ++--------
 hw/arm/raspi.c            |  6 +-----
 hw/arm/realview.c         |  5 +----
 hw/arm/sabrelite.c        |  5 +----
 hw/arm/spitz.c            |  5 +----
 hw/arm/tosa.c             |  8 +-------
 hw/arm/versatilepb.c      |  5 +----
 hw/arm/vexpress.c         |  5 +----
 hw/arm/virt.c             |  8 +++-----
 hw/arm/xilinx_zynq.c      |  8 +-------
 hw/arm/xlnx-versal-virt.c |  7 ++-----
 hw/arm/xlnx-zcu102.c      |  5 +----
 hw/arm/z2.c               |  8 +-------
 include/hw/arm/boot.h     |  4 ++--
 29 files changed, 42 insertions(+), 145 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 33070a6df8..8b9fb606c0 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -226,9 +226,6 @@ static void aspeed_board_init(MachineState *machine,
         write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
     }
 
-    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
-    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
-    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
     aspeed_board_binfo.ram_size = ram_size;
     aspeed_board_binfo.loader_start = sc->info->sdram_base;
 
@@ -236,7 +233,7 @@ static void aspeed_board_init(MachineState *machine,
         cfg->i2c_init(bmc);
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
 }
 
 static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 7279185bd9..30acdbe824 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -523,7 +523,7 @@ static void fdt_add_psci_node(void *fdt)
 }
 
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                 hwaddr addr_limit, AddressSpace *as)
+                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
 {
     void *fdt = NULL;
     int size, rc, n = 0;
@@ -626,9 +626,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         qemu_fdt_add_subnode(fdt, "/chosen");
     }
 
-    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
+    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
         rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
-                                     binfo->kernel_cmdline);
+                                     ms->kernel_cmdline);
         if (rc < 0) {
             fprintf(stderr, "couldn't set /chosen/bootargs\n");
             goto fail;
@@ -1201,7 +1201,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
      */
 }
 
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
 {
     CPUState *cs;
     AddressSpace *as = arm_boot_address_space(cpu, info);
@@ -1222,7 +1222,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
      * doesn't support secure.
      */
     assert(!(info->secure_board_setup && kvm_enabled()));
-
+    info->kernel_filename = ms->kernel_filename;
+    info->kernel_cmdline = ms->kernel_cmdline;
+    info->initrd_filename = ms->initrd_filename;
     info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
     info->dtb_limit = 0;
 
@@ -1234,7 +1236,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
     }
 
     if (!info->skip_dtb_autoload && have_dtb(info)) {
-        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
+        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
             exit(1);
         }
     }
diff --git a/hw/arm/collie.c b/hw/arm/collie.c
index 3db3c56004..72bc8f26e5 100644
--- a/hw/arm/collie.c
+++ b/hw/arm/collie.c
@@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
 
 static void collie_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     StrongARMState *s;
     DriveInfo *dinfo;
     MemoryRegion *sysmem = get_system_memory();
@@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
 
     sysbus_create_simple("scoop", 0x40800000, NULL);
 
-    collie_binfo.kernel_filename = kernel_filename;
-    collie_binfo.kernel_cmdline = kernel_cmdline;
-    collie_binfo.initrd_filename = initrd_filename;
     collie_binfo.board_id = 0x208;
-    arm_load_kernel(s->cpu, &collie_binfo);
+    arm_load_kernel(s->cpu, machine, &collie_binfo);
 }
 
 static void collie_machine_init(MachineClass *mc)
diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 84187d3916..2f82a77dbd 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -73,10 +73,7 @@ static void cubieboard_init(MachineState *machine)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     cubieboard_binfo.ram_size = machine->ram_size;
-    cubieboard_binfo.kernel_filename = machine->kernel_filename;
-    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
-    cubieboard_binfo.initrd_filename = machine->initrd_filename;
-    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
+    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
 }
 
 static void cubieboard_machine_init(MachineClass *mc)
diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
index 71f58586c1..25c1fb40a9 100644
--- a/hw/arm/exynos4_boards.c
+++ b/hw/arm/exynos4_boards.c
@@ -121,9 +121,6 @@ exynos4_boards_init_common(MachineState *machine,
     exynos4_board_binfo.board_id = exynos4_board_id[board_type];
     exynos4_board_binfo.smp_bootreg_addr =
             exynos4_board_smp_bootreg_addr[board_type];
-    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
-    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
-    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
     exynos4_board_binfo.gic_cpu_if_addr =
             EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
 
@@ -142,7 +139,7 @@ static void nuri_init(MachineState *machine)
 {
     exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
 
-    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
 }
 
 static void smdkc210_init(MachineState *machine)
@@ -152,7 +149,7 @@ static void smdkc210_init(MachineState *machine)
 
     lan9215_init(SMDK_LAN9118_BASE_ADDR,
             qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
-    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
 }
 
 static void nuri_class_init(ObjectClass *oc, void *data)
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index a89a1d3a7c..0b2603b774 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -233,9 +233,6 @@ enum cxmachines {
 static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     DeviceState *dev = NULL;
     SysBusDevice *busdev;
     qemu_irq pic[128];
@@ -386,9 +383,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     highbank_binfo.ram_size = ram_size;
-    highbank_binfo.kernel_filename = kernel_filename;
-    highbank_binfo.kernel_cmdline = kernel_cmdline;
-    highbank_binfo.initrd_filename = initrd_filename;
     /* highbank requires a dtb in order to boot, and the dtb will override
      * the board ID. The following value is ignored, so set it to -1 to be
      * clear that the value is meaningless.
@@ -408,7 +402,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
                     "may not boot.");
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
 }
 
 static void highbank_init(MachineState *machine)
diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
index a0423ffb67..5101201f53 100644
--- a/hw/arm/imx25_pdk.c
+++ b/hw/arm/imx25_pdk.c
@@ -117,9 +117,6 @@ static void imx25_pdk_init(MachineState *machine)
     }
 
     imx25_pdk_binfo.ram_size = machine->ram_size;
-    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
-    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
-    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
     imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
     imx25_pdk_binfo.board_id = 1771,
     imx25_pdk_binfo.nb_cpus = 1;
@@ -130,7 +127,7 @@ static void imx25_pdk_init(MachineState *machine)
      * fail.
      */
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
+        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
     }
 }
 
diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index d18caab8bd..95df650d8e 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -579,9 +579,6 @@ static struct arm_boot_info integrator_binfo = {
 static void integratorcp_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     Object *cpuobj;
     ARMCPU *cpu;
     MemoryRegion *address_space_mem = get_system_memory();
@@ -651,10 +648,7 @@ static void integratorcp_init(MachineState *machine)
     sysbus_create_simple("pl110", 0xc0000000, pic[22]);
 
     integrator_binfo.ram_size = ram_size;
-    integrator_binfo.kernel_filename = kernel_filename;
-    integrator_binfo.kernel_cmdline = kernel_cmdline;
-    integrator_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(cpu, &integrator_binfo);
+    arm_load_kernel(cpu, machine, &integrator_binfo);
 }
 
 static void integratorcp_machine_init(MachineClass *mc)
diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
index 44cba8782b..a867d06ec7 100644
--- a/hw/arm/kzm.c
+++ b/hw/arm/kzm.c
@@ -127,13 +127,10 @@ static void kzm_init(MachineState *machine)
     }
 
     kzm_binfo.ram_size = machine->ram_size;
-    kzm_binfo.kernel_filename = machine->kernel_filename;
-    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
-    kzm_binfo.initrd_filename = machine->initrd_filename;
     kzm_binfo.nb_cpus = 1;
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
+        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
     }
 }
 
diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index cd1f904c6c..c76cfb5dd1 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
     smc91c111_init(&nd_table[0], MST_ETH_PHYS,
                     qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
 
-    mainstone_binfo.kernel_filename = machine->kernel_filename;
-    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
-    mainstone_binfo.initrd_filename = machine->initrd_filename;
     mainstone_binfo.board_id = arm_id;
-    arm_load_kernel(mpu->cpu, &mainstone_binfo);
+    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
 }
 
 static void mainstone_init(MachineState *machine)
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index fb2b015bf6..1f0fed37c0 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -40,9 +40,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
         .loader_start = FSL_IMX6UL_MMDC_ADDR,
         .board_id = -1,
         .ram_size = machine->ram_size,
-        .kernel_filename = machine->kernel_filename,
-        .kernel_cmdline = machine->kernel_cmdline,
-        .initrd_filename = machine->initrd_filename,
         .nb_cpus = smp_cpus,
     };
 
@@ -72,7 +69,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
     }
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu[0], &boot_info);
+        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
     }
 }
 
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index 9c5f0e70c3..accc731cf9 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -43,9 +43,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
         .loader_start = FSL_IMX7_MMDC_ADDR,
         .board_id = -1,
         .ram_size = machine->ram_size,
-        .kernel_filename = machine->kernel_filename,
-        .kernel_cmdline = machine->kernel_cmdline,
-        .initrd_filename = machine->initrd_filename,
         .nb_cpus = smp_cpus,
     };
 
@@ -75,7 +72,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
     }
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu[0], &boot_info);
+        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
     }
 }
 
diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 5645997b56..e4ec017d15 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1569,9 +1569,6 @@ static struct arm_boot_info musicpal_binfo = {
 
 static void musicpal_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     ARMCPU *cpu;
     qemu_irq pic[32];
     DeviceState *dev;
@@ -1700,10 +1697,7 @@ static void musicpal_init(MachineState *machine)
     sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
 
     musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
-    musicpal_binfo.kernel_filename = kernel_filename;
-    musicpal_binfo.kernel_cmdline = kernel_cmdline;
-    musicpal_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(cpu, &musicpal_binfo);
+    arm_load_kernel(cpu, machine, &musicpal_binfo);
 }
 
 static void musicpal_machine_init(MachineClass *mc)
diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
index 4a79f5c88b..31dd2f1b51 100644
--- a/hw/arm/nseries.c
+++ b/hw/arm/nseries.c
@@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
 
     if (machine->kernel_filename) {
         /* Or at the linux loader.  */
-        binfo->kernel_filename = machine->kernel_filename;
-        binfo->kernel_cmdline = machine->kernel_cmdline;
-        binfo->initrd_filename = machine->initrd_filename;
-        arm_load_kernel(s->mpu->cpu, binfo);
+        arm_load_kernel(s->mpu->cpu, machine, binfo);
 
         qemu_register_reset(n8x0_boot_init, s);
     }
diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index cae78d0a36..3cc2817f06 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
     }
 
     /* Load the kernel.  */
-    sx1_binfo.kernel_filename = machine->kernel_filename;
-    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
-    sx1_binfo.initrd_filename = machine->initrd_filename;
-    arm_load_kernel(mpu->cpu, &sx1_binfo);
+    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
 
     /* TODO: fix next line */
     //~ qemu_console_resize(ds, 640, 480);
diff --git a/hw/arm/palm.c b/hw/arm/palm.c
index 9eb9612bce..67ab30b5bc 100644
--- a/hw/arm/palm.c
+++ b/hw/arm/palm.c
@@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
 
 static void palmte_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     struct omap_mpu_state_s *mpu;
     int flash_size = 0x00800000;
@@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
         }
     }
 
-    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
+    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
         fprintf(stderr, "Kernel or ROM image must be specified\n");
         exit(1);
     }
 
     /* Load the kernel.  */
-    palmte_binfo.kernel_filename = kernel_filename;
-    palmte_binfo.kernel_cmdline = kernel_cmdline;
-    palmte_binfo.initrd_filename = initrd_filename;
-    arm_load_kernel(mpu->cpu, &palmte_binfo);
+    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
 }
 
 static void palmte_machine_init(MachineClass *mc)
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 8c249fcabb..b6d78e6ff3 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -158,13 +158,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
 
         binfo.entry = firmware_addr;
         binfo.firmware_loaded = true;
-    } else {
-        binfo.kernel_filename = machine->kernel_filename;
-        binfo.kernel_cmdline = machine->kernel_cmdline;
-        binfo.initrd_filename = machine->initrd_filename;
     }
 
-    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
 }
 
 static void raspi_init(MachineState *machine, int version)
diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index d42a76e7a1..3876b4acae 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
     memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
 
     realview_binfo.ram_size = ram_size;
-    realview_binfo.kernel_filename = machine->kernel_filename;
-    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
-    realview_binfo.initrd_filename = machine->initrd_filename;
     realview_binfo.nb_cpus = smp_cpus;
     realview_binfo.board_id = realview_board_id[board_type];
     realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
-    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
 }
 
 static void realview_eb_init(MachineState *machine)
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index f1b00de229..81547dec98 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -103,16 +103,13 @@ static void sabrelite_init(MachineState *machine)
     }
 
     sabrelite_binfo.ram_size = machine->ram_size;
-    sabrelite_binfo.kernel_filename = machine->kernel_filename;
-    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
-    sabrelite_binfo.initrd_filename = machine->initrd_filename;
     sabrelite_binfo.nb_cpus = smp_cpus;
     sabrelite_binfo.secure_boot = true;
     sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
     sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
 
     if (!qtest_enabled()) {
-        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
+        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
     }
 }
 
diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index 723cf5d592..42338696b3 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
         /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
         spitz_microdrive_attach(mpu, 0);
 
-    spitz_binfo.kernel_filename = machine->kernel_filename;
-    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
-    spitz_binfo.initrd_filename = machine->initrd_filename;
     spitz_binfo.board_id = arm_id;
-    arm_load_kernel(mpu->cpu, &spitz_binfo);
+    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
     sl_bootparam_write(SL_PXA_PARAM_BASE);
 }
 
diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
index 7843d68d46..3a1de81278 100644
--- a/hw/arm/tosa.c
+++ b/hw/arm/tosa.c
@@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
 
 static void tosa_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     MemoryRegion *rom = g_new(MemoryRegion, 1);
     PXA2xxState *mpu;
@@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
 
     tosa_tg_init(mpu);
 
-    tosa_binfo.kernel_filename = kernel_filename;
-    tosa_binfo.kernel_cmdline = kernel_cmdline;
-    tosa_binfo.initrd_filename = initrd_filename;
     tosa_binfo.board_id = 0x208;
-    arm_load_kernel(mpu->cpu, &tosa_binfo);
+    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
     sl_bootparam_write(SL_PXA_PARAM_BASE);
 }
 
diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index f471fb7025..b95110ae2d 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -374,11 +374,8 @@ static void versatile_init(MachineState *machine, int board_id)
     }
 
     versatile_binfo.ram_size = machine->ram_size;
-    versatile_binfo.kernel_filename = machine->kernel_filename;
-    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
-    versatile_binfo.initrd_filename = machine->initrd_filename;
     versatile_binfo.board_id = board_id;
-    arm_load_kernel(cpu, &versatile_binfo);
+    arm_load_kernel(cpu, machine, &versatile_binfo);
 }
 
 static void vpb_init(MachineState *machine)
diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 2b3b0c2334..16f0382731 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -703,9 +703,6 @@ static void vexpress_common_init(MachineState *machine)
     }
 
     daughterboard->bootinfo.ram_size = machine->ram_size;
-    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
-    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
-    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
     daughterboard->bootinfo.nb_cpus = smp_cpus;
     daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
     daughterboard->bootinfo.loader_start = daughterboard->loader_start;
@@ -715,7 +712,7 @@ static void vexpress_common_init(MachineState *machine)
     daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
     /* When booting Linux we should be in secure state if the CPU has one. */
     daughterboard->bootinfo.secure_boot = vms->secure;
-    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
 }
 
 static bool vexpress_get_secure(Object *obj, Error **errp)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index bf54f10b51..e2ce7a2841 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1358,6 +1358,7 @@ void virt_machine_done(Notifier *notifier, void *data)
 {
     VirtMachineState *vms = container_of(notifier, VirtMachineState,
                                          machine_done);
+    MachineState *ms = MACHINE(vms);
     ARMCPU *cpu = ARM_CPU(first_cpu);
     struct arm_boot_info *info = &vms->bootinfo;
     AddressSpace *as = arm_boot_address_space(cpu, info);
@@ -1375,7 +1376,7 @@ void virt_machine_done(Notifier *notifier, void *data)
                                        vms->memmap[VIRT_PLATFORM_BUS].size,
                                        vms->irqmap[VIRT_PLATFORM_BUS]);
     }
-    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
+    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
         exit(1);
     }
 
@@ -1699,16 +1700,13 @@ static void machvirt_init(MachineState *machine)
     create_platform_bus(vms, pic);
 
     vms->bootinfo.ram_size = machine->ram_size;
-    vms->bootinfo.kernel_filename = machine->kernel_filename;
-    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
-    vms->bootinfo.initrd_filename = machine->initrd_filename;
     vms->bootinfo.nb_cpus = smp_cpus;
     vms->bootinfo.board_id = -1;
     vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
     vms->bootinfo.get_dtb = machvirt_dtb;
     vms->bootinfo.skip_dtb_autoload = true;
     vms->bootinfo.firmware_loaded = firmware_loaded;
-    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
 
     vms->machine_done.notify = virt_machine_done;
     qemu_add_machine_init_done_notifier(&vms->machine_done);
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 198e3f9763..2487bd7ea5 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -159,9 +159,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
 static void zynq_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     ARMCPU *cpu;
     MemoryRegion *address_space_mem = get_system_memory();
     MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
@@ -304,16 +301,13 @@ static void zynq_init(MachineState *machine)
     sysbus_mmio_map(busdev, 0, 0xF8007000);
 
     zynq_binfo.ram_size = ram_size;
-    zynq_binfo.kernel_filename = kernel_filename;
-    zynq_binfo.kernel_cmdline = kernel_cmdline;
-    zynq_binfo.initrd_filename = initrd_filename;
     zynq_binfo.nb_cpus = 1;
     zynq_binfo.board_id = 0xd32;
     zynq_binfo.loader_start = 0;
     zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
     zynq_binfo.write_board_setup = zynq_write_board_setup;
 
-    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
+    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
 }
 
 static void zynq_machine_init(MachineClass *mc)
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index f95fde2309..462493c467 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
                                         0, &s->soc.fpd.apu.mr, 0);
 
     s->binfo.ram_size = machine->ram_size;
-    s->binfo.kernel_filename = machine->kernel_filename;
-    s->binfo.kernel_cmdline = machine->kernel_cmdline;
-    s->binfo.initrd_filename = machine->initrd_filename;
     s->binfo.loader_start = 0x0;
     s->binfo.get_dtb = versal_virt_get_dtb;
     s->binfo.modify_dtb = versal_virt_modify_dtb;
     if (machine->kernel_filename) {
-        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
+        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
     } else {
         AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
                                                   &s->binfo);
@@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
         s->binfo.loader_start = 0x1000;
         s->binfo.dtb_limit = 0x1000000;
         if (arm_load_dtb(s->binfo.loader_start,
-                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
+                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
             exit(EXIT_FAILURE);
         }
     }
diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
index c802f26fbd..6a455f8d49 100644
--- a/hw/arm/xlnx-zcu102.c
+++ b/hw/arm/xlnx-zcu102.c
@@ -172,11 +172,8 @@ static void xlnx_zcu102_init(MachineState *machine)
     /* TODO create and connect IDE devices for ide_drive_get() */
 
     xlnx_zcu102_binfo.ram_size = ram_size;
-    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
-    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
-    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
     xlnx_zcu102_binfo.loader_start = 0;
-    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
+    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
 }
 
 static void xlnx_zcu102_machine_instance_init(Object *obj)
diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index 44aa748d39..2f21421683 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
 
 static void z2_init(MachineState *machine)
 {
-    const char *kernel_filename = machine->kernel_filename;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-    const char *initrd_filename = machine->initrd_filename;
     MemoryRegion *address_space_mem = get_system_memory();
     uint32_t sector_len = 0x10000;
     PXA2xxState *mpu;
@@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
     qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
                           qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
 
-    z2_binfo.kernel_filename = kernel_filename;
-    z2_binfo.kernel_cmdline = kernel_cmdline;
-    z2_binfo.initrd_filename = initrd_filename;
     z2_binfo.board_id = 0x6dd;
-    arm_load_kernel(mpu->cpu, &z2_binfo);
+    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
 }
 
 static void z2_machine_init(MachineClass *mc)
diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
index c48cc4c2bc..2673abe81f 100644
--- a/include/hw/arm/boot.h
+++ b/include/hw/arm/boot.h
@@ -133,7 +133,7 @@ struct arm_boot_info {
  * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
  * machine init done notifiers are called in registration reverse order.
  */
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
+void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
 
 AddressSpace *arm_boot_address_space(ARMCPU *cpu,
                                      const struct arm_boot_info *info);
@@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
  * Note: Must not be called unless have_dtb(binfo) is true.
  */
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-                 hwaddr addr_limit, AddressSpace *as);
+                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
 
 /* Write a secure board setup routine with a dummy handler for SMCs */
 void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-28 11:02   ` Igor Mammedov
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 3/8] numa: move numa global variable have_numa_distance " Tao Xu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - drop the helper machine_num_numa_nodes() and use
    machine->numa_state->num_nodes directly (Igor)
    - remove the unnecessary header include (Igor)
---
 exec.c                              |  5 ++-
 hw/acpi/aml-build.c                 |  3 +-
 hw/arm/boot.c                       |  4 +-
 hw/arm/virt-acpi-build.c            |  8 +++-
 hw/arm/virt.c                       |  5 ++-
 hw/core/machine.c                   | 14 +++++--
 hw/i386/acpi-build.c                |  2 +-
 hw/i386/pc.c                        |  7 +++-
 hw/mem/pc-dimm.c                    |  2 +
 hw/pci-bridge/pci_expander_bridge.c |  2 +
 hw/ppc/spapr.c                      | 19 +++++++---
 hw/ppc/spapr_pci.c                  |  1 +
 include/hw/acpi/aml-build.h         |  2 +-
 include/hw/boards.h                 |  2 +
 include/sysemu/numa.h               | 13 +++++--
 monitor.c                           | 11 +++++-
 numa.c                              | 59 ++++++++++++++++++-----------
 17 files changed, 112 insertions(+), 47 deletions(-)

diff --git a/exec.c b/exec.c
index 4e734770c2..c7eb4af42d 100644
--- a/exec.c
+++ b/exec.c
@@ -1733,6 +1733,7 @@ long qemu_minrampagesize(void)
     long hpsize = LONG_MAX;
     long mainrampagesize;
     Object *memdev_root;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
     mainrampagesize = qemu_mempath_getpagesize(mem_path);
 
@@ -1760,7 +1761,9 @@ long qemu_minrampagesize(void)
      * so if its page size is smaller we have got to report that size instead.
      */
     if (hpsize > mainrampagesize &&
-        (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL)) {
+        (ms->numa_state == NULL ||
+         ms->numa_state->num_nodes == 0 ||
+         numa_info[0].node_memdev == NULL)) {
         static bool warned;
         if (!warned) {
             error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 555c24f21d..63c1cae8c9 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1726,10 +1726,11 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
  * ACPI spec 5.2.17 System Locality Distance Information Table
  * (Revision 2.0 or later)
  */
-void build_slit(GArray *table_data, BIOSLinker *linker)
+void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
 {
     int slit_start, i, j;
     slit_start = table_data->len;
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 30acdbe824..2af881e0f4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -597,9 +597,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
     g_strfreev(node_path);
 
-    if (nb_numa_nodes > 0) {
+    if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = numa_info[i].node_mem;
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 4a64f9985c..9a22ce679c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -517,7 +517,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     int i, srat_start;
     uint64_t mem_base;
     MachineClass *mc = MACHINE_GET_CLASS(vms);
-    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
+    MachineState *ms = MACHINE(vms);
+    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(ms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     srat_start = table_data->len;
     srat = acpi_data_push(table_data, sizeof(*srat));
@@ -759,6 +761,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     GArray *table_offsets;
     unsigned dsdt, xsdt;
     GArray *tables_blob = tables->table_data;
+    MachineState *ms = MACHINE(vms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     table_offsets = g_array_new(false, true /* clear */,
                                         sizeof(uint32_t));
@@ -798,7 +802,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
         build_srat(tables_blob, tables->linker, vms);
         if (have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
-            build_slit(tables_blob, tables->linker);
+            build_slit(tables_blob, tables->linker, ms);
         }
     }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e2ce7a2841..025ad484c5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -196,6 +196,8 @@ static bool cpu_type_valid(const char *cpu)
 
 static void create_fdt(VirtMachineState *vms)
 {
+    MachineState *ms = MACHINE(vms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
     void *fdt = create_device_tree(&vms->fdt_size);
 
     if (!fdt) {
@@ -1834,7 +1836,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 
 static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx % nb_numa_nodes;
+    return idx % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
@@ -1940,6 +1942,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->kvm_type = virt_kvm_type;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
+    mc->numa_supported = true;
     hc->plug = virt_machine_device_plug_cb;
 }
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index f1a0f45f9c..14b29de0a9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -877,6 +877,9 @@ static void machine_initfn(Object *obj)
                                         NULL);
     }
 
+    if (mc->numa_supported) {
+        ms->numa_state = g_new0(NumaState, 1);
+    }
 
     /* Register notifier when init is done for sysbus sanity checks */
     ms->sysbus_notifier.notify = machine_init_notify;
@@ -897,6 +900,7 @@ static void machine_finalize(Object *obj)
     g_free(ms->firmware);
     g_free(ms->device_memory);
     g_free(ms->nvdimms_state);
+    g_free(ms->numa_state);
 }
 
 bool machine_usb(MachineState *machine)
@@ -968,7 +972,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
 
-    assert(nb_numa_nodes);
+    assert(machine->numa_state->num_nodes);
     for (i = 0; i < possible_cpus->len; i++) {
         if (possible_cpus->cpus[i].props.has_node_id) {
             break;
@@ -1014,9 +1018,11 @@ void machine_run_board_init(MachineState *machine)
 {
     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
-    numa_complete_configuration(machine);
-    if (nb_numa_nodes) {
-        machine_numa_finish_cpu_init(machine);
+    if (machine_class->numa_supported) {
+        numa_complete_configuration(machine);
+        if (machine->numa_state->num_nodes) {
+            machine_numa_finish_cpu_init(machine);
+        }
     }
 
     /* If the machine supports the valid_cpu_types check and the user
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 85dc1640bc..0d58335560 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2669,7 +2669,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
         build_srat(tables_blob, tables->linker, machine);
         if (have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
-            build_slit(tables_blob, tables->linker);
+            build_slit(tables_blob, tables->linker, machine);
         }
     }
     if (acpi_get_mcfg(&mcfg)) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1b08b56362..5bab78e137 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -997,6 +997,8 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     int i;
     const CPUArchIdList *cpus;
     MachineClass *mc = MACHINE_GET_CLASS(pcms);
+    MachineState *ms = MACHINE(pcms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
     fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
@@ -1673,6 +1675,8 @@ void pc_machine_done(Notifier *notifier, void *data)
 void pc_guest_info_init(PCMachineState *pcms)
 {
     int i;
+    MachineState *ms = MACHINE(pcms);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     pcms->apic_xrupt_override = kvm_allows_irq0_override();
     pcms->numa_nodes = nb_numa_nodes;
@@ -2656,7 +2660,7 @@ static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
    assert(idx < ms->possible_cpus->len);
    x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
                             smp_cores, smp_threads, &topo);
-   return topo.pkg_id % nb_numa_nodes;
+   return topo.pkg_id % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
@@ -2750,6 +2754,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     nc->nmi_monitor_handler = x86_nmi;
     mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
     mc->nvdimm_supported = true;
+    mc->numa_supported = true;
 
     object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
         pc_machine_get_device_memory_region_size, NULL,
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 152400b1fc..19e7626590 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -160,6 +160,8 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
 {
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MachineState *ms = MACHINE(qdev_get_machine());
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     if (!dimm->hostmem) {
         error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index ca66bc721a..a76a00a6d5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -211,6 +211,8 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
     PCIBus *bus;
     const char *dev_name = NULL;
     Error *local_err = NULL;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     if (pxb->numa_node != NUMA_NODE_UNASSIGNED &&
         pxb->numa_node >= nb_numa_nodes) {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e2b33e5890..07a02db99e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -290,6 +290,8 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
     CPUState *cs;
     char cpu_model[32];
     uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
+    MachineState *ms = MACHINE(spapr);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     CPU_FOREACH(cs) {
         PowerPCCPU *cpu = POWERPC_CPU(cs);
@@ -344,6 +346,7 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
 
 static hwaddr spapr_node0_size(MachineState *machine)
 {
+    int nb_numa_nodes = machine->numa_state->num_nodes;
     if (nb_numa_nodes) {
         int i;
         for (i = 0; i < nb_numa_nodes; ++i) {
@@ -391,18 +394,18 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *machine = MACHINE(spapr);
     hwaddr mem_start, node_size;
-    int i, nb_nodes = nb_numa_nodes;
+    int i;
     NodeInfo *nodes = numa_info;
     NodeInfo ramnode;
 
     /* No NUMA nodes, assume there is just one node with whole RAM */
-    if (!nb_numa_nodes) {
-        nb_nodes = 1;
+    if (!machine->numa_state->num_nodes) {
+        machine->numa_state->num_nodes = 1;
         ramnode.node_mem = machine->ram_size;
         nodes = &ramnode;
     }
 
-    for (i = 0, mem_start = 0; i < nb_nodes; ++i) {
+    for (i = 0, mem_start = 0; i < machine->numa_state->num_nodes; ++i) {
         if (!nodes[i].node_mem) {
             continue;
         }
@@ -444,6 +447,8 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
     PowerPCCPU *cpu = POWERPC_CPU(cs);
     CPUPPCState *env = &cpu->env;
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
+    MachineState *ms = MACHINE(spapr);
+    int nb_numa_nodes = ms->numa_state->num_nodes;
     int index = spapr_get_vcpu_id(cpu);
     uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
                        0xffffffff, 0xffffffff};
@@ -852,6 +857,7 @@ static int spapr_populate_drmem_v1(SpaprMachineState *spapr, void *fdt,
 static int spapr_populate_drconf_memory(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *machine = MACHINE(spapr);
+    int nb_numa_nodes = machine->numa_state->num_nodes;
     int ret, i, offset;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
     uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
@@ -1696,6 +1702,7 @@ static void spapr_machine_reset(void)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
+    int nb_numa_nodes = machine->numa_state->num_nodes;
     PowerPCCPU *first_ppc_cpu;
     uint32_t rtas_limit;
     hwaddr rtas_addr, fdt_addr;
@@ -2513,6 +2520,7 @@ static void spapr_create_lmb_dr_connectors(SpaprMachineState *spapr)
 static void spapr_validate_node_memory(MachineState *machine, Error **errp)
 {
     int i;
+    int nb_numa_nodes = machine->numa_state->num_nodes;
 
     if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
         error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
@@ -4115,7 +4123,7 @@ spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
 
 static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx / smp_cores % nb_numa_nodes;
+    return idx / smp_cores % ms->numa_state->num_nodes;
 }
 
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
@@ -4319,6 +4327,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->update_dt_enabled = true;
     mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
     mc->has_hotpluggable_cpus = true;
+    mc->numa_supported = true;
     smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
     fwc->get_dev_path = spapr_get_fw_dev_path;
     nc->nmi_monitor_handler = spapr_nmi;
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 9cf2c41b8c..d6fd018dd4 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1638,6 +1638,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
     SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
+    MachineState *ms = MACHINE(spapr);
     char *namebuf;
     int i;
     PCIBus *bus;
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1a563ad756..991cf05134 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -414,7 +414,7 @@ build_append_gas_from_struct(GArray *table, const struct AcpiGenericAddress *s)
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
-void build_slit(GArray *table_data, BIOSLinker *linker);
+void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 6ff02bf3e4..8375a07940 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -210,6 +210,7 @@ struct MachineClass {
     bool ignore_boot_device_suffixes;
     bool smbus_no_migration_support;
     bool nvdimm_supported;
+    bool numa_supported;
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
@@ -273,6 +274,7 @@ struct MachineState {
     AccelState *accelerator;
     CPUArchIdList *possible_cpus;
     struct NVDIMMState *nvdimms_state;
+    struct NumaState *numa_state;
 };
 
 #define DEFINE_MACHINE(namestr, machine_initfn) \
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index b6ac7de43e..3c4b2d2909 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,7 +6,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
 
-extern int nb_numa_nodes;   /* Number of NUMA nodes */
 extern bool have_numa_distance;
 
 struct NodeInfo {
@@ -16,15 +15,23 @@ struct NodeInfo {
     uint8_t distance[MAX_NODES];
 };
 
+extern NodeInfo numa_info[MAX_NODES];
+
 struct NumaNodeMem {
     uint64_t node_mem;
     uint64_t node_plugged_mem;
 };
 
-extern NodeInfo numa_info[MAX_NODES];
+struct NumaState {
+    /* Number of NUMA nodes */
+    int num_nodes;
+
+};
+typedef struct NumaState NumaState;
+
 void parse_numa_opts(MachineState *ms);
 void numa_complete_configuration(MachineState *ms);
-void query_numa_node_mem(NumaNodeMem node_mem[]);
+void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
 void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
                                  int nb_nodes, ram_addr_t size);
diff --git a/monitor.c b/monitor.c
index 6428eb3b7e..08ef28450e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1922,14 +1922,21 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_numa(Monitor *mon, const QDict *qdict)
 {
-    int i;
+    int i, nb_numa_nodes;
     NumaNodeMem *node_mem;
     CpuInfoList *cpu_list, *cpu;
+    MachineState *ms = MACHINE(qdev_get_machine());
+
+    if (ms->numa_state == NULL) {
+        monitor_printf(mon, "%d nodes\n", 0);
+        return;
+    }
 
+    nb_numa_nodes = ms->numa_state->num_nodes;
     cpu_list = qmp_query_cpus(&error_abort);
     node_mem = g_new0(NumaNodeMem, nb_numa_nodes);
 
-    query_numa_node_mem(node_mem);
+    query_numa_node_mem(node_mem, ms);
     monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
     for (i = 0; i < nb_numa_nodes; i++) {
         monitor_printf(mon, "node %d cpus:", i);
diff --git a/numa.c b/numa.c
index 955ec0c830..d678b71607 100644
--- a/numa.c
+++ b/numa.c
@@ -52,7 +52,6 @@ static int have_memdevs = -1;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
                              */
-int nb_numa_nodes;
 bool have_numa_distance;
 NodeInfo numa_info[MAX_NODES];
 
@@ -68,7 +67,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     if (node->has_nodeid) {
         nodenr = node->nodeid;
     } else {
-        nodenr = nb_numa_nodes;
+        nodenr = ms->numa_state->num_nodes;
     }
 
     if (nodenr >= MAX_NODES) {
@@ -136,10 +135,11 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
-    nb_numa_nodes++;
+    ms->numa_state->num_nodes++;
 }
 
-static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
+static
+void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
 {
     uint16_t src = dist->src;
     uint16_t dst = dist->dst;
@@ -178,6 +178,12 @@ static
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+    if (!mc->numa_supported) {
+        error_setg(errp, "NUMA is not supported by this machine-type");
+        goto end;
+    }
 
     switch (object->type) {
     case NUMA_OPTIONS_TYPE_NODE:
@@ -187,7 +193,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         }
         break;
     case NUMA_OPTIONS_TYPE_DIST:
-        parse_numa_distance(&object->u.dist, &err);
+        parse_numa_distance(ms, &object->u.dist, &err);
         if (err) {
             goto end;
         }
@@ -252,10 +258,11 @@ end:
  * distance from a node to itself is always NUMA_DISTANCE_MIN,
  * so providing it is never necessary.
  */
-static void validate_numa_distance(void)
+static void validate_numa_distance(MachineState *ms)
 {
     int src, dst;
     bool is_asymmetrical = false;
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = src; dst < nb_numa_nodes; dst++) {
@@ -293,9 +300,10 @@ static void validate_numa_distance(void)
     }
 }
 
-static void complete_init_numa_distance(void)
+static void complete_init_numa_distance(MachineState *ms)
 {
     int src, dst;
+    int nb_numa_nodes = ms->numa_state->num_nodes;
 
     /* Fixup NUMA distance by symmetric policy because if it is an
      * asymmetric distance table, it should be a complete table and
@@ -369,7 +377,7 @@ void numa_complete_configuration(MachineState *ms)
      *
      * Enable NUMA implicitly by adding a new NUMA node automatically.
      */
-    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
+    if (ms->ram_slots > 0 && ms->numa_state->num_nodes == 0 &&
         mc->auto_enable_numa_with_memhp) {
             NumaNodeOptions node = { };
             parse_numa_node(ms, &node, &error_abort);
@@ -387,30 +395,33 @@ void numa_complete_configuration(MachineState *ms)
     }
 
     /* This must be always true if all nodes are present: */
-    assert(nb_numa_nodes == max_numa_nodeid);
+    assert(ms->numa_state->num_nodes == max_numa_nodeid);
 
-    if (nb_numa_nodes > 0) {
+    if (ms->numa_state->num_nodes > 0) {
         uint64_t numa_total;
 
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
+        if (ms->numa_state->num_nodes > MAX_NODES) {
+            ms->numa_state->num_nodes = MAX_NODES;
         }
 
         /* If no memory size is given for any node, assume the default case
          * and distribute the available memory equally across all nodes
          */
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             if (numa_info[i].node_mem != 0) {
                 break;
             }
         }
-        if (i == nb_numa_nodes) {
+        if (i == ms->numa_state->num_nodes) {
             assert(mc->numa_auto_assign_ram);
-            mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+            mc->numa_auto_assign_ram(mc,
+                                     numa_info,
+                                     ms->numa_state->num_nodes,
+                                     ram_size);
         }
 
         numa_total = 0;
-        for (i = 0; i < nb_numa_nodes; i++) {
+        for (i = 0; i < ms->numa_state->num_nodes; i++) {
             numa_total += numa_info[i].node_mem;
         }
         if (numa_total != ram_size) {
@@ -434,10 +445,10 @@ void numa_complete_configuration(MachineState *ms)
          */
         if (have_numa_distance) {
             /* Validate enough NUMA distance information was provided. */
-            validate_numa_distance();
+            validate_numa_distance(ms);
 
             /* Validation succeeded, now fill in any missing distances. */
-            complete_init_numa_distance();
+            complete_init_numa_distance(ms);
         }
     }
 }
@@ -513,14 +524,16 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 {
     uint64_t addr = 0;
     int i;
+    MachineState *ms = MACHINE(qdev_get_machine());
 
-    if (nb_numa_nodes == 0 || !have_memdevs) {
+    if (ms->numa_state == NULL ||
+        ms->numa_state->num_nodes == 0 || !have_memdevs) {
         allocate_system_memory_nonnuma(mr, owner, name, ram_size);
         return;
     }
 
     memory_region_init(mr, owner, name, ram_size);
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
         uint64_t size = numa_info[i].node_mem;
         HostMemoryBackend *backend = numa_info[i].node_memdev;
         if (!backend) {
@@ -578,16 +591,16 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
     qapi_free_MemoryDeviceInfoList(info_list);
 }
 
-void query_numa_node_mem(NumaNodeMem node_mem[])
+void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 {
     int i;
 
-    if (nb_numa_nodes <= 0) {
+    if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
         return;
     }
 
     numa_stat_memory_devices(node_mem);
-    for (i = 0; i < nb_numa_nodes; i++) {
+    for (i = 0; i < ms->numa_state->num_nodes; i++) {
         node_mem[i].node_mem += numa_info[i].node_mem;
     }
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 3/8] numa: move numa global variable have_numa_distance into MachineState
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info " Tao Xu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

Move existing numa global have_numa_distance into NumaState.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Simplify commit message (Igor)
---
 hw/arm/virt-acpi-build.c | 2 +-
 hw/arm/virt.c            | 2 +-
 hw/i386/acpi-build.c     | 2 +-
 include/sysemu/numa.h    | 4 ++--
 numa.c                   | 4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9a22ce679c..9d2edd8023 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -800,7 +800,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (nb_numa_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
-        if (have_numa_distance) {
+        if (ms->numa_state->have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, ms);
         }
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 025ad484c5..d147cceab6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -229,7 +229,7 @@ static void create_fdt(VirtMachineState *vms)
                                 "clk24mhz");
     qemu_fdt_setprop_cell(fdt, "/apb-pclk", "phandle", vms->clock_phandle);
 
-    if (have_numa_distance) {
+    if (nb_numa_nodes > 0 && ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
         uint32_t *matrix = g_malloc0(size);
         int idx, i, j;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0d58335560..055e677c30 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2667,7 +2667,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     if (pcms->numa_nodes) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, machine);
-        if (have_numa_distance) {
+        if (machine->numa_state->have_numa_distance) {
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 3c4b2d2909..08a86080c4 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,8 +6,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
 
-extern bool have_numa_distance;
-
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -26,6 +24,8 @@ struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
 
+    /* Allow setting NUMA distance for different NUMA nodes */
+    bool have_numa_distance;
 };
 typedef struct NumaState NumaState;
 
diff --git a/numa.c b/numa.c
index d678b71607..9432d42ad0 100644
--- a/numa.c
+++ b/numa.c
@@ -171,7 +171,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     }
 
     numa_info[src].distance[dst] = val;
-    have_numa_distance = true;
+    ms->numa_state->have_numa_distance = true;
 }
 
 static
@@ -443,7 +443,7 @@ void numa_complete_configuration(MachineState *ms)
          * asymmetric. In this case, the distances for both directions
          * of all node pairs are required.
          */
-        if (have_numa_distance) {
+        if (ms->numa_state->have_numa_distance) {
             /* Validate enough NUMA distance information was provided. */
             validate_numa_distance(ms);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info into MachineState
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (2 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 3/8] numa: move numa global variable have_numa_distance " Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-28 11:20   ` Igor Mammedov
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook Tao Xu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

Move existing numa global numa_info (renamed as "nodes") into NumaState.

Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Directly use ms->numa_state->nodes and not dereferencing
    ms->numa_state in the first place when ms->numa_state is possible
    NULL (Igor)
---
 exec.c                   |  2 +-
 hw/acpi/aml-build.c      |  6 ++++--
 hw/arm/boot.c            |  2 +-
 hw/arm/virt-acpi-build.c |  7 ++++---
 hw/arm/virt.c            |  1 +
 hw/i386/pc.c             |  4 ++--
 hw/ppc/spapr.c           |  4 +++-
 hw/ppc/spapr_pci.c       |  1 +
 include/sysemu/numa.h    |  3 +++
 numa.c                   | 15 +++++++++------
 10 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/exec.c b/exec.c
index c7eb4af42d..0e30926588 100644
--- a/exec.c
+++ b/exec.c
@@ -1763,7 +1763,7 @@ long qemu_minrampagesize(void)
     if (hpsize > mainrampagesize &&
         (ms->numa_state == NULL ||
          ms->numa_state->num_nodes == 0 ||
-         numa_info[0].node_memdev == NULL)) {
+         ms->numa_state->nodes[0].node_memdev == NULL)) {
         static bool warned;
         if (!warned) {
             error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 63c1cae8c9..26ccc1a3e2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1737,8 +1737,10 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
     build_append_int_noprefix(table_data, nb_numa_nodes, 8);
     for (i = 0; i < nb_numa_nodes; i++) {
         for (j = 0; j < nb_numa_nodes; j++) {
-            assert(numa_info[i].distance[j]);
-            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
+            assert(ms->numa_state->nodes[i].distance[j]);
+            build_append_int_noprefix(table_data,
+                                      ms->numa_state->nodes[i].distance[j],
+                                      1);
         }
     }
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 2af881e0f4..0c1572d118 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -600,7 +600,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
-            mem_len = numa_info[i].node_mem;
+            mem_len = ms->numa_state->nodes[i].node_mem;
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9d2edd8023..422bbed2d3 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -536,11 +536,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     mem_base = vms->memmap[VIRT_MEM].base;
     for (i = 0; i < nb_numa_nodes; ++i) {
-        if (numa_info[i].node_mem > 0) {
+        if (ms->numa_state->nodes[i].node_mem > 0) {
             numamem = acpi_data_push(table_data, sizeof(*numamem));
-            build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
+            build_srat_memory(numamem, mem_base,
+                              ms->numa_state->nodes[i].node_mem, i,
                               MEM_AFFINITY_ENABLED);
-            mem_base += numa_info[i].node_mem;
+            mem_base += ms->numa_state->nodes[i].node_mem;
         }
     }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d147cceab6..d3904d74dc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -233,6 +233,7 @@ static void create_fdt(VirtMachineState *vms)
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
         uint32_t *matrix = g_malloc0(size);
         int idx, i, j;
+        NodeInfo *numa_info = ms->numa_state->nodes;
 
         for (i = 0; i < nb_numa_nodes; i++) {
             for (j = 0; j < nb_numa_nodes; j++) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5bab78e137..4cc84c5050 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1041,7 +1041,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     }
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
-            cpu_to_le64(numa_info[i].node_mem);
+            cpu_to_le64(ms->numa_state->nodes[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + pcms->apic_id_limit + nb_numa_nodes) *
@@ -1683,7 +1683,7 @@ void pc_guest_info_init(PCMachineState *pcms)
     pcms->node_mem = g_malloc0(pcms->numa_nodes *
                                     sizeof *pcms->node_mem);
     for (i = 0; i < nb_numa_nodes; i++) {
-        pcms->node_mem[i] = numa_info[i].node_mem;
+        pcms->node_mem[i] = ms->numa_state->nodes[i].node_mem;
     }
 
     pcms->machine_done.notify = pc_machine_done;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 07a02db99e..3f2e6e0f5f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -349,6 +349,7 @@ static hwaddr spapr_node0_size(MachineState *machine)
     int nb_numa_nodes = machine->numa_state->num_nodes;
     if (nb_numa_nodes) {
         int i;
+        NodeInfo *numa_info = machine->numa_state->nodes;
         for (i = 0; i < nb_numa_nodes; ++i) {
             if (numa_info[i].node_mem) {
                 return MIN(pow2floor(numa_info[i].node_mem),
@@ -395,7 +396,7 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
     MachineState *machine = MACHINE(spapr);
     hwaddr mem_start, node_size;
     int i;
-    NodeInfo *nodes = numa_info;
+    NodeInfo *nodes = machine->numa_state->nodes;
     NodeInfo ramnode;
 
     /* No NUMA nodes, assume there is just one node with whole RAM */
@@ -2521,6 +2522,7 @@ static void spapr_validate_node_memory(MachineState *machine, Error **errp)
 {
     int i;
     int nb_numa_nodes = machine->numa_state->num_nodes;
+    NodeInfo *numa_info = machine->numa_state->nodes;
 
     if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
         error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index d6fd018dd4..9d4ebd60de 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1639,6 +1639,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
     PCIHostState *phb = PCI_HOST_BRIDGE(s);
     MachineState *ms = MACHINE(spapr);
+    NodeInfo *numa_info = ms->numa_state->nodes;
     char *namebuf;
     int i;
     PCIBus *bus;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 08a86080c4..437eb21fef 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -26,6 +26,9 @@ struct NumaState {
 
     /* Allow setting NUMA distance for different NUMA nodes */
     bool have_numa_distance;
+
+    /* NUMA nodes information */
+    NodeInfo nodes[MAX_NODES];
 };
 typedef struct NumaState NumaState;
 
diff --git a/numa.c b/numa.c
index 9432d42ad0..d23e130bce 100644
--- a/numa.c
+++ b/numa.c
@@ -52,9 +52,6 @@ static int have_memdevs = -1;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
                              */
-bool have_numa_distance;
-NodeInfo numa_info[MAX_NODES];
-
 
 static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
                             Error **errp)
@@ -63,6 +60,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     uint16_t nodenr;
     uint16List *cpus = NULL;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     if (node->has_nodeid) {
         nodenr = node->nodeid;
@@ -144,6 +142,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     uint16_t src = dist->src;
     uint16_t dst = dist->dst;
     uint8_t val = dist->val;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     if (src >= MAX_NODES || dst >= MAX_NODES) {
         error_setg(errp, "Parameter '%s' expects an integer between 0 and %d",
@@ -203,7 +202,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             error_setg(&err, "Missing mandatory node-id property");
             goto end;
         }
-        if (!numa_info[object->u.cpu.node_id].present) {
+        if (!ms->numa_state->nodes[object->u.cpu.node_id].present) {
             error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
                 "defined with -numa node,nodeid=ID before it's used with "
                 "-numa cpu,node-id=ID", object->u.cpu.node_id);
@@ -263,6 +262,7 @@ static void validate_numa_distance(MachineState *ms)
     int src, dst;
     bool is_asymmetrical = false;
     int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = src; dst < nb_numa_nodes; dst++) {
@@ -304,6 +304,7 @@ static void complete_init_numa_distance(MachineState *ms)
 {
     int src, dst;
     int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     /* Fixup NUMA distance by symmetric policy because if it is an
      * asymmetric distance table, it should be a complete table and
@@ -363,6 +364,7 @@ void numa_complete_configuration(MachineState *ms)
 {
     int i;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     /*
      * If memory hotplug is enabled (slots > 0) but without '-numa'
@@ -534,8 +536,8 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 
     memory_region_init(mr, owner, name, ram_size);
     for (i = 0; i < ms->numa_state->num_nodes; i++) {
-        uint64_t size = numa_info[i].node_mem;
-        HostMemoryBackend *backend = numa_info[i].node_memdev;
+        uint64_t size = ms->numa_state->nodes[i].node_mem;
+        HostMemoryBackend *backend = ms->numa_state->nodes[i].node_memdev;
         if (!backend) {
             continue;
         }
@@ -594,6 +596,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
 {
     int i;
+    NodeInfo *numa_info = ms->numa_state->nodes;
 
     if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
         return;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (3 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info " Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-07-01 10:59   ` Igor Mammedov
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT Tao Xu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

Add build_mem_ranges callback to AcpiDeviceIfClass and use
it for generating SRAT and HMAT numa memory ranges.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Co-developed-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Add the missing if 'mem_len > 0' in pc_build_mem_ranges() (Igor)
    - Correct the descriptions of build_mem_ranges
    in AcpiDeviceIfClass (Igor)
    - Use GArray for NUMA memory ranges data (Igor)
    - Add the reason of using stub (Igor)
---
 hw/acpi/piix4.c                      |   1 +
 hw/i386/acpi-build.c                 | 133 +++++++++++++++++----------
 hw/isa/lpc_ich9.c                    |   1 +
 include/hw/acpi/acpi_dev_interface.h |   4 +
 include/hw/i386/pc.h                 |   1 +
 include/sysemu/numa.h                |  12 +++
 stubs/Makefile.objs                  |   1 +
 stubs/pc_build_mem_ranges.c          |  14 +++
 8 files changed, 120 insertions(+), 47 deletions(-)
 create mode 100644 stubs/pc_build_mem_ranges.c

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index ec4e186cec..bc078c1ad7 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -702,6 +702,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
     adevc->ospm_status = piix4_ospm_status;
     adevc->send_event = piix4_send_gpe;
     adevc->madt_cpu = pc_madt_cpu_entry;
+    adevc->build_mem_ranges = pc_build_mem_ranges;
 }
 
 static const TypeInfo piix4_pm_info = {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 055e677c30..44dd447fa5 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2279,18 +2279,89 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
 #define HOLE_640K_START  (640 * KiB)
 #define HOLE_640K_END   (1 * MiB)
 
+void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
+{
+    uint64_t mem_len, mem_base, next_base;
+    int i;
+    PCMachineState *pcms = PC_MACHINE(ms);
+    NumaState *nstat = ms->numa_state;
+    NumaMemRange *mem_range;
+    nstat->mem_ranges_num = 0;
+    next_base = 0;
+
+    /*
+     * the memory map is a bit tricky, it contains at least one hole
+     * from 640k-1M and possibly another one from 3.5G-4G.
+     */
+
+    for (i = 0; i < pcms->numa_nodes; ++i) {
+        mem_base = next_base;
+        mem_len = pcms->node_mem[i];
+        next_base = mem_base + mem_len;
+
+        /* Cut out the 640K hole */
+        if (mem_base <= HOLE_640K_START &&
+            next_base > HOLE_640K_START) {
+            mem_len -= next_base - HOLE_640K_START;
+            if (mem_len > 0) {
+                mem_range = acpi_data_push(nstat->mem_ranges,
+                                           sizeof *mem_range);
+                mem_range->base = mem_base;
+                mem_range->length = mem_len;
+                mem_range->node = i;
+                nstat->mem_ranges_num++;
+            }
+
+            /* Check for the rare case: 640K < RAM < 1M */
+            if (next_base <= HOLE_640K_END) {
+                next_base = HOLE_640K_END;
+                continue;
+            }
+            mem_base = HOLE_640K_END;
+            mem_len = next_base - HOLE_640K_END;
+        }
+
+        /* Cut out the ACPI_PCI hole */
+        if (mem_base <= pcms->below_4g_mem_size &&
+            next_base > pcms->below_4g_mem_size) {
+            mem_len -= next_base - pcms->below_4g_mem_size;
+            if (mem_len > 0) {
+                mem_range = acpi_data_push(nstat->mem_ranges,
+                                           sizeof *mem_range);
+                mem_range->base = mem_base;
+                mem_range->length = mem_len;
+                mem_range->node = i;
+                nstat->mem_ranges_num++;
+            }
+            mem_base = 1ULL << 32;
+            mem_len = next_base - pcms->below_4g_mem_size;
+            next_base = mem_base + mem_len;
+        }
+        if (mem_len > 0) {
+            mem_range = acpi_data_push(nstat->mem_ranges,
+                                       sizeof *mem_range);
+            mem_range->base = mem_base;
+            mem_range->length = mem_len;
+            mem_range->node = i;
+            nstat->mem_ranges_num++;
+        }
+    }
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
     AcpiSystemResourceAffinityTable *srat;
     AcpiSratMemoryAffinity *numamem;
 
-    int i;
-    int srat_start, numa_start, slots;
-    uint64_t mem_len, mem_base, next_base;
+    int i, srat_start, numa_start, slots;
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
     PCMachineState *pcms = PC_MACHINE(machine);
+    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
+    AcpiDeviceIf *adev = ACPI_DEVICE_IF(pcms->acpi_dev);
+    NumaState *nstat = machine->numa_state;
+    NumaMemRange *mem_range;
     ram_addr_t hotplugabble_address_space_size =
         object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
                                 NULL);
@@ -2327,57 +2398,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
         }
     }
 
+    if (pcms->numa_nodes && !nstat->mem_ranges_num) {
+        nstat->mem_ranges = g_array_new(false, true /* clear */,
+                                        sizeof *mem_range);
+        adevc->build_mem_ranges(adev, machine);
+    }
 
-    /* the memory map is a bit tricky, it contains at least one hole
-     * from 640k-1M and possibly another one from 3.5G-4G.
-     */
-    next_base = 0;
     numa_start = table_data->len;
 
-    for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-        mem_base = next_base;
-        mem_len = pcms->node_mem[i - 1];
-        next_base = mem_base + mem_len;
-
-        /* Cut out the 640K hole */
-        if (mem_base <= HOLE_640K_START &&
-            next_base > HOLE_640K_START) {
-            mem_len -= next_base - HOLE_640K_START;
-            if (mem_len > 0) {
-                numamem = acpi_data_push(table_data, sizeof *numamem);
-                build_srat_memory(numamem, mem_base, mem_len, i - 1,
-                                  MEM_AFFINITY_ENABLED);
-            }
-
-            /* Check for the rare case: 640K < RAM < 1M */
-            if (next_base <= HOLE_640K_END) {
-                next_base = HOLE_640K_END;
-                continue;
-            }
-            mem_base = HOLE_640K_END;
-            mem_len = next_base - HOLE_640K_END;
-        }
-
-        /* Cut out the ACPI_PCI hole */
-        if (mem_base <= pcms->below_4g_mem_size &&
-            next_base > pcms->below_4g_mem_size) {
-            mem_len -= next_base - pcms->below_4g_mem_size;
-            if (mem_len > 0) {
-                numamem = acpi_data_push(table_data, sizeof *numamem);
-                build_srat_memory(numamem, mem_base, mem_len, i - 1,
-                                  MEM_AFFINITY_ENABLED);
-            }
-            mem_base = 1ULL << 32;
-            mem_len = next_base - pcms->below_4g_mem_size;
-            next_base = mem_base + mem_len;
-        }
-
-        if (mem_len > 0) {
+    for (i = 0; i < nstat->mem_ranges_num; i++) {
+        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
+        if (mem_range->length > 0) {
             numamem = acpi_data_push(table_data, sizeof *numamem);
-            build_srat_memory(numamem, mem_base, mem_len, i - 1,
+            build_srat_memory(numamem, mem_range->base,
+                              mem_range->length,
+                              mem_range->node,
                               MEM_AFFINITY_ENABLED);
         }
     }
+
     slots = (table_data->len - numa_start) / sizeof *numamem;
     for (; slots < pcms->numa_nodes + 2; slots++) {
         numamem = acpi_data_push(table_data, sizeof *numamem);
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 35d17246e9..20d919c63d 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -801,6 +801,7 @@ static void ich9_lpc_class_init(ObjectClass *klass, void *data)
     adevc->ospm_status = ich9_pm_ospm_status;
     adevc->send_event = ich9_send_gpe;
     adevc->madt_cpu = pc_madt_cpu_entry;
+    adevc->build_mem_ranges = pc_build_mem_ranges;
 }
 
 static const TypeInfo ich9_lpc_info = {
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index 43ff119179..5956b5ea33 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -39,6 +39,8 @@ void acpi_send_event(DeviceState *dev, AcpiEventStatusBits event);
  *           for CPU indexed by @uid in @apic_ids array,
  *           returned structure types are:
  *           0 - Local APIC, 9 - Local x2APIC, 0xB - GICC
+ * build_mem_ranges: build memory ranges of ACPI SRAT (except misc
+ * and hotplug SRAT ranges) and HMAT
  *
  * Interface is designed for providing unified interface
  * to generic ACPI functionality that could be used without
@@ -54,5 +56,7 @@ typedef struct AcpiDeviceIfClass {
     void (*send_event)(AcpiDeviceIf *adev, AcpiEventStatusBits ev);
     void (*madt_cpu)(AcpiDeviceIf *adev, int uid,
                      const CPUArchIdList *apic_ids, GArray *entry);
+    void (*build_mem_ranges)(AcpiDeviceIf *adev, MachineState *ms);
+
 } AcpiDeviceIfClass;
 #endif
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 5d5636241e..21b9ac3d11 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -281,6 +281,7 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 /* acpi-build.c */
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
                        const CPUArchIdList *apic_ids, GArray *entry);
+void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms);
 
 /* e820 types */
 #define E820_RAM        1
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 437eb21fef..e3c85b77bc 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -20,6 +20,12 @@ struct NumaNodeMem {
     uint64_t node_plugged_mem;
 };
 
+typedef struct NumaMemRange {
+    uint64_t base;
+    uint64_t length;
+    uint32_t node;
+} NumaMemRange;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -29,6 +35,12 @@ struct NumaState {
 
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
+
+    /* Number of NUMA memory ranges */
+    uint32_t mem_ranges_num;
+
+    /* NUMA memory ranges */
+    GArray *mem_ranges;
 };
 typedef struct NumaState NumaState;
 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 9c7393b08c..4f0cdc1a45 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -33,6 +33,7 @@ stub-obj-y += qmp_memory_device.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += target-get-monitor-def.o
 stub-obj-y += pc_madt_cpu_entry.o
+stub-obj-y += pc_build_mem_ranges.o
 stub-obj-y += vmgenid.o
 stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
diff --git a/stubs/pc_build_mem_ranges.c b/stubs/pc_build_mem_ranges.c
new file mode 100644
index 0000000000..997cdfe00b
--- /dev/null
+++ b/stubs/pc_build_mem_ranges.c
@@ -0,0 +1,14 @@
+/*
+ * Stub for pc_build_mem_ranges().
+ * piix4 is used not only pc, but also mips and etc. In order to add
+ * build_mem_ranges callback to AcpiDeviceIfClass and use pc_build_mem_ranges
+ * in hw/acpi/piix4.c, pc_build_mem_ranges() stub is added to make other arch
+ * can compile successfully.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/pc.h"
+
+void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
+{
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (4 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-27 15:56   ` Jonathan Cameron
  2019-07-01 11:25   ` Igor Mammedov
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 7/8] hmat acpi: Build System Locality Latency and Bandwidth Information " Tao Xu
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

From: Liu Jingqi <jingqi.liu@intel.com>

HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table (HMAT).
The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
System Physical Address (SPA) Memory Ranges. The software is
expected to use this information as hint for optimization.

This structure describes the System Physical Address(SPA) range
occupied by memory subsystem and its associativity with processor
proximity domain as well as hint for memory usage.

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Add more descriptions from ACPI spec (Igor)
    - Remove all the dependcy on PCMachineState (Igor)
---
 hw/acpi/Kconfig       |   5 ++
 hw/acpi/Makefile.objs |   1 +
 hw/acpi/hmat.c        | 153 ++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/hmat.h        |  43 ++++++++++++
 hw/core/machine.c     |   2 +
 hw/i386/acpi-build.c  |   3 +
 include/sysemu/numa.h |   2 +
 numa.c                |   6 ++
 8 files changed, 215 insertions(+)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 7c59cf900b..039bb99efa 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
     select ACPI_NVDIMM
     select ACPI_CPU_HOTPLUG
     select ACPI_MEMORY_HOTPLUG
+    select ACPI_HMAT
 
 config ACPI_X86_ICH
     bool
@@ -31,3 +32,7 @@ config ACPI_VMGENID
     bool
     default y
     depends on PC
+
+config ACPI_HMAT
+    bool
+    depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 661a9b8c2f..20cc2fb124 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 0000000000..6fd434c4d9
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,153 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+#include "hw/mem/pc-dimm.h"
+
+/* ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure: Table 5-141 */
+static void build_hmat_spa(GArray *table_data, uint16_t flags,
+                           uint64_t base, uint64_t length, int node)
+{
+
+    /* Memory Subsystem Address Range Structure */
+    /* Type */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 40, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Process Proximity Domain */
+    build_append_int_noprefix(table_data, node, 4);
+    /* Memory Proximity Domain */
+    build_append_int_noprefix(table_data, node, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* System Physical Address Range Base */
+    build_append_int_noprefix(table_data, base, 8);
+    /* System Physical Address Range Length */
+    build_append_int_noprefix(table_data, length, 8);
+}
+
+static int pc_dimm_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+        DeviceState *dev = DEVICE(obj);
+        if (dev->realized) { /* only realized memory devices matter */
+            *list = g_slist_append(*list, DEVICE(obj));
+        }
+    }
+
+    object_child_foreach(obj, pc_dimm_device_list, opaque);
+    return 0;
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
+{
+    GSList *device_list = NULL;
+    uint16_t flags;
+    uint64_t mem_base, mem_len;
+    int i;
+    NumaState *nstat = ms->numa_state;
+    NumaMemRange *mem_range;
+
+    Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
+    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
+    AcpiDeviceIf *adev = ACPI_DEVICE_IF(obj);
+
+    /*
+     * ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure:
+     * Table 5-141. The Proximity Domain of System Physical Address
+     * ranges defined in the HMAT, NFIT and SRAT tables shall match
+     * each other.
+     */
+    if (nstat->num_nodes && !nstat->mem_ranges_num) {
+        nstat->mem_ranges = g_array_new(false, true /* clear */,
+                                        sizeof *mem_range);
+        adevc->build_mem_ranges(adev, ms);
+    }
+
+    for (i = 0; i < nstat->mem_ranges_num; i++) {
+        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
+        flags = 0;
+
+        if (nstat->nodes[mem_range->node].is_initiator) {
+            flags |= HMAT_SPA_PROC_VALID;
+        }
+        if (nstat->nodes[mem_range->node].is_target) {
+            flags |= HMAT_SPA_MEM_VALID;
+        }
+
+        build_hmat_spa(table_data, flags, mem_range->base,
+                       mem_range->length,
+                       mem_range->node);
+    }
+
+    /* Build HMAT SPA structures for PC-DIMM devices. */
+    object_child_foreach(OBJECT(ms), pc_dimm_device_list, &device_list);
+
+    for (; device_list; device_list = device_list->next) {
+        PCDIMMDevice *dimm = device_list->data;
+        mem_base = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
+                                            NULL);
+        mem_len = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP,
+                                           NULL);
+        i = object_property_get_uint(OBJECT(dimm), PC_DIMM_NODE_PROP, NULL);
+        flags = 0;
+
+        if (nstat->nodes[i].is_initiator) {
+            flags |= HMAT_SPA_PROC_VALID;
+        }
+        if (nstat->nodes[i].is_target) {
+            flags |= HMAT_SPA_MEM_VALID;
+        }
+        build_hmat_spa(table_data, flags, mem_base, mem_len, i);
+    }
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms)
+{
+    uint64_t hmat_start;
+
+    hmat_start = table_data->len;
+
+    /* reserve space for HMAT header  */
+    acpi_data_push(table_data, 40);
+
+    hmat_build_table_structs(table_data, ms);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + hmat_start),
+                 "HMAT", table_data->len - hmat_start, 1, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 0000000000..e24b673fad
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,43 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.2.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/aml-build.h"
+
+/* the values of AcpiHmatSpaRange flag */
+enum {
+    HMAT_SPA_PROC_VALID       = 0x1,
+    HMAT_SPA_MEM_VALID        = 0x2,
+    HMAT_SPA_RESERVATION_HINT = 0x4,
+};
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms);
+
+#endif
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 14b29de0a9..2ad09ec23e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -646,6 +646,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props, Error **errp)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
     bool match = false;
     int i;
 
@@ -706,6 +707,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
         match = true;
         slot->props.node_id = props->node_id;
         slot->props.has_node_id = props->has_node_id;
+        numa_info[props->node_id].is_initiator = true;
     }
 
     if (!match) {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 44dd447fa5..6584eac76e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -66,6 +66,7 @@
 #include "hw/i386/intel_iommu.h"
 
 #include "hw/acpi/ipmi.h"
+#include "hw/acpi/hmat.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
+        acpi_add_table(table_offsets, tables_blob);
+        build_hmat(tables_blob, tables->linker, machine);
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index e3c85b77bc..13cff59112 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -10,6 +10,8 @@ struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
+    bool is_initiator;
+    bool is_target;
     uint8_t distance[MAX_NODES];
 };
 
diff --git a/numa.c b/numa.c
index d23e130bce..5556d118c3 100644
--- a/numa.c
+++ b/numa.c
@@ -102,6 +102,10 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         }
     }
 
+    if (node->cpus) {
+        numa_info[nodenr].is_initiator = true;
+    }
+
     if (node->has_mem && node->has_memdev) {
         error_setg(errp, "cannot specify both mem= and memdev=");
         return;
@@ -118,6 +122,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
 
     if (node->has_mem) {
         numa_info[nodenr].node_mem = node->mem;
+        numa_info[nodenr].is_target = true;
     }
     if (node->has_memdev) {
         Object *o;
@@ -130,6 +135,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         object_ref(o);
         numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
+        numa_info[nodenr].is_target = true;
     }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 7/8] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) in ACPI HMAT
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (5 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 8/8] numa: Extend the command-line to provide memory latency and bandwidth information Tao Xu
  2019-07-01 13:37 ` [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Igor Mammedov
  8 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Separate hmat_build_lb() (Igor)
    - Add more descriptions from ACPI spec (Igor)
    - Drop all global variables and use local variables instead (Igor)
---
 hw/acpi/hmat.c          | 101 +++++++++++++++++++++++++++++++++++++++-
 hw/acpi/hmat.h          |  39 ++++++++++++++++
 include/qemu/typedefs.h |   1 +
 include/sysemu/numa.h   |   3 ++
 include/sysemu/sysemu.h |  24 ++++++++++
 5 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 6fd434c4d9..7da674825f 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -57,6 +57,74 @@ static void build_hmat_spa(GArray *table_data, uint16_t flags,
     build_append_int_noprefix(table_data, length, 8);
 }
 
+/*
+ * ACPI 6.2: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *numa_hmat_lb,
+                          uint32_t num_initiator, uint32_t num_target,
+                          uint32_t *initiator_pxm, uint32_t *target_pxm,
+                          int type)
+{
+    uint32_t s = num_initiator;
+    uint32_t t = num_target;
+    uint8_t m, n;
+    int i, j;
+
+    /* Type */
+    build_append_int_noprefix(table_data, 1, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, numa_hmat_lb->hierarchy, 1);
+    /* Data Type */
+    build_append_int_noprefix(table_data, numa_hmat_lb->data_type, 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Number of Initiator Proximity Domains (s) */
+    build_append_int_noprefix(table_data, s, 4);
+    /* Number of Target Proximity Domains (t) */
+    build_append_int_noprefix(table_data, t, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+
+    /* Entry Base Unit */
+    if (type <= HMAT_LB_DATA_WRITE_LATENCY) {
+        build_append_int_noprefix(table_data, numa_hmat_lb->base_lat, 8);
+    } else {
+        build_append_int_noprefix(table_data, numa_hmat_lb->base_bw, 8);
+    }
+
+    /* Initiator Proximity Domain List */
+    for (i = 0; i < s; i++) {
+        build_append_int_noprefix(table_data, initiator_pxm[i], 4);
+    }
+
+    /* Target Proximity Domain List */
+    for (i = 0; i < t; i++) {
+        build_append_int_noprefix(table_data, target_pxm[i], 4);
+    }
+
+    /* Latency or Bandwidth Entries */
+    for (i = 0; i < s; i++) {
+        m = initiator_pxm[i];
+        for (j = 0; j < t; j++) {
+            n = target_pxm[j];
+            uint16_t entry;
+
+            if (type <= HMAT_LB_DATA_WRITE_LATENCY) {
+                entry = numa_hmat_lb->latency[m][n] * numa_hmat_lb->base_lat;
+            } else {
+                entry = numa_hmat_lb->bandwidth[m][n] * numa_hmat_lb->base_bw;
+            }
+
+            build_append_int_noprefix(table_data, entry, 2);
+        }
+    }
+}
+
 static int pc_dimm_device_list(Object *obj, void *opaque)
 {
     GSList **list = opaque;
@@ -77,10 +145,13 @@ static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
 {
     GSList *device_list = NULL;
     uint16_t flags;
+    uint32_t num_initiator = 0, num_target = 0;
+    uint32_t initiator_pxm[MAX_NODES], target_pxm[MAX_NODES];
     uint64_t mem_base, mem_len;
-    int i;
+    int i, hrchy, type;
     NumaState *nstat = ms->numa_state;
     NumaMemRange *mem_range;
+    HMAT_LB_Info *numa_hmat_lb;
 
     Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
@@ -134,6 +205,34 @@ static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
         }
         build_hmat_spa(table_data, flags, mem_base, mem_len, i);
     }
+
+    if (!num_initiator && !num_target) {
+        for (i = 0; i < nstat->num_nodes; i++) {
+            if (nstat->nodes[i].is_initiator) {
+                initiator_pxm[num_initiator++] = i;
+            }
+            if (nstat->nodes[i].is_target) {
+                target_pxm[num_target++] = i;
+            }
+        }
+    }
+
+    /*
+     * ACPI 6.2: 5.2.27.4 System Locality Latency and Bandwidth Information
+     * Structure: Table 5-142
+     */
+    for (hrchy = HMAT_LB_MEM_MEMORY;
+         hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
+        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+            numa_hmat_lb = nstat->hmat_lb[hrchy][type];
+
+            if (numa_hmat_lb) {
+                build_hmat_lb(table_data, numa_hmat_lb, num_initiator,
+                              num_target, initiator_pxm, target_pxm, type);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index e24b673fad..914a5e3b91 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -38,6 +38,45 @@ enum {
     HMAT_SPA_RESERVATION_HINT = 0x4,
 };
 
+struct HMAT_LB_Info {
+    /*
+     * Indicates total number of Proximity Domains
+     * that can initiate memory access requests.
+     */
+    uint32_t    num_initiator;
+    /*
+     * Indicates total number of Proximity Domains
+     * that can act as target.
+     */
+    uint32_t    num_target;
+    /*
+     * Indicates it's memory or
+     * the specified level memory side cache.
+     */
+    uint8_t     hierarchy;
+    /*
+     * Present the type of data,
+     * access/read/write latency or bandwidth.
+     */
+    uint8_t     data_type;
+    /* The base unit for latency in nanoseconds. */
+    uint64_t    base_lat;
+    /* The base unit for bandwidth in megabytes per second(MB/s). */
+    uint64_t    base_bw;
+    /*
+     * latency[i][j]:
+     * Indicates the latency based on base_lat
+     * from Initiator Proximity Domain i to Target Proximity Domain j.
+     */
+    uint16_t    latency[MAX_NODES][MAX_NODES];
+    /*
+     * bandwidth[i][j]:
+     * Indicates the bandwidth based on base_bw
+     * from Initiator Proximity Domain i to Target Proximity Domain j.
+     */
+    uint16_t    bandwidth[MAX_NODES][MAX_NODES];
+};
+
 void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index fcdaae58c4..c0257e936b 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -33,6 +33,7 @@ typedef struct FWCfgEntry FWCfgEntry;
 typedef struct FWCfgIoState FWCfgIoState;
 typedef struct FWCfgMemState FWCfgMemState;
 typedef struct FWCfgState FWCfgState;
+typedef struct HMAT_LB_Info HMAT_LB_Info;
 typedef struct HVFX86EmulatorState HVFX86EmulatorState;
 typedef struct I2CBus I2CBus;
 typedef struct I2SCodec I2SCodec;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 13cff59112..026dbeb78c 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -43,6 +43,9 @@ struct NumaState {
 
     /* NUMA memory ranges */
     GArray *mem_ranges;
+
+    /* NUMA modes HMAT Locality Latency and Bandwidth Information */
+    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
 };
 typedef struct NumaState NumaState;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 61579ae71e..85c584c531 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -124,6 +124,30 @@ extern int mem_prealloc;
 #define NUMA_DISTANCE_MAX         254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+    HMAT_LB_MEM_MEMORY           = 0,
+    HMAT_LB_MEM_CACHE_LAST_LEVEL = 1,
+    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 2,
+    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 3,
+    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 4,
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+    HMAT_LB_DATA_READ_LATENCY     = 1,
+    HMAT_LB_DATA_WRITE_LATENCY    = 2,
+    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+};
+
+#define MAX_HMAT_CACHE_LEVEL        3
+
+#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
+#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
+
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
     const char *name;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 8/8] numa: Extend the command-line to provide memory latency and bandwidth information
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (6 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 7/8] hmat acpi: Build System Locality Latency and Bandwidth Information " Tao Xu
@ 2019-06-14 15:56 ` Tao Xu
  2019-07-01 13:37 ` [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Igor Mammedov
  8 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-06-14 15:56 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost; +Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v5 -> v4:
    - Add error message when base unit < 10
    - Add more descriptions about option hmat-lb (Igor)
    - Fix some spell error
    - Update the hmat-lb option example by using '-numa cpu'
    and '-numa memdev' (Igor)
---
 numa.c          | 135 ++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/misc.json  |  94 ++++++++++++++++++++++++++++++++-
 qemu-options.hx |  45 +++++++++++++++-
 3 files changed, 271 insertions(+), 3 deletions(-)

diff --git a/numa.c b/numa.c
index 5556d118c3..ca9d99743a 100644
--- a/numa.c
+++ b/numa.c
@@ -40,6 +40,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
+#include "hw/acpi/hmat.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -179,6 +180,134 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+static void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                               Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
+    HMAT_LB_Info *hmat_lb = NULL;
+
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option.");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency.");
+            return;
+        }
+        if (node->has_base_bw) {
+            error_setg(errp, "Invalid option 'base_bw' since "
+                       "the data type is latency.");
+            return;
+        }
+    }
+
+    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option.");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+        if (node->has_base_lat) {
+            error_setg(errp, "Invalid option 'base_lat' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+    }
+
+    if (node->initiator >= nb_numa_nodes) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->initiator, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->initiator].is_initiator) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it isn't an initiator proximity domain.",
+                   node->initiator);
+        return;
+    }
+
+    if (node->target >= nb_numa_nodes) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->target, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->target].is_target) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it isn't a target proximity domain.",
+                   node->target);
+        return;
+    }
+
+    if (node->has_latency) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->latency[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the latency for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if ((hmat_lb->base_lat == 0) && (node->has_base_lat)) {
+            if (node->base_lat >= 10) {
+                hmat_lb->base_lat = node->base_lat;
+            } else {
+                error_setg(errp, "The minimum latency base unit is 10.");
+                return;
+            }
+        }
+
+        hmat_lb->latency[node->initiator][node->target] = node->latency;
+    }
+
+    if (node->has_bandwidth) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the bandwidth for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if (hmat_lb->base_bw == 0) {
+            if (!node->has_base_bw) {
+                error_setg(errp, "Missing 'base-bw' option");
+                return;
+            } else if (node->base_bw < 10) {
+                error_setg(errp, "The minimum bandwidth base unit is 10.");
+                return;
+            } else {
+                hmat_lb->base_bw = node->base_bw;
+            }
+        }
+
+        hmat_lb->bandwidth[node->initiator][node->target] = node->bandwidth;
+    }
+
+    if (hmat_lb) {
+        hmat_lb->hierarchy = node->hierarchy;
+        hmat_lb->data_type = node->data_type;
+    }
+}
+
 static
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
@@ -218,6 +347,12 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        parse_numa_hmat_lb(ms, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/qapi/misc.json b/qapi/misc.json
index 8b3ca4fdd3..a3fe411137 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2539,10 +2539,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.1)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -2557,7 +2559,8 @@
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -2620,6 +2623,93 @@
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# @memory: the structure represents the memory performance
+#
+# @last-level: last level memory of memory side cached memory
+#
+# @first-level: first level memory of memory side cached memory
+#
+# @second-level: second level memory of memory side cached memory
+#
+# @third-level: third level memory of memory side cached memory
+#
+# Since: 4.1
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'last-level', 'first-level',
+            'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)
+#
+# Since: 4.1
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @base-lat: the base unit for latency in nanoseconds.
+#
+# @base-bw: the base unit for bandwidth in megabytes per second(MB/s).
+#
+# @latency: the value of latency based on Base Unit from @initiator
+#           to @target proximity domain.
+#
+# @bandwidth: the value of bandwidth based on Base Unit between
+#             @initiator and @target proximity domain.
+#
+# Since: 4.1
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*base-lat': 'uint64',
+    '*base-bw': 'uint64',
+    '*latency': 'uint16',
+    '*bandwidth': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 0d8beb4afd..4179be516f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -163,16 +163,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|last-level|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,base-lat=blat][,base-bw=bbw][,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,base-lat=@var{blat}][,base-bw=@var{bbw}][,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -230,6 +233,46 @@ specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually including one or more processors.
+Target NUMA node contains addressable memory.
+[,base-lat=@var{blat}][,base-bw=@var{bbw}][,latency=@var{lat}][,bandwidth=@var{bw}]
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
+is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
+represents the memory performance; if @var{str} is 'last-level|first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{str} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
+or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
+'last-level|first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency(nanoseconds)
+or 'access|read|write' hit bandwidth of the target memory side cache. @var{blat}
+or @var{bbw} is Matrix Entry Values(latency or bandwidth) base unit used for normalizing
+the matrix entry values(which store the latency or bandwidth values). Base unit
+for latency in nanoseconds. Base unit for bandwidth in megabytes per second(MB/s).
+Note: Due to the minimum matrix value entry value being 10, the base unit corresponds
+to a value of 10. And @var{blat} or @var{bbw} should be an integer. @var{lat} or
+@var{bw} is the latency/bandwidth value.
+
+For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
+a ram, node 1 has only a ran. The processors in node 0 access memory in node
+0 with access-latency 5 nanoseconds(base latency is 10), access-bandwidth 5 MB/s(base latency is 20);
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+nanoseconds(base latency is 10), access-bandwidth 10 MB/s(base latency is 20):
+@example
+-m 2G \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
+-smp 2 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 \
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb Tao Xu
@ 2019-06-27 12:42   ` Igor Mammedov
  0 siblings, 0 replies; 25+ messages in thread
From: Igor Mammedov @ 2019-06-27 12:42 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:19 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> In struct arm_boot_info, kernel_filename, initrd_filename and
> kernel_cmdline are copied from from MachineState. This patch add
> MachineState as a parameter into arm_load_dtb() and move the copy chunk
> of kernel_filename, initrd_filename and kernel_cmdline into
> arm_load_kernel().
> 
> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/arm/aspeed.c           |  5 +----
>  hw/arm/boot.c             | 14 ++++++++------
>  hw/arm/collie.c           |  8 +-------
>  hw/arm/cubieboard.c       |  5 +----
>  hw/arm/exynos4_boards.c   |  7 ++-----
>  hw/arm/highbank.c         |  8 +-------
>  hw/arm/imx25_pdk.c        |  5 +----
>  hw/arm/integratorcp.c     |  8 +-------
>  hw/arm/kzm.c              |  5 +----
>  hw/arm/mainstone.c        |  5 +----
>  hw/arm/mcimx6ul-evk.c     |  5 +----
>  hw/arm/mcimx7d-sabre.c    |  5 +----
>  hw/arm/musicpal.c         |  8 +-------
>  hw/arm/nseries.c          |  5 +----
>  hw/arm/omap_sx1.c         |  5 +----
>  hw/arm/palm.c             | 10 ++--------
>  hw/arm/raspi.c            |  6 +-----
>  hw/arm/realview.c         |  5 +----
>  hw/arm/sabrelite.c        |  5 +----
>  hw/arm/spitz.c            |  5 +----
>  hw/arm/tosa.c             |  8 +-------
>  hw/arm/versatilepb.c      |  5 +----
>  hw/arm/vexpress.c         |  5 +----
>  hw/arm/virt.c             |  8 +++-----
>  hw/arm/xilinx_zynq.c      |  8 +-------
>  hw/arm/xlnx-versal-virt.c |  7 ++-----
>  hw/arm/xlnx-zcu102.c      |  5 +----
>  hw/arm/z2.c               |  8 +-------
>  include/hw/arm/boot.h     |  4 ++--
>  29 files changed, 42 insertions(+), 145 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 33070a6df8..8b9fb606c0 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -226,9 +226,6 @@ static void aspeed_board_init(MachineState *machine,
>          write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
>      }
>  
> -    aspeed_board_binfo.kernel_filename = machine->kernel_filename;
> -    aspeed_board_binfo.initrd_filename = machine->initrd_filename;
> -    aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      aspeed_board_binfo.ram_size = ram_size;
>      aspeed_board_binfo.loader_start = sc->info->sdram_base;
>  
> @@ -236,7 +233,7 @@ static void aspeed_board_init(MachineState *machine,
>          cfg->i2c_init(bmc);
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
>  }
>  
>  static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 7279185bd9..30acdbe824 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -523,7 +523,7 @@ static void fdt_add_psci_node(void *fdt)
>  }
>  
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as)
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms)
>  {
>      void *fdt = NULL;
>      int size, rc, n = 0;
> @@ -626,9 +626,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          qemu_fdt_add_subnode(fdt, "/chosen");
>      }
>  
> -    if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
> +    if (ms->kernel_cmdline && *ms->kernel_cmdline) {
>          rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
> -                                     binfo->kernel_cmdline);
> +                                     ms->kernel_cmdline);
>          if (rc < 0) {
>              fprintf(stderr, "couldn't set /chosen/bootargs\n");
>              goto fail;
> @@ -1201,7 +1201,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct arm_boot_info *info)
>       */
>  }
>  
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
>  {
>      CPUState *cs;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1222,7 +1222,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>       * doesn't support secure.
>       */
>      assert(!(info->secure_board_setup && kvm_enabled()));
> -
> +    info->kernel_filename = ms->kernel_filename;
> +    info->kernel_cmdline = ms->kernel_cmdline;
> +    info->initrd_filename = ms->initrd_filename;
>      info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
>      info->dtb_limit = 0;
>  
> @@ -1234,7 +1236,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>      }
>  
>      if (!info->skip_dtb_autoload && have_dtb(info)) {
> -        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +        if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>              exit(1);
>          }
>      }
> diff --git a/hw/arm/collie.c b/hw/arm/collie.c
> index 3db3c56004..72bc8f26e5 100644
> --- a/hw/arm/collie.c
> +++ b/hw/arm/collie.c
> @@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
>  
>  static void collie_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      StrongARMState *s;
>      DriveInfo *dinfo;
>      MemoryRegion *sysmem = get_system_memory();
> @@ -47,11 +44,8 @@ static void collie_init(MachineState *machine)
>  
>      sysbus_create_simple("scoop", 0x40800000, NULL);
>  
> -    collie_binfo.kernel_filename = kernel_filename;
> -    collie_binfo.kernel_cmdline = kernel_cmdline;
> -    collie_binfo.initrd_filename = initrd_filename;
>      collie_binfo.board_id = 0x208;
> -    arm_load_kernel(s->cpu, &collie_binfo);
> +    arm_load_kernel(s->cpu, machine, &collie_binfo);
>  }
>  
>  static void collie_machine_init(MachineClass *mc)
> diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
> index 84187d3916..2f82a77dbd 100644
> --- a/hw/arm/cubieboard.c
> +++ b/hw/arm/cubieboard.c
> @@ -73,10 +73,7 @@ static void cubieboard_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      cubieboard_binfo.ram_size = machine->ram_size;
> -    cubieboard_binfo.kernel_filename = machine->kernel_filename;
> -    cubieboard_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    cubieboard_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(&s->a10->cpu, &cubieboard_binfo);
> +    arm_load_kernel(&s->a10->cpu, machine, &cubieboard_binfo);
>  }
>  
>  static void cubieboard_machine_init(MachineClass *mc)
> diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
> index 71f58586c1..25c1fb40a9 100644
> --- a/hw/arm/exynos4_boards.c
> +++ b/hw/arm/exynos4_boards.c
> @@ -121,9 +121,6 @@ exynos4_boards_init_common(MachineState *machine,
>      exynos4_board_binfo.board_id = exynos4_board_id[board_type];
>      exynos4_board_binfo.smp_bootreg_addr =
>              exynos4_board_smp_bootreg_addr[board_type];
> -    exynos4_board_binfo.kernel_filename = machine->kernel_filename;
> -    exynos4_board_binfo.initrd_filename = machine->initrd_filename;
> -    exynos4_board_binfo.kernel_cmdline = machine->kernel_cmdline;
>      exynos4_board_binfo.gic_cpu_if_addr =
>              EXYNOS4210_SMP_PRIVATE_BASE_ADDR + 0x100;
>  
> @@ -142,7 +139,7 @@ static void nuri_init(MachineState *machine)
>  {
>      exynos4_boards_init_common(machine, EXYNOS4_BOARD_NURI);
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>  
>  static void smdkc210_init(MachineState *machine)
> @@ -152,7 +149,7 @@ static void smdkc210_init(MachineState *machine)
>  
>      lan9215_init(SMDK_LAN9118_BASE_ADDR,
>              qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
> -    arm_load_kernel(ARM_CPU(first_cpu), &exynos4_board_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &exynos4_board_binfo);
>  }
>  
>  static void nuri_class_init(ObjectClass *oc, void *data)
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index a89a1d3a7c..0b2603b774 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -233,9 +233,6 @@ enum cxmachines {
>  static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      DeviceState *dev = NULL;
>      SysBusDevice *busdev;
>      qemu_irq pic[128];
> @@ -386,9 +383,6 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      highbank_binfo.ram_size = ram_size;
> -    highbank_binfo.kernel_filename = kernel_filename;
> -    highbank_binfo.kernel_cmdline = kernel_cmdline;
> -    highbank_binfo.initrd_filename = initrd_filename;
>      /* highbank requires a dtb in order to boot, and the dtb will override
>       * the board ID. The following value is ignored, so set it to -1 to be
>       * clear that the value is meaningless.
> @@ -408,7 +402,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id)
>                      "may not boot.");
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &highbank_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &highbank_binfo);
>  }
>  
>  static void highbank_init(MachineState *machine)
> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
> index a0423ffb67..5101201f53 100644
> --- a/hw/arm/imx25_pdk.c
> +++ b/hw/arm/imx25_pdk.c
> @@ -117,9 +117,6 @@ static void imx25_pdk_init(MachineState *machine)
>      }
>  
>      imx25_pdk_binfo.ram_size = machine->ram_size;
> -    imx25_pdk_binfo.kernel_filename = machine->kernel_filename;
> -    imx25_pdk_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    imx25_pdk_binfo.initrd_filename = machine->initrd_filename;
>      imx25_pdk_binfo.loader_start = FSL_IMX25_SDRAM0_ADDR;
>      imx25_pdk_binfo.board_id = 1771,
>      imx25_pdk_binfo.nb_cpus = 1;
> @@ -130,7 +127,7 @@ static void imx25_pdk_init(MachineState *machine)
>       * fail.
>       */
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &imx25_pdk_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &imx25_pdk_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
> index d18caab8bd..95df650d8e 100644
> --- a/hw/arm/integratorcp.c
> +++ b/hw/arm/integratorcp.c
> @@ -579,9 +579,6 @@ static struct arm_boot_info integrator_binfo = {
>  static void integratorcp_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      Object *cpuobj;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
> @@ -651,10 +648,7 @@ static void integratorcp_init(MachineState *machine)
>      sysbus_create_simple("pl110", 0xc0000000, pic[22]);
>  
>      integrator_binfo.ram_size = ram_size;
> -    integrator_binfo.kernel_filename = kernel_filename;
> -    integrator_binfo.kernel_cmdline = kernel_cmdline;
> -    integrator_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &integrator_binfo);
> +    arm_load_kernel(cpu, machine, &integrator_binfo);
>  }
>  
>  static void integratorcp_machine_init(MachineClass *mc)
> diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
> index 44cba8782b..a867d06ec7 100644
> --- a/hw/arm/kzm.c
> +++ b/hw/arm/kzm.c
> @@ -127,13 +127,10 @@ static void kzm_init(MachineState *machine)
>      }
>  
>      kzm_binfo.ram_size = machine->ram_size;
> -    kzm_binfo.kernel_filename = machine->kernel_filename;
> -    kzm_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    kzm_binfo.initrd_filename = machine->initrd_filename;
>      kzm_binfo.nb_cpus = 1;
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu, &kzm_binfo);
> +        arm_load_kernel(&s->soc.cpu, machine, &kzm_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
> index cd1f904c6c..c76cfb5dd1 100644
> --- a/hw/arm/mainstone.c
> +++ b/hw/arm/mainstone.c
> @@ -177,11 +177,8 @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
>      smc91c111_init(&nd_table[0], MST_ETH_PHYS,
>                      qdev_get_gpio_in(mst_irq, ETHERNET_IRQ));
>  
> -    mainstone_binfo.kernel_filename = machine->kernel_filename;
> -    mainstone_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    mainstone_binfo.initrd_filename = machine->initrd_filename;
>      mainstone_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &mainstone_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &mainstone_binfo);
>  }
>  
>  static void mainstone_init(MachineState *machine)
> diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
> index fb2b015bf6..1f0fed37c0 100644
> --- a/hw/arm/mcimx6ul-evk.c
> +++ b/hw/arm/mcimx6ul-evk.c
> @@ -40,9 +40,6 @@ static void mcimx6ul_evk_init(MachineState *machine)
>          .loader_start = FSL_IMX6UL_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = smp_cpus,
>      };
>  
> @@ -72,7 +69,7 @@ static void mcimx6ul_evk_init(MachineState *machine)
>      }
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &boot_info);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
>      }
>  }
>  
> diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
> index 9c5f0e70c3..accc731cf9 100644
> --- a/hw/arm/mcimx7d-sabre.c
> +++ b/hw/arm/mcimx7d-sabre.c
> @@ -43,9 +43,6 @@ static void mcimx7d_sabre_init(MachineState *machine)
>          .loader_start = FSL_IMX7_MMDC_ADDR,
>          .board_id = -1,
>          .ram_size = machine->ram_size,
> -        .kernel_filename = machine->kernel_filename,
> -        .kernel_cmdline = machine->kernel_cmdline,
> -        .initrd_filename = machine->initrd_filename,
>          .nb_cpus = smp_cpus,
>      };
>  
> @@ -75,7 +72,7 @@ static void mcimx7d_sabre_init(MachineState *machine)
>      }
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &boot_info);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &boot_info);
>      }
>  }
>  
> diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
> index 5645997b56..e4ec017d15 100644
> --- a/hw/arm/musicpal.c
> +++ b/hw/arm/musicpal.c
> @@ -1569,9 +1569,6 @@ static struct arm_boot_info musicpal_binfo = {
>  
>  static void musicpal_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      qemu_irq pic[32];
>      DeviceState *dev;
> @@ -1700,10 +1697,7 @@ static void musicpal_init(MachineState *machine)
>      sysbus_connect_irq(s, 0, pic[MP_AUDIO_IRQ]);
>  
>      musicpal_binfo.ram_size = MP_RAM_DEFAULT_SIZE;
> -    musicpal_binfo.kernel_filename = kernel_filename;
> -    musicpal_binfo.kernel_cmdline = kernel_cmdline;
> -    musicpal_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(cpu, &musicpal_binfo);
> +    arm_load_kernel(cpu, machine, &musicpal_binfo);
>  }
>  
>  static void musicpal_machine_init(MachineClass *mc)
> diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
> index 4a79f5c88b..31dd2f1b51 100644
> --- a/hw/arm/nseries.c
> +++ b/hw/arm/nseries.c
> @@ -1358,10 +1358,7 @@ static void n8x0_init(MachineState *machine,
>  
>      if (machine->kernel_filename) {
>          /* Or at the linux loader.  */
> -        binfo->kernel_filename = machine->kernel_filename;
> -        binfo->kernel_cmdline = machine->kernel_cmdline;
> -        binfo->initrd_filename = machine->initrd_filename;
> -        arm_load_kernel(s->mpu->cpu, binfo);
> +        arm_load_kernel(s->mpu->cpu, machine, binfo);
>  
>          qemu_register_reset(n8x0_boot_init, s);
>      }
> diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
> index cae78d0a36..3cc2817f06 100644
> --- a/hw/arm/omap_sx1.c
> +++ b/hw/arm/omap_sx1.c
> @@ -196,10 +196,7 @@ static void sx1_init(MachineState *machine, const int version)
>      }
>  
>      /* Load the kernel.  */
> -    sx1_binfo.kernel_filename = machine->kernel_filename;
> -    sx1_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sx1_binfo.initrd_filename = machine->initrd_filename;
> -    arm_load_kernel(mpu->cpu, &sx1_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &sx1_binfo);
>  
>      /* TODO: fix next line */
>      //~ qemu_console_resize(ds, 640, 480);
> diff --git a/hw/arm/palm.c b/hw/arm/palm.c
> index 9eb9612bce..67ab30b5bc 100644
> --- a/hw/arm/palm.c
> +++ b/hw/arm/palm.c
> @@ -186,9 +186,6 @@ static struct arm_boot_info palmte_binfo = {
>  
>  static void palmte_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      struct omap_mpu_state_s *mpu;
>      int flash_size = 0x00800000;
> @@ -248,16 +245,13 @@ static void palmte_init(MachineState *machine)
>          }
>      }
>  
> -    if (!rom_loaded && !kernel_filename && !qtest_enabled()) {
> +    if (!rom_loaded && !machine->kernel_filename && !qtest_enabled()) {
>          fprintf(stderr, "Kernel or ROM image must be specified\n");
>          exit(1);
>      }
>  
>      /* Load the kernel.  */
> -    palmte_binfo.kernel_filename = kernel_filename;
> -    palmte_binfo.kernel_cmdline = kernel_cmdline;
> -    palmte_binfo.initrd_filename = initrd_filename;
> -    arm_load_kernel(mpu->cpu, &palmte_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &palmte_binfo);
>  }
>  
>  static void palmte_machine_init(MachineClass *mc)
> diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
> index 8c249fcabb..b6d78e6ff3 100644
> --- a/hw/arm/raspi.c
> +++ b/hw/arm/raspi.c
> @@ -158,13 +158,9 @@ static void setup_boot(MachineState *machine, int version, size_t ram_size)
>  
>          binfo.entry = firmware_addr;
>          binfo.firmware_loaded = true;
> -    } else {
> -        binfo.kernel_filename = machine->kernel_filename;
> -        binfo.kernel_cmdline = machine->kernel_cmdline;
> -        binfo.initrd_filename = machine->initrd_filename;
>      }
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &binfo);
>  }
>  
>  static void raspi_init(MachineState *machine, int version)
> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
> index d42a76e7a1..3876b4acae 100644
> --- a/hw/arm/realview.c
> +++ b/hw/arm/realview.c
> @@ -350,13 +350,10 @@ static void realview_init(MachineState *machine,
>      memory_region_add_subregion(sysmem, SMP_BOOT_ADDR, ram_hack);
>  
>      realview_binfo.ram_size = ram_size;
> -    realview_binfo.kernel_filename = machine->kernel_filename;
> -    realview_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    realview_binfo.initrd_filename = machine->initrd_filename;
>      realview_binfo.nb_cpus = smp_cpus;
>      realview_binfo.board_id = realview_board_id[board_type];
>      realview_binfo.loader_start = (board_type == BOARD_PB_A8 ? 0x70000000 : 0);
> -    arm_load_kernel(ARM_CPU(first_cpu), &realview_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &realview_binfo);
>  }
>  
>  static void realview_eb_init(MachineState *machine)
> diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
> index f1b00de229..81547dec98 100644
> --- a/hw/arm/sabrelite.c
> +++ b/hw/arm/sabrelite.c
> @@ -103,16 +103,13 @@ static void sabrelite_init(MachineState *machine)
>      }
>  
>      sabrelite_binfo.ram_size = machine->ram_size;
> -    sabrelite_binfo.kernel_filename = machine->kernel_filename;
> -    sabrelite_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    sabrelite_binfo.initrd_filename = machine->initrd_filename;
>      sabrelite_binfo.nb_cpus = smp_cpus;
>      sabrelite_binfo.secure_boot = true;
>      sabrelite_binfo.write_secondary_boot = sabrelite_write_secondary;
>      sabrelite_binfo.secondary_cpu_reset_hook = sabrelite_reset_secondary;
>  
>      if (!qtest_enabled()) {
> -        arm_load_kernel(&s->soc.cpu[0], &sabrelite_binfo);
> +        arm_load_kernel(&s->soc.cpu[0], machine, &sabrelite_binfo);
>      }
>  }
>  
> diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
> index 723cf5d592..42338696b3 100644
> --- a/hw/arm/spitz.c
> +++ b/hw/arm/spitz.c
> @@ -951,11 +951,8 @@ static void spitz_common_init(MachineState *machine,
>          /* A 4.0 GB microdrive is permanently sitting in CF slot 0.  */
>          spitz_microdrive_attach(mpu, 0);
>  
> -    spitz_binfo.kernel_filename = machine->kernel_filename;
> -    spitz_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    spitz_binfo.initrd_filename = machine->initrd_filename;
>      spitz_binfo.board_id = arm_id;
> -    arm_load_kernel(mpu->cpu, &spitz_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &spitz_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>  
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index 7843d68d46..3a1de81278 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -218,9 +218,6 @@ static struct arm_boot_info tosa_binfo = {
>  
>  static void tosa_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *rom = g_new(MemoryRegion, 1);
>      PXA2xxState *mpu;
> @@ -245,11 +242,8 @@ static void tosa_init(MachineState *machine)
>  
>      tosa_tg_init(mpu);
>  
> -    tosa_binfo.kernel_filename = kernel_filename;
> -    tosa_binfo.kernel_cmdline = kernel_cmdline;
> -    tosa_binfo.initrd_filename = initrd_filename;
>      tosa_binfo.board_id = 0x208;
> -    arm_load_kernel(mpu->cpu, &tosa_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &tosa_binfo);
>      sl_bootparam_write(SL_PXA_PARAM_BASE);
>  }
>  
> diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
> index f471fb7025..b95110ae2d 100644
> --- a/hw/arm/versatilepb.c
> +++ b/hw/arm/versatilepb.c
> @@ -374,11 +374,8 @@ static void versatile_init(MachineState *machine, int board_id)
>      }
>  
>      versatile_binfo.ram_size = machine->ram_size;
> -    versatile_binfo.kernel_filename = machine->kernel_filename;
> -    versatile_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    versatile_binfo.initrd_filename = machine->initrd_filename;
>      versatile_binfo.board_id = board_id;
> -    arm_load_kernel(cpu, &versatile_binfo);
> +    arm_load_kernel(cpu, machine, &versatile_binfo);
>  }
>  
>  static void vpb_init(MachineState *machine)
> diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
> index 2b3b0c2334..16f0382731 100644
> --- a/hw/arm/vexpress.c
> +++ b/hw/arm/vexpress.c
> @@ -703,9 +703,6 @@ static void vexpress_common_init(MachineState *machine)
>      }
>  
>      daughterboard->bootinfo.ram_size = machine->ram_size;
> -    daughterboard->bootinfo.kernel_filename = machine->kernel_filename;
> -    daughterboard->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    daughterboard->bootinfo.initrd_filename = machine->initrd_filename;
>      daughterboard->bootinfo.nb_cpus = smp_cpus;
>      daughterboard->bootinfo.board_id = VEXPRESS_BOARD_ID;
>      daughterboard->bootinfo.loader_start = daughterboard->loader_start;
> @@ -715,7 +712,7 @@ static void vexpress_common_init(MachineState *machine)
>      daughterboard->bootinfo.modify_dtb = vexpress_modify_dtb;
>      /* When booting Linux we should be in secure state if the CPU has one. */
>      daughterboard->bootinfo.secure_boot = vms->secure;
> -    arm_load_kernel(ARM_CPU(first_cpu), &daughterboard->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &daughterboard->bootinfo);
>  }
>  
>  static bool vexpress_get_secure(Object *obj, Error **errp)
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index bf54f10b51..e2ce7a2841 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1358,6 +1358,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>  {
>      VirtMachineState *vms = container_of(notifier, VirtMachineState,
>                                           machine_done);
> +    MachineState *ms = MACHINE(vms);
>      ARMCPU *cpu = ARM_CPU(first_cpu);
>      struct arm_boot_info *info = &vms->bootinfo;
>      AddressSpace *as = arm_boot_address_space(cpu, info);
> @@ -1375,7 +1376,7 @@ void virt_machine_done(Notifier *notifier, void *data)
>                                         vms->memmap[VIRT_PLATFORM_BUS].size,
>                                         vms->irqmap[VIRT_PLATFORM_BUS]);
>      }
> -    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
> +    if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
>          exit(1);
>      }
>  
> @@ -1699,16 +1700,13 @@ static void machvirt_init(MachineState *machine)
>      create_platform_bus(vms, pic);
>  
>      vms->bootinfo.ram_size = machine->ram_size;
> -    vms->bootinfo.kernel_filename = machine->kernel_filename;
> -    vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> -    vms->bootinfo.initrd_filename = machine->initrd_filename;
>      vms->bootinfo.nb_cpus = smp_cpus;
>      vms->bootinfo.board_id = -1;
>      vms->bootinfo.loader_start = vms->memmap[VIRT_MEM].base;
>      vms->bootinfo.get_dtb = machvirt_dtb;
>      vms->bootinfo.skip_dtb_autoload = true;
>      vms->bootinfo.firmware_loaded = firmware_loaded;
> -    arm_load_kernel(ARM_CPU(first_cpu), &vms->bootinfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &vms->bootinfo);
>  
>      vms->machine_done.notify = virt_machine_done;
>      qemu_add_machine_init_done_notifier(&vms->machine_done);
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 198e3f9763..2487bd7ea5 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -159,9 +159,6 @@ static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>  static void zynq_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      ARMCPU *cpu;
>      MemoryRegion *address_space_mem = get_system_memory();
>      MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
> @@ -304,16 +301,13 @@ static void zynq_init(MachineState *machine)
>      sysbus_mmio_map(busdev, 0, 0xF8007000);
>  
>      zynq_binfo.ram_size = ram_size;
> -    zynq_binfo.kernel_filename = kernel_filename;
> -    zynq_binfo.kernel_cmdline = kernel_cmdline;
> -    zynq_binfo.initrd_filename = initrd_filename;
>      zynq_binfo.nb_cpus = 1;
>      zynq_binfo.board_id = 0xd32;
>      zynq_binfo.loader_start = 0;
>      zynq_binfo.board_setup_addr = BOARD_SETUP_ADDR;
>      zynq_binfo.write_board_setup = zynq_write_board_setup;
>  
> -    arm_load_kernel(ARM_CPU(first_cpu), &zynq_binfo);
> +    arm_load_kernel(ARM_CPU(first_cpu), machine, &zynq_binfo);
>  }
>  
>  static void zynq_machine_init(MachineClass *mc)
> diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
> index f95fde2309..462493c467 100644
> --- a/hw/arm/xlnx-versal-virt.c
> +++ b/hw/arm/xlnx-versal-virt.c
> @@ -441,14 +441,11 @@ static void versal_virt_init(MachineState *machine)
>                                          0, &s->soc.fpd.apu.mr, 0);
>  
>      s->binfo.ram_size = machine->ram_size;
> -    s->binfo.kernel_filename = machine->kernel_filename;
> -    s->binfo.kernel_cmdline = machine->kernel_cmdline;
> -    s->binfo.initrd_filename = machine->initrd_filename;
>      s->binfo.loader_start = 0x0;
>      s->binfo.get_dtb = versal_virt_get_dtb;
>      s->binfo.modify_dtb = versal_virt_modify_dtb;
>      if (machine->kernel_filename) {
> -        arm_load_kernel(s->soc.fpd.apu.cpu[0], &s->binfo);
> +        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
>      } else {
>          AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
>                                                    &s->binfo);
> @@ -457,7 +454,7 @@ static void versal_virt_init(MachineState *machine)
>          s->binfo.loader_start = 0x1000;
>          s->binfo.dtb_limit = 0x1000000;
>          if (arm_load_dtb(s->binfo.loader_start,
> -                         &s->binfo, s->binfo.dtb_limit, as) < 0) {
> +                         &s->binfo, s->binfo.dtb_limit, as, machine) < 0) {
>              exit(EXIT_FAILURE);
>          }
>      }
> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
> index c802f26fbd..6a455f8d49 100644
> --- a/hw/arm/xlnx-zcu102.c
> +++ b/hw/arm/xlnx-zcu102.c
> @@ -172,11 +172,8 @@ static void xlnx_zcu102_init(MachineState *machine)
>      /* TODO create and connect IDE devices for ide_drive_get() */
>  
>      xlnx_zcu102_binfo.ram_size = ram_size;
> -    xlnx_zcu102_binfo.kernel_filename = machine->kernel_filename;
> -    xlnx_zcu102_binfo.kernel_cmdline = machine->kernel_cmdline;
> -    xlnx_zcu102_binfo.initrd_filename = machine->initrd_filename;
>      xlnx_zcu102_binfo.loader_start = 0;
> -    arm_load_kernel(s->soc.boot_cpu_ptr, &xlnx_zcu102_binfo);
> +    arm_load_kernel(s->soc.boot_cpu_ptr, machine, &xlnx_zcu102_binfo);
>  }
>  
>  static void xlnx_zcu102_machine_instance_init(Object *obj)
> diff --git a/hw/arm/z2.c b/hw/arm/z2.c
> index 44aa748d39..2f21421683 100644
> --- a/hw/arm/z2.c
> +++ b/hw/arm/z2.c
> @@ -296,9 +296,6 @@ static const TypeInfo aer915_info = {
>  
>  static void z2_init(MachineState *machine)
>  {
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -    const char *initrd_filename = machine->initrd_filename;
>      MemoryRegion *address_space_mem = get_system_memory();
>      uint32_t sector_len = 0x10000;
>      PXA2xxState *mpu;
> @@ -352,11 +349,8 @@ static void z2_init(MachineState *machine)
>      qdev_connect_gpio_out(mpu->gpio, Z2_GPIO_LCD_CS,
>                            qemu_allocate_irq(z2_lcd_cs, z2_lcd, 0));
>  
> -    z2_binfo.kernel_filename = kernel_filename;
> -    z2_binfo.kernel_cmdline = kernel_cmdline;
> -    z2_binfo.initrd_filename = initrd_filename;
>      z2_binfo.board_id = 0x6dd;
> -    arm_load_kernel(mpu->cpu, &z2_binfo);
> +    arm_load_kernel(mpu->cpu, machine, &z2_binfo);
>  }
>  
>  static void z2_machine_init(MachineClass *mc)
> diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
> index c48cc4c2bc..2673abe81f 100644
> --- a/include/hw/arm/boot.h
> +++ b/include/hw/arm/boot.h
> @@ -133,7 +133,7 @@ struct arm_boot_info {
>   * before sysbus-fdt arm_register_platform_bus_fdt_creator. Indeed the
>   * machine init done notifiers are called in registration reverse order.
>   */
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
> +void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info);
>  
>  AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>                                       const struct arm_boot_info *info);
> @@ -160,7 +160,7 @@ AddressSpace *arm_boot_address_space(ARMCPU *cpu,
>   * Note: Must not be called unless have_dtb(binfo) is true.
>   */
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
> -                 hwaddr addr_limit, AddressSpace *as);
> +                 hwaddr addr_limit, AddressSpace *as, MachineState *ms);
>  
>  /* Write a secure board setup routine with a dummy handler for SMCs */
>  void arm_write_secure_board_setup_dummy_smc(ARMCPU *cpu,



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT Tao Xu
@ 2019-06-27 15:56   ` Jonathan Cameron
  2019-07-01  0:58     ` Tao Xu
  2019-07-01 11:25   ` Igor Mammedov
  1 sibling, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2019-06-27 15:56 UTC (permalink / raw)
  To: Tao Xu; +Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, imammedo

On Fri, 14 Jun 2019 23:56:24 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table (HMAT).
> The specification references below link:
> http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> It describes the memory attributes, such as memory side cache
> attributes and bandwidth and latency details, related to the
> System Physical Address (SPA) Memory Ranges. The software is
> expected to use this information as hint for optimization.
> 
> This structure describes the System Physical Address(SPA) range
> occupied by memory subsystem and its associativity with processor
> proximity domain as well as hint for memory usage.
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

Hi Tao,

Apologies if I missed an earlier discussion on this...

It's probably not letting an secrets out to say that there are very few
real hardware systems out there using the 6.2 version of HMAT.

Does it make sense to implement it rather than the somewhat tidied
up version in ACPI 6.3?

I would go so far as to say that one of the pushes behind making those
changes was that it shouldn't have much impact as no one was shipping
a firmware using the 6.2 version.  So any chance we can avoid
qemu effectively doing so, or at least defaulting to doing so?

I'm entirely in favor of the patch set in general btw as it's much
more useful than having to override with a hand crafted table, when
wanting to test unusual topologies.

Thanks,

Jonathan 

> ---
> 
> Changes in v5 -> v4:
>     - Add more descriptions from ACPI spec (Igor)
>     - Remove all the dependcy on PCMachineState (Igor)
> ---
>  hw/acpi/Kconfig       |   5 ++
>  hw/acpi/Makefile.objs |   1 +
>  hw/acpi/hmat.c        | 153 ++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/hmat.h        |  43 ++++++++++++
>  hw/core/machine.c     |   2 +
>  hw/i386/acpi-build.c  |   3 +
>  include/sysemu/numa.h |   2 +
>  numa.c                |   6 ++
>  8 files changed, 215 insertions(+)
>  create mode 100644 hw/acpi/hmat.c
>  create mode 100644 hw/acpi/hmat.h
> 
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 7c59cf900b..039bb99efa 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -7,6 +7,7 @@ config ACPI_X86
>      select ACPI_NVDIMM
>      select ACPI_CPU_HOTPLUG
>      select ACPI_MEMORY_HOTPLUG
> +    select ACPI_HMAT
>  
>  config ACPI_X86_ICH
>      bool
> @@ -31,3 +32,7 @@ config ACPI_VMGENID
>      bool
>      default y
>      depends on PC
> +
> +config ACPI_HMAT
> +    bool
> +    depends on ACPI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 661a9b8c2f..20cc2fb124 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> +common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
>  
>  common-obj-y += acpi_interface.o
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> new file mode 100644
> index 0000000000..6fd434c4d9
> --- /dev/null
> +++ b/hw/acpi/hmat.c
> @@ -0,0 +1,153 @@
> +/*
> + * HMAT ACPI Implementation
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table
> + * (HMAT)
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/hmat.h"
> +#include "hw/mem/pc-dimm.h"
> +
> +/* ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure: Table 5-141 */
> +static void build_hmat_spa(GArray *table_data, uint16_t flags,
> +                           uint64_t base, uint64_t length, int node)
> +{
> +
> +    /* Memory Subsystem Address Range Structure */
> +    /* Type */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 40, 4);
> +    /* Flags */
> +    build_append_int_noprefix(table_data, flags, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Process Proximity Domain */
> +    build_append_int_noprefix(table_data, node, 4);
> +    /* Memory Proximity Domain */
> +    build_append_int_noprefix(table_data, node, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /* System Physical Address Range Base */

These got scrapped in ACPI 6.3 because they didn't actually provide
any useful information that isn't always available from somewhere
else (SRAT mainly).

> +    build_append_int_noprefix(table_data, base, 8);
> +    /* System Physical Address Range Length */
> +    build_append_int_noprefix(table_data, length, 8);
> +}
> +
> +static int pc_dimm_device_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
> +        DeviceState *dev = DEVICE(obj);
> +        if (dev->realized) { /* only realized memory devices matter */
> +            *list = g_slist_append(*list, DEVICE(obj));
> +        }
> +    }
> +
> +    object_child_foreach(obj, pc_dimm_device_list, opaque);
> +    return 0;
> +}
> +
> +/* Build HMAT sub table structures */
> +static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
> +{
> +    GSList *device_list = NULL;
> +    uint16_t flags;
> +    uint64_t mem_base, mem_len;
> +    int i;
> +    NumaState *nstat = ms->numa_state;
> +    NumaMemRange *mem_range;
> +
> +    Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> +    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
> +    AcpiDeviceIf *adev = ACPI_DEVICE_IF(obj);
> +
> +    /*
> +     * ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure:
> +     * Table 5-141. The Proximity Domain of System Physical Address
> +     * ranges defined in the HMAT, NFIT and SRAT tables shall match
> +     * each other.
> +     */
> +    if (nstat->num_nodes && !nstat->mem_ranges_num) {
> +        nstat->mem_ranges = g_array_new(false, true /* clear */,
> +                                        sizeof *mem_range);
> +        adevc->build_mem_ranges(adev, ms);
> +    }
> +
> +    for (i = 0; i < nstat->mem_ranges_num; i++) {
> +        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
> +        flags = 0;
> +
> +        if (nstat->nodes[mem_range->node].is_initiator) {
> +            flags |= HMAT_SPA_PROC_VALID;
> +        }
> +        if (nstat->nodes[mem_range->node].is_target) {
> +            flags |= HMAT_SPA_MEM_VALID;
> +        }
> +
> +        build_hmat_spa(table_data, flags, mem_range->base,
> +                       mem_range->length,
> +                       mem_range->node);
> +    }
> +
> +    /* Build HMAT SPA structures for PC-DIMM devices. */
> +    object_child_foreach(OBJECT(ms), pc_dimm_device_list, &device_list);
> +
> +    for (; device_list; device_list = device_list->next) {
> +        PCDIMMDevice *dimm = device_list->data;
> +        mem_base = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> +                                            NULL);
> +        mem_len = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP,
> +                                           NULL);
> +        i = object_property_get_uint(OBJECT(dimm), PC_DIMM_NODE_PROP, NULL);
> +        flags = 0;
> +
> +        if (nstat->nodes[i].is_initiator) {
> +            flags |= HMAT_SPA_PROC_VALID;
> +        }
> +        if (nstat->nodes[i].is_target) {
> +            flags |= HMAT_SPA_MEM_VALID;
> +        }
> +        build_hmat_spa(table_data, flags, mem_base, mem_len, i);
> +    }
> +}
> +
> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms)
> +{
> +    uint64_t hmat_start;
> +
> +    hmat_start = table_data->len;
> +
> +    /* reserve space for HMAT header  */
> +    acpi_data_push(table_data, 40);
> +
> +    hmat_build_table_structs(table_data, ms);
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + hmat_start),
> +                 "HMAT", table_data->len - hmat_start, 1, NULL, NULL);
> +}
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> new file mode 100644
> index 0000000000..e24b673fad
> --- /dev/null
> +++ b/hw/acpi/hmat.h
> @@ -0,0 +1,43 @@
> +/*
> + * HMAT ACPI Implementation Header
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.2.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#ifndef HMAT_H
> +#define HMAT_H
> +
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/acpi/aml-build.h"
> +
> +/* the values of AcpiHmatSpaRange flag */
> +enum {
> +    HMAT_SPA_PROC_VALID       = 0x1,
> +    HMAT_SPA_MEM_VALID        = 0x2,
> +    HMAT_SPA_RESERVATION_HINT = 0x4,

Only the first bit ended up being kept for ACPI 6.3.

> +};
> +
> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms);
> +
> +#endif
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 14b29de0a9..2ad09ec23e 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -646,6 +646,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>                                 const CpuInstanceProperties *props, Error **errp)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>      bool match = false;
>      int i;
>  
> @@ -706,6 +707,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>          match = true;
>          slot->props.node_id = props->node_id;
>          slot->props.has_node_id = props->has_node_id;
> +        numa_info[props->node_id].is_initiator = true;
>      }
>  
>      if (!match) {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 44dd447fa5..6584eac76e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -66,6 +66,7 @@
>  #include "hw/i386/intel_iommu.h"
>  
>  #include "hw/acpi/ipmi.h"
> +#include "hw/acpi/hmat.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>              acpi_add_table(table_offsets, tables_blob);
>              build_slit(tables_blob, tables->linker, machine);
>          }
> +        acpi_add_table(table_offsets, tables_blob);
> +        build_hmat(tables_blob, tables->linker, machine);
>      }
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index e3c85b77bc..13cff59112 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -10,6 +10,8 @@ struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
> +    bool is_initiator;
> +    bool is_target;
>      uint8_t distance[MAX_NODES];
>  };
>  
> diff --git a/numa.c b/numa.c
> index d23e130bce..5556d118c3 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -102,6 +102,10 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          }
>      }
>  
> +    if (node->cpus) {
> +        numa_info[nodenr].is_initiator = true;
> +    }
> +
>      if (node->has_mem && node->has_memdev) {
>          error_setg(errp, "cannot specify both mem= and memdev=");
>          return;
> @@ -118,6 +122,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>  
>      if (node->has_mem) {
>          numa_info[nodenr].node_mem = node->mem;
> +        numa_info[nodenr].is_target = true;
>      }
>      if (node->has_memdev) {
>          Object *o;
> @@ -130,6 +135,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          object_ref(o);
>          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> +        numa_info[nodenr].is_target = true;
>      }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState Tao Xu
@ 2019-06-28 11:02   ` Igor Mammedov
  2019-07-01  1:57     ` Tao Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2019-06-28 11:02 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:20 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> Add struct NumaState in MachineState and move existing numa global
> nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
> numa_support into MachineClass to decide which submachines support NUMA.
> 
> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
you are not supposed to keep Reviewed-bys on respin
unless changes that were made are trivial.
(
PS:
 it applies to the whole series
)

> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>

In many places you are following pattern: 
  int nb_numa_nodes = ms->numa_state->num_nodes;

even if local variable is only used once or twice within the function.
Pls use ms->numa_state->num_nodes directly instead if it doesn't hurt
readability.

more comments below...
> ---
> 
> Changes in v5 -> v4:
>     - drop the helper machine_num_numa_nodes() and use
>     machine->numa_state->num_nodes directly (Igor)
>     - remove the unnecessary header include (Igor)
> ---
>  exec.c                              |  5 ++-
>  hw/acpi/aml-build.c                 |  3 +-
>  hw/arm/boot.c                       |  4 +-
>  hw/arm/virt-acpi-build.c            |  8 +++-
>  hw/arm/virt.c                       |  5 ++-
>  hw/core/machine.c                   | 14 +++++--
>  hw/i386/acpi-build.c                |  2 +-
>  hw/i386/pc.c                        |  7 +++-
>  hw/mem/pc-dimm.c                    |  2 +
>  hw/pci-bridge/pci_expander_bridge.c |  2 +
>  hw/ppc/spapr.c                      | 19 +++++++---
>  hw/ppc/spapr_pci.c                  |  1 +
>  include/hw/acpi/aml-build.h         |  2 +-
>  include/hw/boards.h                 |  2 +
>  include/sysemu/numa.h               | 13 +++++--
>  monitor.c                           | 11 +++++-
>  numa.c                              | 59 ++++++++++++++++++-----------
>  17 files changed, 112 insertions(+), 47 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 4e734770c2..c7eb4af42d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1733,6 +1733,7 @@ long qemu_minrampagesize(void)
>      long hpsize = LONG_MAX;
>      long mainrampagesize;
>      Object *memdev_root;
> +    MachineState *ms = MACHINE(qdev_get_machine());
>  
>      mainrampagesize = qemu_mempath_getpagesize(mem_path);
>  
> @@ -1760,7 +1761,9 @@ long qemu_minrampagesize(void)
>       * so if its page size is smaller we have got to report that size instead.
>       */
>      if (hpsize > mainrampagesize &&
> -        (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL)) {
> +        (ms->numa_state == NULL ||
> +         ms->numa_state->num_nodes == 0 ||
> +         numa_info[0].node_memdev == NULL)) {
>          static bool warned;
>          if (!warned) {
>              error_report("Huge page support disabled (n/a for main memory).");
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 555c24f21d..63c1cae8c9 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1726,10 +1726,11 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>   * ACPI spec 5.2.17 System Locality Distance Information Table
>   * (Revision 2.0 or later)
>   */
> -void build_slit(GArray *table_data, BIOSLinker *linker)
> +void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>  {
>      int slit_start, i, j;
>      slit_start = table_data->len;
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      acpi_data_push(table_data, sizeof(AcpiTableHeader));
>  
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 30acdbe824..2af881e0f4 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -597,9 +597,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      }
>      g_strfreev(node_path);
>  
> -    if (nb_numa_nodes > 0) {
> +    if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>          mem_base = binfo->loader_start;
> -        for (i = 0; i < nb_numa_nodes; i++) {
> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              mem_len = numa_info[i].node_mem;
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4a64f9985c..9a22ce679c 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -517,7 +517,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      int i, srat_start;
>      uint64_t mem_base;
>      MachineClass *mc = MACHINE_GET_CLASS(vms);
> -    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
> +    MachineState *ms = MACHINE(vms);
> +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(ms);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      srat_start = table_data->len;
>      srat = acpi_data_push(table_data, sizeof(*srat));
> @@ -759,6 +761,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      GArray *table_offsets;
>      unsigned dsdt, xsdt;
>      GArray *tables_blob = tables->table_data;
> +    MachineState *ms = MACHINE(vms);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      table_offsets = g_array_new(false, true /* clear */,
>                                          sizeof(uint32_t));
> @@ -798,7 +802,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>          build_srat(tables_blob, tables->linker, vms);
>          if (have_numa_distance) {
>              acpi_add_table(table_offsets, tables_blob);
> -            build_slit(tables_blob, tables->linker);
> +            build_slit(tables_blob, tables->linker, ms);
>          }
>      }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index e2ce7a2841..025ad484c5 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -196,6 +196,8 @@ static bool cpu_type_valid(const char *cpu)
>  
>  static void create_fdt(VirtMachineState *vms)
>  {
> +    MachineState *ms = MACHINE(vms);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>      void *fdt = create_device_tree(&vms->fdt_size);
>  
>      if (!fdt) {
> @@ -1834,7 +1836,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  
>  static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>  {
> -    return idx % nb_numa_nodes;
> +    return idx % ms->numa_state->num_nodes;
>  }
>  
>  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> @@ -1940,6 +1942,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->kvm_type = virt_kvm_type;
>      assert(!mc->get_hotplug_handler);
>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
> +    mc->numa_supported = true;
>      hc->plug = virt_machine_device_plug_cb;
>  }
>  
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index f1a0f45f9c..14b29de0a9 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -877,6 +877,9 @@ static void machine_initfn(Object *obj)
>                                          NULL);
>      }
>  
> +    if (mc->numa_supported) {
> +        ms->numa_state = g_new0(NumaState, 1);
> +    }
>  
>      /* Register notifier when init is done for sysbus sanity checks */
>      ms->sysbus_notifier.notify = machine_init_notify;
> @@ -897,6 +900,7 @@ static void machine_finalize(Object *obj)
>      g_free(ms->firmware);
>      g_free(ms->device_memory);
>      g_free(ms->nvdimms_state);
> +    g_free(ms->numa_state);
>  }
>  
>  bool machine_usb(MachineState *machine)
> @@ -968,7 +972,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
>  
> -    assert(nb_numa_nodes);
> +    assert(machine->numa_state->num_nodes);
>      for (i = 0; i < possible_cpus->len; i++) {
>          if (possible_cpus->cpus[i].props.has_node_id) {
>              break;
> @@ -1014,9 +1018,11 @@ void machine_run_board_init(MachineState *machine)
>  {
>      MachineClass *machine_class = MACHINE_GET_CLASS(machine);
>  
> -    numa_complete_configuration(machine);
> -    if (nb_numa_nodes) {
> -        machine_numa_finish_cpu_init(machine);
> +    if (machine_class->numa_supported) {
> +        numa_complete_configuration(machine);
> +        if (machine->numa_state->num_nodes) {
> +            machine_numa_finish_cpu_init(machine);
> +        }
>      }
>  
>      /* If the machine supports the valid_cpu_types check and the user
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 85dc1640bc..0d58335560 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2669,7 +2669,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>          build_srat(tables_blob, tables->linker, machine);
>          if (have_numa_distance) {
>              acpi_add_table(table_offsets, tables_blob);
> -            build_slit(tables_blob, tables->linker);
> +            build_slit(tables_blob, tables->linker, machine);
>          }
>      }
>      if (acpi_get_mcfg(&mcfg)) {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 1b08b56362..5bab78e137 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -997,6 +997,8 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>      int i;
>      const CPUArchIdList *cpus;
>      MachineClass *mc = MACHINE_GET_CLASS(pcms);
> +    MachineState *ms = MACHINE(pcms);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
>      fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> @@ -1673,6 +1675,8 @@ void pc_machine_done(Notifier *notifier, void *data)
>  void pc_guest_info_init(PCMachineState *pcms)
>  {
>      int i;
> +    MachineState *ms = MACHINE(pcms);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      pcms->apic_xrupt_override = kvm_allows_irq0_override();
>      pcms->numa_nodes = nb_numa_nodes;
> @@ -2656,7 +2660,7 @@ static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
>     assert(idx < ms->possible_cpus->len);
>     x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
>                              smp_cores, smp_threads, &topo);
> -   return topo.pkg_id % nb_numa_nodes;
> +   return topo.pkg_id % ms->numa_state->num_nodes;
>  }
>  
>  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> @@ -2750,6 +2754,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      nc->nmi_monitor_handler = x86_nmi;
>      mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
>      mc->nvdimm_supported = true;
> +    mc->numa_supported = true;
>  
>      object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
>          pc_machine_get_device_memory_region_size, NULL,
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 152400b1fc..19e7626590 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -160,6 +160,8 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
>  {
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      if (!dimm->hostmem) {
>          error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
> diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
> index ca66bc721a..a76a00a6d5 100644
> --- a/hw/pci-bridge/pci_expander_bridge.c
> +++ b/hw/pci-bridge/pci_expander_bridge.c
> @@ -211,6 +211,8 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
>      PCIBus *bus;
>      const char *dev_name = NULL;
>      Error *local_err = NULL;
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      if (pxb->numa_node != NUMA_NODE_UNASSIGNED &&
>          pxb->numa_node >= nb_numa_nodes) {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e2b33e5890..07a02db99e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -290,6 +290,8 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
>      CPUState *cs;
>      char cpu_model[32];
>      uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> +    MachineState *ms = MACHINE(spapr);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      CPU_FOREACH(cs) {
>          PowerPCCPU *cpu = POWERPC_CPU(cs);
> @@ -344,6 +346,7 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
>  
>  static hwaddr spapr_node0_size(MachineState *machine)
>  {
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>      if (nb_numa_nodes) {
>          int i;
>          for (i = 0; i < nb_numa_nodes; ++i) {
> @@ -391,18 +394,18 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
>  {
>      MachineState *machine = MACHINE(spapr);
>      hwaddr mem_start, node_size;
> -    int i, nb_nodes = nb_numa_nodes;
> +    int i;
>      NodeInfo *nodes = numa_info;
>      NodeInfo ramnode;
>  
>      /* No NUMA nodes, assume there is just one node with whole RAM */
> -    if (!nb_numa_nodes) {
> -        nb_nodes = 1;
> +    if (!machine->numa_state->num_nodes) {
> +        machine->numa_state->num_nodes = 1;
>          ramnode.node_mem = machine->ram_size;
>          nodes = &ramnode;
>      }
>  
> -    for (i = 0, mem_start = 0; i < nb_nodes; ++i) {
> +    for (i = 0, mem_start = 0; i < machine->numa_state->num_nodes; ++i) {
>          if (!nodes[i].node_mem) {
>              continue;
>          }
> @@ -444,6 +447,8 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
>      PowerPCCPU *cpu = POWERPC_CPU(cs);
>      CPUPPCState *env = &cpu->env;
>      PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
> +    MachineState *ms = MACHINE(spapr);
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>      int index = spapr_get_vcpu_id(cpu);
>      uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>                         0xffffffff, 0xffffffff};
> @@ -852,6 +857,7 @@ static int spapr_populate_drmem_v1(SpaprMachineState *spapr, void *fdt,
>  static int spapr_populate_drconf_memory(SpaprMachineState *spapr, void *fdt)
>  {
>      MachineState *machine = MACHINE(spapr);
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>      int ret, i, offset;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> @@ -1696,6 +1702,7 @@ static void spapr_machine_reset(void)
>  {
>      MachineState *machine = MACHINE(qdev_get_machine());
>      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>      PowerPCCPU *first_ppc_cpu;
>      uint32_t rtas_limit;
>      hwaddr rtas_addr, fdt_addr;
> @@ -2513,6 +2520,7 @@ static void spapr_create_lmb_dr_connectors(SpaprMachineState *spapr)
>  static void spapr_validate_node_memory(MachineState *machine, Error **errp)
>  {
>      int i;
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>  
>      if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
>          error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
> @@ -4115,7 +4123,7 @@ spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>  
>  static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
>  {
> -    return idx / smp_cores % nb_numa_nodes;
> +    return idx / smp_cores % ms->numa_state->num_nodes;
>  }
>  
>  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> @@ -4319,6 +4327,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->update_dt_enabled = true;
>      mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
>      mc->has_hotpluggable_cpus = true;
> +    mc->numa_supported = true;
>      smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
>      fwc->get_dev_path = spapr_get_fw_dev_path;
>      nc->nmi_monitor_handler = spapr_nmi;
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 9cf2c41b8c..d6fd018dd4 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1638,6 +1638,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      SysBusDevice *s = SYS_BUS_DEVICE(dev);
>      SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>      PCIHostState *phb = PCI_HOST_BRIDGE(s);
> +    MachineState *ms = MACHINE(spapr);
why do you do it?

>      char *namebuf;
>      int i;
>      PCIBus *bus;
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 1a563ad756..991cf05134 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -414,7 +414,7 @@ build_append_gas_from_struct(GArray *table, const struct AcpiGenericAddress *s)
>  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>                         uint64_t len, int node, MemoryAffinityFlags flags);
>  
> -void build_slit(GArray *table_data, BIOSLinker *linker);
> +void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
>  
>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>                  const char *oem_id, const char *oem_table_id);
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 6ff02bf3e4..8375a07940 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -210,6 +210,7 @@ struct MachineClass {
>      bool ignore_boot_device_suffixes;
>      bool smbus_no_migration_support;
>      bool nvdimm_supported;
> +    bool numa_supported;
>  
>      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>                                             DeviceState *dev);
> @@ -273,6 +274,7 @@ struct MachineState {
>      AccelState *accelerator;
>      CPUArchIdList *possible_cpus;
>      struct NVDIMMState *nvdimms_state;
> +    struct NumaState *numa_state;
>  };
>  
>  #define DEFINE_MACHINE(namestr, machine_initfn) \
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index b6ac7de43e..3c4b2d2909 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -6,7 +6,6 @@
>  #include "sysemu/hostmem.h"
>  #include "hw/boards.h"
>  
> -extern int nb_numa_nodes;   /* Number of NUMA nodes */
>  extern bool have_numa_distance;
>  
>  struct NodeInfo {
> @@ -16,15 +15,23 @@ struct NodeInfo {
>      uint8_t distance[MAX_NODES];
>  };
>  
> +extern NodeInfo numa_info[MAX_NODES];
> +
random move? 

>  struct NumaNodeMem {
>      uint64_t node_mem;
>      uint64_t node_plugged_mem;
>  };
>  
> -extern NodeInfo numa_info[MAX_NODES];
> +struct NumaState {
> +    /* Number of NUMA nodes */
> +    int num_nodes;
> +
> +};
> +typedef struct NumaState NumaState;
> +
>  void parse_numa_opts(MachineState *ms);
>  void numa_complete_configuration(MachineState *ms);
> -void query_numa_node_mem(NumaNodeMem node_mem[]);
> +void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
>  void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>                                   int nb_nodes, ram_addr_t size);
> diff --git a/monitor.c b/monitor.c
> index 6428eb3b7e..08ef28450e 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -1922,14 +1922,21 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
>  
>  static void hmp_info_numa(Monitor *mon, const QDict *qdict)
>  {
> -    int i;
> +    int i, nb_numa_nodes;
>      NumaNodeMem *node_mem;
>      CpuInfoList *cpu_list, *cpu;
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +
> +    if (ms->numa_state == NULL) {
> +        monitor_printf(mon, "%d nodes\n", 0);
> +        return;
> +    }
suggest not to duplicate monitor_printf,
something like that:

nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
if(!nb_numa_nodes)
  return;


>  
> +    nb_numa_nodes = ms->numa_state->num_nodes;
>      cpu_list = qmp_query_cpus(&error_abort);
>      node_mem = g_new0(NumaNodeMem, nb_numa_nodes);
>  
> -    query_numa_node_mem(node_mem);
> +    query_numa_node_mem(node_mem, ms);
>      monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
>      for (i = 0; i < nb_numa_nodes; i++) {
>          monitor_printf(mon, "node %d cpus:", i);
> diff --git a/numa.c b/numa.c
> index 955ec0c830..d678b71607 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -52,7 +52,6 @@ static int have_memdevs = -1;
>  static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
>                               * For all nodes, nodeid < max_numa_nodeid
>                               */
> -int nb_numa_nodes;
>  bool have_numa_distance;
>  NodeInfo numa_info[MAX_NODES];
>  
> @@ -68,7 +67,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>      if (node->has_nodeid) {
>          nodenr = node->nodeid;
>      } else {
> -        nodenr = nb_numa_nodes;
> +        nodenr = ms->numa_state->num_nodes;
>      }
>  
>      if (nodenr >= MAX_NODES) {
> @@ -136,10 +135,11 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>      }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> -    nb_numa_nodes++;
> +    ms->numa_state->num_nodes++;
>  }
>  
> -static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
> +static
> +void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>  {
>      uint16_t src = dist->src;
>      uint16_t dst = dist->dst;
> @@ -178,6 +178,12 @@ static
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +
> +    if (!mc->numa_supported) {
> +        error_setg(errp, "NUMA is not supported by this machine-type");
> +        goto end;
> +    }
>  
>      switch (object->type) {
>      case NUMA_OPTIONS_TYPE_NODE:
> @@ -187,7 +193,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>          }
>          break;
>      case NUMA_OPTIONS_TYPE_DIST:
> -        parse_numa_distance(&object->u.dist, &err);
> +        parse_numa_distance(ms, &object->u.dist, &err);
>          if (err) {
>              goto end;
>          }
> @@ -252,10 +258,11 @@ end:
>   * distance from a node to itself is always NUMA_DISTANCE_MIN,
>   * so providing it is never necessary.
>   */
> -static void validate_numa_distance(void)
> +static void validate_numa_distance(MachineState *ms)
>  {
>      int src, dst;
>      bool is_asymmetrical = false;
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      for (src = 0; src < nb_numa_nodes; src++) {
>          for (dst = src; dst < nb_numa_nodes; dst++) {
> @@ -293,9 +300,10 @@ static void validate_numa_distance(void)
>      }
>  }
>  
> -static void complete_init_numa_distance(void)
> +static void complete_init_numa_distance(MachineState *ms)
>  {
>      int src, dst;
> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      /* Fixup NUMA distance by symmetric policy because if it is an
>       * asymmetric distance table, it should be a complete table and
> @@ -369,7 +377,7 @@ void numa_complete_configuration(MachineState *ms)
>       *
>       * Enable NUMA implicitly by adding a new NUMA node automatically.
>       */
> -    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
> +    if (ms->ram_slots > 0 && ms->numa_state->num_nodes == 0 &&
>          mc->auto_enable_numa_with_memhp) {
>              NumaNodeOptions node = { };
>              parse_numa_node(ms, &node, &error_abort);
> @@ -387,30 +395,33 @@ void numa_complete_configuration(MachineState *ms)
>      }
>  
>      /* This must be always true if all nodes are present: */
> -    assert(nb_numa_nodes == max_numa_nodeid);
> +    assert(ms->numa_state->num_nodes == max_numa_nodeid);
>  
> -    if (nb_numa_nodes > 0) {
> +    if (ms->numa_state->num_nodes > 0) {
>          uint64_t numa_total;
>  
> -        if (nb_numa_nodes > MAX_NODES) {
> -            nb_numa_nodes = MAX_NODES;
> +        if (ms->numa_state->num_nodes > MAX_NODES) {
> +            ms->numa_state->num_nodes = MAX_NODES;
>          }
>  
>          /* If no memory size is given for any node, assume the default case
>           * and distribute the available memory equally across all nodes
>           */
> -        for (i = 0; i < nb_numa_nodes; i++) {
> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              if (numa_info[i].node_mem != 0) {
>                  break;
>              }
>          }
> -        if (i == nb_numa_nodes) {
> +        if (i == ms->numa_state->num_nodes) {
>              assert(mc->numa_auto_assign_ram);
> -            mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
> +            mc->numa_auto_assign_ram(mc,
> +                                     numa_info,
> +                                     ms->numa_state->num_nodes,
> +                                     ram_size);
>          }
>  
>          numa_total = 0;
> -        for (i = 0; i < nb_numa_nodes; i++) {
> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              numa_total += numa_info[i].node_mem;
>          }
>          if (numa_total != ram_size) {
> @@ -434,10 +445,10 @@ void numa_complete_configuration(MachineState *ms)
>           */
>          if (have_numa_distance) {
>              /* Validate enough NUMA distance information was provided. */
> -            validate_numa_distance();
> +            validate_numa_distance(ms);
>  
>              /* Validation succeeded, now fill in any missing distances. */
> -            complete_init_numa_distance();
> +            complete_init_numa_distance(ms);
>          }
>      }
>  }
> @@ -513,14 +524,16 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>  {
>      uint64_t addr = 0;
>      int i;
> +    MachineState *ms = MACHINE(qdev_get_machine());
>  
> -    if (nb_numa_nodes == 0 || !have_memdevs) {
> +    if (ms->numa_state == NULL ||
> +        ms->numa_state->num_nodes == 0 || !have_memdevs) {
>          allocate_system_memory_nonnuma(mr, owner, name, ram_size);
>          return;
>      }
>  
>      memory_region_init(mr, owner, name, ram_size);
> -    for (i = 0; i < nb_numa_nodes; i++) {
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
>          uint64_t size = numa_info[i].node_mem;
>          HostMemoryBackend *backend = numa_info[i].node_memdev;
>          if (!backend) {
> @@ -578,16 +591,16 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
>      qapi_free_MemoryDeviceInfoList(info_list);
>  }
>  
> -void query_numa_node_mem(NumaNodeMem node_mem[])
> +void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
>  {
>      int i;
>  
> -    if (nb_numa_nodes <= 0) {
> +    if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
>          return;
>      }
>  
>      numa_stat_memory_devices(node_mem);
> -    for (i = 0; i < nb_numa_nodes; i++) {
> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
>          node_mem[i].node_mem += numa_info[i].node_mem;
>      }
>  }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info into MachineState
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info " Tao Xu
@ 2019-06-28 11:20   ` Igor Mammedov
  2019-07-01  2:01     ` Tao Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2019-06-28 11:20 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:22 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> Move existing numa global numa_info (renamed as "nodes") into NumaState.
> 
> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v5 -> v4:
>     - Directly use ms->numa_state->nodes and not dereferencing
>     ms->numa_state in the first place when ms->numa_state is possible
>     NULL (Igor)

the sa,e like in previous patch,
use ms->numa_state->nodes directly whenever possible without using
intermediate local variable

> ---
>  exec.c                   |  2 +-
>  hw/acpi/aml-build.c      |  6 ++++--
>  hw/arm/boot.c            |  2 +-
>  hw/arm/virt-acpi-build.c |  7 ++++---
>  hw/arm/virt.c            |  1 +
>  hw/i386/pc.c             |  4 ++--
>  hw/ppc/spapr.c           |  4 +++-
>  hw/ppc/spapr_pci.c       |  1 +
>  include/sysemu/numa.h    |  3 +++
>  numa.c                   | 15 +++++++++------
>  10 files changed, 29 insertions(+), 16 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index c7eb4af42d..0e30926588 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1763,7 +1763,7 @@ long qemu_minrampagesize(void)
>      if (hpsize > mainrampagesize &&
>          (ms->numa_state == NULL ||
>           ms->numa_state->num_nodes == 0 ||
> -         numa_info[0].node_memdev == NULL)) {
> +         ms->numa_state->nodes[0].node_memdev == NULL)) {
>          static bool warned;
>          if (!warned) {
>              error_report("Huge page support disabled (n/a for main memory).");
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 63c1cae8c9..26ccc1a3e2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1737,8 +1737,10 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>      build_append_int_noprefix(table_data, nb_numa_nodes, 8);
>      for (i = 0; i < nb_numa_nodes; i++) {
>          for (j = 0; j < nb_numa_nodes; j++) {
> -            assert(numa_info[i].distance[j]);
> -            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
> +            assert(ms->numa_state->nodes[i].distance[j]);
> +            build_append_int_noprefix(table_data,
> +                                      ms->numa_state->nodes[i].distance[j],
> +                                      1);
>          }
>      }
>  
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 2af881e0f4..0c1572d118 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -600,7 +600,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>          mem_base = binfo->loader_start;
>          for (i = 0; i < ms->numa_state->num_nodes; i++) {
> -            mem_len = numa_info[i].node_mem;
> +            mem_len = ms->numa_state->nodes[i].node_mem;
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
>              if (rc < 0) {
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 9d2edd8023..422bbed2d3 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -536,11 +536,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  
>      mem_base = vms->memmap[VIRT_MEM].base;
>      for (i = 0; i < nb_numa_nodes; ++i) {
> -        if (numa_info[i].node_mem > 0) {
> +        if (ms->numa_state->nodes[i].node_mem > 0) {
>              numamem = acpi_data_push(table_data, sizeof(*numamem));
> -            build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
> +            build_srat_memory(numamem, mem_base,
> +                              ms->numa_state->nodes[i].node_mem, i,
>                                MEM_AFFINITY_ENABLED);
> -            mem_base += numa_info[i].node_mem;
> +            mem_base += ms->numa_state->nodes[i].node_mem;
>          }
>      }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d147cceab6..d3904d74dc 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -233,6 +233,7 @@ static void create_fdt(VirtMachineState *vms)
>          int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
>          uint32_t *matrix = g_malloc0(size);
>          int idx, i, j;
> +        NodeInfo *numa_info = ms->numa_state->nodes;
>
>          for (i = 0; i < nb_numa_nodes; i++) {
>              for (j = 0; j < nb_numa_nodes; j++) {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 5bab78e137..4cc84c5050 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1041,7 +1041,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>      }
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> -            cpu_to_le64(numa_info[i].node_mem);
> +            cpu_to_le64(ms->numa_state->nodes[i].node_mem);
>      }
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
>                       (1 + pcms->apic_id_limit + nb_numa_nodes) *
> @@ -1683,7 +1683,7 @@ void pc_guest_info_init(PCMachineState *pcms)
>      pcms->node_mem = g_malloc0(pcms->numa_nodes *
>                                      sizeof *pcms->node_mem);
>      for (i = 0; i < nb_numa_nodes; i++) {
> -        pcms->node_mem[i] = numa_info[i].node_mem;
> +        pcms->node_mem[i] = ms->numa_state->nodes[i].node_mem;
>      }
>  
>      pcms->machine_done.notify = pc_machine_done;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 07a02db99e..3f2e6e0f5f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -349,6 +349,7 @@ static hwaddr spapr_node0_size(MachineState *machine)
>      int nb_numa_nodes = machine->numa_state->num_nodes;
>      if (nb_numa_nodes) {
>          int i;
> +        NodeInfo *numa_info = machine->numa_state->nodes;
>          for (i = 0; i < nb_numa_nodes; ++i) {
>              if (numa_info[i].node_mem) {
>                  return MIN(pow2floor(numa_info[i].node_mem),
> @@ -395,7 +396,7 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
>      MachineState *machine = MACHINE(spapr);
>      hwaddr mem_start, node_size;
>      int i;
> -    NodeInfo *nodes = numa_info;
> +    NodeInfo *nodes = machine->numa_state->nodes;
>      NodeInfo ramnode;
>  
>      /* No NUMA nodes, assume there is just one node with whole RAM */
> @@ -2521,6 +2522,7 @@ static void spapr_validate_node_memory(MachineState *machine, Error **errp)
>  {
>      int i;
>      int nb_numa_nodes = machine->numa_state->num_nodes;
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>  
>      if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
>          error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index d6fd018dd4..9d4ebd60de 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1639,6 +1639,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>      PCIHostState *phb = PCI_HOST_BRIDGE(s);
>      MachineState *ms = MACHINE(spapr);
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>      char *namebuf;
>      int i;
>      PCIBus *bus;
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 08a86080c4..437eb21fef 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -26,6 +26,9 @@ struct NumaState {
>  
>      /* Allow setting NUMA distance for different NUMA nodes */
>      bool have_numa_distance;
> +
> +    /* NUMA nodes information */
> +    NodeInfo nodes[MAX_NODES];
>  };
>  typedef struct NumaState NumaState;

Shouldn't you remove global numa_info var from header as well? 

> diff --git a/numa.c b/numa.c
> index 9432d42ad0..d23e130bce 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -52,9 +52,6 @@ static int have_memdevs = -1;
>  static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
>                               * For all nodes, nodeid < max_numa_nodeid
>                               */
> -bool have_numa_distance;
> -NodeInfo numa_info[MAX_NODES];
> -
>  
>  static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>                              Error **errp)
> @@ -63,6 +60,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>      uint16_t nodenr;
>      uint16List *cpus = NULL;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>  
>      if (node->has_nodeid) {
>          nodenr = node->nodeid;
> @@ -144,6 +142,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>      uint16_t src = dist->src;
>      uint16_t dst = dist->dst;
>      uint8_t val = dist->val;
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>  
>      if (src >= MAX_NODES || dst >= MAX_NODES) {
>          error_setg(errp, "Parameter '%s' expects an integer between 0 and %d",
> @@ -203,7 +202,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>              error_setg(&err, "Missing mandatory node-id property");
>              goto end;
>          }
> -        if (!numa_info[object->u.cpu.node_id].present) {
> +        if (!ms->numa_state->nodes[object->u.cpu.node_id].present) {
>              error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
>                  "defined with -numa node,nodeid=ID before it's used with "
>                  "-numa cpu,node-id=ID", object->u.cpu.node_id);
> @@ -263,6 +262,7 @@ static void validate_numa_distance(MachineState *ms)
>      int src, dst;
>      bool is_asymmetrical = false;
>      int nb_numa_nodes = ms->numa_state->num_nodes;
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>  
>      for (src = 0; src < nb_numa_nodes; src++) {
>          for (dst = src; dst < nb_numa_nodes; dst++) {
> @@ -304,6 +304,7 @@ static void complete_init_numa_distance(MachineState *ms)
>  {
>      int src, dst;
>      int nb_numa_nodes = ms->numa_state->num_nodes;
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>  
>      /* Fixup NUMA distance by symmetric policy because if it is an
>       * asymmetric distance table, it should be a complete table and
> @@ -363,6 +364,7 @@ void numa_complete_configuration(MachineState *ms)
>  {
>      int i;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    NodeInfo *numa_info = ms->numa_state->nodes;
>  
>      /*
>       * If memory hotplug is enabled (slots > 0) but without '-numa'
> @@ -534,8 +536,8 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>  
>      memory_region_init(mr, owner, name, ram_size);
>      for (i = 0; i < ms->numa_state->num_nodes; i++) {
> -        uint64_t size = numa_info[i].node_mem;
> -        HostMemoryBackend *backend = numa_info[i].node_memdev;
> +        uint64_t size = ms->numa_state->nodes[i].node_mem;
> +        HostMemoryBackend *backend = ms->numa_state->nodes[i].node_memdev;
>          if (!backend) {
>              continue;
>          }
> @@ -594,6 +596,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
>  {
>      int i;
> +    NodeInfo *numa_info = ms->numa_state->nodes;
well, look line below where you care about NULL check and suddenly
you don't care about it being NULL right above that check.

>  
>      if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
>          return;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-06-27 15:56   ` Jonathan Cameron
@ 2019-07-01  0:58     ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-01  0:58 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, imammedo

On 6/27/2019 11:56 PM, Jonathan Cameron wrote:
> On Fri, 14 Jun 2019 23:56:24 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table (HMAT).
>> The specification references below link:
>> http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
>>
>> It describes the memory attributes, such as memory side cache
>> attributes and bandwidth and latency details, related to the
>> System Physical Address (SPA) Memory Ranges. The software is
>> expected to use this information as hint for optimization.
>>
>> This structure describes the System Physical Address(SPA) range
>> occupied by memory subsystem and its associativity with processor
>> proximity domain as well as hint for memory usage.
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> 
> Hi Tao,
> 
> Apologies if I missed an earlier discussion on this...
> 
> It's probably not letting an secrets out to say that there are very few
> real hardware systems out there using the 6.2 version of HMAT.
> 
> Does it make sense to implement it rather than the somewhat tidied
> up version in ACPI 6.3?
> 
> I would go so far as to say that one of the pushes behind making those
> changes was that it shouldn't have much impact as no one was shipping
> a firmware using the 6.2 version.  So any chance we can avoid
> qemu effectively doing so, or at least defaulting to doing so?
> 
> I'm entirely in favor of the patch set in general btw as it's much
> more useful than having to override with a hand crafted table, when
> wanting to test unusual topologies.
> 
> Thanks,
> 
> Jonathan
> 
Thanks for your suggestion. After discussion, we decide to use ACPI 6.3 
in next version.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState
  2019-06-28 11:02   ` Igor Mammedov
@ 2019-07-01  1:57     ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-01  1:57 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On 6/28/2019 7:02 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:20 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> Add struct NumaState in MachineState and move existing numa global
>> nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
>> numa_support into MachineClass to decide which submachines support NUMA.
>>
>> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
> you are not supposed to keep Reviewed-bys on respin
> unless changes that were made are trivial.
> (
> PS:
>   it applies to the whole series
> )
> 
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> 
> In many places you are following pattern:
>    int nb_numa_nodes = ms->numa_state->num_nodes;
> 
> even if local variable is only used once or twice within the function.
> Pls use ms->numa_state->num_nodes directly instead if it doesn't hurt
> readability.

Thank you for your comments. I will improve it.
> 
> more comments below...
>> ---
>>
>> Changes in v5 -> v4:
>>      - drop the helper machine_num_numa_nodes() and use
>>      machine->numa_state->num_nodes directly (Igor)
>>      - remove the unnecessary header include (Igor)
>> ---
>>   exec.c                              |  5 ++-
>>   hw/acpi/aml-build.c                 |  3 +-
>>   hw/arm/boot.c                       |  4 +-
>>   hw/arm/virt-acpi-build.c            |  8 +++-
>>   hw/arm/virt.c                       |  5 ++-
>>   hw/core/machine.c                   | 14 +++++--
>>   hw/i386/acpi-build.c                |  2 +-
>>   hw/i386/pc.c                        |  7 +++-
>>   hw/mem/pc-dimm.c                    |  2 +
>>   hw/pci-bridge/pci_expander_bridge.c |  2 +
>>   hw/ppc/spapr.c                      | 19 +++++++---
>>   hw/ppc/spapr_pci.c                  |  1 +
>>   include/hw/acpi/aml-build.h         |  2 +-
>>   include/hw/boards.h                 |  2 +
>>   include/sysemu/numa.h               | 13 +++++--
>>   monitor.c                           | 11 +++++-
>>   numa.c                              | 59 ++++++++++++++++++-----------
>>   17 files changed, 112 insertions(+), 47 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 4e734770c2..c7eb4af42d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1733,6 +1733,7 @@ long qemu_minrampagesize(void)
>>       long hpsize = LONG_MAX;
>>       long mainrampagesize;
>>       Object *memdev_root;
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>>   
>>       mainrampagesize = qemu_mempath_getpagesize(mem_path);
>>   
>> @@ -1760,7 +1761,9 @@ long qemu_minrampagesize(void)
>>        * so if its page size is smaller we have got to report that size instead.
>>        */
>>       if (hpsize > mainrampagesize &&
>> -        (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL)) {
>> +        (ms->numa_state == NULL ||
>> +         ms->numa_state->num_nodes == 0 ||
>> +         numa_info[0].node_memdev == NULL)) {
>>           static bool warned;
>>           if (!warned) {
>>               error_report("Huge page support disabled (n/a for main memory).");
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 555c24f21d..63c1cae8c9 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1726,10 +1726,11 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>>    * ACPI spec 5.2.17 System Locality Distance Information Table
>>    * (Revision 2.0 or later)
>>    */
>> -void build_slit(GArray *table_data, BIOSLinker *linker)
>> +void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>>   {
>>       int slit_start, i, j;
>>       slit_start = table_data->len;
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       acpi_data_push(table_data, sizeof(AcpiTableHeader));
>>   
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index 30acdbe824..2af881e0f4 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -597,9 +597,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>       }
>>       g_strfreev(node_path);
>>   
>> -    if (nb_numa_nodes > 0) {
>> +    if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>>           mem_base = binfo->loader_start;
>> -        for (i = 0; i < nb_numa_nodes; i++) {
>> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>               mem_len = numa_info[i].node_mem;
>>               rc = fdt_add_memory_node(fdt, acells, mem_base,
>>                                        scells, mem_len, i);
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 4a64f9985c..9a22ce679c 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -517,7 +517,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>>       int i, srat_start;
>>       uint64_t mem_base;
>>       MachineClass *mc = MACHINE_GET_CLASS(vms);
>> -    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
>> +    MachineState *ms = MACHINE(vms);
>> +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(ms);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       srat_start = table_data->len;
>>       srat = acpi_data_push(table_data, sizeof(*srat));
>> @@ -759,6 +761,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>>       GArray *table_offsets;
>>       unsigned dsdt, xsdt;
>>       GArray *tables_blob = tables->table_data;
>> +    MachineState *ms = MACHINE(vms);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       table_offsets = g_array_new(false, true /* clear */,
>>                                           sizeof(uint32_t));
>> @@ -798,7 +802,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>>           build_srat(tables_blob, tables->linker, vms);
>>           if (have_numa_distance) {
>>               acpi_add_table(table_offsets, tables_blob);
>> -            build_slit(tables_blob, tables->linker);
>> +            build_slit(tables_blob, tables->linker, ms);
>>           }
>>       }
>>   
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index e2ce7a2841..025ad484c5 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -196,6 +196,8 @@ static bool cpu_type_valid(const char *cpu)
>>   
>>   static void create_fdt(VirtMachineState *vms)
>>   {
>> +    MachineState *ms = MACHINE(vms);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>       void *fdt = create_device_tree(&vms->fdt_size);
>>   
>>       if (!fdt) {
>> @@ -1834,7 +1836,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>>   
>>   static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>>   {
>> -    return idx % nb_numa_nodes;
>> +    return idx % ms->numa_state->num_nodes;
>>   }
>>   
>>   static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>> @@ -1940,6 +1942,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>>       mc->kvm_type = virt_kvm_type;
>>       assert(!mc->get_hotplug_handler);
>>       mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
>> +    mc->numa_supported = true;
>>       hc->plug = virt_machine_device_plug_cb;
>>   }
>>   
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index f1a0f45f9c..14b29de0a9 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -877,6 +877,9 @@ static void machine_initfn(Object *obj)
>>                                           NULL);
>>       }
>>   
>> +    if (mc->numa_supported) {
>> +        ms->numa_state = g_new0(NumaState, 1);
>> +    }
>>   
>>       /* Register notifier when init is done for sysbus sanity checks */
>>       ms->sysbus_notifier.notify = machine_init_notify;
>> @@ -897,6 +900,7 @@ static void machine_finalize(Object *obj)
>>       g_free(ms->firmware);
>>       g_free(ms->device_memory);
>>       g_free(ms->nvdimms_state);
>> +    g_free(ms->numa_state);
>>   }
>>   
>>   bool machine_usb(MachineState *machine)
>> @@ -968,7 +972,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>       const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
>>   
>> -    assert(nb_numa_nodes);
>> +    assert(machine->numa_state->num_nodes);
>>       for (i = 0; i < possible_cpus->len; i++) {
>>           if (possible_cpus->cpus[i].props.has_node_id) {
>>               break;
>> @@ -1014,9 +1018,11 @@ void machine_run_board_init(MachineState *machine)
>>   {
>>       MachineClass *machine_class = MACHINE_GET_CLASS(machine);
>>   
>> -    numa_complete_configuration(machine);
>> -    if (nb_numa_nodes) {
>> -        machine_numa_finish_cpu_init(machine);
>> +    if (machine_class->numa_supported) {
>> +        numa_complete_configuration(machine);
>> +        if (machine->numa_state->num_nodes) {
>> +            machine_numa_finish_cpu_init(machine);
>> +        }
>>       }
>>   
>>       /* If the machine supports the valid_cpu_types check and the user
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 85dc1640bc..0d58335560 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2669,7 +2669,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>           build_srat(tables_blob, tables->linker, machine);
>>           if (have_numa_distance) {
>>               acpi_add_table(table_offsets, tables_blob);
>> -            build_slit(tables_blob, tables->linker);
>> +            build_slit(tables_blob, tables->linker, machine);
>>           }
>>       }
>>       if (acpi_get_mcfg(&mcfg)) {
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 1b08b56362..5bab78e137 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -997,6 +997,8 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>       int i;
>>       const CPUArchIdList *cpus;
>>       MachineClass *mc = MACHINE_GET_CLASS(pcms);
>> +    MachineState *ms = MACHINE(pcms);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
>>       fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> @@ -1673,6 +1675,8 @@ void pc_machine_done(Notifier *notifier, void *data)
>>   void pc_guest_info_init(PCMachineState *pcms)
>>   {
>>       int i;
>> +    MachineState *ms = MACHINE(pcms);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       pcms->apic_xrupt_override = kvm_allows_irq0_override();
>>       pcms->numa_nodes = nb_numa_nodes;
>> @@ -2656,7 +2660,7 @@ static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
>>      assert(idx < ms->possible_cpus->len);
>>      x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
>>                               smp_cores, smp_threads, &topo);
>> -   return topo.pkg_id % nb_numa_nodes;
>> +   return topo.pkg_id % ms->numa_state->num_nodes;
>>   }
>>   
>>   static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
>> @@ -2750,6 +2754,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>>       nc->nmi_monitor_handler = x86_nmi;
>>       mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
>>       mc->nvdimm_supported = true;
>> +    mc->numa_supported = true;
>>   
>>       object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
>>           pc_machine_get_device_memory_region_size, NULL,
>> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>> index 152400b1fc..19e7626590 100644
>> --- a/hw/mem/pc-dimm.c
>> +++ b/hw/mem/pc-dimm.c
>> @@ -160,6 +160,8 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
>>   {
>>       PCDIMMDevice *dimm = PC_DIMM(dev);
>>       PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       if (!dimm->hostmem) {
>>           error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
>> diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
>> index ca66bc721a..a76a00a6d5 100644
>> --- a/hw/pci-bridge/pci_expander_bridge.c
>> +++ b/hw/pci-bridge/pci_expander_bridge.c
>> @@ -211,6 +211,8 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
>>       PCIBus *bus;
>>       const char *dev_name = NULL;
>>       Error *local_err = NULL;
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       if (pxb->numa_node != NUMA_NODE_UNASSIGNED &&
>>           pxb->numa_node >= nb_numa_nodes) {
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index e2b33e5890..07a02db99e 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -290,6 +290,8 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
>>       CPUState *cs;
>>       char cpu_model[32];
>>       uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
>> +    MachineState *ms = MACHINE(spapr);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       CPU_FOREACH(cs) {
>>           PowerPCCPU *cpu = POWERPC_CPU(cs);
>> @@ -344,6 +346,7 @@ static int spapr_fixup_cpu_dt(void *fdt, SpaprMachineState *spapr)
>>   
>>   static hwaddr spapr_node0_size(MachineState *machine)
>>   {
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>>       if (nb_numa_nodes) {
>>           int i;
>>           for (i = 0; i < nb_numa_nodes; ++i) {
>> @@ -391,18 +394,18 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>>       hwaddr mem_start, node_size;
>> -    int i, nb_nodes = nb_numa_nodes;
>> +    int i;
>>       NodeInfo *nodes = numa_info;
>>       NodeInfo ramnode;
>>   
>>       /* No NUMA nodes, assume there is just one node with whole RAM */
>> -    if (!nb_numa_nodes) {
>> -        nb_nodes = 1;
>> +    if (!machine->numa_state->num_nodes) {
>> +        machine->numa_state->num_nodes = 1;
>>           ramnode.node_mem = machine->ram_size;
>>           nodes = &ramnode;
>>       }
>>   
>> -    for (i = 0, mem_start = 0; i < nb_nodes; ++i) {
>> +    for (i = 0, mem_start = 0; i < machine->numa_state->num_nodes; ++i) {
>>           if (!nodes[i].node_mem) {
>>               continue;
>>           }
>> @@ -444,6 +447,8 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
>>       PowerPCCPU *cpu = POWERPC_CPU(cs);
>>       CPUPPCState *env = &cpu->env;
>>       PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
>> +    MachineState *ms = MACHINE(spapr);
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>       int index = spapr_get_vcpu_id(cpu);
>>       uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>>                          0xffffffff, 0xffffffff};
>> @@ -852,6 +857,7 @@ static int spapr_populate_drmem_v1(SpaprMachineState *spapr, void *fdt,
>>   static int spapr_populate_drconf_memory(SpaprMachineState *spapr, void *fdt)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>>       int ret, i, offset;
>>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>>       uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
>> @@ -1696,6 +1702,7 @@ static void spapr_machine_reset(void)
>>   {
>>       MachineState *machine = MACHINE(qdev_get_machine());
>>       SpaprMachineState *spapr = SPAPR_MACHINE(machine);
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>>       PowerPCCPU *first_ppc_cpu;
>>       uint32_t rtas_limit;
>>       hwaddr rtas_addr, fdt_addr;
>> @@ -2513,6 +2520,7 @@ static void spapr_create_lmb_dr_connectors(SpaprMachineState *spapr)
>>   static void spapr_validate_node_memory(MachineState *machine, Error **errp)
>>   {
>>       int i;
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>>   
>>       if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
>>           error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
>> @@ -4115,7 +4123,7 @@ spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>>   
>>   static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
>>   {
>> -    return idx / smp_cores % nb_numa_nodes;
>> +    return idx / smp_cores % ms->numa_state->num_nodes;
>>   }
>>   
>>   static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
>> @@ -4319,6 +4327,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>       smc->update_dt_enabled = true;
>>       mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
>>       mc->has_hotpluggable_cpus = true;
>> +    mc->numa_supported = true;
>>       smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
>>       fwc->get_dev_path = spapr_get_fw_dev_path;
>>       nc->nmi_monitor_handler = spapr_nmi;
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 9cf2c41b8c..d6fd018dd4 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1638,6 +1638,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>       SysBusDevice *s = SYS_BUS_DEVICE(dev);
>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>>       PCIHostState *phb = PCI_HOST_BRIDGE(s);
>> +    MachineState *ms = MACHINE(spapr);
> why do you do it?
> 
>>       char *namebuf;
>>       int i;
>>       PCIBus *bus;
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index 1a563ad756..991cf05134 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -414,7 +414,7 @@ build_append_gas_from_struct(GArray *table, const struct AcpiGenericAddress *s)
>>   void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>>                          uint64_t len, int node, MemoryAffinityFlags flags);
>>   
>> -void build_slit(GArray *table_data, BIOSLinker *linker);
>> +void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
>>   
>>   void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>>                   const char *oem_id, const char *oem_table_id);
>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>> index 6ff02bf3e4..8375a07940 100644
>> --- a/include/hw/boards.h
>> +++ b/include/hw/boards.h
>> @@ -210,6 +210,7 @@ struct MachineClass {
>>       bool ignore_boot_device_suffixes;
>>       bool smbus_no_migration_support;
>>       bool nvdimm_supported;
>> +    bool numa_supported;
>>   
>>       HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>>                                              DeviceState *dev);
>> @@ -273,6 +274,7 @@ struct MachineState {
>>       AccelState *accelerator;
>>       CPUArchIdList *possible_cpus;
>>       struct NVDIMMState *nvdimms_state;
>> +    struct NumaState *numa_state;
>>   };
>>   
>>   #define DEFINE_MACHINE(namestr, machine_initfn) \
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index b6ac7de43e..3c4b2d2909 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -6,7 +6,6 @@
>>   #include "sysemu/hostmem.h"
>>   #include "hw/boards.h"
>>   
>> -extern int nb_numa_nodes;   /* Number of NUMA nodes */
>>   extern bool have_numa_distance;
>>   
>>   struct NodeInfo {
>> @@ -16,15 +15,23 @@ struct NodeInfo {
>>       uint8_t distance[MAX_NODES];
>>   };
>>   
>> +extern NodeInfo numa_info[MAX_NODES];
>> +
> random move?
> 

Sorry, I make mistake here. I should be more careful.
>>   struct NumaNodeMem {
>>       uint64_t node_mem;
>>       uint64_t node_plugged_mem;
>>   };
>>   
>> -extern NodeInfo numa_info[MAX_NODES];
>> +struct NumaState {
>> +    /* Number of NUMA nodes */
>> +    int num_nodes;
>> +
>> +};
>> +typedef struct NumaState NumaState;
>> +
>>   void parse_numa_opts(MachineState *ms);
>>   void numa_complete_configuration(MachineState *ms);
>> -void query_numa_node_mem(NumaNodeMem node_mem[]);
>> +void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>>   extern QemuOptsList qemu_numa_opts;
>>   void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>>                                    int nb_nodes, ram_addr_t size);
>> diff --git a/monitor.c b/monitor.c
>> index 6428eb3b7e..08ef28450e 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -1922,14 +1922,21 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
>>   
>>   static void hmp_info_numa(Monitor *mon, const QDict *qdict)
>>   {
>> -    int i;
>> +    int i, nb_numa_nodes;
>>       NumaNodeMem *node_mem;
>>       CpuInfoList *cpu_list, *cpu;
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +
>> +    if (ms->numa_state == NULL) {
>> +        monitor_printf(mon, "%d nodes\n", 0);
>> +        return;
>> +    }
> suggest not to duplicate monitor_printf,
> something like that:
> 
> nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
> monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
> if(!nb_numa_nodes)
>    return;
> 
> 
>>   
>> +    nb_numa_nodes = ms->numa_state->num_nodes;
>>       cpu_list = qmp_query_cpus(&error_abort);
>>       node_mem = g_new0(NumaNodeMem, nb_numa_nodes);
>>   
>> -    query_numa_node_mem(node_mem);
>> +    query_numa_node_mem(node_mem, ms);
>>       monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
>>       for (i = 0; i < nb_numa_nodes; i++) {
>>           monitor_printf(mon, "node %d cpus:", i);
>> diff --git a/numa.c b/numa.c
>> index 955ec0c830..d678b71607 100644
>> --- a/numa.c
>> +++ b/numa.c
>> @@ -52,7 +52,6 @@ static int have_memdevs = -1;
>>   static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
>>                                * For all nodes, nodeid < max_numa_nodeid
>>                                */
>> -int nb_numa_nodes;
>>   bool have_numa_distance;
>>   NodeInfo numa_info[MAX_NODES];
>>   
>> @@ -68,7 +67,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>       if (node->has_nodeid) {
>>           nodenr = node->nodeid;
>>       } else {
>> -        nodenr = nb_numa_nodes;
>> +        nodenr = ms->numa_state->num_nodes;
>>       }
>>   
>>       if (nodenr >= MAX_NODES) {
>> @@ -136,10 +135,11 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>       }
>>       numa_info[nodenr].present = true;
>>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>> -    nb_numa_nodes++;
>> +    ms->numa_state->num_nodes++;
>>   }
>>   
>> -static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
>> +static
>> +void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>>   {
>>       uint16_t src = dist->src;
>>       uint16_t dst = dist->dst;
>> @@ -178,6 +178,12 @@ static
>>   void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>   {
>>       Error *err = NULL;
>> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +
>> +    if (!mc->numa_supported) {
>> +        error_setg(errp, "NUMA is not supported by this machine-type");
>> +        goto end;
>> +    }
>>   
>>       switch (object->type) {
>>       case NUMA_OPTIONS_TYPE_NODE:
>> @@ -187,7 +193,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>           }
>>           break;
>>       case NUMA_OPTIONS_TYPE_DIST:
>> -        parse_numa_distance(&object->u.dist, &err);
>> +        parse_numa_distance(ms, &object->u.dist, &err);
>>           if (err) {
>>               goto end;
>>           }
>> @@ -252,10 +258,11 @@ end:
>>    * distance from a node to itself is always NUMA_DISTANCE_MIN,
>>    * so providing it is never necessary.
>>    */
>> -static void validate_numa_distance(void)
>> +static void validate_numa_distance(MachineState *ms)
>>   {
>>       int src, dst;
>>       bool is_asymmetrical = false;
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       for (src = 0; src < nb_numa_nodes; src++) {
>>           for (dst = src; dst < nb_numa_nodes; dst++) {
>> @@ -293,9 +300,10 @@ static void validate_numa_distance(void)
>>       }
>>   }
>>   
>> -static void complete_init_numa_distance(void)
>> +static void complete_init_numa_distance(MachineState *ms)
>>   {
>>       int src, dst;
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>>   
>>       /* Fixup NUMA distance by symmetric policy because if it is an
>>        * asymmetric distance table, it should be a complete table and
>> @@ -369,7 +377,7 @@ void numa_complete_configuration(MachineState *ms)
>>        *
>>        * Enable NUMA implicitly by adding a new NUMA node automatically.
>>        */
>> -    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
>> +    if (ms->ram_slots > 0 && ms->numa_state->num_nodes == 0 &&
>>           mc->auto_enable_numa_with_memhp) {
>>               NumaNodeOptions node = { };
>>               parse_numa_node(ms, &node, &error_abort);
>> @@ -387,30 +395,33 @@ void numa_complete_configuration(MachineState *ms)
>>       }
>>   
>>       /* This must be always true if all nodes are present: */
>> -    assert(nb_numa_nodes == max_numa_nodeid);
>> +    assert(ms->numa_state->num_nodes == max_numa_nodeid);
>>   
>> -    if (nb_numa_nodes > 0) {
>> +    if (ms->numa_state->num_nodes > 0) {
>>           uint64_t numa_total;
>>   
>> -        if (nb_numa_nodes > MAX_NODES) {
>> -            nb_numa_nodes = MAX_NODES;
>> +        if (ms->numa_state->num_nodes > MAX_NODES) {
>> +            ms->numa_state->num_nodes = MAX_NODES;
>>           }
>>   
>>           /* If no memory size is given for any node, assume the default case
>>            * and distribute the available memory equally across all nodes
>>            */
>> -        for (i = 0; i < nb_numa_nodes; i++) {
>> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>               if (numa_info[i].node_mem != 0) {
>>                   break;
>>               }
>>           }
>> -        if (i == nb_numa_nodes) {
>> +        if (i == ms->numa_state->num_nodes) {
>>               assert(mc->numa_auto_assign_ram);
>> -            mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
>> +            mc->numa_auto_assign_ram(mc,
>> +                                     numa_info,
>> +                                     ms->numa_state->num_nodes,
>> +                                     ram_size);
>>           }
>>   
>>           numa_total = 0;
>> -        for (i = 0; i < nb_numa_nodes; i++) {
>> +        for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>               numa_total += numa_info[i].node_mem;
>>           }
>>           if (numa_total != ram_size) {
>> @@ -434,10 +445,10 @@ void numa_complete_configuration(MachineState *ms)
>>            */
>>           if (have_numa_distance) {
>>               /* Validate enough NUMA distance information was provided. */
>> -            validate_numa_distance();
>> +            validate_numa_distance(ms);
>>   
>>               /* Validation succeeded, now fill in any missing distances. */
>> -            complete_init_numa_distance();
>> +            complete_init_numa_distance(ms);
>>           }
>>       }
>>   }
>> @@ -513,14 +524,16 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>>   {
>>       uint64_t addr = 0;
>>       int i;
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>>   
>> -    if (nb_numa_nodes == 0 || !have_memdevs) {
>> +    if (ms->numa_state == NULL ||
>> +        ms->numa_state->num_nodes == 0 || !have_memdevs) {
>>           allocate_system_memory_nonnuma(mr, owner, name, ram_size);
>>           return;
>>       }
>>   
>>       memory_region_init(mr, owner, name, ram_size);
>> -    for (i = 0; i < nb_numa_nodes; i++) {
>> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>           uint64_t size = numa_info[i].node_mem;
>>           HostMemoryBackend *backend = numa_info[i].node_memdev;
>>           if (!backend) {
>> @@ -578,16 +591,16 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
>>       qapi_free_MemoryDeviceInfoList(info_list);
>>   }
>>   
>> -void query_numa_node_mem(NumaNodeMem node_mem[])
>> +void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
>>   {
>>       int i;
>>   
>> -    if (nb_numa_nodes <= 0) {
>> +    if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
>>           return;
>>       }
>>   
>>       numa_stat_memory_devices(node_mem);
>> -    for (i = 0; i < nb_numa_nodes; i++) {
>> +    for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>           node_mem[i].node_mem += numa_info[i].node_mem;
>>       }
>>   }
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info into MachineState
  2019-06-28 11:20   ` Igor Mammedov
@ 2019-07-01  2:01     ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-01  2:01 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On 6/28/2019 7:20 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:22 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> Move existing numa global numa_info (renamed as "nodes") into NumaState.
>>
>> Reviewed-by: Liu Jingqi <jingqi.liu@intel.com>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v5 -> v4:
>>      - Directly use ms->numa_state->nodes and not dereferencing
>>      ms->numa_state in the first place when ms->numa_state is possible
>>      NULL (Igor)
> 
> the sa,e like in previous patch,
> use ms->numa_state->nodes directly whenever possible without using
> intermediate local variable
> 
>> ---
>>   exec.c                   |  2 +-
>>   hw/acpi/aml-build.c      |  6 ++++--
>>   hw/arm/boot.c            |  2 +-
>>   hw/arm/virt-acpi-build.c |  7 ++++---
>>   hw/arm/virt.c            |  1 +
>>   hw/i386/pc.c             |  4 ++--
>>   hw/ppc/spapr.c           |  4 +++-
>>   hw/ppc/spapr_pci.c       |  1 +
>>   include/sysemu/numa.h    |  3 +++
>>   numa.c                   | 15 +++++++++------
>>   10 files changed, 29 insertions(+), 16 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index c7eb4af42d..0e30926588 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1763,7 +1763,7 @@ long qemu_minrampagesize(void)
>>       if (hpsize > mainrampagesize &&
>>           (ms->numa_state == NULL ||
>>            ms->numa_state->num_nodes == 0 ||
>> -         numa_info[0].node_memdev == NULL)) {
>> +         ms->numa_state->nodes[0].node_memdev == NULL)) {
>>           static bool warned;
>>           if (!warned) {
>>               error_report("Huge page support disabled (n/a for main memory).");
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 63c1cae8c9..26ccc1a3e2 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1737,8 +1737,10 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>>       build_append_int_noprefix(table_data, nb_numa_nodes, 8);
>>       for (i = 0; i < nb_numa_nodes; i++) {
>>           for (j = 0; j < nb_numa_nodes; j++) {
>> -            assert(numa_info[i].distance[j]);
>> -            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
>> +            assert(ms->numa_state->nodes[i].distance[j]);
>> +            build_append_int_noprefix(table_data,
>> +                                      ms->numa_state->nodes[i].distance[j],
>> +                                      1);
>>           }
>>       }
>>   
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index 2af881e0f4..0c1572d118 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -600,7 +600,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>       if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>>           mem_base = binfo->loader_start;
>>           for (i = 0; i < ms->numa_state->num_nodes; i++) {
>> -            mem_len = numa_info[i].node_mem;
>> +            mem_len = ms->numa_state->nodes[i].node_mem;
>>               rc = fdt_add_memory_node(fdt, acells, mem_base,
>>                                        scells, mem_len, i);
>>               if (rc < 0) {
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 9d2edd8023..422bbed2d3 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -536,11 +536,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>>   
>>       mem_base = vms->memmap[VIRT_MEM].base;
>>       for (i = 0; i < nb_numa_nodes; ++i) {
>> -        if (numa_info[i].node_mem > 0) {
>> +        if (ms->numa_state->nodes[i].node_mem > 0) {
>>               numamem = acpi_data_push(table_data, sizeof(*numamem));
>> -            build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
>> +            build_srat_memory(numamem, mem_base,
>> +                              ms->numa_state->nodes[i].node_mem, i,
>>                                 MEM_AFFINITY_ENABLED);
>> -            mem_base += numa_info[i].node_mem;
>> +            mem_base += ms->numa_state->nodes[i].node_mem;
>>           }
>>       }
>>   
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index d147cceab6..d3904d74dc 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -233,6 +233,7 @@ static void create_fdt(VirtMachineState *vms)
>>           int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
>>           uint32_t *matrix = g_malloc0(size);
>>           int idx, i, j;
>> +        NodeInfo *numa_info = ms->numa_state->nodes;
>>
>>           for (i = 0; i < nb_numa_nodes; i++) {
>>               for (j = 0; j < nb_numa_nodes; j++) {
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 5bab78e137..4cc84c5050 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -1041,7 +1041,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>       }
>>       for (i = 0; i < nb_numa_nodes; i++) {
>>           numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
>> -            cpu_to_le64(numa_info[i].node_mem);
>> +            cpu_to_le64(ms->numa_state->nodes[i].node_mem);
>>       }
>>       fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
>>                        (1 + pcms->apic_id_limit + nb_numa_nodes) *
>> @@ -1683,7 +1683,7 @@ void pc_guest_info_init(PCMachineState *pcms)
>>       pcms->node_mem = g_malloc0(pcms->numa_nodes *
>>                                       sizeof *pcms->node_mem);
>>       for (i = 0; i < nb_numa_nodes; i++) {
>> -        pcms->node_mem[i] = numa_info[i].node_mem;
>> +        pcms->node_mem[i] = ms->numa_state->nodes[i].node_mem;
>>       }
>>   
>>       pcms->machine_done.notify = pc_machine_done;
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 07a02db99e..3f2e6e0f5f 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -349,6 +349,7 @@ static hwaddr spapr_node0_size(MachineState *machine)
>>       int nb_numa_nodes = machine->numa_state->num_nodes;
>>       if (nb_numa_nodes) {
>>           int i;
>> +        NodeInfo *numa_info = machine->numa_state->nodes;
>>           for (i = 0; i < nb_numa_nodes; ++i) {
>>               if (numa_info[i].node_mem) {
>>                   return MIN(pow2floor(numa_info[i].node_mem),
>> @@ -395,7 +396,7 @@ static int spapr_populate_memory(SpaprMachineState *spapr, void *fdt)
>>       MachineState *machine = MACHINE(spapr);
>>       hwaddr mem_start, node_size;
>>       int i;
>> -    NodeInfo *nodes = numa_info;
>> +    NodeInfo *nodes = machine->numa_state->nodes;
>>       NodeInfo ramnode;
>>   
>>       /* No NUMA nodes, assume there is just one node with whole RAM */
>> @@ -2521,6 +2522,7 @@ static void spapr_validate_node_memory(MachineState *machine, Error **errp)
>>   {
>>       int i;
>>       int nb_numa_nodes = machine->numa_state->num_nodes;
>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>   
>>       if (machine->ram_size % SPAPR_MEMORY_BLOCK_SIZE) {
>>           error_setg(errp, "Memory size 0x" RAM_ADDR_FMT
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index d6fd018dd4..9d4ebd60de 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1639,6 +1639,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>>       PCIHostState *phb = PCI_HOST_BRIDGE(s);
>>       MachineState *ms = MACHINE(spapr);
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>       char *namebuf;
>>       int i;
>>       PCIBus *bus;
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 08a86080c4..437eb21fef 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -26,6 +26,9 @@ struct NumaState {
>>   
>>       /* Allow setting NUMA distance for different NUMA nodes */
>>       bool have_numa_distance;
>> +
>> +    /* NUMA nodes information */
>> +    NodeInfo nodes[MAX_NODES];
>>   };
>>   typedef struct NumaState NumaState;
> 
> Shouldn't you remove global numa_info var from header as well?
> 
>> diff --git a/numa.c b/numa.c
>> index 9432d42ad0..d23e130bce 100644
>> --- a/numa.c
>> +++ b/numa.c
>> @@ -52,9 +52,6 @@ static int have_memdevs = -1;
>>   static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
>>                                * For all nodes, nodeid < max_numa_nodeid
>>                                */
>> -bool have_numa_distance;
>> -NodeInfo numa_info[MAX_NODES];
>> -
>>   
>>   static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>                               Error **errp)
>> @@ -63,6 +60,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>       uint16_t nodenr;
>>       uint16List *cpus = NULL;
>>       MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>   
>>       if (node->has_nodeid) {
>>           nodenr = node->nodeid;
>> @@ -144,6 +142,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>>       uint16_t src = dist->src;
>>       uint16_t dst = dist->dst;
>>       uint8_t val = dist->val;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>   
>>       if (src >= MAX_NODES || dst >= MAX_NODES) {
>>           error_setg(errp, "Parameter '%s' expects an integer between 0 and %d",
>> @@ -203,7 +202,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>               error_setg(&err, "Missing mandatory node-id property");
>>               goto end;
>>           }
>> -        if (!numa_info[object->u.cpu.node_id].present) {
>> +        if (!ms->numa_state->nodes[object->u.cpu.node_id].present) {
>>               error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
>>                   "defined with -numa node,nodeid=ID before it's used with "
>>                   "-numa cpu,node-id=ID", object->u.cpu.node_id);
>> @@ -263,6 +262,7 @@ static void validate_numa_distance(MachineState *ms)
>>       int src, dst;
>>       bool is_asymmetrical = false;
>>       int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>   
>>       for (src = 0; src < nb_numa_nodes; src++) {
>>           for (dst = src; dst < nb_numa_nodes; dst++) {
>> @@ -304,6 +304,7 @@ static void complete_init_numa_distance(MachineState *ms)
>>   {
>>       int src, dst;
>>       int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>   
>>       /* Fixup NUMA distance by symmetric policy because if it is an
>>        * asymmetric distance table, it should be a complete table and
>> @@ -363,6 +364,7 @@ void numa_complete_configuration(MachineState *ms)
>>   {
>>       int i;
>>       MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>>   
>>       /*
>>        * If memory hotplug is enabled (slots > 0) but without '-numa'
>> @@ -534,8 +536,8 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>>   
>>       memory_region_init(mr, owner, name, ram_size);
>>       for (i = 0; i < ms->numa_state->num_nodes; i++) {
>> -        uint64_t size = numa_info[i].node_mem;
>> -        HostMemoryBackend *backend = numa_info[i].node_memdev;
>> +        uint64_t size = ms->numa_state->nodes[i].node_mem;
>> +        HostMemoryBackend *backend = ms->numa_state->nodes[i].node_memdev;
>>           if (!backend) {
>>               continue;
>>           }
>> @@ -594,6 +596,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
>>   void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms)
>>   {
>>       int i;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
> well, look line below where you care about NULL check and suddenly
> you don't care about it being NULL right above that check.
> 

Thanks. I will correct the mistake here.
>>   
>>       if (ms->numa_state == NULL || ms->numa_state->num_nodes <= 0) {
>>           return;
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook Tao Xu
@ 2019-07-01 10:59   ` Igor Mammedov
  2019-07-02  1:12     ` Tao Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2019-07-01 10:59 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:23 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> Add build_mem_ranges callback to AcpiDeviceIfClass and use
> it for generating SRAT and HMAT numa memory ranges.
> 
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Co-developed-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v5 -> v4:
>     - Add the missing if 'mem_len > 0' in pc_build_mem_ranges() (Igor)
>     - Correct the descriptions of build_mem_ranges
>     in AcpiDeviceIfClass (Igor)
>     - Use GArray for NUMA memory ranges data (Igor)
>     - Add the reason of using stub (Igor)
> ---
>  hw/acpi/piix4.c                      |   1 +
>  hw/i386/acpi-build.c                 | 133 +++++++++++++++++----------
>  hw/isa/lpc_ich9.c                    |   1 +
>  include/hw/acpi/acpi_dev_interface.h |   4 +
>  include/hw/i386/pc.h                 |   1 +
>  include/sysemu/numa.h                |  12 +++
>  stubs/Makefile.objs                  |   1 +
>  stubs/pc_build_mem_ranges.c          |  14 +++
>  8 files changed, 120 insertions(+), 47 deletions(-)
>  create mode 100644 stubs/pc_build_mem_ranges.c
> 
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index ec4e186cec..bc078c1ad7 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -702,6 +702,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
>      adevc->ospm_status = piix4_ospm_status;
>      adevc->send_event = piix4_send_gpe;
>      adevc->madt_cpu = pc_madt_cpu_entry;
> +    adevc->build_mem_ranges = pc_build_mem_ranges;
>  }
>  
>  static const TypeInfo piix4_pm_info = {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 055e677c30..44dd447fa5 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2279,18 +2279,89 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>  #define HOLE_640K_START  (640 * KiB)
>  #define HOLE_640K_END   (1 * MiB)
>  
> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
> +{
> +    uint64_t mem_len, mem_base, next_base;
> +    int i;
> +    PCMachineState *pcms = PC_MACHINE(ms);
> +    NumaState *nstat = ms->numa_state;
> +    NumaMemRange *mem_range;
> +    nstat->mem_ranges_num = 0;
> +    next_base = 0;
> +
> +    /*
> +     * the memory map is a bit tricky, it contains at least one hole
> +     * from 640k-1M and possibly another one from 3.5G-4G.
> +     */
> +
> +    for (i = 0; i < pcms->numa_nodes; ++i) {
> +        mem_base = next_base;
> +        mem_len = pcms->node_mem[i];
> +        next_base = mem_base + mem_len;
> +
> +        /* Cut out the 640K hole */
> +        if (mem_base <= HOLE_640K_START &&
> +            next_base > HOLE_640K_START) {
> +            mem_len -= next_base - HOLE_640K_START;
> +            if (mem_len > 0) {
> +                mem_range = acpi_data_push(nstat->mem_ranges,
> +                                           sizeof *mem_range);
> +                mem_range->base = mem_base;
> +                mem_range->length = mem_len;
> +                mem_range->node = i;
> +                nstat->mem_ranges_num++;
> +            }
> +
> +            /* Check for the rare case: 640K < RAM < 1M */
> +            if (next_base <= HOLE_640K_END) {
> +                next_base = HOLE_640K_END;
> +                continue;
> +            }
> +            mem_base = HOLE_640K_END;
> +            mem_len = next_base - HOLE_640K_END;
> +        }
> +
> +        /* Cut out the ACPI_PCI hole */
> +        if (mem_base <= pcms->below_4g_mem_size &&
> +            next_base > pcms->below_4g_mem_size) {
> +            mem_len -= next_base - pcms->below_4g_mem_size;
> +            if (mem_len > 0) {
> +                mem_range = acpi_data_push(nstat->mem_ranges,
> +                                           sizeof *mem_range);
> +                mem_range->base = mem_base;
> +                mem_range->length = mem_len;
> +                mem_range->node = i;
> +                nstat->mem_ranges_num++;
> +            }
> +            mem_base = 1ULL << 32;
> +            mem_len = next_base - pcms->below_4g_mem_size;
> +            next_base = mem_base + mem_len;
> +        }
> +        if (mem_len > 0) {
> +            mem_range = acpi_data_push(nstat->mem_ranges,
> +                                       sizeof *mem_range);
> +            mem_range->base = mem_base;
> +            mem_range->length = mem_len;
> +            mem_range->node = i;
> +            nstat->mem_ranges_num++;
> +        }
> +    }
> +}
> +
>  static void
>  build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>  {
>      AcpiSystemResourceAffinityTable *srat;
>      AcpiSratMemoryAffinity *numamem;
>  
> -    int i;
> -    int srat_start, numa_start, slots;
> -    uint64_t mem_len, mem_base, next_base;
> +    int i, srat_start, numa_start, slots;
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>      PCMachineState *pcms = PC_MACHINE(machine);
> +    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
> +    AcpiDeviceIf *adev = ACPI_DEVICE_IF(pcms->acpi_dev);
> +    NumaState *nstat = machine->numa_state;
> +    NumaMemRange *mem_range;
>      ram_addr_t hotplugabble_address_space_size =
>          object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
>                                  NULL);
> @@ -2327,57 +2398,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>          }
>      }
>  
> +    if (pcms->numa_nodes && !nstat->mem_ranges_num) {
suggest to drop nstat->mem_ranges_num field and use nstat->mem_ranges->len instead,
also its probably better to initialize nstat->mem_ranges
at the same place where ms->numa_state is initialized, so that
a specific platform code won't need to duplicate it.

> +        nstat->mem_ranges = g_array_new(false, true /* clear */,
> +                                        sizeof *mem_range);
> +        adevc->build_mem_ranges(adev, machine);
> +    }
>  
> -    /* the memory map is a bit tricky, it contains at least one hole
> -     * from 640k-1M and possibly another one from 3.5G-4G.
> -     */
> -    next_base = 0;
>      numa_start = table_data->len;
>  
> -    for (i = 1; i < pcms->numa_nodes + 1; ++i) {
> -        mem_base = next_base;
> -        mem_len = pcms->node_mem[i - 1];
> -        next_base = mem_base + mem_len;
> -
> -        /* Cut out the 640K hole */
> -        if (mem_base <= HOLE_640K_START &&
> -            next_base > HOLE_640K_START) {
> -            mem_len -= next_base - HOLE_640K_START;
> -            if (mem_len > 0) {
> -                numamem = acpi_data_push(table_data, sizeof *numamem);
> -                build_srat_memory(numamem, mem_base, mem_len, i - 1,
> -                                  MEM_AFFINITY_ENABLED);
> -            }
> -
> -            /* Check for the rare case: 640K < RAM < 1M */
> -            if (next_base <= HOLE_640K_END) {
> -                next_base = HOLE_640K_END;
> -                continue;
> -            }
> -            mem_base = HOLE_640K_END;
> -            mem_len = next_base - HOLE_640K_END;
> -        }
> -
> -        /* Cut out the ACPI_PCI hole */
> -        if (mem_base <= pcms->below_4g_mem_size &&
> -            next_base > pcms->below_4g_mem_size) {
> -            mem_len -= next_base - pcms->below_4g_mem_size;
> -            if (mem_len > 0) {
> -                numamem = acpi_data_push(table_data, sizeof *numamem);
> -                build_srat_memory(numamem, mem_base, mem_len, i - 1,
> -                                  MEM_AFFINITY_ENABLED);
> -            }
> -            mem_base = 1ULL << 32;
> -            mem_len = next_base - pcms->below_4g_mem_size;
> -            next_base = mem_base + mem_len;
> -        }
> -
> -        if (mem_len > 0) {
> +    for (i = 0; i < nstat->mem_ranges_num; i++) {
> +        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
> +        if (mem_range->length > 0) {
why do we have this condition,
I'd assume adevc->build_mem_ranges() shouldn't return empty ranges.

>              numamem = acpi_data_push(table_data, sizeof *numamem);
> -            build_srat_memory(numamem, mem_base, mem_len, i - 1,
> +            build_srat_memory(numamem, mem_range->base,
> +                              mem_range->length,
> +                              mem_range->node,
>                                MEM_AFFINITY_ENABLED);
>          }
>      }
> +
>      slots = (table_data->len - numa_start) / sizeof *numamem;
>      for (; slots < pcms->numa_nodes + 2; slots++) {
>          numamem = acpi_data_push(table_data, sizeof *numamem);
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 35d17246e9..20d919c63d 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -801,6 +801,7 @@ static void ich9_lpc_class_init(ObjectClass *klass, void *data)
>      adevc->ospm_status = ich9_pm_ospm_status;
>      adevc->send_event = ich9_send_gpe;
>      adevc->madt_cpu = pc_madt_cpu_entry;
> +    adevc->build_mem_ranges = pc_build_mem_ranges;
>  }
>  
>  static const TypeInfo ich9_lpc_info = {
> diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> index 43ff119179..5956b5ea33 100644
> --- a/include/hw/acpi/acpi_dev_interface.h
> +++ b/include/hw/acpi/acpi_dev_interface.h
> @@ -39,6 +39,8 @@ void acpi_send_event(DeviceState *dev, AcpiEventStatusBits event);
>   *           for CPU indexed by @uid in @apic_ids array,
>   *           returned structure types are:
>   *           0 - Local APIC, 9 - Local x2APIC, 0xB - GICC
> + * build_mem_ranges: build memory ranges of ACPI SRAT (except misc
> + * and hotplug SRAT ranges) and HMAT
>   *
>   * Interface is designed for providing unified interface
>   * to generic ACPI functionality that could be used without
> @@ -54,5 +56,7 @@ typedef struct AcpiDeviceIfClass {
>      void (*send_event)(AcpiDeviceIf *adev, AcpiEventStatusBits ev);
>      void (*madt_cpu)(AcpiDeviceIf *adev, int uid,
>                       const CPUArchIdList *apic_ids, GArray *entry);
> +    void (*build_mem_ranges)(AcpiDeviceIf *adev, MachineState *ms);
> +
>  } AcpiDeviceIfClass;
>  #endif
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 5d5636241e..21b9ac3d11 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -281,6 +281,7 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>  /* acpi-build.c */
>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>                         const CPUArchIdList *apic_ids, GArray *entry);
> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms);
>  
>  /* e820 types */
>  #define E820_RAM        1
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 437eb21fef..e3c85b77bc 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -20,6 +20,12 @@ struct NumaNodeMem {
>      uint64_t node_plugged_mem;
>  };
>  
> +typedef struct NumaMemRange {
> +    uint64_t base;
> +    uint64_t length;
> +    uint32_t node;
> +} NumaMemRange;
> +
>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -29,6 +35,12 @@ struct NumaState {
>  
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
> +
> +    /* Number of NUMA memory ranges */
> +    uint32_t mem_ranges_num;
> +
> +    /* NUMA memory ranges */
> +    GArray *mem_ranges;
>  };
>  typedef struct NumaState NumaState;
>  
> diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
> index 9c7393b08c..4f0cdc1a45 100644
> --- a/stubs/Makefile.objs
> +++ b/stubs/Makefile.objs
> @@ -33,6 +33,7 @@ stub-obj-y += qmp_memory_device.o
>  stub-obj-y += target-monitor-defs.o
>  stub-obj-y += target-get-monitor-def.o
>  stub-obj-y += pc_madt_cpu_entry.o
> +stub-obj-y += pc_build_mem_ranges.o
>  stub-obj-y += vmgenid.o
>  stub-obj-y += xen-common.o
>  stub-obj-y += xen-hvm.o
> diff --git a/stubs/pc_build_mem_ranges.c b/stubs/pc_build_mem_ranges.c
> new file mode 100644
> index 0000000000..997cdfe00b
> --- /dev/null
> +++ b/stubs/pc_build_mem_ranges.c
> @@ -0,0 +1,14 @@
> +/*
> + * Stub for pc_build_mem_ranges().
> + * piix4 is used not only pc, but also mips and etc. In order to add
> + * build_mem_ranges callback to AcpiDeviceIfClass and use pc_build_mem_ranges
> + * in hw/acpi/piix4.c, pc_build_mem_ranges() stub is added to make other arch
> + * can compile successfully.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/i386/pc.h"
> +
> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
> +{
> +}



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT Tao Xu
  2019-06-27 15:56   ` Jonathan Cameron
@ 2019-07-01 11:25   ` Igor Mammedov
  2019-07-02  1:14     ` Tao Xu
  2019-07-02  8:50     ` Tao Xu
  1 sibling, 2 replies; 25+ messages in thread
From: Igor Mammedov @ 2019-07-01 11:25 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:24 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table (HMAT).
> The specification references below link:
> http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> It describes the memory attributes, such as memory side cache
> attributes and bandwidth and latency details, related to the
> System Physical Address (SPA) Memory Ranges. The software is
> expected to use this information as hint for optimization.
> 
> This structure describes the System Physical Address(SPA) range
> occupied by memory subsystem and its associativity with processor
> proximity domain as well as hint for memory usage.
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v5 -> v4:
>     - Add more descriptions from ACPI spec (Igor)
>     - Remove all the dependcy on PCMachineState (Igor)
> ---
>  hw/acpi/Kconfig       |   5 ++
>  hw/acpi/Makefile.objs |   1 +
>  hw/acpi/hmat.c        | 153 ++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/hmat.h        |  43 ++++++++++++
>  hw/core/machine.c     |   2 +
>  hw/i386/acpi-build.c  |   3 +
>  include/sysemu/numa.h |   2 +
>  numa.c                |   6 ++
>  8 files changed, 215 insertions(+)
>  create mode 100644 hw/acpi/hmat.c
>  create mode 100644 hw/acpi/hmat.h
> 
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 7c59cf900b..039bb99efa 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -7,6 +7,7 @@ config ACPI_X86
>      select ACPI_NVDIMM
>      select ACPI_CPU_HOTPLUG
>      select ACPI_MEMORY_HOTPLUG
> +    select ACPI_HMAT
>  
>  config ACPI_X86_ICH
>      bool
> @@ -31,3 +32,7 @@ config ACPI_VMGENID
>      bool
>      default y
>      depends on PC
> +
> +config ACPI_HMAT
> +    bool
> +    depends on ACPI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 661a9b8c2f..20cc2fb124 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> +common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
>  
>  common-obj-y += acpi_interface.o
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> new file mode 100644
> index 0000000000..6fd434c4d9
> --- /dev/null
> +++ b/hw/acpi/hmat.c
> @@ -0,0 +1,153 @@
> +/*
> + * HMAT ACPI Implementation
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table
> + * (HMAT)
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/hmat.h"
> +#include "hw/mem/pc-dimm.h"
> +
> +/* ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure: Table 5-141 */
> +static void build_hmat_spa(GArray *table_data, uint16_t flags,
> +                           uint64_t base, uint64_t length, int node)
> +{
> +
> +    /* Memory Subsystem Address Range Structure */
> +    /* Type */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Length */
> +    build_append_int_noprefix(table_data, 40, 4);
> +    /* Flags */
> +    build_append_int_noprefix(table_data, flags, 2);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 2);
> +    /* Process Proximity Domain */
> +    build_append_int_noprefix(table_data, node, 4);
> +    /* Memory Proximity Domain */
> +    build_append_int_noprefix(table_data, node, 4);
> +    /* Reserved */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /* System Physical Address Range Base */
> +    build_append_int_noprefix(table_data, base, 8);
> +    /* System Physical Address Range Length */
> +    build_append_int_noprefix(table_data, length, 8);
> +}
> +
> +static int pc_dimm_device_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
> +        DeviceState *dev = DEVICE(obj);
> +        if (dev->realized) { /* only realized memory devices matter */
> +            *list = g_slist_append(*list, DEVICE(obj));
> +        }
> +    }
> +
> +    object_child_foreach(obj, pc_dimm_device_list, opaque);
> +    return 0;
> +}
> +
> +/* Build HMAT sub table structures */
> +static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
> +{
> +    GSList *device_list = NULL;
> +    uint16_t flags;
> +    uint64_t mem_base, mem_len;
> +    int i;
> +    NumaState *nstat = ms->numa_state;
> +    NumaMemRange *mem_range;
> +
> +    Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> +    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
> +    AcpiDeviceIf *adev = ACPI_DEVICE_IF(obj);
> +
> +    /*
> +     * ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure:
> +     * Table 5-141. The Proximity Domain of System Physical Address
> +     * ranges defined in the HMAT, NFIT and SRAT tables shall match
> +     * each other.
> +     */
> +    if (nstat->num_nodes && !nstat->mem_ranges_num) {
> +        nstat->mem_ranges = g_array_new(false, true /* clear */,
> +                                        sizeof *mem_range);
> +        adevc->build_mem_ranges(adev, ms);
another place you are tying to initialize nstat->mem_ranges
make initialization in generic numa init code

> +    }
> +
> +    for (i = 0; i < nstat->mem_ranges_num; i++) {
> +        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
> +        flags = 0;
> +
> +        if (nstat->nodes[mem_range->node].is_initiator) {
> +            flags |= HMAT_SPA_PROC_VALID;
> +        }
> +        if (nstat->nodes[mem_range->node].is_target) {
> +            flags |= HMAT_SPA_MEM_VALID;
> +        }
> +
> +        build_hmat_spa(table_data, flags, mem_range->base,
> +                       mem_range->length,
> +                       mem_range->node);
> +    }
> +
> +    /* Build HMAT SPA structures for PC-DIMM devices. */
> +    object_child_foreach(OBJECT(ms), pc_dimm_device_list, &device_list);
> +
> +    for (; device_list; device_list = device_list->next) {
> +        PCDIMMDevice *dimm = device_list->data;
> +        mem_base = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> +                                            NULL);
> +        mem_len = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP,
> +                                           NULL);
> +        i = object_property_get_uint(OBJECT(dimm), PC_DIMM_NODE_PROP, NULL);
> +        flags = 0;
> +
> +        if (nstat->nodes[i].is_initiator) {
> +            flags |= HMAT_SPA_PROC_VALID;
> +        }
> +        if (nstat->nodes[i].is_target) {
> +            flags |= HMAT_SPA_MEM_VALID;
> +        }
> +        build_hmat_spa(table_data, flags, mem_base, mem_len, i);
> +    }
Don't you need to free device_list at this point?

> +}
> +
> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms)
> +{
> +    uint64_t hmat_start;
> +
> +    hmat_start = table_data->len;
> +
> +    /* reserve space for HMAT header  */
> +    acpi_data_push(table_data, 40);
> +
> +    hmat_build_table_structs(table_data, ms);
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + hmat_start),
> +                 "HMAT", table_data->len - hmat_start, 1, NULL, NULL);
> +}
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> new file mode 100644
> index 0000000000..e24b673fad
> --- /dev/null
> +++ b/hw/acpi/hmat.h
> @@ -0,0 +1,43 @@
> +/*
> + * HMAT ACPI Implementation Header
> + *
> + * Copyright(C) 2019 Intel Corporation.
> + *
> + * Author:
> + *  Liu jingqi <jingqi.liu@linux.intel.com>
> + *  Tao Xu <tao3.xu@intel.com>
> + *
> + * HMAT is defined in ACPI 6.2.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#ifndef HMAT_H
> +#define HMAT_H
> +
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/acpi/aml-build.h"
> +
> +/* the values of AcpiHmatSpaRange flag */
> +enum {
> +    HMAT_SPA_PROC_VALID       = 0x1,
> +    HMAT_SPA_MEM_VALID        = 0x2,
> +    HMAT_SPA_RESERVATION_HINT = 0x4,
> +};
> +
> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms);
> +
> +#endif
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 14b29de0a9..2ad09ec23e 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -646,6 +646,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>                                 const CpuInstanceProperties *props, Error **errp)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>      bool match = false;
>      int i;
>  
> @@ -706,6 +707,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>          match = true;
>          slot->props.node_id = props->node_id;
>          slot->props.has_node_id = props->has_node_id;
> +        numa_info[props->node_id].is_initiator = true;
>      }
>  
>      if (!match) {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 44dd447fa5..6584eac76e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -66,6 +66,7 @@
>  #include "hw/i386/intel_iommu.h"
>  
>  #include "hw/acpi/ipmi.h"
> +#include "hw/acpi/hmat.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>              acpi_add_table(table_offsets, tables_blob);
>              build_slit(tables_blob, tables->linker, machine);
>          }
> +        acpi_add_table(table_offsets, tables_blob);
> +        build_hmat(tables_blob, tables->linker, machine);
I'm not sure if we should add it unconditionally.
Is this table used in any meaningful manner by guest when
it's incomplete (i.e. populated only with SPA records)?

>      }
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index e3c85b77bc..13cff59112 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -10,6 +10,8 @@ struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
> +    bool is_initiator;
> +    bool is_target;
>      uint8_t distance[MAX_NODES];
>  };
>  
> diff --git a/numa.c b/numa.c
> index d23e130bce..5556d118c3 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -102,6 +102,10 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          }
>      }
>  
> +    if (node->cpus) {
> +        numa_info[nodenr].is_initiator = true;
> +    }
> +
>      if (node->has_mem && node->has_memdev) {
>          error_setg(errp, "cannot specify both mem= and memdev=");
>          return;
> @@ -118,6 +122,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>  
>      if (node->has_mem) {
>          numa_info[nodenr].node_mem = node->mem;
> +        numa_info[nodenr].is_target = true;
>      }
>      if (node->has_memdev) {
>          Object *o;
> @@ -130,6 +135,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          object_ref(o);
>          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> +        numa_info[nodenr].is_target = true;
>      }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (7 preceding siblings ...)
  2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 8/8] numa: Extend the command-line to provide memory latency and bandwidth information Tao Xu
@ 2019-07-01 13:37 ` Igor Mammedov
  2019-07-02  0:44   ` Tao Xu
  8 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2019-07-01 13:37 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On Fri, 14 Jun 2019 23:56:18 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
> according to the command line. The ACPI HMAT describes the memory attributes,
> such as memory side cache attributes and bandwidth and latency details,
> related to the System Physical Address (SPA) Memory Ranges.
> The software is expected to use this information as hint for optimization.

in addition to patches in this series. pls consider adding testcase for ACPI table
as the last patch. Look at tests/bios-tables-test.c for examples.


> The V4 patches link:
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01644.html
> 
> Changelog:
> v5:
>     - spilt the 1-6/11 and 8/11 from patch v4 to build Memory Subsystem
>     Address Range Structure(s) and System Locality Latency and Bandwidth
>     Information Structure(s) firstly.
>     - add 1/8 of patch v5 to simplify arm_load_dtb() (Igor)
>     - drop the helper machine_num_numa_nodes() and use
>     machine->numa_state->num_nodes (and numa_state->nodes) directly (Igor)
>     - Add more descriptions from ACPI spec (Igor)
>     - Add the reason of using stub (Igor)
>     - Use GArray for NUMA memory ranges data (Igor)
>     - Separate hmat_build_lb() (Igor)
>     - Drop all global variables and use local variables instead (Igor)
>     - Add error message when base unit < 10
>     - Update the hmat-lb option example by using '-numa cpu'
>     and '-numa memdev' (Igor)
> 
> v4:
>     - send the patch of "move numa global variables into MachineState"
>     together with HMAT patches.
>     https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03662.html
>     - spilt the 1/8 of v3 patch into two patches, 4/11 introduces
>     build_mem_ranges() and 5/11 builds HMAT (Igor)
>     - use build_append_int_noprefix() to build parts of ACPI table in
>     all patches (Igor)
>     - Split 8/8 of patch v3 into two parts, 10/11 introduces NFIT
>     generalizations (build_acpi_aml_common), and use it in 11/11 to
>     simplify hmat_build_aml (Igor)
>     - use MachineState instead of PCMachineState to build HMAT more
>     generalic (Igor)
>     - move the 7/8 v3 patch into the former patches
>     - update the version tag from 4.0 to 4.1
> v3:
>     - rebase the fixing patch into the jingqi's patches (Eric)
>     - update the version tag from 3.10 to 4.0 (Eric)
> v2:
>   Per Igor and Eric's comments, fix some coding style and small issues:
>     - update the version number in qapi/misc.json
>     - including the expansion of the acronym HMAT in qapi/misc.json
>     - correct spell mistakes in qapi/misc.json and qemu-options.hx
>     - fix the comment syle in hw/i386/acpi-build.c
>     and hw/acpi/hmat.h
>    - remove some unnecessary head files in hw/acpi/hmat.c 
>    - use hardcoded numbers from spec to generate
>    Memory Subsystem Address Range Structure in hw/acpi/hmat.c
>    - drop the struct AcpiHmat and AcpiHmatSpaRange
>     in hw/acpi/hmat.h
>    - rewrite NFIT code to build _HMA method
> 
> Liu Jingqi (3):
>   hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI
>     HMAT
>   hmat acpi: Build System Locality Latency and Bandwidth Information
>     Structure(s) in ACPI HMAT
>   numa: Extend the command-line to provide memory latency and bandwidth
>     information
> 
> Tao Xu (5):
>   hw/arm: simplify arm_load_dtb
>   numa: move numa global variable nb_numa_nodes into MachineState
>   numa: move numa global variable have_numa_distance into MachineState
>   numa: move numa global variable numa_info into MachineState
>   acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
> 
>  exec.c                               |   5 +-
>  hw/acpi/Kconfig                      |   5 +
>  hw/acpi/Makefile.objs                |   1 +
>  hw/acpi/aml-build.c                  |   9 +-
>  hw/acpi/hmat.c                       | 252 +++++++++++++++++++++++++++
>  hw/acpi/hmat.h                       |  82 +++++++++
>  hw/acpi/piix4.c                      |   1 +
>  hw/arm/aspeed.c                      |   5 +-
>  hw/arm/boot.c                        |  20 ++-
>  hw/arm/collie.c                      |   8 +-
>  hw/arm/cubieboard.c                  |   5 +-
>  hw/arm/exynos4_boards.c              |   7 +-
>  hw/arm/highbank.c                    |   8 +-
>  hw/arm/imx25_pdk.c                   |   5 +-
>  hw/arm/integratorcp.c                |   8 +-
>  hw/arm/kzm.c                         |   5 +-
>  hw/arm/mainstone.c                   |   5 +-
>  hw/arm/mcimx6ul-evk.c                |   5 +-
>  hw/arm/mcimx7d-sabre.c               |   5 +-
>  hw/arm/musicpal.c                    |   8 +-
>  hw/arm/nseries.c                     |   5 +-
>  hw/arm/omap_sx1.c                    |   5 +-
>  hw/arm/palm.c                        |  10 +-
>  hw/arm/raspi.c                       |   6 +-
>  hw/arm/realview.c                    |   5 +-
>  hw/arm/sabrelite.c                   |   5 +-
>  hw/arm/spitz.c                       |   5 +-
>  hw/arm/tosa.c                        |   8 +-
>  hw/arm/versatilepb.c                 |   5 +-
>  hw/arm/vexpress.c                    |   5 +-
>  hw/arm/virt-acpi-build.c             |  17 +-
>  hw/arm/virt.c                        |  16 +-
>  hw/arm/xilinx_zynq.c                 |   8 +-
>  hw/arm/xlnx-versal-virt.c            |   7 +-
>  hw/arm/xlnx-zcu102.c                 |   5 +-
>  hw/arm/z2.c                          |   8 +-
>  hw/core/machine.c                    |  16 +-
>  hw/i386/acpi-build.c                 | 140 +++++++++------
>  hw/i386/pc.c                         |  11 +-
>  hw/isa/lpc_ich9.c                    |   1 +
>  hw/mem/pc-dimm.c                     |   2 +
>  hw/pci-bridge/pci_expander_bridge.c  |   2 +
>  hw/ppc/spapr.c                       |  23 ++-
>  hw/ppc/spapr_pci.c                   |   2 +
>  include/hw/acpi/acpi_dev_interface.h |   4 +
>  include/hw/acpi/aml-build.h          |   2 +-
>  include/hw/arm/boot.h                |   4 +-
>  include/hw/boards.h                  |   2 +
>  include/hw/i386/pc.h                 |   1 +
>  include/qemu/typedefs.h              |   1 +
>  include/sysemu/numa.h                |  37 +++-
>  include/sysemu/sysemu.h              |  24 +++
>  monitor.c                            |  11 +-
>  numa.c                               | 219 +++++++++++++++++++----
>  qapi/misc.json                       |  94 +++++++++-
>  qemu-options.hx                      |  45 ++++-
>  stubs/Makefile.objs                  |   1 +
>  stubs/pc_build_mem_ranges.c          |  14 ++
>  58 files changed, 961 insertions(+), 264 deletions(-)
>  create mode 100644 hw/acpi/hmat.c
>  create mode 100644 hw/acpi/hmat.h
>  create mode 100644 stubs/pc_build_mem_ranges.c
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-07-01 13:37 ` [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Igor Mammedov
@ 2019-07-02  0:44   ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-02  0:44 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On 7/1/2019 9:37 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:18 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
>> according to the command line. The ACPI HMAT describes the memory attributes,
>> such as memory side cache attributes and bandwidth and latency details,
>> related to the System Physical Address (SPA) Memory Ranges.
>> The software is expected to use this information as hint for optimization.
> 
> in addition to patches in this series. pls consider adding testcase for ACPI table
> as the last patch. Look at tests/bios-tables-test.c for examples.
> 
> 
OK, I will add it.

>> The V4 patches link:
>> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01644.html
>>
>> Changelog:
>> v5:
>>      - spilt the 1-6/11 and 8/11 from patch v4 to build Memory Subsystem
>>      Address Range Structure(s) and System Locality Latency and Bandwidth
>>      Information Structure(s) firstly.
>>      - add 1/8 of patch v5 to simplify arm_load_dtb() (Igor)
>>      - drop the helper machine_num_numa_nodes() and use
>>      machine->numa_state->num_nodes (and numa_state->nodes) directly (Igor)
>>      - Add more descriptions from ACPI spec (Igor)
>>      - Add the reason of using stub (Igor)
>>      - Use GArray for NUMA memory ranges data (Igor)
>>      - Separate hmat_build_lb() (Igor)
>>      - Drop all global variables and use local variables instead (Igor)
>>      - Add error message when base unit < 10
>>      - Update the hmat-lb option example by using '-numa cpu'
>>      and '-numa memdev' (Igor)
>>
>> v4:
>>      - send the patch of "move numa global variables into MachineState"
>>      together with HMAT patches.
>>      https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03662.html
>>      - spilt the 1/8 of v3 patch into two patches, 4/11 introduces
>>      build_mem_ranges() and 5/11 builds HMAT (Igor)
>>      - use build_append_int_noprefix() to build parts of ACPI table in
>>      all patches (Igor)
>>      - Split 8/8 of patch v3 into two parts, 10/11 introduces NFIT
>>      generalizations (build_acpi_aml_common), and use it in 11/11 to
>>      simplify hmat_build_aml (Igor)
>>      - use MachineState instead of PCMachineState to build HMAT more
>>      generalic (Igor)
>>      - move the 7/8 v3 patch into the former patches
>>      - update the version tag from 4.0 to 4.1
>> v3:
>>      - rebase the fixing patch into the jingqi's patches (Eric)
>>      - update the version tag from 3.10 to 4.0 (Eric)
>> v2:
>>    Per Igor and Eric's comments, fix some coding style and small issues:
>>      - update the version number in qapi/misc.json
>>      - including the expansion of the acronym HMAT in qapi/misc.json
>>      - correct spell mistakes in qapi/misc.json and qemu-options.hx
>>      - fix the comment syle in hw/i386/acpi-build.c
>>      and hw/acpi/hmat.h
>>     - remove some unnecessary head files in hw/acpi/hmat.c
>>     - use hardcoded numbers from spec to generate
>>     Memory Subsystem Address Range Structure in hw/acpi/hmat.c
>>     - drop the struct AcpiHmat and AcpiHmatSpaRange
>>      in hw/acpi/hmat.h
>>     - rewrite NFIT code to build _HMA method
>>
>> Liu Jingqi (3):
>>    hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI
>>      HMAT
>>    hmat acpi: Build System Locality Latency and Bandwidth Information
>>      Structure(s) in ACPI HMAT
>>    numa: Extend the command-line to provide memory latency and bandwidth
>>      information
>>
>> Tao Xu (5):
>>    hw/arm: simplify arm_load_dtb
>>    numa: move numa global variable nb_numa_nodes into MachineState
>>    numa: move numa global variable have_numa_distance into MachineState
>>    numa: move numa global variable numa_info into MachineState
>>    acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
>>
>>   exec.c                               |   5 +-
>>   hw/acpi/Kconfig                      |   5 +
>>   hw/acpi/Makefile.objs                |   1 +
>>   hw/acpi/aml-build.c                  |   9 +-
>>   hw/acpi/hmat.c                       | 252 +++++++++++++++++++++++++++
>>   hw/acpi/hmat.h                       |  82 +++++++++
>>   hw/acpi/piix4.c                      |   1 +
>>   hw/arm/aspeed.c                      |   5 +-
>>   hw/arm/boot.c                        |  20 ++-
>>   hw/arm/collie.c                      |   8 +-
>>   hw/arm/cubieboard.c                  |   5 +-
>>   hw/arm/exynos4_boards.c              |   7 +-
>>   hw/arm/highbank.c                    |   8 +-
>>   hw/arm/imx25_pdk.c                   |   5 +-
>>   hw/arm/integratorcp.c                |   8 +-
>>   hw/arm/kzm.c                         |   5 +-
>>   hw/arm/mainstone.c                   |   5 +-
>>   hw/arm/mcimx6ul-evk.c                |   5 +-
>>   hw/arm/mcimx7d-sabre.c               |   5 +-
>>   hw/arm/musicpal.c                    |   8 +-
>>   hw/arm/nseries.c                     |   5 +-
>>   hw/arm/omap_sx1.c                    |   5 +-
>>   hw/arm/palm.c                        |  10 +-
>>   hw/arm/raspi.c                       |   6 +-
>>   hw/arm/realview.c                    |   5 +-
>>   hw/arm/sabrelite.c                   |   5 +-
>>   hw/arm/spitz.c                       |   5 +-
>>   hw/arm/tosa.c                        |   8 +-
>>   hw/arm/versatilepb.c                 |   5 +-
>>   hw/arm/vexpress.c                    |   5 +-
>>   hw/arm/virt-acpi-build.c             |  17 +-
>>   hw/arm/virt.c                        |  16 +-
>>   hw/arm/xilinx_zynq.c                 |   8 +-
>>   hw/arm/xlnx-versal-virt.c            |   7 +-
>>   hw/arm/xlnx-zcu102.c                 |   5 +-
>>   hw/arm/z2.c                          |   8 +-
>>   hw/core/machine.c                    |  16 +-
>>   hw/i386/acpi-build.c                 | 140 +++++++++------
>>   hw/i386/pc.c                         |  11 +-
>>   hw/isa/lpc_ich9.c                    |   1 +
>>   hw/mem/pc-dimm.c                     |   2 +
>>   hw/pci-bridge/pci_expander_bridge.c  |   2 +
>>   hw/ppc/spapr.c                       |  23 ++-
>>   hw/ppc/spapr_pci.c                   |   2 +
>>   include/hw/acpi/acpi_dev_interface.h |   4 +
>>   include/hw/acpi/aml-build.h          |   2 +-
>>   include/hw/arm/boot.h                |   4 +-
>>   include/hw/boards.h                  |   2 +
>>   include/hw/i386/pc.h                 |   1 +
>>   include/qemu/typedefs.h              |   1 +
>>   include/sysemu/numa.h                |  37 +++-
>>   include/sysemu/sysemu.h              |  24 +++
>>   monitor.c                            |  11 +-
>>   numa.c                               | 219 +++++++++++++++++++----
>>   qapi/misc.json                       |  94 +++++++++-
>>   qemu-options.hx                      |  45 ++++-
>>   stubs/Makefile.objs                  |   1 +
>>   stubs/pc_build_mem_ranges.c          |  14 ++
>>   58 files changed, 961 insertions(+), 264 deletions(-)
>>   create mode 100644 hw/acpi/hmat.c
>>   create mode 100644 hw/acpi/hmat.h
>>   create mode 100644 stubs/pc_build_mem_ranges.c
>>
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
  2019-07-01 10:59   ` Igor Mammedov
@ 2019-07-02  1:12     ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-02  1:12 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Liu, Jingqi, Du, Fan, ehabkost, qemu-devel

On 7/1/2019 6:59 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:23 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> Add build_mem_ranges callback to AcpiDeviceIfClass and use
>> it for generating SRAT and HMAT numa memory ranges.
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Co-developed-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v5 -> v4:
>>      - Add the missing if 'mem_len > 0' in pc_build_mem_ranges() (Igor)
>>      - Correct the descriptions of build_mem_ranges
>>      in AcpiDeviceIfClass (Igor)
>>      - Use GArray for NUMA memory ranges data (Igor)
>>      - Add the reason of using stub (Igor)
>> ---
>>   hw/acpi/piix4.c                      |   1 +
>>   hw/i386/acpi-build.c                 | 133 +++++++++++++++++----------
>>   hw/isa/lpc_ich9.c                    |   1 +
>>   include/hw/acpi/acpi_dev_interface.h |   4 +
>>   include/hw/i386/pc.h                 |   1 +
>>   include/sysemu/numa.h                |  12 +++
>>   stubs/Makefile.objs                  |   1 +
>>   stubs/pc_build_mem_ranges.c          |  14 +++
>>   8 files changed, 120 insertions(+), 47 deletions(-)
>>   create mode 100644 stubs/pc_build_mem_ranges.c
>>
>> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
>> index ec4e186cec..bc078c1ad7 100644
>> --- a/hw/acpi/piix4.c
>> +++ b/hw/acpi/piix4.c
>> @@ -702,6 +702,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
>>       adevc->ospm_status = piix4_ospm_status;
>>       adevc->send_event = piix4_send_gpe;
>>       adevc->madt_cpu = pc_madt_cpu_entry;
>> +    adevc->build_mem_ranges = pc_build_mem_ranges;
>>   }
>>   
>>   static const TypeInfo piix4_pm_info = {
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 055e677c30..44dd447fa5 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2279,18 +2279,89 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>>   #define HOLE_640K_START  (640 * KiB)
>>   #define HOLE_640K_END   (1 * MiB)
>>   
>> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
>> +{
>> +    uint64_t mem_len, mem_base, next_base;
>> +    int i;
>> +    PCMachineState *pcms = PC_MACHINE(ms);
>> +    NumaState *nstat = ms->numa_state;
>> +    NumaMemRange *mem_range;
>> +    nstat->mem_ranges_num = 0;
>> +    next_base = 0;
>> +
>> +    /*
>> +     * the memory map is a bit tricky, it contains at least one hole
>> +     * from 640k-1M and possibly another one from 3.5G-4G.
>> +     */
>> +
>> +    for (i = 0; i < pcms->numa_nodes; ++i) {
>> +        mem_base = next_base;
>> +        mem_len = pcms->node_mem[i];
>> +        next_base = mem_base + mem_len;
>> +
>> +        /* Cut out the 640K hole */
>> +        if (mem_base <= HOLE_640K_START &&
>> +            next_base > HOLE_640K_START) {
>> +            mem_len -= next_base - HOLE_640K_START;
>> +            if (mem_len > 0) {
>> +                mem_range = acpi_data_push(nstat->mem_ranges,
>> +                                           sizeof *mem_range);
>> +                mem_range->base = mem_base;
>> +                mem_range->length = mem_len;
>> +                mem_range->node = i;
>> +                nstat->mem_ranges_num++;
>> +            }
>> +
>> +            /* Check for the rare case: 640K < RAM < 1M */
>> +            if (next_base <= HOLE_640K_END) {
>> +                next_base = HOLE_640K_END;
>> +                continue;
>> +            }
>> +            mem_base = HOLE_640K_END;
>> +            mem_len = next_base - HOLE_640K_END;
>> +        }
>> +
>> +        /* Cut out the ACPI_PCI hole */
>> +        if (mem_base <= pcms->below_4g_mem_size &&
>> +            next_base > pcms->below_4g_mem_size) {
>> +            mem_len -= next_base - pcms->below_4g_mem_size;
>> +            if (mem_len > 0) {
>> +                mem_range = acpi_data_push(nstat->mem_ranges,
>> +                                           sizeof *mem_range);
>> +                mem_range->base = mem_base;
>> +                mem_range->length = mem_len;
>> +                mem_range->node = i;
>> +                nstat->mem_ranges_num++;
>> +            }
>> +            mem_base = 1ULL << 32;
>> +            mem_len = next_base - pcms->below_4g_mem_size;
>> +            next_base = mem_base + mem_len;
>> +        }
>> +        if (mem_len > 0) {
>> +            mem_range = acpi_data_push(nstat->mem_ranges,
>> +                                       sizeof *mem_range);
>> +            mem_range->base = mem_base;
>> +            mem_range->length = mem_len;
>> +            mem_range->node = i;
>> +            nstat->mem_ranges_num++;
>> +        }
>> +    }
>> +}
>> +
>>   static void
>>   build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>>   {
>>       AcpiSystemResourceAffinityTable *srat;
>>       AcpiSratMemoryAffinity *numamem;
>>   
>> -    int i;
>> -    int srat_start, numa_start, slots;
>> -    uint64_t mem_len, mem_base, next_base;
>> +    int i, srat_start, numa_start, slots;
>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>       const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>>       PCMachineState *pcms = PC_MACHINE(machine);
>> +    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
>> +    AcpiDeviceIf *adev = ACPI_DEVICE_IF(pcms->acpi_dev);
>> +    NumaState *nstat = machine->numa_state;
>> +    NumaMemRange *mem_range;
>>       ram_addr_t hotplugabble_address_space_size =
>>           object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
>>                                   NULL);
>> @@ -2327,57 +2398,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>>           }
>>       }
>>   
>> +    if (pcms->numa_nodes && !nstat->mem_ranges_num) {
> suggest to drop nstat->mem_ranges_num field and use nstat->mem_ranges->len instead,
> also its probably better to initialize nstat->mem_ranges
> at the same place where ms->numa_state is initialized, so that
> a specific platform code won't need to duplicate it.

OK, I will improve it.
> 
>> +        nstat->mem_ranges = g_array_new(false, true /* clear */,
>> +                                        sizeof *mem_range);
>> +        adevc->build_mem_ranges(adev, machine);
>> +    }
>>   
>> -    /* the memory map is a bit tricky, it contains at least one hole
>> -     * from 640k-1M and possibly another one from 3.5G-4G.
>> -     */
>> -    next_base = 0;
>>       numa_start = table_data->len;
>>   
>> -    for (i = 1; i < pcms->numa_nodes + 1; ++i) {
>> -        mem_base = next_base;
>> -        mem_len = pcms->node_mem[i - 1];
>> -        next_base = mem_base + mem_len;
>> -
>> -        /* Cut out the 640K hole */
>> -        if (mem_base <= HOLE_640K_START &&
>> -            next_base > HOLE_640K_START) {
>> -            mem_len -= next_base - HOLE_640K_START;
>> -            if (mem_len > 0) {
>> -                numamem = acpi_data_push(table_data, sizeof *numamem);
>> -                build_srat_memory(numamem, mem_base, mem_len, i - 1,
>> -                                  MEM_AFFINITY_ENABLED);
>> -            }
>> -
>> -            /* Check for the rare case: 640K < RAM < 1M */
>> -            if (next_base <= HOLE_640K_END) {
>> -                next_base = HOLE_640K_END;
>> -                continue;
>> -            }
>> -            mem_base = HOLE_640K_END;
>> -            mem_len = next_base - HOLE_640K_END;
>> -        }
>> -
>> -        /* Cut out the ACPI_PCI hole */
>> -        if (mem_base <= pcms->below_4g_mem_size &&
>> -            next_base > pcms->below_4g_mem_size) {
>> -            mem_len -= next_base - pcms->below_4g_mem_size;
>> -            if (mem_len > 0) {
>> -                numamem = acpi_data_push(table_data, sizeof *numamem);
>> -                build_srat_memory(numamem, mem_base, mem_len, i - 1,
>> -                                  MEM_AFFINITY_ENABLED);
>> -            }
>> -            mem_base = 1ULL << 32;
>> -            mem_len = next_base - pcms->below_4g_mem_size;
>> -            next_base = mem_base + mem_len;
>> -        }
>> -
>> -        if (mem_len > 0) {
>> +    for (i = 0; i < nstat->mem_ranges_num; i++) {
>> +        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
>> +        if (mem_range->length > 0) {
> why do we have this condition,
> I'd assume adevc->build_mem_ranges() shouldn't return empty ranges.
> 
OK I will drop this condition.

>>               numamem = acpi_data_push(table_data, sizeof *numamem);
>> -            build_srat_memory(numamem, mem_base, mem_len, i - 1,
>> +            build_srat_memory(numamem, mem_range->base,
>> +                              mem_range->length,
>> +                              mem_range->node,
>>                                 MEM_AFFINITY_ENABLED);
>>           }
>>       }
>> +
>>       slots = (table_data->len - numa_start) / sizeof *numamem;
>>       for (; slots < pcms->numa_nodes + 2; slots++) {
>>           numamem = acpi_data_push(table_data, sizeof *numamem);
>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>> index 35d17246e9..20d919c63d 100644
>> --- a/hw/isa/lpc_ich9.c
>> +++ b/hw/isa/lpc_ich9.c
>> @@ -801,6 +801,7 @@ static void ich9_lpc_class_init(ObjectClass *klass, void *data)
>>       adevc->ospm_status = ich9_pm_ospm_status;
>>       adevc->send_event = ich9_send_gpe;
>>       adevc->madt_cpu = pc_madt_cpu_entry;
>> +    adevc->build_mem_ranges = pc_build_mem_ranges;
>>   }
>>   
>>   static const TypeInfo ich9_lpc_info = {
>> diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
>> index 43ff119179..5956b5ea33 100644
>> --- a/include/hw/acpi/acpi_dev_interface.h
>> +++ b/include/hw/acpi/acpi_dev_interface.h
>> @@ -39,6 +39,8 @@ void acpi_send_event(DeviceState *dev, AcpiEventStatusBits event);
>>    *           for CPU indexed by @uid in @apic_ids array,
>>    *           returned structure types are:
>>    *           0 - Local APIC, 9 - Local x2APIC, 0xB - GICC
>> + * build_mem_ranges: build memory ranges of ACPI SRAT (except misc
>> + * and hotplug SRAT ranges) and HMAT
>>    *
>>    * Interface is designed for providing unified interface
>>    * to generic ACPI functionality that could be used without
>> @@ -54,5 +56,7 @@ typedef struct AcpiDeviceIfClass {
>>       void (*send_event)(AcpiDeviceIf *adev, AcpiEventStatusBits ev);
>>       void (*madt_cpu)(AcpiDeviceIf *adev, int uid,
>>                        const CPUArchIdList *apic_ids, GArray *entry);
>> +    void (*build_mem_ranges)(AcpiDeviceIf *adev, MachineState *ms);
>> +
>>   } AcpiDeviceIfClass;
>>   #endif
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 5d5636241e..21b9ac3d11 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -281,6 +281,7 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>>   /* acpi-build.c */
>>   void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>>                          const CPUArchIdList *apic_ids, GArray *entry);
>> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms);
>>   
>>   /* e820 types */
>>   #define E820_RAM        1
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 437eb21fef..e3c85b77bc 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -20,6 +20,12 @@ struct NumaNodeMem {
>>       uint64_t node_plugged_mem;
>>   };
>>   
>> +typedef struct NumaMemRange {
>> +    uint64_t base;
>> +    uint64_t length;
>> +    uint32_t node;
>> +} NumaMemRange;
>> +
>>   struct NumaState {
>>       /* Number of NUMA nodes */
>>       int num_nodes;
>> @@ -29,6 +35,12 @@ struct NumaState {
>>   
>>       /* NUMA nodes information */
>>       NodeInfo nodes[MAX_NODES];
>> +
>> +    /* Number of NUMA memory ranges */
>> +    uint32_t mem_ranges_num;
>> +
>> +    /* NUMA memory ranges */
>> +    GArray *mem_ranges;
>>   };
>>   typedef struct NumaState NumaState;
>>   
>> diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
>> index 9c7393b08c..4f0cdc1a45 100644
>> --- a/stubs/Makefile.objs
>> +++ b/stubs/Makefile.objs
>> @@ -33,6 +33,7 @@ stub-obj-y += qmp_memory_device.o
>>   stub-obj-y += target-monitor-defs.o
>>   stub-obj-y += target-get-monitor-def.o
>>   stub-obj-y += pc_madt_cpu_entry.o
>> +stub-obj-y += pc_build_mem_ranges.o
>>   stub-obj-y += vmgenid.o
>>   stub-obj-y += xen-common.o
>>   stub-obj-y += xen-hvm.o
>> diff --git a/stubs/pc_build_mem_ranges.c b/stubs/pc_build_mem_ranges.c
>> new file mode 100644
>> index 0000000000..997cdfe00b
>> --- /dev/null
>> +++ b/stubs/pc_build_mem_ranges.c
>> @@ -0,0 +1,14 @@
>> +/*
>> + * Stub for pc_build_mem_ranges().
>> + * piix4 is used not only pc, but also mips and etc. In order to add
>> + * build_mem_ranges callback to AcpiDeviceIfClass and use pc_build_mem_ranges
>> + * in hw/acpi/piix4.c, pc_build_mem_ranges() stub is added to make other arch
>> + * can compile successfully.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "hw/i386/pc.h"
>> +
>> +void pc_build_mem_ranges(AcpiDeviceIf *adev, MachineState *ms)
>> +{
>> +}
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-07-01 11:25   ` Igor Mammedov
@ 2019-07-02  1:14     ` Tao Xu
  2019-07-02  8:50     ` Tao Xu
  1 sibling, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-02  1:14 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: jingqi.liu, fan.du, ehabkost, qemu-devel

On 7/1/2019 7:25 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:24 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table (HMAT).
>> The specification references below link:
>> http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
>>
>> It describes the memory attributes, such as memory side cache
>> attributes and bandwidth and latency details, related to the
>> System Physical Address (SPA) Memory Ranges. The software is
>> expected to use this information as hint for optimization.
>>
>> This structure describes the System Physical Address(SPA) range
>> occupied by memory subsystem and its associativity with processor
>> proximity domain as well as hint for memory usage.
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v5 -> v4:
>>      - Add more descriptions from ACPI spec (Igor)
>>      - Remove all the dependcy on PCMachineState (Igor)
>> ---
>>   hw/acpi/Kconfig       |   5 ++
>>   hw/acpi/Makefile.objs |   1 +
>>   hw/acpi/hmat.c        | 153 ++++++++++++++++++++++++++++++++++++++++++
>>   hw/acpi/hmat.h        |  43 ++++++++++++
>>   hw/core/machine.c     |   2 +
>>   hw/i386/acpi-build.c  |   3 +
>>   include/sysemu/numa.h |   2 +
>>   numa.c                |   6 ++
>>   8 files changed, 215 insertions(+)
>>   create mode 100644 hw/acpi/hmat.c
>>   create mode 100644 hw/acpi/hmat.h
>>
>> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
>> index 7c59cf900b..039bb99efa 100644
>> --- a/hw/acpi/Kconfig
>> +++ b/hw/acpi/Kconfig
>> @@ -7,6 +7,7 @@ config ACPI_X86
>>       select ACPI_NVDIMM
>>       select ACPI_CPU_HOTPLUG
>>       select ACPI_MEMORY_HOTPLUG
>> +    select ACPI_HMAT
>>   
>>   config ACPI_X86_ICH
>>       bool
>> @@ -31,3 +32,7 @@ config ACPI_VMGENID
>>       bool
>>       default y
>>       depends on PC
>> +
>> +config ACPI_HMAT
>> +    bool
>> +    depends on ACPI
>> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
>> index 661a9b8c2f..20cc2fb124 100644
>> --- a/hw/acpi/Makefile.objs
>> +++ b/hw/acpi/Makefile.objs
>> @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>>   common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>>   common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
>>   common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>> +common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
>>   common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
>>   
>>   common-obj-y += acpi_interface.o
>> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
>> new file mode 100644
>> index 0000000000..6fd434c4d9
>> --- /dev/null
>> +++ b/hw/acpi/hmat.c
>> @@ -0,0 +1,153 @@
>> +/*
>> + * HMAT ACPI Implementation
>> + *
>> + * Copyright(C) 2019 Intel Corporation.
>> + *
>> + * Author:
>> + *  Liu jingqi <jingqi.liu@linux.intel.com>
>> + *  Tao Xu <tao3.xu@intel.com>
>> + *
>> + * HMAT is defined in ACPI 6.2: 5.2.27 Heterogeneous Memory Attribute Table
>> + * (HMAT)
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "sysemu/numa.h"
>> +#include "hw/acpi/hmat.h"
>> +#include "hw/mem/pc-dimm.h"
>> +
>> +/* ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure: Table 5-141 */
>> +static void build_hmat_spa(GArray *table_data, uint16_t flags,
>> +                           uint64_t base, uint64_t length, int node)
>> +{
>> +
>> +    /* Memory Subsystem Address Range Structure */
>> +    /* Type */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Length */
>> +    build_append_int_noprefix(table_data, 40, 4);
>> +    /* Flags */
>> +    build_append_int_noprefix(table_data, flags, 2);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 2);
>> +    /* Process Proximity Domain */
>> +    build_append_int_noprefix(table_data, node, 4);
>> +    /* Memory Proximity Domain */
>> +    build_append_int_noprefix(table_data, node, 4);
>> +    /* Reserved */
>> +    build_append_int_noprefix(table_data, 0, 4);
>> +    /* System Physical Address Range Base */
>> +    build_append_int_noprefix(table_data, base, 8);
>> +    /* System Physical Address Range Length */
>> +    build_append_int_noprefix(table_data, length, 8);
>> +}
>> +
>> +static int pc_dimm_device_list(Object *obj, void *opaque)
>> +{
>> +    GSList **list = opaque;
>> +
>> +    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
>> +        DeviceState *dev = DEVICE(obj);
>> +        if (dev->realized) { /* only realized memory devices matter */
>> +            *list = g_slist_append(*list, DEVICE(obj));
>> +        }
>> +    }
>> +
>> +    object_child_foreach(obj, pc_dimm_device_list, opaque);
>> +    return 0;
>> +}
>> +
>> +/* Build HMAT sub table structures */
>> +static void hmat_build_table_structs(GArray *table_data, MachineState *ms)
>> +{
>> +    GSList *device_list = NULL;
>> +    uint16_t flags;
>> +    uint64_t mem_base, mem_len;
>> +    int i;
>> +    NumaState *nstat = ms->numa_state;
>> +    NumaMemRange *mem_range;
>> +
>> +    Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
>> +    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
>> +    AcpiDeviceIf *adev = ACPI_DEVICE_IF(obj);
>> +
>> +    /*
>> +     * ACPI 6.2: 5.2.27.3 Memory Subsystem Address Range Structure:
>> +     * Table 5-141. The Proximity Domain of System Physical Address
>> +     * ranges defined in the HMAT, NFIT and SRAT tables shall match
>> +     * each other.
>> +     */
>> +    if (nstat->num_nodes && !nstat->mem_ranges_num) {
>> +        nstat->mem_ranges = g_array_new(false, true /* clear */,
>> +                                        sizeof *mem_range);
>> +        adevc->build_mem_ranges(adev, ms);
> another place you are tying to initialize nstat->mem_ranges
> make initialization in generic numa init code
> 
>> +    }
>> +
>> +    for (i = 0; i < nstat->mem_ranges_num; i++) {
>> +        mem_range = &g_array_index(nstat->mem_ranges, NumaMemRange, i);
>> +        flags = 0;
>> +
>> +        if (nstat->nodes[mem_range->node].is_initiator) {
>> +            flags |= HMAT_SPA_PROC_VALID;
>> +        }
>> +        if (nstat->nodes[mem_range->node].is_target) {
>> +            flags |= HMAT_SPA_MEM_VALID;
>> +        }
>> +
>> +        build_hmat_spa(table_data, flags, mem_range->base,
>> +                       mem_range->length,
>> +                       mem_range->node);
>> +    }
>> +
>> +    /* Build HMAT SPA structures for PC-DIMM devices. */
>> +    object_child_foreach(OBJECT(ms), pc_dimm_device_list, &device_list);
>> +
>> +    for (; device_list; device_list = device_list->next) {
>> +        PCDIMMDevice *dimm = device_list->data;
>> +        mem_base = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
>> +                                            NULL);
>> +        mem_len = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP,
>> +                                           NULL);
>> +        i = object_property_get_uint(OBJECT(dimm), PC_DIMM_NODE_PROP, NULL);
>> +        flags = 0;
>> +
>> +        if (nstat->nodes[i].is_initiator) {
>> +            flags |= HMAT_SPA_PROC_VALID;
>> +        }
>> +        if (nstat->nodes[i].is_target) {
>> +            flags |= HMAT_SPA_MEM_VALID;
>> +        }
>> +        build_hmat_spa(table_data, flags, mem_base, mem_len, i);
>> +    }
> Don't you need to free device_list at this point?
> 

Thank you for your suggestion, I will correct it.
>> +}
>> +
>> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>> +{
>> +    uint64_t hmat_start;
>> +
>> +    hmat_start = table_data->len;
>> +
>> +    /* reserve space for HMAT header  */
>> +    acpi_data_push(table_data, 40);
>> +
>> +    hmat_build_table_structs(table_data, ms);
>> +
>> +    build_header(linker, table_data,
>> +                 (void *)(table_data->data + hmat_start),
>> +                 "HMAT", table_data->len - hmat_start, 1, NULL, NULL);
>> +}
>> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
>> new file mode 100644
>> index 0000000000..e24b673fad
>> --- /dev/null
>> +++ b/hw/acpi/hmat.h
>> @@ -0,0 +1,43 @@
>> +/*
>> + * HMAT ACPI Implementation Header
>> + *
>> + * Copyright(C) 2019 Intel Corporation.
>> + *
>> + * Author:
>> + *  Liu jingqi <jingqi.liu@linux.intel.com>
>> + *  Tao Xu <tao3.xu@intel.com>
>> + *
>> + * HMAT is defined in ACPI 6.2.
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
>> + */
>> +
>> +#ifndef HMAT_H
>> +#define HMAT_H
>> +
>> +#include "hw/acpi/acpi-defs.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/bios-linker-loader.h"
>> +#include "hw/acpi/aml-build.h"
>> +
>> +/* the values of AcpiHmatSpaRange flag */
>> +enum {
>> +    HMAT_SPA_PROC_VALID       = 0x1,
>> +    HMAT_SPA_MEM_VALID        = 0x2,
>> +    HMAT_SPA_RESERVATION_HINT = 0x4,
>> +};
>> +
>> +void build_hmat(GArray *table_data, BIOSLinker *linker, MachineState *ms);
>> +
>> +#endif
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 14b29de0a9..2ad09ec23e 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -646,6 +646,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>                                  const CpuInstanceProperties *props, Error **errp)
>>   {
>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>       bool match = false;
>>       int i;
>>   
>> @@ -706,6 +707,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>           match = true;
>>           slot->props.node_id = props->node_id;
>>           slot->props.has_node_id = props->has_node_id;
>> +        numa_info[props->node_id].is_initiator = true;
>>       }
>>   
>>       if (!match) {
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 44dd447fa5..6584eac76e 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -66,6 +66,7 @@
>>   #include "hw/i386/intel_iommu.h"
>>   
>>   #include "hw/acpi/ipmi.h"
>> +#include "hw/acpi/hmat.h"
>>   
>>   /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>>    * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
>> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>               acpi_add_table(table_offsets, tables_blob);
>>               build_slit(tables_blob, tables->linker, machine);
>>           }
>> +        acpi_add_table(table_offsets, tables_blob);
>> +        build_hmat(tables_blob, tables->linker, machine);
> I'm not sure if we should add it unconditionally.
> Is this table used in any meaningful manner by guest when
> it's incomplete (i.e. populated only with SPA records)?
> 
>>       }
>>       if (acpi_get_mcfg(&mcfg)) {
>>           acpi_add_table(table_offsets, tables_blob);
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index e3c85b77bc..13cff59112 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -10,6 +10,8 @@ struct NodeInfo {
>>       uint64_t node_mem;
>>       struct HostMemoryBackend *node_memdev;
>>       bool present;
>> +    bool is_initiator;
>> +    bool is_target;
>>       uint8_t distance[MAX_NODES];
>>   };
>>   
>> diff --git a/numa.c b/numa.c
>> index d23e130bce..5556d118c3 100644
>> --- a/numa.c
>> +++ b/numa.c
>> @@ -102,6 +102,10 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>           }
>>       }
>>   
>> +    if (node->cpus) {
>> +        numa_info[nodenr].is_initiator = true;
>> +    }
>> +
>>       if (node->has_mem && node->has_memdev) {
>>           error_setg(errp, "cannot specify both mem= and memdev=");
>>           return;
>> @@ -118,6 +122,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>   
>>       if (node->has_mem) {
>>           numa_info[nodenr].node_mem = node->mem;
>> +        numa_info[nodenr].is_target = true;
>>       }
>>       if (node->has_memdev) {
>>           Object *o;
>> @@ -130,6 +135,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>           object_ref(o);
>>           numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>>           numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>> +        numa_info[nodenr].is_target = true;
>>       }
>>       numa_info[nodenr].present = true;
>>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-07-01 11:25   ` Igor Mammedov
  2019-07-02  1:14     ` Tao Xu
@ 2019-07-02  8:50     ` Tao Xu
  2019-07-08  9:09       ` Igor Mammedov
  1 sibling, 1 reply; 25+ messages in thread
From: Tao Xu @ 2019-07-02  8:50 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Jonathan Cameron, Dan Williams

On 7/1/2019 7:25 PM, Igor Mammedov wrote:
> On Fri, 14 Jun 2019 23:56:24 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
...
>> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>               acpi_add_table(table_offsets, tables_blob);
>>               build_slit(tables_blob, tables->linker, machine);
>>           }
>> +        acpi_add_table(table_offsets, tables_blob);
>> +        build_hmat(tables_blob, tables->linker, machine);
> I'm not sure if we should add it unconditionally.
> Is this table used in any meaningful manner by guest when
> it's incomplete (i.e. populated only with SPA records)?
> 
Hi Igor,

In ACPI 6.2, the linux kernel use it to show the memory ranges' 
node-id(Proximity Domain). In ACPI 6.3, the linux kernel use it to show 
the numa node's closest initiator(Generic Initiator or Processor, directly
attached). It is useful for a memory only numa node, because with 
SPA(renamed as "Memory Proximity Domain Attributes Structure" in ACPI 
6.3) the user-space can know the topology of hardware heterogeneous 
memory. I think I should add a doc to describe the use case in QEMU.

Therefore, the numa CLI may be lack of a input which can indicate the 
Initiator of a memory only numa node. Dan suggested me to add a new 
parameter for that[1].

Maybe like:
-numa node,mem=4G,nodeid=2,initiator=0

[1] https://patchwork.kernel.org/cover/10934417/

Thanks

Tao













^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-07-02  8:50     ` Tao Xu
@ 2019-07-08  9:09       ` Igor Mammedov
  2019-07-09  0:45         ` Tao Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2019-07-08  9:09 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Jonathan Cameron, Dan Williams

On Tue, 2 Jul 2019 16:50:24 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 7/1/2019 7:25 PM, Igor Mammedov wrote:
> > On Fri, 14 Jun 2019 23:56:24 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >   
> ...
> >> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
> >>               acpi_add_table(table_offsets, tables_blob);
> >>               build_slit(tables_blob, tables->linker, machine);
> >>           }
> >> +        acpi_add_table(table_offsets, tables_blob);
> >> +        build_hmat(tables_blob, tables->linker, machine);  
> > I'm not sure if we should add it unconditionally.
> > Is this table used in any meaningful manner by guest when
> > it's incomplete (i.e. populated only with SPA records)?
> >   
> Hi Igor,
> 
> In ACPI 6.2, the linux kernel use it to show the memory ranges' 
> node-id(Proximity Domain). In ACPI 6.3, the linux kernel use it to show 
> the numa node's closest initiator(Generic Initiator or Processor, directly
> attached). It is useful for a memory only numa node, because with 
> SPA(renamed as "Memory Proximity Domain Attributes Structure" in ACPI 
> 6.3) the user-space can know the topology of hardware heterogeneous 
> memory. I think I should add a doc to describe the use case in QEMU.
Could you point out to me the specific kernel code that parses and uses HMAT?

> 
> Therefore, the numa CLI may be lack of a input which can indicate the 
> Initiator of a memory only numa node. Dan suggested me to add a new 
> parameter for that[1].
> 
> Maybe like:
> -numa node,mem=4G,nodeid=2,initiator=0
> 
> [1] https://patchwork.kernel.org/cover/10934417/
> 
> Thanks
> 
> Tao
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT
  2019-07-08  9:09       ` Igor Mammedov
@ 2019-07-09  0:45         ` Tao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Tao Xu @ 2019-07-09  0:45 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, Jonathan Cameron,
	Williams, Dan J

On 7/8/2019 5:09 PM, Igor Mammedov wrote:
> On Tue, 2 Jul 2019 16:50:24 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> On 7/1/2019 7:25 PM, Igor Mammedov wrote:
>>> On Fri, 14 Jun 2019 23:56:24 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>    
>> ...
>>>> @@ -2710,6 +2711,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>>>                acpi_add_table(table_offsets, tables_blob);
>>>>                build_slit(tables_blob, tables->linker, machine);
>>>>            }
>>>> +        acpi_add_table(table_offsets, tables_blob);
>>>> +        build_hmat(tables_blob, tables->linker, machine);
>>> I'm not sure if we should add it unconditionally.
>>> Is this table used in any meaningful manner by guest when
>>> it's incomplete (i.e. populated only with SPA records)?
>>>    
>> Hi Igor,
>>
>> In ACPI 6.2, the linux kernel use it to show the memory ranges'
>> node-id(Proximity Domain). In ACPI 6.3, the linux kernel use it to show
>> the numa node's closest initiator(Generic Initiator or Processor, directly
>> attached). It is useful for a memory only numa node, because with
>> SPA(renamed as "Memory Proximity Domain Attributes Structure" in ACPI
>> 6.3) the user-space can know the topology of hardware heterogeneous
>> memory. I think I should add a doc to describe the use case in QEMU.
> Could you point out to me the specific kernel code that parses and uses HMAT?
> 

OK, it is in drivers/acpi/hmat/hmat.c

>>
>> Therefore, the numa CLI may be lack of a input which can indicate the
>> Initiator of a memory only numa node. Dan suggested me to add a new
>> parameter for that[1].
>>
>> Maybe like:
>> -numa node,mem=4G,nodeid=2,initiator=0
>>
>> [1] https://patchwork.kernel.org/cover/10934417/
>>
>> Thanks
>>
>> Tao
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-07-09  0:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-14 15:56 [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 1/8] hw/arm: simplify arm_load_dtb Tao Xu
2019-06-27 12:42   ` Igor Mammedov
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 2/8] numa: move numa global variable nb_numa_nodes into MachineState Tao Xu
2019-06-28 11:02   ` Igor Mammedov
2019-07-01  1:57     ` Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 3/8] numa: move numa global variable have_numa_distance " Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 4/8] numa: move numa global variable numa_info " Tao Xu
2019-06-28 11:20   ` Igor Mammedov
2019-07-01  2:01     ` Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 5/8] acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook Tao Xu
2019-07-01 10:59   ` Igor Mammedov
2019-07-02  1:12     ` Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 6/8] hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI HMAT Tao Xu
2019-06-27 15:56   ` Jonathan Cameron
2019-07-01  0:58     ` Tao Xu
2019-07-01 11:25   ` Igor Mammedov
2019-07-02  1:14     ` Tao Xu
2019-07-02  8:50     ` Tao Xu
2019-07-08  9:09       ` Igor Mammedov
2019-07-09  0:45         ` Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 7/8] hmat acpi: Build System Locality Latency and Bandwidth Information " Tao Xu
2019-06-14 15:56 ` [Qemu-devel] [PATCH v5 8/8] numa: Extend the command-line to provide memory latency and bandwidth information Tao Xu
2019-07-01 13:37 ` [Qemu-devel] [PATCH v5 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Igor Mammedov
2019-07-02  0:44   ` Tao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).