qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] account for NVDIMM nodes during SRAT generation
@ 2020-05-28 22:34 Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 1/3] diffs-allowed: add the SRAT AML to diffs-allowed Vishal Verma
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vishal Verma @ 2020-05-28 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Thomas Huth, Xiao Guangrong, Michael S. Tsirkin,
	jingqi.liu, Dave Hansen, Eduardo Habkost, Vishal Verma,
	Paolo Bonzini, Igor Mammedov, Dan Williams, Richard Henderson

Changes since v3:
- Add the SRAT augmentation for ARM's virt-acpi-build as well (Igor)
- Update patches 1 and 3 for the test binaries to include ARM tests.

Changes since v2:
- Change a repetitive OBJECT(dev) to a stored 'Object' (Igor)
- No need to return 'numamem' back to build_srat (Igor)

Changes since v1:
- Use error_abort for getters (Igor)
- Free the device list (Igor)
- Refactor the NVDIMM related portion into hw/acpi/nvdimm.c (Igor)
- Rebase onto latest master
- Add Jingqi's Reviewed-by

On the command line, one can specify a NUMA node for NVDIMM devices. If
we set up the topology to give NVDIMMs their own nodes, i.e. not
containing any CPUs or regular memory, qemu doesn't populate SRAT memory
affinity structures for these nodes. However the NFIT does reference
those proximity domains.

As a result, Linux, while parsing the SRAT, fails to initialize node
related structures for these nodes, and they never end up in the
nodes_possible map. When these are onlined at a later point (via
hotplug), this causes problems.

I've followed the instructions in bios-tables-test.c to update the
expected SRAT binary, and the tests (make check) pass. Patches 1 and 3
are the relevant ones for the binary update.

Patch 2 is the main patch which changes SRAT generation.


Vishal Verma (3):
  diffs-allowed: add the SRAT AML to diffs-allowed
  hw/acpi/nvdimm: add a helper to augment SRAT generation
  tests/acpi: update expected SRAT files

 hw/acpi/nvdimm.c                 |  23 +++++++++++++++++++++++
 hw/arm/virt-acpi-build.c         |   4 ++++
 hw/i386/acpi-build.c             |   5 +++++
 include/hw/mem/nvdimm.h          |   1 +
 tests/data/acpi/pc/SRAT.dimmpxm  | Bin 392 -> 392 bytes
 tests/data/acpi/q35/SRAT.dimmpxm | Bin 392 -> 392 bytes
 tests/data/acpi/virt/SRAT.memhp  | Bin 186 -> 226 bytes
 7 files changed, 33 insertions(+)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/3] diffs-allowed: add the SRAT AML to diffs-allowed
  2020-05-28 22:34 [PATCH v4 0/3] account for NVDIMM nodes during SRAT generation Vishal Verma
@ 2020-05-28 22:34 ` Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 3/3] tests/acpi: update expected SRAT files Vishal Verma
  2 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2020-05-28 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Thomas Huth, Xiao Guangrong, Michael S. Tsirkin,
	jingqi.liu, Dave Hansen, Eduardo Habkost, Vishal Verma,
	Paolo Bonzini, Igor Mammedov, Dan Williams, Richard Henderson

In anticipation of a change to the SRAT generation in qemu, add the AML
file to diffs-allowed.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..e8f2766a63 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,4 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/SRAT.dimmpxm",
+"tests/data/acpi/q35/SRAT.dimmpxm",
+"tests/data/acpi/virt/SRAT.memhp",
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation
  2020-05-28 22:34 [PATCH v4 0/3] account for NVDIMM nodes during SRAT generation Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 1/3] diffs-allowed: add the SRAT AML to diffs-allowed Vishal Verma
@ 2020-05-28 22:34 ` Vishal Verma
  2020-06-04 10:33   ` Igor Mammedov
  2020-05-28 22:34 ` [PATCH v4 3/3] tests/acpi: update expected SRAT files Vishal Verma
  2 siblings, 1 reply; 8+ messages in thread
From: Vishal Verma @ 2020-05-28 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Thomas Huth, Xiao Guangrong, Michael S. Tsirkin,
	jingqi.liu, Dave Hansen, Eduardo Habkost, Vishal Verma,
	Paolo Bonzini, Igor Mammedov, Dan Williams, Richard Henderson

NVDIMMs can belong to their own proximity domains, as described by the
NFIT. In such cases, the SRAT needs to have Memory Affinity structures
in the SRAT for these NVDIMMs, otherwise Linux doesn't populate node
data structures properly during NUMA initialization. See the following
for an example failure case.

https://lore.kernel.org/linux-nvdimm/20200416225438.15208-1-vishal.l.verma@intel.com/

Introduce a new helper, nvdimm_build_srat(), and call it for both the
i386 and arm versions of 'build_srat()' to augment the SRAT with
memory affinity information for NVDIMMs.

The relevant command line options to exercise this are below. Nodes 0-1
contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address
space.

  -numa node,nodeid=0,mem=2048M,
  -numa node,nodeid=1,mem=2048M,
  -numa node,nodeid=2,mem=0,
  -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=128M
  -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=2
  -numa node,nodeid=3,mem=0,
  -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=128M
  -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=3

Cc: Jingqi Liu <jingqi.liu@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 hw/acpi/nvdimm.c         | 23 +++++++++++++++++++++++
 hw/arm/virt-acpi-build.c |  4 ++++
 hw/i386/acpi-build.c     |  5 +++++
 include/hw/mem/nvdimm.h  |  1 +
 4 files changed, 33 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 9316d12b70..8f7cc16add 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -28,6 +28,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/uuid.h"
+#include "qapi/error.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
@@ -1334,6 +1335,28 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
     free_aml_allocator();
 }
 
+void nvdimm_build_srat(GArray *table_data)
+{
+    GSList *device_list = nvdimm_get_device_list();
+
+    for (; device_list; device_list = device_list->next) {
+        AcpiSratMemoryAffinity *numamem = NULL;
+        DeviceState *dev = device_list->data;
+        Object *obj = OBJECT(dev);
+        uint64_t addr, size;
+        int node;
+
+        node = object_property_get_int(obj, PC_DIMM_NODE_PROP, &error_abort);
+        addr = object_property_get_uint(obj, PC_DIMM_ADDR_PROP, &error_abort);
+        size = object_property_get_uint(obj, PC_DIMM_SIZE_PROP, &error_abort);
+
+        numamem = acpi_data_push(table_data, sizeof *numamem);
+        build_srat_memory(numamem, addr, size, node,
+                          MEM_AFFINITY_ENABLED | MEM_AFFINITY_NON_VOLATILE);
+    }
+    g_slist_free(device_list);
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        BIOSLinker *linker, NVDIMMState *state,
                        uint32_t ram_slots)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 1b0a584c7b..2cbccd5fe2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -539,6 +539,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         }
     }
 
+    if (ms->nvdimms_state->is_enabled) {
+        nvdimm_build_srat(table_data);
+    }
+
     if (ms->device_memory) {
         numamem = acpi_data_push(table_data, sizeof *numamem);
         build_srat_memory(numamem, ms->device_memory->base,
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2e15f6848e..d996525e2c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2428,6 +2428,11 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
                               MEM_AFFINITY_ENABLED);
         }
     }
+
+    if (machine->nvdimms_state->is_enabled) {
+        nvdimm_build_srat(table_data);
+    }
+
     slots = (table_data->len - numa_start) / sizeof *numamem;
     for (; slots < pcms->numa_nodes + 2; slots++) {
         numamem = acpi_data_push(table_data, sizeof *numamem);
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index a3c08955e8..b67a1aedf6 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -155,6 +155,7 @@ typedef struct NVDIMMState NVDIMMState;
 void nvdimm_init_acpi_state(NVDIMMState *state, MemoryRegion *io,
                             struct AcpiGenericAddress dsm_io,
                             FWCfgState *fw_cfg, Object *owner);
+void nvdimm_build_srat(GArray *table_data);
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        BIOSLinker *linker, NVDIMMState *state,
                        uint32_t ram_slots);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/3] tests/acpi: update expected SRAT files
  2020-05-28 22:34 [PATCH v4 0/3] account for NVDIMM nodes during SRAT generation Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 1/3] diffs-allowed: add the SRAT AML to diffs-allowed Vishal Verma
  2020-05-28 22:34 ` [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation Vishal Verma
@ 2020-05-28 22:34 ` Vishal Verma
  2 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2020-05-28 22:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Thomas Huth, Xiao Guangrong, Michael S. Tsirkin,
	jingqi.liu, Dave Hansen, Eduardo Habkost, Vishal Verma,
	Paolo Bonzini, Igor Mammedov, Dan Williams, Richard Henderson

Update expected SRAT files for the change to account for NVDIMM NUMA
nodes in the SRAT.

AML diffs:

tests/data/acpi/pc/SRAT.dimmpxm:
--- /tmp/asl-3P2IL0.dsl	2020-05-28 15:11:02.326439263 -0600
+++ /tmp/asl-1N4IL0.dsl	2020-05-28 15:11:02.325439280 -0600
@@ -3,7 +3,7 @@
  * AML/ASL+ Disassembler version 20190509 (64-bit version)
  * Copyright (c) 2000 - 2019 Intel Corporation
  *
- * Disassembly of tests/data/acpi/pc/SRAT.dimmpxm, Thu May 28 15:11:02 2020
+ * Disassembly of /tmp/aml-4D4IL0, Thu May 28 15:11:02 2020
  *
  * ACPI Data Table [SRAT]
  *
@@ -13,7 +13,7 @@
 [000h 0000   4]                    Signature : "SRAT"    [System Resource Affinity Table]
 [004h 0004   4]                 Table Length : 00000188
 [008h 0008   1]                     Revision : 01
-[009h 0009   1]                     Checksum : 80
+[009h 0009   1]                     Checksum : 68
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPCSRAT"
 [018h 0024   4]                 Oem Revision : 00000001
@@ -140,15 +140,15 @@
 [138h 0312   1]                Subtable Type : 01 [Memory Affinity]
 [139h 0313   1]                       Length : 28

-[13Ah 0314   4]             Proximity Domain : 00000000
+[13Ah 0314   4]             Proximity Domain : 00000002
 [13Eh 0318   2]                    Reserved1 : 0000
-[140h 0320   8]                 Base Address : 0000000000000000
-[148h 0328   8]               Address Length : 0000000000000000
+[140h 0320   8]                 Base Address : 0000000108000000
+[148h 0328   8]               Address Length : 0000000008000000
 [150h 0336   4]                    Reserved2 : 00000000
-[154h 0340   4]        Flags (decoded below) : 00000000
-                                     Enabled : 0
+[154h 0340   4]        Flags (decoded below) : 00000005
+                                     Enabled : 1
                                Hot Pluggable : 0
-                                Non-Volatile : 0
+                                Non-Volatile : 1
 [158h 0344   8]                    Reserved3 : 0000000000000000

 [160h 0352   1]                Subtable Type : 01 [Memory Affinity]

tests/data/acpi/q35/SRAT.dimmpxm:
--- /tmp/asl-HW2LL0.dsl	2020-05-28 15:11:05.446384514 -0600
+++ /tmp/asl-8MYLL0.dsl	2020-05-28 15:11:05.445384532 -0600
@@ -3,7 +3,7 @@
  * AML/ASL+ Disassembler version 20190509 (64-bit version)
  * Copyright (c) 2000 - 2019 Intel Corporation
  *
- * Disassembly of tests/data/acpi/q35/SRAT.dimmpxm, Thu May 28 15:11:05 2020
+ * Disassembly of /tmp/aml-2CYLL0, Thu May 28 15:11:05 2020
  *
  * ACPI Data Table [SRAT]
  *
@@ -13,7 +13,7 @@
 [000h 0000   4]                    Signature : "SRAT"    [System Resource Affinity Table]
 [004h 0004   4]                 Table Length : 00000188
 [008h 0008   1]                     Revision : 01
-[009h 0009   1]                     Checksum : 80
+[009h 0009   1]                     Checksum : 68
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPCSRAT"
 [018h 0024   4]                 Oem Revision : 00000001
@@ -140,15 +140,15 @@
 [138h 0312   1]                Subtable Type : 01 [Memory Affinity]
 [139h 0313   1]                       Length : 28

-[13Ah 0314   4]             Proximity Domain : 00000000
+[13Ah 0314   4]             Proximity Domain : 00000002
 [13Eh 0318   2]                    Reserved1 : 0000
-[140h 0320   8]                 Base Address : 0000000000000000
-[148h 0328   8]               Address Length : 0000000000000000
+[140h 0320   8]                 Base Address : 0000000108000000
+[148h 0328   8]               Address Length : 0000000008000000
 [150h 0336   4]                    Reserved2 : 00000000
-[154h 0340   4]        Flags (decoded below) : 00000000
-                                     Enabled : 0
+[154h 0340   4]        Flags (decoded below) : 00000005
+                                     Enabled : 1
                                Hot Pluggable : 0
-                                Non-Volatile : 0
+                                Non-Volatile : 1
 [158h 0344   8]                    Reserved3 : 0000000000000000

 [160h 0352   1]                Subtable Type : 01 [Memory Affinity]

tests/data/acpi/virt/SRAT.memhp:
--- /tmp/asl-E32WL0.dsl	2020-05-28 15:19:56.976095582 -0600
+++ /tmp/asl-Y69WL0.dsl	2020-05-28 15:19:56.974095617 -0600
@@ -3,7 +3,7 @@
  * AML/ASL+ Disassembler version 20190509 (64-bit version)
  * Copyright (c) 2000 - 2019 Intel Corporation
  *
- * Disassembly of tests/data/acpi/virt/SRAT.memhp, Thu May 28 15:19:56 2020
+ * Disassembly of /tmp/aml-2CCXL0, Thu May 28 15:19:56 2020
  *
  * ACPI Data Table [SRAT]
  *
@@ -11,9 +11,9 @@
  */

 [000h 0000   4]                    Signature : "SRAT"    [System Resource Affinity Table]
-[004h 0004   4]                 Table Length : 000000BA
+[004h 0004   4]                 Table Length : 000000E2
 [008h 0008   1]                     Revision : 03
-[009h 0009   1]                     Checksum : 43
+[009h 0009   1]                     Checksum : 5C
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPCSRAT"
 [018h 0024   4]                 Oem Revision : 00000001
@@ -65,18 +65,32 @@

 [094h 0148   4]             Proximity Domain : 00000001
 [098h 0152   2]                    Reserved1 : 0000
-[09Ah 0154   8]                 Base Address : 0000000080000000
-[0A2h 0162   8]               Address Length : 00000000F0000000
+[09Ah 0154   8]                 Base Address : 0000000088000000
+[0A2h 0162   8]               Address Length : 0000000008000000
 [0AAh 0170   4]                    Reserved2 : 00000000
-[0AEh 0174   4]        Flags (decoded below) : 00000003
+[0AEh 0174   4]        Flags (decoded below) : 00000005
+                                     Enabled : 1
+                               Hot Pluggable : 0
+                                Non-Volatile : 1
+[0B2h 0178   8]                    Reserved3 : 0000000000000000
+
+[0BAh 0186   1]                Subtable Type : 01 [Memory Affinity]
+[0BBh 0187   1]                       Length : 28
+
+[0BCh 0188   4]             Proximity Domain : 00000001
+[0C0h 0192   2]                    Reserved1 : 0000
+[0C2h 0194   8]                 Base Address : 0000000080000000
+[0CAh 0202   8]               Address Length : 00000000F0000000
+[0D2h 0210   4]                    Reserved2 : 00000000
+[0D6h 0214   4]        Flags (decoded below) : 00000003
                                      Enabled : 1
                                Hot Pluggable : 1
                                 Non-Volatile : 0
-[0B2h 0178   8]                    Reserved3 : 0000000000000000
+[0DAh 0218   8]                    Reserved3 : 0000000000000000

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 tests/data/acpi/pc/SRAT.dimmpxm             | Bin 392 -> 392 bytes
 tests/data/acpi/q35/SRAT.dimmpxm            | Bin 392 -> 392 bytes
 tests/data/acpi/virt/SRAT.memhp             | Bin 186 -> 226 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   3 ---
 4 files changed, 3 deletions(-)

diff --git a/tests/data/acpi/pc/SRAT.dimmpxm b/tests/data/acpi/pc/SRAT.dimmpxm
index f5c0267ea24bb404b6b4e687390140378fbdc3f1..5a13c61b9041c6045c29643bf93a111fb1c0c76a 100644
GIT binary patch
delta 51
scmeBR?qKE$4ss0XU}Rum%-G0fz$nec00kUCF%aN@Pz(&LlS3Je0lmQmhyVZp

delta 51
icmeBR?qKE$4ss0XU}RumY}m+Uz$ndt8%z#mGzI{_tp$hx

diff --git a/tests/data/acpi/q35/SRAT.dimmpxm b/tests/data/acpi/q35/SRAT.dimmpxm
index f5c0267ea24bb404b6b4e687390140378fbdc3f1..5a13c61b9041c6045c29643bf93a111fb1c0c76a 100644
GIT binary patch
delta 51
scmeBR?qKE$4ss0XU}Rum%-G0fz$nec00kUCF%aN@Pz(&LlS3Je0lmQmhyVZp

delta 51
icmeBR?qKE$4ss0XU}RumY}m+Uz$ndt8%z#mGzI{_tp$hx

diff --git a/tests/data/acpi/virt/SRAT.memhp b/tests/data/acpi/virt/SRAT.memhp
index 1b57db2072e7f7e2085c4a427aa31c7383851b71..9a35adb40c6f7cd822e5af37abba8aad033617cb 100644
GIT binary patch
delta 43
rcmdnR_=u4!ILI;N5d#AQbIe4p$wD1K76@=aC<X@BiSc3+=gI;A(y0ha

delta 21
dcmaFFxQmf1ILI+%7Xt$Wv-3o*$rF#t0suzv27~|r

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index e8f2766a63..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,4 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/SRAT.dimmpxm",
-"tests/data/acpi/q35/SRAT.dimmpxm",
-"tests/data/acpi/virt/SRAT.memhp",
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation
  2020-05-28 22:34 ` [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation Vishal Verma
@ 2020-06-04 10:33   ` Igor Mammedov
  2020-06-05  0:54     ` Verma, Vishal L
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Mammedov @ 2020-06-04 10:33 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Laurent Vivier, Thomas Huth, Xiao Guangrong, Michael S. Tsirkin,
	jingqi.liu, Dave Hansen, qemu-devel, Paolo Bonzini, Dan Williams,
	Richard Henderson, Eduardo Habkost

On Thu, 28 May 2020 16:34:36 -0600
Vishal Verma <vishal.l.verma@intel.com> wrote:

> NVDIMMs can belong to their own proximity domains, as described by the
> NFIT. In such cases, the SRAT needs to have Memory Affinity structures
> in the SRAT for these NVDIMMs, otherwise Linux doesn't populate node
> data structures properly during NUMA initialization. See the following
> for an example failure case.
> 
> https://lore.kernel.org/linux-nvdimm/20200416225438.15208-1-vishal.l.verma@intel.com/
> 
> Introduce a new helper, nvdimm_build_srat(), and call it for both the
> i386 and arm versions of 'build_srat()' to augment the SRAT with
> memory affinity information for NVDIMMs.
> 
> The relevant command line options to exercise this are below. Nodes 0-1
> contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address
> space.
> 
>   -numa node,nodeid=0,mem=2048M,
>   -numa node,nodeid=1,mem=2048M,

pls note that 'mem' is about to be disabled for new machine types in favor of memdev
so this CLI won't work.
It would be nice to update commit message with memdev variant of CLI

>   -numa node,nodeid=2,mem=0,
>   -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=128M
>   -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=2
>   -numa node,nodeid=3,mem=0,
>   -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=128M
>   -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=3
> 
> Cc: Jingqi Liu <jingqi.liu@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>


Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/acpi/nvdimm.c         | 23 +++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c |  4 ++++
>  hw/i386/acpi-build.c     |  5 +++++
>  include/hw/mem/nvdimm.h  |  1 +
>  4 files changed, 33 insertions(+)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 9316d12b70..8f7cc16add 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -28,6 +28,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/uuid.h"
> +#include "qapi/error.h"
>  #include "hw/acpi/acpi.h"
>  #include "hw/acpi/aml-build.h"
>  #include "hw/acpi/bios-linker-loader.h"
> @@ -1334,6 +1335,28 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
>      free_aml_allocator();
>  }
>  
> +void nvdimm_build_srat(GArray *table_data)
> +{
> +    GSList *device_list = nvdimm_get_device_list();
> +
> +    for (; device_list; device_list = device_list->next) {
> +        AcpiSratMemoryAffinity *numamem = NULL;
> +        DeviceState *dev = device_list->data;
> +        Object *obj = OBJECT(dev);
> +        uint64_t addr, size;
> +        int node;
> +
> +        node = object_property_get_int(obj, PC_DIMM_NODE_PROP, &error_abort);
> +        addr = object_property_get_uint(obj, PC_DIMM_ADDR_PROP, &error_abort);
> +        size = object_property_get_uint(obj, PC_DIMM_SIZE_PROP, &error_abort);
> +
> +        numamem = acpi_data_push(table_data, sizeof *numamem);
> +        build_srat_memory(numamem, addr, size, node,
> +                          MEM_AFFINITY_ENABLED | MEM_AFFINITY_NON_VOLATILE);
> +    }
> +    g_slist_free(device_list);
> +}
> +
>  void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         BIOSLinker *linker, NVDIMMState *state,
>                         uint32_t ram_slots)
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 1b0a584c7b..2cbccd5fe2 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -539,6 +539,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          }
>      }
>  
> +    if (ms->nvdimms_state->is_enabled) {
> +        nvdimm_build_srat(table_data);
> +    }
> +
>      if (ms->device_memory) {
>          numamem = acpi_data_push(table_data, sizeof *numamem);
>          build_srat_memory(numamem, ms->device_memory->base,
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 2e15f6848e..d996525e2c 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2428,6 +2428,11 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>                                MEM_AFFINITY_ENABLED);
>          }
>      }
> +
> +    if (machine->nvdimms_state->is_enabled) {
> +        nvdimm_build_srat(table_data);
> +    }
> +
>      slots = (table_data->len - numa_start) / sizeof *numamem;
>      for (; slots < pcms->numa_nodes + 2; slots++) {
>          numamem = acpi_data_push(table_data, sizeof *numamem);
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index a3c08955e8..b67a1aedf6 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -155,6 +155,7 @@ typedef struct NVDIMMState NVDIMMState;
>  void nvdimm_init_acpi_state(NVDIMMState *state, MemoryRegion *io,
>                              struct AcpiGenericAddress dsm_io,
>                              FWCfgState *fw_cfg, Object *owner);
> +void nvdimm_build_srat(GArray *table_data);
>  void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         BIOSLinker *linker, NVDIMMState *state,
>                         uint32_t ram_slots);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation
  2020-06-04 10:33   ` Igor Mammedov
@ 2020-06-05  0:54     ` Verma, Vishal L
  2020-06-05  8:23       ` Igor Mammedov
  0 siblings, 1 reply; 8+ messages in thread
From: Verma, Vishal L @ 2020-06-05  0:54 UTC (permalink / raw)
  To: imammedo
  Cc: lvivier, thuth, xiaoguangrong.eric, mst, Liu, Jingqi,
	dave.hansen, qemu-devel, ehabkost, pbonzini, Williams, Dan J,
	rth

On Thu, 2020-06-04 at 12:33 +0200, Igor Mammedov wrote:
> On Thu, 28 May 2020 16:34:36 -0600
> Vishal Verma <vishal.l.verma@intel.com> wrote:
> 
> > NVDIMMs can belong to their own proximity domains, as described by the
> > NFIT. In such cases, the SRAT needs to have Memory Affinity structures
> > in the SRAT for these NVDIMMs, otherwise Linux doesn't populate node
> > data structures properly during NUMA initialization. See the following
> > for an example failure case.
> > 
> > https://lore.kernel.org/linux-nvdimm/20200416225438.15208-1-vishal.l.verma@intel.com/
> > 
> > Introduce a new helper, nvdimm_build_srat(), and call it for both the
> > i386 and arm versions of 'build_srat()' to augment the SRAT with
> > memory affinity information for NVDIMMs.
> > 
> > The relevant command line options to exercise this are below. Nodes 0-1
> > contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address
> > space.
> > 
> >   -numa node,nodeid=0,mem=2048M,
> >   -numa node,nodeid=1,mem=2048M,
> 
> pls note that 'mem' is about to be disabled for new machine types in favor of memdev
> so this CLI won't work.
> It would be nice to update commit message with memdev variant of CLI

I saw the warnings printed - I did try to use memdevs, but it didn't
quite work with my use case. I'm supplying mem=0 for the pmem/nvdimm
devices that I want to give a specific numa node, but not give them any
more regular memory aside from the nvdimm itself (see nodes 4 and 5
below). And for some reason I couldn't do that with memdevs.

Here is the full command line I'm using for example. I'd appreciate any
pointers on converting over to memdevs fully.

   qemu-system-x86_64 
   -machine pc,accel=kvm,nvdimm, 
   -m 8192M,slots=4,maxmem=40960M 
   -smp 8,sockets=2,cores=2,threads=2 
   -enable-kvm 
   -display none 
   -nographic 
   -drive file=root.img,format=raw,media=disk 
   -kernel ./mkosi.extra/boot/vmlinuz-5.7.0-00001-g87ad963bac23 
   -initrd mkosi.extra/boot/initramfs-5.7.0-00001-g87ad963bac23.img 
   -append selinux=0 audit=0 console=tty0 console=ttyS0 root=/dev/sda2 ignore_loglevel rw 
   -device e1000,netdev=net0 
   -netdev user,id=net0,hostfwd=tcp::10022-:22 
   -snapshot 
   -numa node,nodeid=0,mem=2048M, 
   -numa cpu,node-id=0,socket-id=0 
   -numa node,nodeid=1,mem=2048M, 
   -numa cpu,node-id=1,socket-id=1 
   -numa node,nodeid=2,mem=2048M, 
   -numa node,nodeid=3,mem=2048M, 
   -numa node,nodeid=4,mem=0, 
   -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=1G 
   -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=4 
   -numa node,nodeid=5,mem=0, 
   -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=1G 
   -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=5 

> 
> >   -numa node,nodeid=2,mem=0,
> >   -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=128M
> >   -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=2
> >   -numa node,nodeid=3,mem=0,
> >   -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=128M
> >   -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=3
> > 
> > Cc: Jingqi Liu <jingqi.liu@intel.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>

Thanks for the review Igor - I'm pretty unfamiliar with qemu development
- what are the next steps? Is there a certain maintainer/tree I could
watch for the inclusion of this?

> 
> > ---
> >  hw/acpi/nvdimm.c         | 23 +++++++++++++++++++++++
> >  hw/arm/virt-acpi-build.c |  4 ++++
> >  hw/i386/acpi-build.c     |  5 +++++
> >  include/hw/mem/nvdimm.h  |  1 +
> >  4 files changed, 33 insertions(+)
> > 
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation
  2020-06-05  0:54     ` Verma, Vishal L
@ 2020-06-05  8:23       ` Igor Mammedov
  2020-06-05 23:52         ` Verma, Vishal L
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Mammedov @ 2020-06-05  8:23 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: lvivier, thuth, xiaoguangrong.eric, mst, Liu, Jingqi,
	dave.hansen, qemu-devel, ehabkost, pbonzini, Williams, Dan J,
	rth

On Fri, 5 Jun 2020 00:54:28 +0000
"Verma, Vishal L" <vishal.l.verma@intel.com> wrote:

> On Thu, 2020-06-04 at 12:33 +0200, Igor Mammedov wrote:
> > On Thu, 28 May 2020 16:34:36 -0600
> > Vishal Verma <vishal.l.verma@intel.com> wrote:
> >   
> > > NVDIMMs can belong to their own proximity domains, as described by the
> > > NFIT. In such cases, the SRAT needs to have Memory Affinity structures
> > > in the SRAT for these NVDIMMs, otherwise Linux doesn't populate node
> > > data structures properly during NUMA initialization. See the following
> > > for an example failure case.
> > > 
> > > https://lore.kernel.org/linux-nvdimm/20200416225438.15208-1-vishal.l.verma@intel.com/
> > > 
> > > Introduce a new helper, nvdimm_build_srat(), and call it for both the
> > > i386 and arm versions of 'build_srat()' to augment the SRAT with
> > > memory affinity information for NVDIMMs.
> > > 
> > > The relevant command line options to exercise this are below. Nodes 0-1
> > > contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address
> > > space.
> > > 
> > >   -numa node,nodeid=0,mem=2048M,
> > >   -numa node,nodeid=1,mem=2048M,  
> > 
> > pls note that 'mem' is about to be disabled for new machine types in favor of memdev
> > so this CLI won't work.
> > It would be nice to update commit message with memdev variant of CLI  
> 
> I saw the warnings printed - I did try to use memdevs, but it didn't
> quite work with my use case. I'm supplying mem=0 for the pmem/nvdimm
> devices that I want to give a specific numa node, but not give them any
> more regular memory aside from the nvdimm itself (see nodes 4 and 5
> below). And for some reason I couldn't do that with memdevs.
it should work since 4.1

here is example 
qemu-system-x86_64 -object memory-backend-ram,id=mem0,size=1G -m 1G \
 -numa node,memdev=mem0 -numa node -monitor stdio

QEMU 5.0.50 monitor - type 'help' for more information
(qemu) VNC server running on ::1:5900
info numa
2 nodes
node 0 cpus: 0
node 0 size: 1024 MB
node 0 plugged: 0 MB
node 1 cpus:
node 1 size: 0 MB
node 1 plugged: 0 MB
(qemu)
> 
> Here is the full command line I'm using for example. I'd appreciate any
> pointers on converting over to memdevs fully.
> 
>    qemu-system-x86_64 
>    -machine pc,accel=kvm,nvdimm, 
>    -m 8192M,slots=4,maxmem=40960M 
>    -smp 8,sockets=2,cores=2,threads=2 
>    -enable-kvm 
>    -display none 
>    -nographic 
>    -drive file=root.img,format=raw,media=disk 
>    -kernel ./mkosi.extra/boot/vmlinuz-5.7.0-00001-g87ad963bac23 
>    -initrd mkosi.extra/boot/initramfs-5.7.0-00001-g87ad963bac23.img 
>    -append selinux=0 audit=0 console=tty0 console=ttyS0 root=/dev/sda2 ignore_loglevel rw 
>    -device e1000,netdev=net0 
>    -netdev user,id=net0,hostfwd=tcp::10022-:22 
>    -snapshot 
>    -numa node,nodeid=0,mem=2048M, 
>    -numa cpu,node-id=0,socket-id=0 
>    -numa node,nodeid=1,mem=2048M, 
>    -numa cpu,node-id=1,socket-id=1 
>    -numa node,nodeid=2,mem=2048M, 
>    -numa node,nodeid=3,mem=2048M, 
>    -numa node,nodeid=4,mem=0, 
>    -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=1G 
>    -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=4 
>    -numa node,nodeid=5,mem=0, 
>    -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=1G 
>    -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=5 
> 
> >   
> > >   -numa node,nodeid=2,mem=0,
> > >   -object memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=128M
> > >   -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=2
> > >   -numa node,nodeid=3,mem=0,
> > >   -object memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=128M
> > >   -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=3
> > > 
> > > Cc: Jingqi Liu <jingqi.liu@intel.com>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>  
> > 
> > Reviewed-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Thanks for the review Igor - I'm pretty unfamiliar with qemu development
> - what are the next steps? Is there a certain maintainer/tree I could
> watch for the inclusion of this?
> 
> >   
> > > ---
> > >  hw/acpi/nvdimm.c         | 23 +++++++++++++++++++++++
> > >  hw/arm/virt-acpi-build.c |  4 ++++
> > >  hw/i386/acpi-build.c     |  5 +++++
> > >  include/hw/mem/nvdimm.h  |  1 +
> > >  4 files changed, 33 insertions(+)
> > > 
> > >   



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation
  2020-06-05  8:23       ` Igor Mammedov
@ 2020-06-05 23:52         ` Verma, Vishal L
  0 siblings, 0 replies; 8+ messages in thread
From: Verma, Vishal L @ 2020-06-05 23:52 UTC (permalink / raw)
  To: imammedo
  Cc: lvivier, thuth, xiaoguangrong.eric, mst, Liu, Jingqi,
	dave.hansen, qemu-devel, ehabkost, pbonzini, Williams, Dan J,
	rth

On Fri, 2020-06-05 at 10:23 +0200, Igor Mammedov wrote:

> > > > The relevant command line options to exercise this are below. Nodes 0-1
> > > > contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address
> > > > space.
> > > > 
> > > >   -numa node,nodeid=0,mem=2048M,
> > > >   -numa node,nodeid=1,mem=2048M,  
> > > 
> > > pls note that 'mem' is about to be disabled for new machine types in favor of memdev
> > > so this CLI won't work.
> > > It would be nice to update commit message with memdev variant of CLI  
> > 
> > I saw the warnings printed - I did try to use memdevs, but it didn't
> > quite work with my use case. I'm supplying mem=0 for the pmem/nvdimm
> > devices that I want to give a specific numa node, but not give them any
> > more regular memory aside from the nvdimm itself (see nodes 4 and 5
> > below). And for some reason I couldn't do that with memdevs.
> it should work since 4.1
> 
> here is example 
> qemu-system-x86_64 -object memory-backend-ram,id=mem0,size=1G -m 1G \
>  -numa node,memdev=mem0 -numa node -monitor stdio
> 
> QEMU 5.0.50 monitor - type 'help' for more information
> (qemu) VNC server running on ::1:5900
> info numa
> 2 nodes
> node 0 cpus: 0
> node 0 size: 1024 MB
> node 0 plugged: 0 MB
> node 1 cpus:
> node 1 size: 0 MB
> node 1 plugged: 0 MB
> (qemu)
> 

Perfect got it working, Thanks Igor!

I'll send a v5 with the updated commit message and add your reviewed-by.

-Vishal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-05 23:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28 22:34 [PATCH v4 0/3] account for NVDIMM nodes during SRAT generation Vishal Verma
2020-05-28 22:34 ` [PATCH v4 1/3] diffs-allowed: add the SRAT AML to diffs-allowed Vishal Verma
2020-05-28 22:34 ` [PATCH v4 2/3] hw/acpi/nvdimm: add a helper to augment SRAT generation Vishal Verma
2020-06-04 10:33   ` Igor Mammedov
2020-06-05  0:54     ` Verma, Vishal L
2020-06-05  8:23       ` Igor Mammedov
2020-06-05 23:52         ` Verma, Vishal L
2020-05-28 22:34 ` [PATCH v4 3/3] tests/acpi: update expected SRAT files Vishal Verma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).