All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
@ 2019-02-20 22:39 Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
                   ` (18 more replies)
  0 siblings, 19 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

This series aims to bump the 255GB RAM limit in machvirt and to
support device memory in general, and especially PCDIMM/NVDIMM.

In machvirt versions < 4.0, the initial RAM starts at 1GB and can
grow up to 255GB. From 256GB onwards we find IO regions such as the
additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
MMIO region. The address map was 1TB large. This corresponded to
the max IPA capacity KVM was able to manage.

Since 4.20, the host kernel is able to support a larger and dynamic
IPA range. So the guest physical address can go beyond the 1TB. The
max GPA size depends on the host kernel configuration and physical CPUs.

In this series we use this feature and allow the RAM to grow without
any other limit than the one put by the host kernel.

The RAM still starts at 1GB. First comes the initial ram (-m) of size
ram_size and then comes the device memory (,maxmem) of size
maxram_size - ram_size. The device memory is potentially hotpluggable
depending on the instantiated memory objects.

IO regions previously located between 256GB and 1TB are moved after
the RAM. Their offset is dynamically computed, depends on ram_size
and maxram_size. Size alignment is enforced.

In case maxmem value is inferior to 255GB, the legacy memory map
still is used. The change of memory map becomes effective from 4.0
onwards.

As we keep the initial RAM at 1GB base address, we do not need to do
invasive changes in the EDK2 FW. It seems nobody is eager to do
that job at the moment.

Device memory being put just after the initial RAM, it is possible
to get access to this feature while keeping a 1TB address map.

This series reuses/rebases patches initially submitted by Shameer
in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.

Functionally, the series is split into 3 parts:
1) bump of the initial RAM limit [1 - 9] and change in
   the memory map
2) Support of PC-DIMM [10 - 13]
3) Support of NV-DIMM [14 - 17]

1) can be upstreamed before 2 and 2 can be upstreamed before 3.

Work is ongoing to transform the whole memory as device memory.
However this move is not trivial and to me, is independent on
the improvements brought by this series:
- if we were to use DIMM for initial RAM, those DIMMs would use
  use slots. Although they would not be part of the ones provided
  using the ",slots" options, they are ACPI limited resources.
- DT and ACPI description needs to be reworked
- NUMA integration needs special care
- a special device memory object may be required to avoid consuming
  slots and easing the FW description.

So I preferred to separate the concerns. This new implementation
based on device memory could be candidate for another virt
version.

Best Regards

Eric

References:

[0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
http://patchwork.ozlabs.org/cover/914694/

[1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html

This series can be found at:
https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7

History:

v6 -> v7:
- Addressed Peter and Igor comments (exceptions sent my email)
- Fixed TCG case. Now device memory works also for TCG and vcpu
  pamax is checked
- See individual logs for more details

v5 -> v6:
- mingw compilation issue fix
- kvm_arm_get_max_vm_phys_shift always returns the number of supported
  IPA bits
- new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
  of "hw/arm/virt: Split the memory map description"
- "hw/arm/virt: Move memory map initialization into machvirt_init"
  squashed into the previous patch
- change alignment of IO regions beyond the RAM so that it matches their
  size

v4 -> v5:
- change in the memory map
- see individual logs

v3 -> v4:
- rebase on David's "pc-dimm: next bunch of cleanups" and
  "pc-dimm: pre_plug "slot" and "addr" assignment"
- kvm-type option not used anymore. We directly use
  maxram_size and ram_size machine fields to compute the
  MAX IPA range. Migration is naturally handled as CLI
  option are kept between source and destination. This was
  suggested by David.
- device_memory_start and device_memory_size not stored
  anymore in vms->bootinfo
- I did not take into account 2 Igor's comments: the one
  related to the refactoring of arm_load_dtb and the one
  related to the generation of the dtb after system_reset
  which would contain nodes of hotplugged devices (we do
  not support hotplug at this stage)
- check the end-user does not attempt to hotplug a device
- addition of "vl: Set machine ram_size, maxram_size and
  ram_slots earlier"

v2 -> v3:
- fix pc_q35 and pc_piix compilation error
- kwangwoo's email being not valid anymore, remove his address

v1 -> v2:
- kvm_get_max_vm_phys_shift moved in arch specific file
- addition of NVDIMM part
- single series
- rebase on David's refactoring

v1:
- was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
- was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"

Best Regards

Eric


Eric Auger (12):
  hw/arm/virt: Rename highmem IO regions
  hw/arm/virt: Split the memory map description
  hw/boards: Add a MachineState parameter to kvm_type callback
  kvm: add kvm_arm_get_max_vm_ipa_size
  vl: Set machine ram_size, maxram_size and ram_slots earlier
  hw/arm/virt: Dynamic memory map depending on RAM requirements
  hw/arm/virt: Implement kvm_type function for 4.0 machine
  hw/arm/virt: Bump the 255GB initial RAM limit
  hw/arm/virt: Add memory hotplug framework
  hw/arm/virt: Allocate device_memory
  hw/arm/boot: Expose the pmem nodes in the DT
  hw/arm/virt: Add nvdimm and nvdimm-persistence options

Kwangwoo Lee (2):
  nvdimm: use configurable ACPI IO base and size
  hw/arm/virt: Add nvdimm hot-plug infrastructure

Shameer Kolothum (3):
  hw/arm/boot: introduce fdt_add_memory_node helper
  hw/arm/boot: Expose the PC-DIMM nodes in the DT
  hw/arm/virt-acpi-build: Add PC-DIMM in SRAT

 accel/kvm/kvm-all.c             |   2 +-
 default-configs/arm-softmmu.mak |   4 +
 hw/acpi/nvdimm.c                |  31 ++-
 hw/arm/boot.c                   | 136 ++++++++++--
 hw/arm/virt-acpi-build.c        |  23 +-
 hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
 hw/i386/pc_piix.c               |   6 +-
 hw/i386/pc_q35.c                |   6 +-
 hw/ppc/mac_newworld.c           |   3 +-
 hw/ppc/mac_oldworld.c           |   2 +-
 hw/ppc/spapr.c                  |   2 +-
 include/hw/arm/virt.h           |  24 ++-
 include/hw/boards.h             |   5 +-
 include/hw/mem/nvdimm.h         |   4 +
 target/arm/kvm.c                |  10 +
 target/arm/kvm_arm.h            |  13 ++
 vl.c                            |   6 +-
 17 files changed, 556 insertions(+), 85 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-21 14:58   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions Eric Auger
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

We introduce an helper to create a memory node.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

---

v6 -> v7:
- msg error in the caller
- add comment about NUMA ID
---
 hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d90af2f17d..a830655e1a 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -423,6 +423,32 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
     }
 }
 
+static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
+                               uint32_t scells, hwaddr mem_len,
+                               int numa_node_id)
+{
+    char *nodename;
+    int ret;
+
+    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
+    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+                                       scells, mem_len);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* only set the NUMA ID if it is specified */
+    if (numa_node_id >= 0) {
+        ret = qemu_fdt_setprop_cell(fdt, nodename,
+                                    "numa-node-id", numa_node_id);
+    }
+out:
+    g_free(nodename);
+    return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
     uint32_t cpu_suspend_fn;
@@ -502,7 +528,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     void *fdt = NULL;
     int size, rc, n = 0;
     uint32_t acells, scells;
-    char *nodename;
     unsigned int i;
     hwaddr mem_base, mem_len;
     char **node_path;
@@ -576,35 +601,24 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         mem_base = binfo->loader_start;
         for (i = 0; i < nb_numa_nodes; i++) {
             mem_len = numa_info[i].node_mem;
-            nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
-            qemu_fdt_add_subnode(fdt, nodename);
-            qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-            rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-                                              acells, mem_base,
-                                              scells, mem_len);
+            rc = fdt_add_memory_node(fdt, acells, mem_base,
+                                     scells, mem_len, i);
             if (rc < 0) {
-                fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
-                        i);
+                fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
+                        mem_base);
                 goto fail;
             }
 
-            qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
             mem_base += mem_len;
-            g_free(nodename);
         }
     } else {
-        nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
-        qemu_fdt_add_subnode(fdt, nodename);
-        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-
-        rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-                                          acells, binfo->loader_start,
-                                          scells, binfo->ram_size);
+        rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
+                                 scells, binfo->ram_size, -1);
         if (rc < 0) {
-            fprintf(stderr, "couldn't set %s reg\n", nodename);
+            fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
+                    binfo->loader_start);
             goto fail;
         }
-        g_free(nodename);
     }
 
     rc = fdt_path_offset(fdt, "/chosen");
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-21 15:05   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description Eric Auger
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

In preparation for a split of the memory map into a static
part and a dynamic part floating after the RAM, let's rename the
regions located after the RAM

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---
v7: added Peter's R-b
v6: creation
---
 hw/arm/virt-acpi-build.c |  8 ++++----
 hw/arm/virt.c            | 21 +++++++++++----------
 include/hw/arm/virt.h    |  8 ++++----
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 04b62c714d..829d2f0035 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -229,8 +229,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
                      size_pio));
 
     if (use_highmem) {
-        hwaddr base_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].base;
-        hwaddr size_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].size;
+        hwaddr base_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].base;
+        hwaddr size_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].size;
 
         aml_append(rbuf,
             aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
@@ -663,8 +663,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             gicr = acpi_data_push(table_data, sizeof(*gicr));
             gicr->type = ACPI_APIC_GENERIC_REDISTRIBUTOR;
             gicr->length = sizeof(*gicr);
-            gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST2].base);
-            gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST2].size);
+            gicr->base_address = cpu_to_le64(memmap[VIRT_HIGH_GIC_REDIST2].base);
+            gicr->range_length = cpu_to_le32(memmap[VIRT_HIGH_GIC_REDIST2].size);
         }
 
         if (its_class_name() && !vmc->no_its) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 99c2b6e60d..a1955e7764 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -150,10 +150,10 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
-    [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
-    [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
+    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
+    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
     /* Second PCIe window, 512GB wide at the 512GB boundary */
-    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
+    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
 };
 
 static const int a15irqmap[] = {
@@ -435,8 +435,8 @@ static void fdt_add_gic_node(VirtMachineState *vms)
                                          2, vms->memmap[VIRT_GIC_DIST].size,
                                          2, vms->memmap[VIRT_GIC_REDIST].base,
                                          2, vms->memmap[VIRT_GIC_REDIST].size,
-                                         2, vms->memmap[VIRT_GIC_REDIST2].base,
-                                         2, vms->memmap[VIRT_GIC_REDIST2].size);
+                                         2, vms->memmap[VIRT_HIGH_GIC_REDIST2].base,
+                                         2, vms->memmap[VIRT_HIGH_GIC_REDIST2].size);
         }
 
         if (vms->virt) {
@@ -584,7 +584,7 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
 
         if (nb_redist_regions == 2) {
             uint32_t redist1_capacity =
-                        vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+                    vms->memmap[VIRT_HIGH_GIC_REDIST2].size / GICV3_REDIST_SIZE;
 
             qdev_prop_set_uint32(gicdev, "redist-region-count[1]",
                 MIN(smp_cpus - redist0_count, redist1_capacity));
@@ -601,7 +601,8 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
     if (type == 3) {
         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_REDIST].base);
         if (nb_redist_regions == 2) {
-            sysbus_mmio_map(gicbusdev, 2, vms->memmap[VIRT_GIC_REDIST2].base);
+            sysbus_mmio_map(gicbusdev, 2,
+                            vms->memmap[VIRT_HIGH_GIC_REDIST2].base);
         }
     } else {
         sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_CPU].base);
@@ -1088,8 +1089,8 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
 {
     hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
     hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
-    hwaddr base_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].base;
-    hwaddr size_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].size;
+    hwaddr base_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].base;
+    hwaddr size_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].size;
     hwaddr base_pio = vms->memmap[VIRT_PCIE_PIO].base;
     hwaddr size_pio = vms->memmap[VIRT_PCIE_PIO].size;
     hwaddr base_ecam, size_ecam;
@@ -1418,7 +1419,7 @@ static void machvirt_init(MachineState *machine)
      */
     if (vms->gic_version == 3) {
         virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
-        virt_max_cpus += vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+        virt_max_cpus += vms->memmap[VIRT_HIGH_GIC_REDIST2].size / GICV3_REDIST_SIZE;
     } else {
         virt_max_cpus = GIC_NCPU;
     }
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 4cc57a7ef6..a27086d524 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -64,7 +64,7 @@ enum {
     VIRT_GIC_VCPU,
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
-    VIRT_GIC_REDIST2,
+    VIRT_HIGH_GIC_REDIST2,
     VIRT_SMMU,
     VIRT_UART,
     VIRT_MMIO,
@@ -74,9 +74,9 @@ enum {
     VIRT_PCIE_MMIO,
     VIRT_PCIE_PIO,
     VIRT_PCIE_ECAM,
-    VIRT_PCIE_ECAM_HIGH,
+    VIRT_HIGH_PCIE_ECAM,
     VIRT_PLATFORM_BUS,
-    VIRT_PCIE_MMIO_HIGH,
+    VIRT_HIGH_PCIE_MMIO,
     VIRT_GPIO,
     VIRT_SECURE_UART,
     VIRT_SECURE_MEM,
@@ -128,7 +128,7 @@ typedef struct {
     int psci_conduit;
 } VirtMachineState;
 
-#define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
+#define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
 
 #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
 #define VIRT_MACHINE(obj) \
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-21 16:19   ` Igor Mammedov
  2019-02-22  7:34   ` Heyi Guo
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

In the prospect to introduce an extended memory map supporting more
RAM, let's split the memory map array into two parts:

- the former a15memmap contains regions below and including the RAM
- extended_memmap, only initialized with entries located after the RAM.
  Only the size of the region is initialized there since their base
  address will be dynamically computed, depending on the top of the
  RAM (initial RAM at the moment), with same alignment as their size.

This new split will allow to grow the RAM size without changing the
description of the high regions.

The patch also moves the memory map setup into machvirt_init().
The rationale is the memory map will be soon affected by the
kvm_type() call that happens after virt_instance_init() and
before machvirt_init().

The memory map is unchanged (the top of the initial RAM still is
256GiB). Then come the high IO regions with same layout as before.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---
v6 -> v7:
- s/a15memmap/base_memmap
- slight rewording of the commit message
- add "if there is less than 256GiB of RAM then the floating area
  starts at the 256GiB mark" in the comment associated to the floating
  memory map
- Added Peter's R-b

v5 -> v6
- removal of many macros in units.h
- introduce the virt_set_memmap helper
- new computation for offsets of high IO regions
- add comments
---
 hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
 include/hw/arm/virt.h | 14 +++++++++----
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a1955e7764..12039a0367 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -29,6 +29,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "qapi/error.h"
 #include "hw/sysbus.h"
 #include "hw/arm/arm.h"
@@ -121,7 +122,7 @@
  * Note that devices should generally be placed at multiples of 0x10000,
  * to accommodate guests using 64K pages.
  */
-static const MemMapEntry a15memmap[] = {
+static const MemMapEntry base_memmap[] = {
     /* Space up to 0x8000000 is reserved for a boot ROM */
     [VIRT_FLASH] =              {          0, 0x08000000 },
     [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
@@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
     [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+};
+
+/*
+ * Highmem IO Regions: This memory map is floating, located after the RAM.
+ * Each IO region offset will be dynamically computed, depending on the
+ * top of the RAM, so that its base get the same alignment as the size,
+ * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
+ * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
+ */
+static MemMapEntry extended_memmap[] = {
     /* Additional 64 MB redist region (can contain up to 512 redistributors) */
-    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
-    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
-    /* Second PCIe window, 512GB wide at the 512GB boundary */
-    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
+    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
+    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
+    /* Second PCIe window */
+    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
 };
 
 static const int a15irqmap[] = {
@@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
     return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_memmap(VirtMachineState *vms)
+{
+    hwaddr base;
+    int i;
+
+    vms->memmap = extended_memmap;
+
+    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
+        vms->memmap[i] = base_memmap[i];
+    }
+
+    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
+    base = vms->high_io_base;
+
+    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+        hwaddr size = extended_memmap[i].size;
+
+        base = ROUND_UP(base, size);
+        vms->memmap[i].base = base;
+        vms->memmap[i].size = size;
+        base += size;
+    }
+}
+
 static void machvirt_init(MachineState *machine)
 {
     VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
     bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
     bool aarch64 = true;
 
+    virt_set_memmap(vms);
+
     /* We can probe only here because during property set
      * KVM is not available yet
      */
@@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
                                     "Valid values are none and smmuv3",
                                     NULL);
 
-    vms->memmap = a15memmap;
     vms->irqmap = a15irqmap;
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a27086d524..3dc7a6c5d5 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -64,7 +64,6 @@ enum {
     VIRT_GIC_VCPU,
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
-    VIRT_HIGH_GIC_REDIST2,
     VIRT_SMMU,
     VIRT_UART,
     VIRT_MMIO,
@@ -74,12 +73,18 @@ enum {
     VIRT_PCIE_MMIO,
     VIRT_PCIE_PIO,
     VIRT_PCIE_ECAM,
-    VIRT_HIGH_PCIE_ECAM,
     VIRT_PLATFORM_BUS,
-    VIRT_HIGH_PCIE_MMIO,
     VIRT_GPIO,
     VIRT_SECURE_UART,
     VIRT_SECURE_MEM,
+    VIRT_LOWMEMMAP_LAST,
+};
+
+/* indices of IO regions located after the RAM */
+enum {
+    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
+    VIRT_HIGH_PCIE_ECAM,
+    VIRT_HIGH_PCIE_MMIO,
 };
 
 typedef enum VirtIOMMUType {
@@ -116,7 +121,7 @@ typedef struct {
     int32_t gic_version;
     VirtIOMMUType iommu;
     struct arm_boot_info bootinfo;
-    const MemMapEntry *memmap;
+    MemMapEntry *memmap;
     const int *irqmap;
     int smp_cpus;
     void *fdt;
@@ -126,6 +131,7 @@ typedef struct {
     uint32_t msi_phandle;
     uint32_t iommu_phandle;
     int psci_conduit;
+    hwaddr high_io_base;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (2 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 10:18   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 05/17] kvm: add kvm_arm_get_max_vm_ipa_size Eric Auger
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

On ARM, the kvm_type will be resolved by querying the KVMState.
Let's add the MachineState handle to the callback so that we
can retrieve the  KVMState handle. in kvm_init, when the callback
is called, the kvm_state variable is not yet set.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
[ppc parts]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---
v6 -> v7:
- add a comment for kvm_type
- use machine instead of ms in the declaration
- add Peter's R-b
---
 accel/kvm/kvm-all.c   | 2 +-
 hw/ppc/mac_newworld.c | 3 +--
 hw/ppc/mac_oldworld.c | 2 +-
 hw/ppc/spapr.c        | 2 +-
 include/hw/boards.h   | 5 ++++-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fd92b6f375..241db496c3 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1593,7 +1593,7 @@ static int kvm_init(MachineState *ms)
 
     kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
     if (mc->kvm_type) {
-        type = mc->kvm_type(kvm_type);
+        type = mc->kvm_type(ms, kvm_type);
     } else if (kvm_type) {
         ret = -EINVAL;
         fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type);
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 98461052ac..97e8817145 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -564,8 +564,7 @@ static char *core99_fw_dev_path(FWPathProvider *p, BusState *bus,
 
     return NULL;
 }
-
-static int core99_kvm_type(const char *arg)
+static int core99_kvm_type(MachineState *machine, const char *arg)
 {
     /* Always force PR KVM */
     return 2;
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 284431ddd6..cc1e463466 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -420,7 +420,7 @@ static char *heathrow_fw_dev_path(FWPathProvider *p, BusState *bus,
     return NULL;
 }
 
-static int heathrow_kvm_type(const char *arg)
+static int heathrow_kvm_type(MachineState *machine, const char *arg)
 {
     /* Always force PR KVM */
     return 2;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index abf9ebce59..3d0811fa81 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2920,7 +2920,7 @@ static void spapr_machine_init(MachineState *machine)
     }
 }
 
-static int spapr_kvm_type(const char *vm_type)
+static int spapr_kvm_type(MachineState *machine, const char *vm_type)
 {
     if (!vm_type) {
         return 0;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 05f9f45c3d..ed2fec82d5 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -156,6 +156,9 @@ typedef struct {
  *    should instead use "unimplemented-device" for all memory ranges where
  *    the guest will attempt to probe for a device that QEMU doesn't
  *    implement and a stub device is required.
+ * @kvm_type:
+ *    Return the type of KVM corresponding to the kvm-type string option or
+ *    computed based on other criteria such as the host kernel capabilities.
  */
 struct MachineClass {
     /*< private >*/
@@ -171,7 +174,7 @@ struct MachineClass {
     void (*init)(MachineState *state);
     void (*reset)(void);
     void (*hot_add_cpu)(const int64_t id, Error **errp);
-    int (*kvm_type)(const char *arg);
+    int (*kvm_type)(MachineState *machine, const char *arg);
 
     BlockInterfaceType block_default_type;
     int units_per_default_bus;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 05/17] kvm: add kvm_arm_get_max_vm_ipa_size
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (3 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier Eric Auger
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

Add the kvm_arm_get_max_vm_ipa_size() helper that returns the
number of bits in the IPA address space supported by KVM.

This capability needs to be known to create the VM with a
specific IPA max size (kvm_type passed along KVM_CREATE_VM ioctl.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7:
- s/kvm_arm_get_max_vm_phys_shift/kvm_arm_get_max_vm_ipa_size
- reword the comment

v4 -> v5:
- return 40 if the host does not support the capability

v3 -> v4:
- s/s/ms in kvm_arm_get_max_vm_phys_shift function comment
- check KVM_CAP_ARM_VM_IPA_SIZE extension

v1 -> v2:
- put this in ARM specific code
---
 target/arm/kvm.c     | 10 ++++++++++
 target/arm/kvm_arm.h | 13 +++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index e00ccf9c98..79a79f0190 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
 #include "kvm_arm.h"
 #include "cpu.h"
 #include "trace.h"
@@ -162,6 +163,15 @@ void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
     env->features = arm_host_cpu_features.features;
 }
 
+int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
+{
+    KVMState *s = KVM_STATE(ms->accelerator);
+    int ret;
+
+    ret = kvm_check_extension(s, KVM_CAP_ARM_VM_IPA_SIZE);
+    return ret > 0 ? ret : 40;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     /* For ARM interrupt delivery is always asynchronous,
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 6393455b1d..2a07333c61 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -207,6 +207,14 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
  */
 void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu);
 
+/**
+ * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
+ * IPA address space supported by KVM
+ *
+ * @ms: Machine state handle
+ */
+int kvm_arm_get_max_vm_ipa_size(MachineState *ms);
+
 /**
  * kvm_arm_sync_mpstate_to_kvm
  * @cpu: ARMCPU
@@ -239,6 +247,11 @@ static inline void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
     cpu->host_cpu_probe_failed = true;
 }
 
+static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
+{
+    return -ENOENT;
+}
+
 static inline int kvm_arm_vgic_probe(void)
 {
     return 0;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (4 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 05/17] kvm: add kvm_arm_get_max_vm_ipa_size Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 10:40   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements Eric Auger
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

The machine RAM attributes will need to be analyzed during the
configure_accelerator() process. especially kvm_type() arm64
machine callback will use them to know how many IPA/GPA bits are
needed to model the whole RAM range. So let's assign those machine
state fields before calling configure_accelerator.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---
v6 -> v7:
- add Peter's R-b

v4: new
---
 vl.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vl.c b/vl.c
index 502857a176..fd0d51320d 100644
--- a/vl.c
+++ b/vl.c
@@ -4239,6 +4239,9 @@ int main(int argc, char **argv, char **envp)
     machine_opts = qemu_get_machine_opts();
     qemu_opt_foreach(machine_opts, machine_set_property, current_machine,
                      &error_fatal);
+    current_machine->ram_size = ram_size;
+    current_machine->maxram_size = maxram_size;
+    current_machine->ram_slots = ram_slots;
 
     configure_accelerator(current_machine, argv[0]);
 
@@ -4434,9 +4437,6 @@ int main(int argc, char **argv, char **envp)
     replay_checkpoint(CHECKPOINT_INIT);
     qdev_machine_init();
 
-    current_machine->ram_size = ram_size;
-    current_machine->maxram_size = maxram_size;
-    current_machine->ram_slots = ram_slots;
     current_machine->boot_order = boot_order;
 
     /* parse features once if machine provides default cpu_type */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (5 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 12:57   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine Eric Auger
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

Up to now the memory map has been static and the high IO region
base has always been 256GiB.

This patch modifies the virt_set_memmap() function, which freezes
the memory map, so that the high IO range base becomes floating,
located after the initial RAM and the device memory.

The function computes
- the base of the device memory,
- the size of the device memory and
- the highest GPA used in the memory map.

The two former will be used when defining the device memory region
while the latter will be used at VM creation to choose the requested
IPA size.

Setting all the existing highmem IO regions beyond the RAM
allows to have a single contiguous RAM region (initial RAM and
possible hotpluggable device memory). That way we do not need
to do invasive changes in the EDK2 FW to support a dynamic
RAM base.

Still the user cannot request an initial RAM size greater than 255GB.
Also we handle the case where maxmem or slots options are passed,
although no device memory is usable at the moment. In this case, we
just ignore those settings.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c         | 47 ++++++++++++++++++++++++++++++++++---------
 include/hw/arm/virt.h |  3 +++
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 12039a0367..9db602457b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -107,8 +107,9 @@
  * of a terabyte of RAM will be doing it on a host with more than a
  * terabyte of physical address space.)
  */
-#define RAMLIMIT_GB 255
-#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
+#define RAMBASE GiB
+#define LEGACY_RAMLIMIT_GB 255
+#define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
 
 /* Addresses and sizes of our components.
  * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
@@ -149,7 +150,7 @@ static const MemMapEntry base_memmap[] = {
     [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
     [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
     [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
-    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
+    [VIRT_MEM] =                { RAMBASE, LEGACY_RAMLIMIT_BYTES },
 };
 
 /*
@@ -1367,16 +1368,48 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
 
 static void virt_set_memmap(VirtMachineState *vms)
 {
+    MachineState *ms = MACHINE(vms);
     hwaddr base;
     int i;
 
+    if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
+        error_report("mach-virt: does not support device memory: "
+                     "ignore maxmem and slots options");
+        ms->maxram_size = ms->ram_size;
+        ms->ram_slots = 0;
+    }
+    if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
+        error_report("mach-virt: cannot model more than %dGB RAM",
+                     LEGACY_RAMLIMIT_GB);
+        exit(1);
+    }
+
     vms->memmap = extended_memmap;
 
     for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
         vms->memmap[i] = base_memmap[i];
     }
 
-    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
+    /*
+     * We compute the base of the high IO region depending on the
+     * amount of initial and device memory. The device memory start/size
+     * is aligned on 1GiB. We never put the high IO region below 256GiB
+     * so that if maxram_size is < 255GiB we keep the legacy memory map.
+     * The device region size assumes 1GiB page max alignment per slot.
+     */
+    vms->device_memory_base = ROUND_UP(RAMBASE + ms->ram_size, GiB);
+    vms->device_memory_size = ms->maxram_size - ms->ram_size +
+                              ms->ram_slots * GiB;
+
+    vms->high_io_base = vms->device_memory_base +
+                        ROUND_UP(vms->device_memory_size, GiB);
+    if (vms->high_io_base < vms->device_memory_base) {
+        error_report("maxmem/slots too huge");
+        exit(EXIT_FAILURE);
+    }
+    if (vms->high_io_base < 256 * GiB) {
+        vms->high_io_base = 256 * GiB;
+    }
     base = vms->high_io_base;
 
     for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
@@ -1387,6 +1420,7 @@ static void virt_set_memmap(VirtMachineState *vms)
         vms->memmap[i].size = size;
         base += size;
     }
+    vms->highest_gpa = base - 1;
 }
 
 static void machvirt_init(MachineState *machine)
@@ -1470,11 +1504,6 @@ static void machvirt_init(MachineState *machine)
 
     vms->smp_cpus = smp_cpus;
 
-    if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
-        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
-        exit(1);
-    }
-
     if (vms->virt && kvm_enabled()) {
         error_report("mach-virt: KVM does not support providing "
                      "Virtualization extensions to the guest CPU");
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 3dc7a6c5d5..acad0400d8 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -132,6 +132,9 @@ typedef struct {
     uint32_t iommu_phandle;
     int psci_conduit;
     hwaddr high_io_base;
+    hwaddr highest_gpa;
+    hwaddr device_memory_base;
+    hwaddr device_memory_size;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (6 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 12:45   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 09/17] hw/arm/virt: Bump the 255GB initial RAM limit Eric Auger
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

This patch implements the machine class kvm_type() callback.
It returns the number of bits requested to implement the whole GPA
range including the RAM and IO regions located beyond.
The returned value in passed though the KVM_CREATE_VM ioctl and
this allows KVM to set the stage2 tables dynamically.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v6 -> v7:
- Introduce RAMBASE and rename add LEGACY_ prefix in that patch
- use local variables with explicit names in virt_set_memmap:
  device_memory_base, device_memory_size
- add an extended_memmap field in the class

v5 -> v6:
- add some comments
- high IO region cannot start before 256GiB
---
 hw/arm/virt.c         | 50 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/arm/virt.h |  2 ++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9db602457b..ad3a0ad73d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
     bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
     bool aarch64 = true;
 
-    virt_set_memmap(vms);
+    /*
+     * In accelerated mode, the memory map is computed in kvm_type(),
+     * if set, to create a VM with the right number of IPA bits.
+     */
+
+    if (!mc->kvm_type || !kvm_enabled()) {
+        virt_set_memmap(vms);
+    }
 
     /* We can probe only here because during property set
      * KVM is not available yet
@@ -1814,6 +1821,36 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
     return NULL;
 }
 
+/*
+ * for arm64 kvm_type [7-0] encodes the requested number of bits
+ * in the IPA address space
+ */
+static int virt_kvm_type(MachineState *ms, const char *type_str)
+{
+    VirtMachineState *vms = VIRT_MACHINE(ms);
+    int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
+    int requested_pa_size;
+
+    /* we freeze the memory map to compute the highest gpa */
+    virt_set_memmap(vms);
+
+    requested_pa_size = 64 - clz64(vms->highest_gpa);
+
+    if (requested_pa_size > max_vm_pa_size) {
+        error_report("-m and ,maxmem option values "
+                     "require an IPA range (%d bits) larger than "
+                     "the one supported by the host (%d bits)",
+                     requested_pa_size, max_vm_pa_size);
+       exit(1);
+    }
+    /*
+     * By default we return 0 which corresponds to an implicit legacy
+     * 40b IPA setting. Otherwise we return the actual requested PA
+     * logsize
+     */
+    return requested_pa_size > 40 ? requested_pa_size : 0;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
     mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+    mc->kvm_type = virt_kvm_type;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
     hc->plug = virt_machine_device_plug_cb;
@@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
                                     "Valid values are none and smmuv3",
                                     NULL);
 
+    if (vmc->no_extended_memmap) {
+        vms->extended_memmap = false;
+    } else {
+        vms->extended_memmap = true;
+    }
+
     vms->irqmap = a15irqmap;
 }
 
@@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
 
 static void virt_machine_3_1_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
     virt_machine_4_0_options(mc);
     compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
+
+    /* extended memory map is enabled from 4.0 onwards */
+    vmc->no_extended_memmap = true;
 }
 DEFINE_VIRT_MACHINE(3, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index acad0400d8..7798462cb0 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -106,6 +106,7 @@ typedef struct {
     bool claim_edge_triggered_timers;
     bool smbios_old_sys_ver;
     bool no_highmem_ecam;
+    bool no_extended_memmap;
 } VirtMachineClass;
 
 typedef struct {
@@ -135,6 +136,7 @@ typedef struct {
     hwaddr highest_gpa;
     hwaddr device_memory_base;
     hwaddr device_memory_size;
+    bool extended_memmap;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 09/17] hw/arm/virt: Bump the 255GB initial RAM limit
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (7 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework Eric Auger
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

Now we have the extended memory map (high IO regions beyond the
scalable RAM) and dynamic IPA range support at KVM/ARM level
we can bump the legacy 255GB initial RAM limit. The actual maximum
RAM size now depends on the physical CPU and host kernel.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7
- handle TCG case
- set_memmap modifications moved to previous patches
---
 hw/arm/virt.c | 54 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ad3a0ad73d..5b656f9db5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -59,6 +59,7 @@
 #include "qapi/visitor.h"
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
+#include "target/arm/internals.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -93,21 +94,8 @@
 
 #define PLATFORM_BUS_NUM_IRQS 64
 
-/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
- * RAM can go up to the 256GB mark, leaving 256GB of the physical
- * address space unallocated and free for future use between 256G and 512G.
- * If we need to provide more RAM to VMs in the future then we need to:
- *  * allocate a second bank of RAM starting at 2TB and working up
- *  * fix the DT and ACPI table generation code in QEMU to correctly
- *    report two split lumps of RAM to the guest
- *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
- * (We don't want to fill all the way up to 512GB with RAM because
- * we might want it for non-RAM purposes later. Conversely it seems
- * reasonable to assume that anybody configuring a VM with a quarter
- * of a terabyte of RAM will be doing it on a host with more than a
- * terabyte of physical address space.)
- */
 #define RAMBASE GiB
+/* Legacy RAM limit in GB (< version 4.0) */
 #define LEGACY_RAMLIMIT_GB 255
 #define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
 
@@ -1372,16 +1360,18 @@ static void virt_set_memmap(VirtMachineState *vms)
     hwaddr base;
     int i;
 
-    if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
-        error_report("mach-virt: does not support device memory: "
-                     "ignore maxmem and slots options");
-        ms->maxram_size = ms->ram_size;
-        ms->ram_slots = 0;
-    }
-    if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
-        error_report("mach-virt: cannot model more than %dGB RAM",
-                     LEGACY_RAMLIMIT_GB);
-        exit(1);
+    if (!vms->extended_memmap) {
+        if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
+            error_report("mach-virt: does not support device memory: "
+                         "ignore maxmem and slots options");
+            ms->maxram_size = ms->ram_size;
+            ms->ram_slots = 0;
+        }
+        if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
+            error_report("mach-virt: cannot model more than %dGB RAM",
+                         LEGACY_RAMLIMIT_GB);
+            exit(1);
+        }
     }
 
     vms->memmap = extended_memmap;
@@ -1598,6 +1588,22 @@ static void machvirt_init(MachineState *machine)
     fdt_add_timer_nodes(vms);
     fdt_add_cpu_nodes(vms);
 
+   if (!kvm_enabled()) {
+        ARMCPU *cpu = ARM_CPU(first_cpu);
+        bool aarch64 = object_property_get_bool(OBJECT(cpu), "aarch64", NULL);
+
+        if (aarch64 && vms->highmem) {
+            int requested_pa_size, pamax = arm_pamax(cpu);
+
+            requested_pa_size = 64 - clz64(vms->highest_gpa);
+            if (pamax < requested_pa_size) {
+                error_report("VCPU supports less PA bits (%d) than requested "
+                            "by the memory map (%d)", pamax, requested_pa_size);
+                exit(1);
+            }
+        }
+    }
+
     memory_region_allocate_system_memory(ram, NULL, "mach-virt.ram",
                                          machine->ram_size);
     memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (8 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 09/17] hw/arm/virt: Bump the 255GB initial RAM limit Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 13:25   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

This patch adds the the memory hot-plug/hot-unplug infrastructure
in machvirt. It is still not enabled as no device memory is allocated.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>

---
v4 -> v5:
- change in pc_dimm_pre_plug signature
- CONFIG_MEM_HOTPLUG replaced by CONFIG_MEM_DEVICE and CONFIG_DIMM

v3 -> v4:
- check the memory device is not hotplugged

v2 -> v3:
- change in pc_dimm_plug()'s signature
- add pc_dimm_pre_plug call

v1 -> v2:
- s/virt_dimm_plug|unplug/virt_memory_plug|unplug
- s/pc_dimm_memory_plug/pc_dimm_plug
- reworded title and commit message
- added pre_plug cb
- don't handle get_memory_region failure anymore
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt.c                   | 64 ++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 734ca721e9..0a78421f72 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -163,3 +163,5 @@ CONFIG_PCI_EXPRESS_DESIGNWARE=y
 CONFIG_STRONGARM=y
 CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5b656f9db5..470ca0ce2d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -60,6 +60,8 @@
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
 #include "target/arm/internals.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/mem/nvdimm.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1804,6 +1806,49 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
     return ms->possible_cpus;
 }
 
+static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                 Error **errp)
+{
+    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+
+    if (dev->hotplugged) {
+        error_setg(errp, "memory hotplug is not supported");
+    }
+
+    if (is_nvdimm) {
+        error_setg(errp, "nvdimm is not yet supported");
+        return;
+    }
+
+    pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
+}
+
+static void virt_memory_plug(HotplugHandler *hotplug_dev,
+                             DeviceState *dev, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+    Error *local_err = NULL;
+
+    pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), &local_err);
+
+    error_propagate(errp, local_err);
+}
+
+static void virt_memory_unplug(HotplugHandler *hotplug_dev,
+                               DeviceState *dev, Error **errp)
+{
+    pc_dimm_unplug(PC_DIMM(dev), MACHINE(hotplug_dev));
+    object_unparent(OBJECT(dev));
+}
+
+static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
+                                            DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        virt_memory_pre_plug(hotplug_dev, dev, errp);
+    }
+}
+
 static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
@@ -1815,12 +1860,27 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                      SYS_BUS_DEVICE(dev));
         }
     }
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+            virt_memory_plug(hotplug_dev, dev, errp);
+    }
+}
+
+static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
+                                          DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        virt_memory_unplug(hotplug_dev, dev, errp);
+    } else {
+        error_setg(errp, "device unplug request for unsupported device"
+                   " type: %s", object_get_typename(OBJECT(dev)));
+    }
 }
 
 static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
                                                         DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
+       (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
         return HOTPLUG_HANDLER(machine);
     }
 
@@ -1884,7 +1944,9 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->kvm_type = virt_kvm_type;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
+    hc->pre_plug = virt_machine_device_pre_plug_cb;
     hc->plug = virt_machine_device_plug_cb;
+    hc->unplug = virt_machine_device_unplug_cb;
 }
 
 static void virt_instance_init(Object *obj)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (9 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 13:30   ` Igor Mammedov
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 12/17] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

This patch adds memory nodes corresponding to PC-DIMM regions.

NV_DIMM and ACPI_NVDIMM configs are not yet set for ARM so we
don't need to care about NV-DIMM at this stage.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7:
- rework the error messages, use a switch/case
v3 -> v4:
- git rid of @base and @len in fdt_add_hotpluggable_memory_nodes

v1 -> v2:
- added qapi_free_MemoryDeviceInfoList and simplify the loop
---
 hw/arm/boot.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index a830655e1a..255aaca0cf 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -19,6 +19,7 @@
 #include "sysemu/numa.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
+#include "hw/mem/memory-device.h"
 #include "elf.h"
 #include "sysemu/device_tree.h"
 #include "qemu/config-file.h"
@@ -522,6 +523,41 @@ static void fdt_add_psci_node(void *fdt)
     qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
+static int fdt_add_hotpluggable_memory_nodes(void *fdt,
+                                             uint32_t acells, uint32_t scells) {
+    MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
+    MemoryDeviceInfo *mi;
+    int ret = 0;
+
+    for (info = info_list; info != NULL; info = info->next) {
+        mi = info->value;
+        switch (mi->type) {
+        case MEMORY_DEVICE_INFO_KIND_DIMM:
+        {
+            PCDIMMDeviceInfo *di = mi->u.dimm.data;
+
+            ret = fdt_add_memory_node(fdt, acells, di->addr,
+                                      scells, di->size, di->node);
+            if (ret) {
+                fprintf(stderr,
+                        "couldn't add PCDIMM /memory@%"PRIx64" node\n",
+                        di->addr);
+                goto out;
+            }
+            break;
+        }
+        default:
+            fprintf(stderr, "%s memory nodes are not yet supported\n",
+                    MemoryDeviceInfoKind_str(mi->type));
+            ret = -ENOENT;
+            goto out;
+        }
+    }
+out:
+    qapi_free_MemoryDeviceInfoList(info_list);
+    return ret;
+}
+
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
                  hwaddr addr_limit, AddressSpace *as)
 {
@@ -621,6 +657,12 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         }
     }
 
+    rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells);
+    if (rc < 0) {
+            fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
+            goto fail;
+    }
+
     rc = fdt_path_offset(fdt, "/chosen");
     if (rc < 0) {
         qemu_fdt_add_subnode(fdt, "/chosen");
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 12/17] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (10 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory Eric Auger
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Generate Memory Affinity Structures for PC-DIMM ranges.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>

---

v6 -> v7:
- add Igor's R-b

v5 -> v6:
- fix mingw compil issue

v4 -> v5:
- Align to x86 code and especially
  "pc: acpi: revert back to 1 SRAT entry for hotpluggable area"

v3 -> v4:
- do not use vms->bootinfo.device_memory_start/device_memory_size anymore

v1 -> v2:
- build_srat_hotpluggable_memory movedc to aml-build
---
 hw/arm/virt-acpi-build.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 829d2f0035..781eafaf5e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -516,6 +516,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     int i, srat_start;
     uint64_t mem_base;
     MachineClass *mc = MACHINE_GET_CLASS(vms);
+    MachineState *ms = MACHINE(vms);
     const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
 
     srat_start = table_data->len;
@@ -541,6 +542,14 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         }
     }
 
+    if (ms->device_memory) {
+        numamem = acpi_data_push(table_data, sizeof *numamem);
+        build_srat_memory(numamem, ms->device_memory->base,
+                          memory_region_size(&ms->device_memory->mr),
+                          nb_numa_nodes - 1,
+                          MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+    }
+
     build_header(linker, table_data, (void *)(table_data->data + srat_start),
                  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (11 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 12/17] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
@ 2019-02-20 22:39 ` Eric Auger
  2019-02-22 13:48   ` Igor Mammedov
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size Eric Auger
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:39 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

The device memory region is located after the initial RAM.
its start/size are 1GB aligned.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>

---
v6 -> v7:
- check the device memory top does not wrap
- check the device memory can fit the slots

v4 -> v5:
- device memory set after the initial RAM

v3 -> v4:
- remove bootinfo.device_memory_start/device_memory_size
- rename VIRT_HOTPLUG_MEM into VIRT_DEVICE_MEM
---
 hw/arm/virt.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 470ca0ce2d..33ad9b3f63 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -62,6 +62,7 @@
 #include "target/arm/internals.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/acpi/acpi.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1263,6 +1264,34 @@ static void create_secure_ram(VirtMachineState *vms,
     g_free(nodename);
 }
 
+static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
+{
+    MachineState *ms = MACHINE(vms);
+
+    if (!vms->device_memory_size) {
+        return;
+    }
+
+    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
+        error_report("unsupported number of memory slots: %"PRIu64,
+                     ms->ram_slots);
+        exit(EXIT_FAILURE);
+    }
+
+    if (QEMU_ALIGN_UP(ms->maxram_size, GiB) != ms->maxram_size) {
+        error_report("maximum memory size must be GiB aligned");
+        exit(EXIT_FAILURE);
+    }
+
+    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
+    ms->device_memory->base = vms->device_memory_base;
+
+    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
+                       "device-memory", vms->device_memory_size);
+    memory_region_add_subregion(sysmem, ms->device_memory->base,
+                                &ms->device_memory->mr);
+}
+
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 {
     const VirtMachineState *board = container_of(binfo, VirtMachineState,
@@ -1610,6 +1639,10 @@ static void machvirt_init(MachineState *machine)
                                          machine->ram_size);
     memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
 
+    if (vms->extended_memmap) {
+        create_device_memory(vms, sysmem);
+    }
+
     create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
 
     create_gic(vms, pic);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (12 preceding siblings ...)
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory Eric Auger
@ 2019-02-20 22:40 ` Eric Auger
  2019-02-22 15:28   ` Igor Mammedov
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:40 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

From: Kwangwoo Lee <kwangwoo.lee@sk.com>

This patch uses configurable IO base and size to create NPIO AML for
ACPI NFIT. Since a different architecture like AArch64 does not use
port-mapped IO, a configurable IO base is required to create correct
mapping of ACPI IO address and size.

Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v6 -> v7:
- Use NvdimmDsmIO constant
- use AcpiGenericAddress instead of AcpiNVDIMMIOEntry

v2 -> v3:
- s/size/len in pc_piix.c and pc_q35.c
---
 hw/acpi/nvdimm.c        | 31 ++++++++++++++++++++++---------
 hw/i386/pc_piix.c       |  6 +++++-
 hw/i386/pc_q35.c        |  6 +++++-
 include/hw/mem/nvdimm.h |  4 ++++
 4 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e53b2cb681..fddc790945 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -33,6 +33,9 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
+const struct AcpiGenericAddress NvdimmDsmIO = { .space_id = AML_AS_SYSTEM_IO,
+        .bit_width = NVDIMM_ACPI_IO_LEN << 3, .address = NVDIMM_ACPI_IO_BASE};
+
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
     GSList **list = opaque;
@@ -929,8 +932,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
                             FWCfgState *fw_cfg, Object *owner)
 {
     memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
-                          "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
-    memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
+                          "nvdimm-acpi-io", state->dsm_io.bit_width >> 3);
+    memory_region_add_subregion(io, state->dsm_io.address, &state->io_mr);
 
     state->dsm_mem = g_array_new(false, true /* clear */, 1);
     acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
@@ -959,12 +962,14 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 
 #define NVDIMM_QEMU_RSVD_UUID   "648B9CF2-CDA1-4312-8AD9-49C4AF32BD62"
 
-static void nvdimm_build_common_dsm(Aml *dev)
+static void nvdimm_build_common_dsm(Aml *dev,
+                                    AcpiNVDIMMState *acpi_nvdimm_state)
 {
     Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem, *elsectx2;
     Aml *elsectx, *unsupport, *unpatched, *expected_uuid, *uuid_invalid;
     Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf, *dsm_out_buf_size;
     uint8_t byte_list[1];
+    AmlRegionSpace rs;
 
     method = aml_method(NVDIMM_COMMON_DSM, 5, AML_SERIALIZED);
     uuid = aml_arg(0);
@@ -975,9 +980,16 @@ static void nvdimm_build_common_dsm(Aml *dev)
 
     aml_append(method, aml_store(aml_name(NVDIMM_ACPI_MEM_ADDR), dsm_mem));
 
+    if (acpi_nvdimm_state->dsm_io.space_id == AML_AS_SYSTEM_IO) {
+        rs = AML_SYSTEM_IO;
+    } else {
+        rs = AML_SYSTEM_MEMORY;
+    }
+
     /* map DSM memory and IO into ACPI namespace. */
-    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, AML_SYSTEM_IO,
-               aml_int(NVDIMM_ACPI_IO_BASE), NVDIMM_ACPI_IO_LEN));
+    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, rs,
+               aml_int(acpi_nvdimm_state->dsm_io.address),
+               acpi_nvdimm_state->dsm_io.bit_width >> 3));
     aml_append(method, aml_operation_region(NVDIMM_DSM_MEMORY,
                AML_SYSTEM_MEMORY, dsm_mem, sizeof(NvdimmDsmIn)));
 
@@ -1260,7 +1272,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
 }
 
 static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-                              BIOSLinker *linker, GArray *dsm_dma_arrea,
+                              BIOSLinker *linker,
+                              AcpiNVDIMMState *acpi_nvdimm_state,
                               uint32_t ram_slots)
 {
     Aml *ssdt, *sb_scope, *dev;
@@ -1288,7 +1301,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
      */
     aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
 
-    nvdimm_build_common_dsm(dev);
+    nvdimm_build_common_dsm(dev, acpi_nvdimm_state);
 
     /* 0 is reserved for root device. */
     nvdimm_build_device_dsm(dev, 0);
@@ -1307,7 +1320,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
                                                NVDIMM_ACPI_MEM_ADDR);
 
     bios_linker_loader_alloc(linker,
-                             NVDIMM_DSM_MEM_FILE, dsm_dma_arrea,
+                             NVDIMM_DSM_MEM_FILE, acpi_nvdimm_state->dsm_mem,
                              sizeof(NvdimmDsmIn), false /* high memory */);
     bios_linker_loader_add_pointer(linker,
         ACPI_BUILD_TABLE_FILE, mem_addr_offset, sizeof(uint32_t),
@@ -1329,7 +1342,7 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
         return;
     }
 
-    nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+    nvdimm_build_ssdt(table_offsets, table_data, linker, state,
                       ram_slots);
 
     device_list = nvdimm_get_device_list();
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index fd0f2c268f..d0a262d106 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -298,7 +298,11 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcms->acpi_nvdimm_state.is_enabled) {
-        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
+        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io = NvdimmDsmIO;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io,
                                pcms->fw_cfg, OBJECT(pcms));
     }
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 4a175ea50e..21f594001f 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -330,7 +330,11 @@ static void pc_q35_init(MachineState *machine)
     pc_nic_init(pcmc, isa_bus, host_bus);
 
     if (pcms->acpi_nvdimm_state.is_enabled) {
-        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
+        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io = NvdimmDsmIO;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io,
                                pcms->fw_cfg, OBJECT(pcms));
     }
 }
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index c5c9b3c7f8..ead51d958d 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -25,6 +25,7 @@
 
 #include "hw/mem/pc-dimm.h"
 #include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/aml-build.h"
 
 #define NVDIMM_DEBUG 0
 #define nvdimm_debug(fmt, ...)                                \
@@ -123,6 +124,8 @@ struct NvdimmFitBuffer {
 };
 typedef struct NvdimmFitBuffer NvdimmFitBuffer;
 
+extern const struct AcpiGenericAddress NvdimmDsmIO;
+
 struct AcpiNVDIMMState {
     /* detect if NVDIMM support is enabled. */
     bool is_enabled;
@@ -140,6 +143,7 @@ struct AcpiNVDIMMState {
      */
     int32_t persistence;
     char    *persistence_string;
+    struct AcpiGenericAddress dsm_io;
 };
 typedef struct AcpiNVDIMMState AcpiNVDIMMState;
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (13 preceding siblings ...)
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size Eric Auger
@ 2019-02-20 22:40 ` Eric Auger
  2019-02-22 15:36   ` Igor Mammedov
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 16/17] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:40 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

From: Kwangwoo Lee <kwangwoo.lee@sk.com>

Pre-plug and plug handlers are prepared for NVDIMM support.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt-acpi-build.c        |  6 ++++++
 hw/arm/virt.c                   | 22 ++++++++++++++++++++++
 include/hw/arm/virt.h           |  3 +++
 4 files changed, 33 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 0a78421f72..03dbebb197 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -165,3 +165,5 @@ CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
+CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 781eafaf5e..f086adfa82 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -784,6 +784,7 @@ static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+    MachineState *ms = MACHINE(vms);
     GArray *table_offsets;
     unsigned dsdt, xsdt;
     GArray *tables_blob = tables->table_data;
@@ -824,6 +825,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
         }
     }
 
+    if (vms->acpi_nvdimm_state.is_enabled) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
+                          &vms->acpi_nvdimm_state, ms->ram_slots);
+    }
+
     if (its_class_name() && !vmc->no_its) {
         acpi_add_table(table_offsets, tables_blob);
         build_iort(tables_blob, tables->linker, vms);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 33ad9b3f63..1896920570 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -134,6 +134,7 @@ static const MemMapEntry base_memmap[] = {
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
     [VIRT_SMMU] =               { 0x09050000, 0x00020000 },
+    [VIRT_ACPI_IO] =            { 0x09070000, 0x00010000 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
@@ -1675,6 +1676,18 @@ static void machvirt_init(MachineState *machine)
 
     create_platform_bus(vms, pic);
 
+    if (vms->acpi_nvdimm_state.is_enabled) {
+        AcpiNVDIMMState *acpi_nvdimm_state = &vms->acpi_nvdimm_state;
+
+        acpi_nvdimm_state->dsm_io.space_id = AML_AS_SYSTEM_MEMORY;
+        acpi_nvdimm_state->dsm_io.address =
+                vms->memmap[VIRT_ACPI_IO].base + NVDIMM_ACPI_IO_BASE;
+        acpi_nvdimm_state->dsm_io.bit_width = NVDIMM_ACPI_IO_LEN << 3;
+
+        nvdimm_init_acpi_state(acpi_nvdimm_state, sysmem,
+                               vms->fw_cfg, OBJECT(vms));
+    }
+
     vms->bootinfo.ram_size = machine->ram_size;
     vms->bootinfo.kernel_filename = machine->kernel_filename;
     vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
@@ -1860,10 +1873,19 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
                              DeviceState *dev, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
     Error *local_err = NULL;
 
     pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), &local_err);
+    if (local_err) {
+        goto out;
+    }
 
+    if (is_nvdimm) {
+        nvdimm_plug(&vms->acpi_nvdimm_state);
+    }
+
+out:
     error_propagate(errp, local_err);
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 7798462cb0..bd9cf68311 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -37,6 +37,7 @@
 #include "hw/arm/arm.h"
 #include "sysemu/kvm.h"
 #include "hw/intc/arm_gicv3_common.h"
+#include "hw/mem/nvdimm.h"
 
 #define NUM_GICV2M_SPIS       64
 #define NUM_VIRTIO_TRANSPORTS 32
@@ -77,6 +78,7 @@ enum {
     VIRT_GPIO,
     VIRT_SECURE_UART,
     VIRT_SECURE_MEM,
+    VIRT_ACPI_IO,
     VIRT_LOWMEMMAP_LAST,
 };
 
@@ -137,6 +139,7 @@ typedef struct {
     hwaddr device_memory_base;
     hwaddr device_memory_size;
     bool extended_memmap;
+    AcpiNVDIMMState acpi_nvdimm_state;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 16/17] hw/arm/boot: Expose the pmem nodes in the DT
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (14 preceding siblings ...)
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
@ 2019-02-20 22:40 ` Eric Auger
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:40 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

In case of NV-DIMM slots, let's add /pmem DT nodes.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v6 -> v7
- does the same rework as for fdt_add_memory_node
---
 hw/arm/boot.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 255aaca0cf..66caf005e5 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -450,6 +450,32 @@ out:
     return ret;
 }
 
+static int fdt_add_pmem_node(void *fdt, uint32_t acells, hwaddr mem_base,
+                             uint32_t scells, hwaddr mem_len,
+                             int numa_node_id)
+{
+    char *nodename;
+    int ret;
+
+    nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+    qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
+    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+                                       scells, mem_len);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* only set the NUMA ID if it is specified */
+    if (numa_node_id >= 0) {
+        ret = qemu_fdt_setprop_cell(fdt, nodename,
+                                    "numa-node-id", numa_node_id);
+    }
+out:
+    g_free(nodename);
+    return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
     uint32_t cpu_suspend_fn;
@@ -546,6 +572,20 @@ static int fdt_add_hotpluggable_memory_nodes(void *fdt,
             }
             break;
         }
+        case MEMORY_DEVICE_INFO_KIND_NVDIMM:
+        {
+            PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
+
+            ret = fdt_add_pmem_node(fdt, acells, di->addr,
+                                    scells, di->size, di->node);
+            if (ret) {
+                fprintf(stderr,
+                        "couldn't add NVDIMM /memory@%"PRIx64" node\n",
+                        di->addr);
+                goto out;
+            }
+            break;
+        }
         default:
             fprintf(stderr, "%s memory nodes are not yet supported\n",
                     MemoryDeviceInfoKind_str(mi->type));
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (15 preceding siblings ...)
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 16/17] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
@ 2019-02-20 22:40 ` Eric Auger
  2019-02-22 15:48   ` Igor Mammedov
  2019-02-20 22:46 ` [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Auger Eric
  2019-02-22 16:27 ` Igor Mammedov
  18 siblings, 1 reply; 63+ messages in thread
From: Eric Auger @ 2019-02-20 22:40 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

Machine option nvdimm allows to turn NVDIMM support on.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1896920570..c7e68e2428 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1814,6 +1814,47 @@ static void virt_set_iommu(Object *obj, const char *value, Error **errp)
     }
 }
 
+static bool virt_get_nvdimm(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->acpi_nvdimm_state.is_enabled;
+}
+
+static void virt_set_nvdimm(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->acpi_nvdimm_state.is_enabled = value;
+}
+
+static char *virt_get_nvdimm_persistence(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return g_strdup(vms->acpi_nvdimm_state.persistence_string);
+}
+
+static void virt_set_nvdimm_persistence(Object *obj, const char *value,
+                                        Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+    AcpiNVDIMMState *nvdimm_state = &vms->acpi_nvdimm_state;
+
+    if (strcmp(value, "cpu") == 0)
+        nvdimm_state->persistence = 3;
+    else if (strcmp(value, "mem-ctrl") == 0)
+        nvdimm_state->persistence = 2;
+    else {
+        error_report("-machine nvdimm-persistence=%s: unsupported option",
+                     value);
+        exit(EXIT_FAILURE);
+    }
+
+    g_free(nvdimm_state->persistence_string);
+    nvdimm_state->persistence_string = g_strdup(value);
+}
+
 static CpuInstanceProperties
 virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
@@ -1856,13 +1897,14 @@ static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                                  Error **errp)
 {
     const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+    VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
 
     if (dev->hotplugged) {
         error_setg(errp, "memory hotplug is not supported");
     }
 
-    if (is_nvdimm) {
-        error_setg(errp, "nvdimm is not yet supported");
+    if (is_nvdimm && !vms->acpi_nvdimm_state.is_enabled) {
+        error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
         return;
     }
 
@@ -2076,6 +2118,19 @@ static void virt_instance_init(Object *obj)
         vms->extended_memmap = true;
     }
 
+    object_property_add_bool(obj, "nvdimm",
+                             virt_get_nvdimm, virt_set_nvdimm, NULL);
+    object_property_set_description(obj, "nvdimm",
+                                         "Set on/off to enable/disable NVDIMM "
+                                         "instantiation", NULL);
+
+    object_property_add_str(obj, "nvdimm-persistence",
+                            virt_get_nvdimm_persistence,
+                            virt_set_nvdimm_persistence, NULL);
+    object_property_set_description(obj, "nvdimm-persistence",
+                                    "Set NVDIMM persistence"
+                                    "Valid values are cpu and mem-ctrl", NULL);
+
     vms->irqmap = a15irqmap;
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (16 preceding siblings ...)
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
@ 2019-02-20 22:46 ` Auger Eric
  2019-02-22 16:27 ` Igor Mammedov
  18 siblings, 0 replies; 63+ messages in thread
From: Auger Eric @ 2019-02-20 22:46 UTC (permalink / raw)
  To: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: dgilbert, david, drjones

Hi Peter,

On 2/20/19 11:39 PM, Eric Auger wrote:
> This series aims to bump the 255GB RAM limit in machvirt and to
> support device memory in general, and especially PCDIMM/NVDIMM.
> 
> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> grow up to 255GB. From 256GB onwards we find IO regions such as the
> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> MMIO region. The address map was 1TB large. This corresponded to
> the max IPA capacity KVM was able to manage.
> 
> Since 4.20, the host kernel is able to support a larger and dynamic
> IPA range. So the guest physical address can go beyond the 1TB. The
> max GPA size depends on the host kernel configuration and physical CPUs.
> 
> In this series we use this feature and allow the RAM to grow without
> any other limit than the one put by the host kernel.
> 
> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> ram_size and then comes the device memory (,maxmem) of size
> maxram_size - ram_size. The device memory is potentially hotpluggable
> depending on the instantiated memory objects.
> 
> IO regions previously located between 256GB and 1TB are moved after
> the RAM. Their offset is dynamically computed, depends on ram_size
> and maxram_size. Size alignment is enforced.
> 
> In case maxmem value is inferior to 255GB, the legacy memory map
> still is used. The change of memory map becomes effective from 4.0
> onwards.
> 
> As we keep the initial RAM at 1GB base address, we do not need to do
> invasive changes in the EDK2 FW. It seems nobody is eager to do
> that job at the moment.
> 
> Device memory being put just after the initial RAM, it is possible
> to get access to this feature while keeping a 1TB address map.
> 
> This series reuses/rebases patches initially submitted by Shameer
> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> 
> Functionally, the series is split into 3 parts:
> 1) bump of the initial RAM limit [1 - 9] and change in
>    the memory map
I respinned the whole series including PCDIMM and NVDIMM parts as Igor
did a first review pass on those latter. However the first objective is
to get [1 - 9] upstreamed as we discussed realier. So please consider
those patches independently on the others.

Thanks

Eric
> 2) Support of PC-DIMM [10 - 13]
> 3) Support of NV-DIMM [14 - 17]
> 
> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> 
> Work is ongoing to transform the whole memory as device memory.
> However this move is not trivial and to me, is independent on
> the improvements brought by this series:
> - if we were to use DIMM for initial RAM, those DIMMs would use
>   use slots. Although they would not be part of the ones provided
>   using the ",slots" options, they are ACPI limited resources.
> - DT and ACPI description needs to be reworked
> - NUMA integration needs special care
> - a special device memory object may be required to avoid consuming
>   slots and easing the FW description.
> 
> So I preferred to separate the concerns. This new implementation
> based on device memory could be candidate for another virt
> version.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> 
> History:
> 
> v6 -> v7:
> - Addressed Peter and Igor comments (exceptions sent my email)
> - Fixed TCG case. Now device memory works also for TCG and vcpu
>   pamax is checked
> - See individual logs for more details
> 
> v5 -> v6:
> - mingw compilation issue fix
> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>   IPA bits
> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>   of "hw/arm/virt: Split the memory map description"
> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>   squashed into the previous patch
> - change alignment of IO regions beyond the RAM so that it matches their
>   size
> 
> v4 -> v5:
> - change in the memory map
> - see individual logs
> 
> v3 -> v4:
> - rebase on David's "pc-dimm: next bunch of cleanups" and
>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> - kvm-type option not used anymore. We directly use
>   maxram_size and ram_size machine fields to compute the
>   MAX IPA range. Migration is naturally handled as CLI
>   option are kept between source and destination. This was
>   suggested by David.
> - device_memory_start and device_memory_size not stored
>   anymore in vms->bootinfo
> - I did not take into account 2 Igor's comments: the one
>   related to the refactoring of arm_load_dtb and the one
>   related to the generation of the dtb after system_reset
>   which would contain nodes of hotplugged devices (we do
>   not support hotplug at this stage)
> - check the end-user does not attempt to hotplug a device
> - addition of "vl: Set machine ram_size, maxram_size and
>   ram_slots earlier"
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (12):
>   hw/arm/virt: Rename highmem IO regions
>   hw/arm/virt: Split the memory map description
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_ipa_size
>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>   hw/arm/virt: Bump the 255GB initial RAM limit
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/virt: Allocate device_memory
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (3):
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c             |   2 +-
>  default-configs/arm-softmmu.mak |   4 +
>  hw/acpi/nvdimm.c                |  31 ++-
>  hw/arm/boot.c                   | 136 ++++++++++--
>  hw/arm/virt-acpi-build.c        |  23 +-
>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c               |   6 +-
>  hw/i386/pc_q35.c                |   6 +-
>  hw/ppc/mac_newworld.c           |   3 +-
>  hw/ppc/mac_oldworld.c           |   2 +-
>  hw/ppc/spapr.c                  |   2 +-
>  include/hw/arm/virt.h           |  24 ++-
>  include/hw/boards.h             |   5 +-
>  include/hw/mem/nvdimm.h         |   4 +
>  target/arm/kvm.c                |  10 +
>  target/arm/kvm_arm.h            |  13 ++
>  vl.c                            |   6 +-
>  17 files changed, 556 insertions(+), 85 deletions(-)
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
@ 2019-02-21 14:58   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-21 14:58 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:47 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> We introduce an helper to create a memory node.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
> 
> v6 -> v7:
> - msg error in the caller
> - add comment about NUMA ID
> ---
>  hw/arm/boot.c | 54 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 34 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index d90af2f17d..a830655e1a 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -423,6 +423,32 @@ static void set_kernel_args_old(const struct arm_boot_info *info,
>      }
>  }
>  
> +static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
> +                               uint32_t scells, hwaddr mem_len,
> +                               int numa_node_id)
> +{
> +    char *nodename;
> +    int ret;
> +
> +    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
> +    qemu_fdt_add_subnode(fdt, nodename);
> +    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> +    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
> +                                       scells, mem_len);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    /* only set the NUMA ID if it is specified */
> +    if (numa_node_id >= 0) {
> +        ret = qemu_fdt_setprop_cell(fdt, nodename,
> +                                    "numa-node-id", numa_node_id);
> +    }
> +out:
> +    g_free(nodename);
> +    return ret;
> +}
> +
>  static void fdt_add_psci_node(void *fdt)
>  {
>      uint32_t cpu_suspend_fn;
> @@ -502,7 +528,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      void *fdt = NULL;
>      int size, rc, n = 0;
>      uint32_t acells, scells;
> -    char *nodename;
>      unsigned int i;
>      hwaddr mem_base, mem_len;
>      char **node_path;
> @@ -576,35 +601,24 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          mem_base = binfo->loader_start;
>          for (i = 0; i < nb_numa_nodes; i++) {
>              mem_len = numa_info[i].node_mem;
> -            nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
> -            qemu_fdt_add_subnode(fdt, nodename);
> -            qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -            rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
> -                                              acells, mem_base,
> -                                              scells, mem_len);
> +            rc = fdt_add_memory_node(fdt, acells, mem_base,
> +                                     scells, mem_len, i);
>              if (rc < 0) {
> -                fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
> -                        i);
> +                fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
> +                        mem_base);
>                  goto fail;
>              }
>  
> -            qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
>              mem_base += mem_len;
> -            g_free(nodename);
>          }
>      } else {
> -        nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
> -        qemu_fdt_add_subnode(fdt, nodename);
> -        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> -
> -        rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
> -                                          acells, binfo->loader_start,
> -                                          scells, binfo->ram_size);
> +        rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
> +                                 scells, binfo->ram_size, -1);
>          if (rc < 0) {
> -            fprintf(stderr, "couldn't set %s reg\n", nodename);
> +            fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
> +                    binfo->loader_start);
>              goto fail;
>          }
> -        g_free(nodename);
>      }
>  
>      rc = fdt_path_offset(fdt, "/chosen");

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions Eric Auger
@ 2019-02-21 15:05   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-21 15:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:48 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> In preparation for a split of the memory map into a static
> part and a dynamic part floating after the RAM, let's rename the
> regions located after the RAM
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

with indent and checkpatch warnings fixed

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> v7: added Peter's R-b
> v6: creation
> ---
>  hw/arm/virt-acpi-build.c |  8 ++++----
>  hw/arm/virt.c            | 21 +++++++++++----------
>  include/hw/arm/virt.h    |  8 ++++----
>  3 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 04b62c714d..829d2f0035 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -229,8 +229,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>                       size_pio));
>  
>      if (use_highmem) {
> -        hwaddr base_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].base;
> -        hwaddr size_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].size;
> +        hwaddr base_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].base;
> +        hwaddr size_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].size;
>  
>          aml_append(rbuf,
>              aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
> @@ -663,8 +663,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>              gicr = acpi_data_push(table_data, sizeof(*gicr));
>              gicr->type = ACPI_APIC_GENERIC_REDISTRIBUTOR;
>              gicr->length = sizeof(*gicr);
> -            gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST2].base);
> -            gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST2].size);
> +            gicr->base_address = cpu_to_le64(memmap[VIRT_HIGH_GIC_REDIST2].base);
> +            gicr->range_length = cpu_to_le32(memmap[VIRT_HIGH_GIC_REDIST2].size);
>          }
>  
>          if (its_class_name() && !vmc->no_its) {
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 99c2b6e60d..a1955e7764 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -150,10 +150,10 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
> -    [VIRT_GIC_REDIST2] =        { 0x4000000000ULL, 0x4000000 },
> -    [VIRT_PCIE_ECAM_HIGH] =     { 0x4010000000ULL, 0x10000000 },
> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
>      /* Second PCIe window, 512GB wide at the 512GB boundary */
> -    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },
> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
>  };
>  
>  static const int a15irqmap[] = {
> @@ -435,8 +435,8 @@ static void fdt_add_gic_node(VirtMachineState *vms)
>                                           2, vms->memmap[VIRT_GIC_DIST].size,
>                                           2, vms->memmap[VIRT_GIC_REDIST].base,
>                                           2, vms->memmap[VIRT_GIC_REDIST].size,
> -                                         2, vms->memmap[VIRT_GIC_REDIST2].base,
> -                                         2, vms->memmap[VIRT_GIC_REDIST2].size);
> +                                         2, vms->memmap[VIRT_HIGH_GIC_REDIST2].base,
> +                                         2, vms->memmap[VIRT_HIGH_GIC_REDIST2].size);
>          }
>  
>          if (vms->virt) {
> @@ -584,7 +584,7 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
>  
>          if (nb_redist_regions == 2) {
>              uint32_t redist1_capacity =
> -                        vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
> +                    vms->memmap[VIRT_HIGH_GIC_REDIST2].size / GICV3_REDIST_SIZE;
is indent correct here (it didn't look correct to begin with).

Strangle checkpatch didn't complain about it, but since I've run it
does complain about a bunch of "line over 80 characters" on this patch

>  
>              qdev_prop_set_uint32(gicdev, "redist-region-count[1]",
>                  MIN(smp_cpus - redist0_count, redist1_capacity));
> @@ -601,7 +601,8 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
>      if (type == 3) {
>          sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_REDIST].base);
>          if (nb_redist_regions == 2) {
> -            sysbus_mmio_map(gicbusdev, 2, vms->memmap[VIRT_GIC_REDIST2].base);
> +            sysbus_mmio_map(gicbusdev, 2,
> +                            vms->memmap[VIRT_HIGH_GIC_REDIST2].base);
>          }
>      } else {
>          sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_CPU].base);
> @@ -1088,8 +1089,8 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
>  {
>      hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
>      hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
> -    hwaddr base_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].base;
> -    hwaddr size_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].size;
> +    hwaddr base_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].base;
> +    hwaddr size_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].size;
>      hwaddr base_pio = vms->memmap[VIRT_PCIE_PIO].base;
>      hwaddr size_pio = vms->memmap[VIRT_PCIE_PIO].size;
>      hwaddr base_ecam, size_ecam;
> @@ -1418,7 +1419,7 @@ static void machvirt_init(MachineState *machine)
>       */
>      if (vms->gic_version == 3) {
>          virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / GICV3_REDIST_SIZE;
> -        virt_max_cpus += vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
> +        virt_max_cpus += vms->memmap[VIRT_HIGH_GIC_REDIST2].size / GICV3_REDIST_SIZE;
>      } else {
>          virt_max_cpus = GIC_NCPU;
>      }
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 4cc57a7ef6..a27086d524 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -64,7 +64,7 @@ enum {
>      VIRT_GIC_VCPU,
>      VIRT_GIC_ITS,
>      VIRT_GIC_REDIST,
> -    VIRT_GIC_REDIST2,
> +    VIRT_HIGH_GIC_REDIST2,
>      VIRT_SMMU,
>      VIRT_UART,
>      VIRT_MMIO,
> @@ -74,9 +74,9 @@ enum {
>      VIRT_PCIE_MMIO,
>      VIRT_PCIE_PIO,
>      VIRT_PCIE_ECAM,
> -    VIRT_PCIE_ECAM_HIGH,
> +    VIRT_HIGH_PCIE_ECAM,
>      VIRT_PLATFORM_BUS,
> -    VIRT_PCIE_MMIO_HIGH,
> +    VIRT_HIGH_PCIE_MMIO,
>      VIRT_GPIO,
>      VIRT_SECURE_UART,
>      VIRT_SECURE_MEM,
> @@ -128,7 +128,7 @@ typedef struct {
>      int psci_conduit;
>  } VirtMachineState;
>  
> -#define VIRT_ECAM_ID(high) (high ? VIRT_PCIE_ECAM_HIGH : VIRT_PCIE_ECAM)
> +#define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
>  
>  #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
>  #define VIRT_MACHINE(obj) \

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description Eric Auger
@ 2019-02-21 16:19   ` Igor Mammedov
  2019-02-21 17:21     ` Auger Eric
  2019-02-22  7:34   ` Heyi Guo
  1 sibling, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-21 16:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:49 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> In the prospect to introduce an extended memory map supporting more
> RAM, let's split the memory map array into two parts:
> 
> - the former a15memmap contains regions below and including the RAM

> - extended_memmap, only initialized with entries located after the RAM.
>   Only the size of the region is initialized there since their base
>   address will be dynamically computed, depending on the top of the
>   RAM (initial RAM at the moment), with same alignment as their size.
can't parse this part and pinpoint what is 'their', care to rephrase?


> This new split will allow to grow the RAM size without changing the
> description of the high regions.
> 
> The patch also moves the memory map setup
s/moves/makes/
s/$/dynamic and moves it/

> into machvirt_init().

> The rationale is the memory map will be soon affected by the

> kvm_type() call that happens after virt_instance_init() and
is dependency on kvm_type() still valid,
shouldn't split memmap work for TCG just fine as well?

> before machvirt_init().
> 
> The memory map is unchanged (the top of the initial RAM still is
> 256GiB). Then come the high IO regions with same layout as before.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> ---
> v6 -> v7:
> - s/a15memmap/base_memmap
> - slight rewording of the commit message
> - add "if there is less than 256GiB of RAM then the floating area
>   starts at the 256GiB mark" in the comment associated to the floating
>   memory map
> - Added Peter's R-b
> 
> v5 -> v6
> - removal of many macros in units.h
> - introduce the virt_set_memmap helper
> - new computation for offsets of high IO regions
> - add comments
> ---
>  hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
>  include/hw/arm/virt.h | 14 +++++++++----
>  2 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index a1955e7764..12039a0367 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -29,6 +29,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qapi/error.h"
>  #include "hw/sysbus.h"
>  #include "hw/arm/arm.h"
> @@ -121,7 +122,7 @@
>   * Note that devices should generally be placed at multiples of 0x10000,
>   * to accommodate guests using 64K pages.
>   */
> -static const MemMapEntry a15memmap[] = {
> +static const MemMapEntry base_memmap[] = {
>      /* Space up to 0x8000000 is reserved for a boot ROM */
>      [VIRT_FLASH] =              {          0, 0x08000000 },
>      [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> +};
> +
> +/*
> + * Highmem IO Regions: This memory map is floating, located after the RAM.
> + * Each IO region offset will be dynamically computed, depending on the
s/IO region offset/MemMapEntry base (GPA)/
> + * top of the RAM, so that its base get the same alignment as the size,

> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
s/region/entry/

> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
> + */
> +static MemMapEntry extended_memmap[] = {
>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
> +    /* Second PCIe window */
> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>  };
>  
>  static const int a15irqmap[] = {
> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>      return arm_cpu_mp_affinity(idx, clustersz);
>  }
>  
> +static void virt_set_memmap(VirtMachineState *vms)
> +{
> +    hwaddr base;
> +    int i;
> +
> +    vms->memmap = extended_memmap;
I probably don't see something but ...

> +
> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
> +        vms->memmap[i] = base_memmap[i];

ARRAY_SIZE(base_memmap) > 3
ARRAY_SIZE(extended_memmap) == 3
as result shouldn't we observe OOB at vms->memmap[i] access
starting from i==3 ?

> +    }
> +
> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> +    base = vms->high_io_base;
> +
> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
not sure why VIRT_LOWMEMMAP_LAST is needed at all, one could just continue
with current 'i' value, provided extended_memmap wasn't corrupted by previous
loop.
And does this loop ever executes? VIRT_LOWMEMMAP_LAST > ARRAY_SIZE(extended_memmap)

> +        hwaddr size = extended_memmap[i].size;
> +
> +        base = ROUND_UP(base, size);
> +        vms->memmap[i].base = base;
> +        vms->memmap[i].size = size;
> +        base += size;
> +    }
> +}
> +
>  static void machvirt_init(MachineState *machine)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(machine);
> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>      bool aarch64 = true;
>  
> +    virt_set_memmap(vms);
> +
>      /* We can probe only here because during property set
>       * KVM is not available yet
>       */
> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
>                                      "Valid values are none and smmuv3",
>                                      NULL);
>  
> -    vms->memmap = a15memmap;
>      vms->irqmap = a15irqmap;
>  }
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a27086d524..3dc7a6c5d5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -64,7 +64,6 @@ enum {
>      VIRT_GIC_VCPU,
>      VIRT_GIC_ITS,
>      VIRT_GIC_REDIST,
> -    VIRT_HIGH_GIC_REDIST2,
>      VIRT_SMMU,
>      VIRT_UART,
>      VIRT_MMIO,
> @@ -74,12 +73,18 @@ enum {
>      VIRT_PCIE_MMIO,
>      VIRT_PCIE_PIO,
>      VIRT_PCIE_ECAM,
> -    VIRT_HIGH_PCIE_ECAM,
>      VIRT_PLATFORM_BUS,
> -    VIRT_HIGH_PCIE_MMIO,
>      VIRT_GPIO,
>      VIRT_SECURE_UART,
>      VIRT_SECURE_MEM,
> +    VIRT_LOWMEMMAP_LAST,
> +};
> +
> +/* indices of IO regions located after the RAM */
> +enum {
> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
> +    VIRT_HIGH_PCIE_ECAM,
> +    VIRT_HIGH_PCIE_MMIO,
>  };
>  
>  typedef enum VirtIOMMUType {
> @@ -116,7 +121,7 @@ typedef struct {
>      int32_t gic_version;
>      VirtIOMMUType iommu;
>      struct arm_boot_info bootinfo;
> -    const MemMapEntry *memmap;
> +    MemMapEntry *memmap;
>      const int *irqmap;
>      int smp_cpus;
>      void *fdt;
> @@ -126,6 +131,7 @@ typedef struct {
>      uint32_t msi_phandle;
>      uint32_t iommu_phandle;
>      int psci_conduit;
> +    hwaddr high_io_base;
>  } VirtMachineState;
>  
>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-21 16:19   ` Igor Mammedov
@ 2019-02-21 17:21     ` Auger Eric
  2019-02-22 10:15       ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-21 17:21 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,
On 2/21/19 5:19 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:49 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> In the prospect to introduce an extended memory map supporting more
>> RAM, let's split the memory map array into two parts:
>>
>> - the former a15memmap contains regions below and including the RAM
> 
>> - extended_memmap, only initialized with entries located after the RAM.
>>   Only the size of the region is initialized there since their base
>>   address will be dynamically computed, depending on the top of the
>>   RAM (initial RAM at the moment), with same alignment as their size.
> can't parse this part and pinpoint what is 'their', care to rephrase?
Only the size of the High IO region entries is initialized (there are
currently 3 entries:  VIRT_HIGH_GIC_REDIST2, VIRT_HIGH_PCIE_ECAM,
VIRT_HIGH_PCIE_MMIO). The base address is dynamically computed so it is
not initialized.
> 
> 
>> This new split will allow to grow the RAM size without changing the
>> description of the high regions.
>>
>> The patch also moves the memory map setup
> s/moves/makes/
> s/$/dynamic and moves it/
> 
>> into machvirt_init().
> 
>> The rationale is the memory map will be soon affected by the
> 
>> kvm_type() call that happens after virt_instance_init() and
> is dependency on kvm_type() still valid,
> shouldn't split memmap work for TCG just fine as well?
See in 08/17: in TCG mode the memory map  will be "frozen" (set_memmap)
in machvirt_init. Otherwise set_memmap is called from kvm_type().

Split memmap works both in TCG and in accelerated mode.

I will rephrase the commit message.
> 
>> before machvirt_init().
>>
>> The memory map is unchanged (the top of the initial RAM still is
>> 256GiB). Then come the high IO regions with same layout as before.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>
>> ---
>> v6 -> v7:
>> - s/a15memmap/base_memmap
>> - slight rewording of the commit message
>> - add "if there is less than 256GiB of RAM then the floating area
>>   starts at the 256GiB mark" in the comment associated to the floating
>>   memory map
>> - Added Peter's R-b
>>
>> v5 -> v6
>> - removal of many macros in units.h
>> - introduce the virt_set_memmap helper
>> - new computation for offsets of high IO regions
>> - add comments
>> ---
>>  hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
>>  include/hw/arm/virt.h | 14 +++++++++----
>>  2 files changed, 52 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index a1955e7764..12039a0367 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -29,6 +29,7 @@
>>   */
>>  
>>  #include "qemu/osdep.h"
>> +#include "qemu/units.h"
>>  #include "qapi/error.h"
>>  #include "hw/sysbus.h"
>>  #include "hw/arm/arm.h"
>> @@ -121,7 +122,7 @@
>>   * Note that devices should generally be placed at multiples of 0x10000,
>>   * to accommodate guests using 64K pages.
>>   */
>> -static const MemMapEntry a15memmap[] = {
>> +static const MemMapEntry base_memmap[] = {
>>      /* Space up to 0x8000000 is reserved for a boot ROM */
>>      [VIRT_FLASH] =              {          0, 0x08000000 },
>>      [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
>> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>> +};
>> +
>> +/*
>> + * Highmem IO Regions: This memory map is floating, located after the RAM.
>> + * Each IO region offset will be dynamically computed, depending on the
> s/IO region offset/MemMapEntry base (GPA)/
>> + * top of the RAM, so that its base get the same alignment as the size,
> 
>> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
> s/region/entry/
> 
>> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
>> + */
>> +static MemMapEntry extended_memmap[] = {
>>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
>> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
>> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
>> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
>> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
>> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
>> +    /* Second PCIe window */
>> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>>  };
>>  
>>  static const int a15irqmap[] = {
>> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>      return arm_cpu_mp_affinity(idx, clustersz);
>>  }
>>  
>> +static void virt_set_memmap(VirtMachineState *vms)
>> +{
>> +    hwaddr base;
>> +    int i;
>> +
>> +    vms->memmap = extended_memmap;
> I probably don't see something but ...
> 
>> +
>> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
>> +        vms->memmap[i] = base_memmap[i];
> 
> ARRAY_SIZE(base_memmap) > 3
> ARRAY_SIZE(extended_memmap) == 3
> as result shouldn't we observe OOB at vms->memmap[i] access
> starting from i==3 ?
ARRAY_SIZE(extended_memmap) = ARRAY_SIZE(base_memmap) + 3

VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST is what you miss.

/* indices of IO regions located after the RAM */
enum {
    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
    VIRT_HIGH_PCIE_ECAM,
    VIRT_HIGH_PCIE_MMIO,
};

> 
>> +    }
>> +
>> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
>> +    base = vms->high_io_base;
>> +
>> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> not sure why VIRT_LOWMEMMAP_LAST is needed at all, one could just continue
> with current 'i' value, provided extended_memmap wasn't corrupted by previous
> loop.
Yep maybe. But I think it is less error prone like this if someone later
on adds some intermediate manipulation on i.
> And does this loop ever executes? VIRT_LOWMEMMAP_LAST > ARRAY_SIZE(extended_memmap)
yes it does

Thanks

Eric
> 
>> +        hwaddr size = extended_memmap[i].size;
>> +
>> +        base = ROUND_UP(base, size);
>> +        vms->memmap[i].base = base;
>> +        vms->memmap[i].size = size;
>> +        base += size;
>> +    }
>> +}
>> +
>>  static void machvirt_init(MachineState *machine)
>>  {
>>      VirtMachineState *vms = VIRT_MACHINE(machine);
>> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
>>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>>      bool aarch64 = true;
>>  
>> +    virt_set_memmap(vms);
>> +
>>      /* We can probe only here because during property set
>>       * KVM is not available yet
>>       */
>> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
>>                                      "Valid values are none and smmuv3",
>>                                      NULL);
>>  
>> -    vms->memmap = a15memmap;
>>      vms->irqmap = a15irqmap;
>>  }
>>  
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index a27086d524..3dc7a6c5d5 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -64,7 +64,6 @@ enum {
>>      VIRT_GIC_VCPU,
>>      VIRT_GIC_ITS,
>>      VIRT_GIC_REDIST,
>> -    VIRT_HIGH_GIC_REDIST2,
>>      VIRT_SMMU,
>>      VIRT_UART,
>>      VIRT_MMIO,
>> @@ -74,12 +73,18 @@ enum {
>>      VIRT_PCIE_MMIO,
>>      VIRT_PCIE_PIO,
>>      VIRT_PCIE_ECAM,
>> -    VIRT_HIGH_PCIE_ECAM,
>>      VIRT_PLATFORM_BUS,
>> -    VIRT_HIGH_PCIE_MMIO,
>>      VIRT_GPIO,
>>      VIRT_SECURE_UART,
>>      VIRT_SECURE_MEM,
>> +    VIRT_LOWMEMMAP_LAST,
>> +};
>> +
>> +/* indices of IO regions located after the RAM */
>> +enum {
>> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
>> +    VIRT_HIGH_PCIE_ECAM,
>> +    VIRT_HIGH_PCIE_MMIO,
>>  };
>>  
>>  typedef enum VirtIOMMUType {
>> @@ -116,7 +121,7 @@ typedef struct {
>>      int32_t gic_version;
>>      VirtIOMMUType iommu;
>>      struct arm_boot_info bootinfo;
>> -    const MemMapEntry *memmap;
>> +    MemMapEntry *memmap;
>>      const int *irqmap;
>>      int smp_cpus;
>>      void *fdt;
>> @@ -126,6 +131,7 @@ typedef struct {
>>      uint32_t msi_phandle;
>>      uint32_t iommu_phandle;
>>      int psci_conduit;
>> +    hwaddr high_io_base;
>>  } VirtMachineState;
>>  
>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description Eric Auger
  2019-02-21 16:19   ` Igor Mammedov
@ 2019-02-22  7:34   ` Heyi Guo
  2019-02-22  8:08     ` Auger Eric
  1 sibling, 1 reply; 63+ messages in thread
From: Heyi Guo @ 2019-02-22  7:34 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: drjones, dgilbert, david

Hi Eric,

Can't we still use one single memory map and update the base of every entry following VIRT_MEM? So that we don't need to split memory map or the enumeration definition, neither do we need to copy a15memmap into the extended memmap.

Thanks,

Heyi


On 2019/2/21 6:39, Eric Auger wrote:
> In the prospect to introduce an extended memory map supporting more
> RAM, let's split the memory map array into two parts:
>
> - the former a15memmap contains regions below and including the RAM
> - extended_memmap, only initialized with entries located after the RAM.
>    Only the size of the region is initialized there since their base
>    address will be dynamically computed, depending on the top of the
>    RAM (initial RAM at the moment), with same alignment as their size.
>
> This new split will allow to grow the RAM size without changing the
> description of the high regions.
>
> The patch also moves the memory map setup into machvirt_init().
> The rationale is the memory map will be soon affected by the
> kvm_type() call that happens after virt_instance_init() and
> before machvirt_init().
>
> The memory map is unchanged (the top of the initial RAM still is
> 256GiB). Then come the high IO regions with same layout as before.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> ---
> v6 -> v7:
> - s/a15memmap/base_memmap
> - slight rewording of the commit message
> - add "if there is less than 256GiB of RAM then the floating area
>    starts at the 256GiB mark" in the comment associated to the floating
>    memory map
> - Added Peter's R-b
>
> v5 -> v6
> - removal of many macros in units.h
> - introduce the virt_set_memmap helper
> - new computation for offsets of high IO regions
> - add comments
> ---
>   hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
>   include/hw/arm/virt.h | 14 +++++++++----
>   2 files changed, 52 insertions(+), 10 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index a1955e7764..12039a0367 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -29,6 +29,7 @@
>    */
>   
>   #include "qemu/osdep.h"
> +#include "qemu/units.h"
>   #include "qapi/error.h"
>   #include "hw/sysbus.h"
>   #include "hw/arm/arm.h"
> @@ -121,7 +122,7 @@
>    * Note that devices should generally be placed at multiples of 0x10000,
>    * to accommodate guests using 64K pages.
>    */
> -static const MemMapEntry a15memmap[] = {
> +static const MemMapEntry base_memmap[] = {
>       /* Space up to 0x8000000 is reserved for a boot ROM */
>       [VIRT_FLASH] =              {          0, 0x08000000 },
>       [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
>       [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>       [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>       [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> +};
> +
> +/*
> + * Highmem IO Regions: This memory map is floating, located after the RAM.
> + * Each IO region offset will be dynamically computed, depending on the
> + * top of the RAM, so that its base get the same alignment as the size,
> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
> + */
> +static MemMapEntry extended_memmap[] = {
>       /* Additional 64 MB redist region (can contain up to 512 redistributors) */
> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
> +    /* Second PCIe window */
> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>   };
>   
>   static const int a15irqmap[] = {
> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>       return arm_cpu_mp_affinity(idx, clustersz);
>   }
>   
> +static void virt_set_memmap(VirtMachineState *vms)
> +{
> +    hwaddr base;
> +    int i;
> +
> +    vms->memmap = extended_memmap;
> +
> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
> +        vms->memmap[i] = base_memmap[i];
> +    }
> +
> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> +    base = vms->high_io_base;
> +
> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> +        hwaddr size = extended_memmap[i].size;
> +
> +        base = ROUND_UP(base, size);
> +        vms->memmap[i].base = base;
> +        vms->memmap[i].size = size;
> +        base += size;
> +    }
> +}
> +
>   static void machvirt_init(MachineState *machine)
>   {
>       VirtMachineState *vms = VIRT_MACHINE(machine);
> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
>       bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>       bool aarch64 = true;
>   
> +    virt_set_memmap(vms);
> +
>       /* We can probe only here because during property set
>        * KVM is not available yet
>        */
> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
>                                       "Valid values are none and smmuv3",
>                                       NULL);
>   
> -    vms->memmap = a15memmap;
>       vms->irqmap = a15irqmap;
>   }
>   
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a27086d524..3dc7a6c5d5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -64,7 +64,6 @@ enum {
>       VIRT_GIC_VCPU,
>       VIRT_GIC_ITS,
>       VIRT_GIC_REDIST,
> -    VIRT_HIGH_GIC_REDIST2,
>       VIRT_SMMU,
>       VIRT_UART,
>       VIRT_MMIO,
> @@ -74,12 +73,18 @@ enum {
>       VIRT_PCIE_MMIO,
>       VIRT_PCIE_PIO,
>       VIRT_PCIE_ECAM,
> -    VIRT_HIGH_PCIE_ECAM,
>       VIRT_PLATFORM_BUS,
> -    VIRT_HIGH_PCIE_MMIO,
>       VIRT_GPIO,
>       VIRT_SECURE_UART,
>       VIRT_SECURE_MEM,
> +    VIRT_LOWMEMMAP_LAST,
> +};
> +
> +/* indices of IO regions located after the RAM */
> +enum {
> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
> +    VIRT_HIGH_PCIE_ECAM,
> +    VIRT_HIGH_PCIE_MMIO,
>   };
>   
>   typedef enum VirtIOMMUType {
> @@ -116,7 +121,7 @@ typedef struct {
>       int32_t gic_version;
>       VirtIOMMUType iommu;
>       struct arm_boot_info bootinfo;
> -    const MemMapEntry *memmap;
> +    MemMapEntry *memmap;
>       const int *irqmap;
>       int smp_cpus;
>       void *fdt;
> @@ -126,6 +131,7 @@ typedef struct {
>       uint32_t msi_phandle;
>       uint32_t iommu_phandle;
>       int psci_conduit;
> +    hwaddr high_io_base;
>   } VirtMachineState;
>   
>   #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-22  7:34   ` Heyi Guo
@ 2019-02-22  8:08     ` Auger Eric
  0 siblings, 0 replies; 63+ messages in thread
From: Auger Eric @ 2019-02-22  8:08 UTC (permalink / raw)
  To: Heyi Guo, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, imammedo, david
  Cc: drjones, dgilbert, david

Hi Heyi,

On 2/22/19 8:34 AM, Heyi Guo wrote:
> Hi Eric,
> 
> Can't we still use one single memory map and update the base of every
> entry following VIRT_MEM? So that we don't need to split memory map or
> the enumeration definition, neither do we need to copy a15memmap into
> the extended memmap.

I made the decision to have 2 separate arrays since we have
- one array with static base address entries (which are initialized with
base address and size)
- and another array with floating base address entries (only the size is
initialized)

To me it makes things clearer with respect to what is static and what is
dynamically allocated (same for the enum).

Thanks

Eric
> 
> Thanks,
> 
> Heyi
> 
> 
> On 2019/2/21 6:39, Eric Auger wrote:
>> In the prospect to introduce an extended memory map supporting more
>> RAM, let's split the memory map array into two parts:
>>
>> - the former a15memmap contains regions below and including the RAM
>> - extended_memmap, only initialized with entries located after the RAM.
>>    Only the size of the region is initialized there since their base
>>    address will be dynamically computed, depending on the top of the
>>    RAM (initial RAM at the moment), with same alignment as their size.
>>
>> This new split will allow to grow the RAM size without changing the
>> description of the high regions.
>>
>> The patch also moves the memory map setup into machvirt_init().
>> The rationale is the memory map will be soon affected by the
>> kvm_type() call that happens after virt_instance_init() and
>> before machvirt_init().
>>
>> The memory map is unchanged (the top of the initial RAM still is
>> 256GiB). Then come the high IO regions with same layout as before.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>
>> ---
>> v6 -> v7:
>> - s/a15memmap/base_memmap
>> - slight rewording of the commit message
>> - add "if there is less than 256GiB of RAM then the floating area
>>    starts at the 256GiB mark" in the comment associated to the floating
>>    memory map
>> - Added Peter's R-b
>>
>> v5 -> v6
>> - removal of many macros in units.h
>> - introduce the virt_set_memmap helper
>> - new computation for offsets of high IO regions
>> - add comments
>> ---
>>   hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
>>   include/hw/arm/virt.h | 14 +++++++++----
>>   2 files changed, 52 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index a1955e7764..12039a0367 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -29,6 +29,7 @@
>>    */
>>     #include "qemu/osdep.h"
>> +#include "qemu/units.h"
>>   #include "qapi/error.h"
>>   #include "hw/sysbus.h"
>>   #include "hw/arm/arm.h"
>> @@ -121,7 +122,7 @@
>>    * Note that devices should generally be placed at multiples of
>> 0x10000,
>>    * to accommodate guests using 64K pages.
>>    */
>> -static const MemMapEntry a15memmap[] = {
>> +static const MemMapEntry base_memmap[] = {
>>       /* Space up to 0x8000000 is reserved for a boot ROM */
>>       [VIRT_FLASH] =              {          0, 0x08000000 },
>>       [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
>> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
>>       [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>       [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>>       [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>> +};
>> +
>> +/*
>> + * Highmem IO Regions: This memory map is floating, located after the
>> RAM.
>> + * Each IO region offset will be dynamically computed, depending on the
>> + * top of the RAM, so that its base get the same alignment as the size,
>> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
>> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
>> + */
>> +static MemMapEntry extended_memmap[] = {
>>       /* Additional 64 MB redist region (can contain up to 512
>> redistributors) */
>> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
>> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
>> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
>> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
>> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
>> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
>> +    /* Second PCIe window */
>> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>>   };
>>     static const int a15irqmap[] = {
>> @@ -1354,6 +1365,30 @@ static uint64_t
>> virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>       return arm_cpu_mp_affinity(idx, clustersz);
>>   }
>>   +static void virt_set_memmap(VirtMachineState *vms)
>> +{
>> +    hwaddr base;
>> +    int i;
>> +
>> +    vms->memmap = extended_memmap;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
>> +        vms->memmap[i] = base_memmap[i];
>> +    }
>> +
>> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM
>> region */
>> +    base = vms->high_io_base;
>> +
>> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap);
>> i++) {
>> +        hwaddr size = extended_memmap[i].size;
>> +
>> +        base = ROUND_UP(base, size);
>> +        vms->memmap[i].base = base;
>> +        vms->memmap[i].size = size;
>> +        base += size;
>> +    }
>> +}
>> +
>>   static void machvirt_init(MachineState *machine)
>>   {
>>       VirtMachineState *vms = VIRT_MACHINE(machine);
>> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
>>       bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>>       bool aarch64 = true;
>>   +    virt_set_memmap(vms);
>> +
>>       /* We can probe only here because during property set
>>        * KVM is not available yet
>>        */
>> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
>>                                       "Valid values are none and smmuv3",
>>                                       NULL);
>>   -    vms->memmap = a15memmap;
>>       vms->irqmap = a15irqmap;
>>   }
>>   diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index a27086d524..3dc7a6c5d5 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -64,7 +64,6 @@ enum {
>>       VIRT_GIC_VCPU,
>>       VIRT_GIC_ITS,
>>       VIRT_GIC_REDIST,
>> -    VIRT_HIGH_GIC_REDIST2,
>>       VIRT_SMMU,
>>       VIRT_UART,
>>       VIRT_MMIO,
>> @@ -74,12 +73,18 @@ enum {
>>       VIRT_PCIE_MMIO,
>>       VIRT_PCIE_PIO,
>>       VIRT_PCIE_ECAM,
>> -    VIRT_HIGH_PCIE_ECAM,
>>       VIRT_PLATFORM_BUS,
>> -    VIRT_HIGH_PCIE_MMIO,
>>       VIRT_GPIO,
>>       VIRT_SECURE_UART,
>>       VIRT_SECURE_MEM,
>> +    VIRT_LOWMEMMAP_LAST,
>> +};
>> +
>> +/* indices of IO regions located after the RAM */
>> +enum {
>> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
>> +    VIRT_HIGH_PCIE_ECAM,
>> +    VIRT_HIGH_PCIE_MMIO,
>>   };
>>     typedef enum VirtIOMMUType {
>> @@ -116,7 +121,7 @@ typedef struct {
>>       int32_t gic_version;
>>       VirtIOMMUType iommu;
>>       struct arm_boot_info bootinfo;
>> -    const MemMapEntry *memmap;
>> +    MemMapEntry *memmap;
>>       const int *irqmap;
>>       int smp_cpus;
>>       void *fdt;
>> @@ -126,6 +131,7 @@ typedef struct {
>>       uint32_t msi_phandle;
>>       uint32_t iommu_phandle;
>>       int psci_conduit;
>> +    hwaddr high_io_base;
>>   } VirtMachineState;
>>     #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM :
>> VIRT_PCIE_ECAM)
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-21 17:21     ` Auger Eric
@ 2019-02-22 10:15       ` Igor Mammedov
  2019-02-22 14:28         ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 10:15 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

On Thu, 21 Feb 2019 18:21:11 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> On 2/21/19 5:19 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:49 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> In the prospect to introduce an extended memory map supporting more
> >> RAM, let's split the memory map array into two parts:
> >>
> >> - the former a15memmap contains regions below and including the RAM  
> >   
> >> - extended_memmap, only initialized with entries located after the RAM.
> >>   Only the size of the region is initialized there since their base
> >>   address will be dynamically computed, depending on the top of the
> >>   RAM (initial RAM at the moment), with same alignment as their size.  
> > can't parse this part and pinpoint what is 'their', care to rephrase?  
> Only the size of the High IO region entries is initialized (there are
> currently 3 entries:  VIRT_HIGH_GIC_REDIST2, VIRT_HIGH_PCIE_ECAM,
> VIRT_HIGH_PCIE_MMIO). The base address is dynamically computed so it is
> not initialized.
> > 
> >   
> >> This new split will allow to grow the RAM size without changing the
> >> description of the high regions.
> >>
> >> The patch also moves the memory map setup  
> > s/moves/makes/
> > s/$/dynamic and moves it/
> >   
> >> into machvirt_init().  
> >   
> >> The rationale is the memory map will be soon affected by the  
> >   
> >> kvm_type() call that happens after virt_instance_init() and  
> > is dependency on kvm_type() still valid,
> > shouldn't split memmap work for TCG just fine as well?  
> See in 08/17: in TCG mode the memory map  will be "frozen" (set_memmap)
> in machvirt_init. Otherwise set_memmap is called from kvm_type().
> 
> Split memmap works both in TCG and in accelerated mode.
> 
> I will rephrase the commit message.
> >   
> >> before machvirt_init().
> >>
> >> The memory map is unchanged (the top of the initial RAM still is
> >> 256GiB). Then come the high IO regions with same layout as before.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >>
> >> ---
> >> v6 -> v7:
> >> - s/a15memmap/base_memmap
> >> - slight rewording of the commit message
> >> - add "if there is less than 256GiB of RAM then the floating area
> >>   starts at the 256GiB mark" in the comment associated to the floating
> >>   memory map
> >> - Added Peter's R-b
> >>
> >> v5 -> v6
> >> - removal of many macros in units.h
> >> - introduce the virt_set_memmap helper
> >> - new computation for offsets of high IO regions
> >> - add comments
> >> ---
> >>  hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
> >>  include/hw/arm/virt.h | 14 +++++++++----
> >>  2 files changed, 52 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index a1955e7764..12039a0367 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -29,6 +29,7 @@
> >>   */
> >>  
> >>  #include "qemu/osdep.h"
> >> +#include "qemu/units.h"
> >>  #include "qapi/error.h"
> >>  #include "hw/sysbus.h"
> >>  #include "hw/arm/arm.h"
> >> @@ -121,7 +122,7 @@
> >>   * Note that devices should generally be placed at multiples of 0x10000,
> >>   * to accommodate guests using 64K pages.
> >>   */
> >> -static const MemMapEntry a15memmap[] = {
> >> +static const MemMapEntry base_memmap[] = {
> >>      /* Space up to 0x8000000 is reserved for a boot ROM */
> >>      [VIRT_FLASH] =              {          0, 0x08000000 },
> >>      [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
> >> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
> >>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
> >>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> >>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> >> +};
> >> +
> >> +/*
> >> + * Highmem IO Regions: This memory map is floating, located after the RAM.
> >> + * Each IO region offset will be dynamically computed, depending on the  
> > s/IO region offset/MemMapEntry base (GPA)/  
> >> + * top of the RAM, so that its base get the same alignment as the size,  
> >   
> >> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is  
> > s/region/entry/
> >   
> >> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
> >> + */
> >> +static MemMapEntry extended_memmap[] = {
> >>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
> >> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
> >> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
> >> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
> >> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
> >> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
> >> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
> >> +    /* Second PCIe window */
> >> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
> >>  };
> >>  
> >>  static const int a15irqmap[] = {
> >> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
> >>      return arm_cpu_mp_affinity(idx, clustersz);
> >>  }
> >>  
> >> +static void virt_set_memmap(VirtMachineState *vms)
> >> +{
> >> +    hwaddr base;
> >> +    int i;
> >> +
> >> +    vms->memmap = extended_memmap;  
> > I probably don't see something but ...
> >   
> >> +
> >> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
> >> +        vms->memmap[i] = base_memmap[i];  
> > 
> > ARRAY_SIZE(base_memmap) > 3
> > ARRAY_SIZE(extended_memmap) == 3
> > as result shouldn't we observe OOB at vms->memmap[i] access
> > starting from i==3 ?  
> ARRAY_SIZE(extended_memmap) = ARRAY_SIZE(base_memmap) + 3
> VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST is what you miss.
Yep, that's the trick.
It is too much subtle for my taste,
is it possible to make extended_memmap sizing more explicit/trivial,
so one could see it right away without figuring out how indexes influence array size?


> /* indices of IO regions located after the RAM */
> enum {
>     VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
>     VIRT_HIGH_PCIE_ECAM,
>     VIRT_HIGH_PCIE_MMIO,
> };
> 
> >   
> >> +    }
> >> +
> >> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> >> +    base = vms->high_io_base;
> >> +
> >> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {  
> > not sure why VIRT_LOWMEMMAP_LAST is needed at all, one could just continue
> > with current 'i' value, provided extended_memmap wasn't corrupted by previous
> > loop.  
> Yep maybe. But I think it is less error prone like this if someone later
> on adds some intermediate manipulation on i.
> > And does this loop ever executes? VIRT_LOWMEMMAP_LAST > ARRAY_SIZE(extended_memmap)  
> yes it does
> 
> Thanks
> 
> Eric
> >   
> >> +        hwaddr size = extended_memmap[i].size;
> >> +
> >> +        base = ROUND_UP(base, size);
> >> +        vms->memmap[i].base = base;
> >> +        vms->memmap[i].size = size;
> >> +        base += size;
> >> +    }
> >> +}
> >> +
> >>  static void machvirt_init(MachineState *machine)
> >>  {
> >>      VirtMachineState *vms = VIRT_MACHINE(machine);
> >> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
> >>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
> >>      bool aarch64 = true;
> >>  
> >> +    virt_set_memmap(vms);
> >> +
> >>      /* We can probe only here because during property set
> >>       * KVM is not available yet
> >>       */
> >> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
> >>                                      "Valid values are none and smmuv3",
> >>                                      NULL);
> >>  
> >> -    vms->memmap = a15memmap;
> >>      vms->irqmap = a15irqmap;
> >>  }
> >>  
> >> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> >> index a27086d524..3dc7a6c5d5 100644
> >> --- a/include/hw/arm/virt.h
> >> +++ b/include/hw/arm/virt.h
> >> @@ -64,7 +64,6 @@ enum {
> >>      VIRT_GIC_VCPU,
> >>      VIRT_GIC_ITS,
> >>      VIRT_GIC_REDIST,
> >> -    VIRT_HIGH_GIC_REDIST2,
> >>      VIRT_SMMU,
> >>      VIRT_UART,
> >>      VIRT_MMIO,
> >> @@ -74,12 +73,18 @@ enum {
> >>      VIRT_PCIE_MMIO,
> >>      VIRT_PCIE_PIO,
> >>      VIRT_PCIE_ECAM,
> >> -    VIRT_HIGH_PCIE_ECAM,
> >>      VIRT_PLATFORM_BUS,
> >> -    VIRT_HIGH_PCIE_MMIO,
> >>      VIRT_GPIO,
> >>      VIRT_SECURE_UART,
> >>      VIRT_SECURE_MEM,
> >> +    VIRT_LOWMEMMAP_LAST,
> >> +};
> >> +
> >> +/* indices of IO regions located after the RAM */
> >> +enum {
> >> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
> >> +    VIRT_HIGH_PCIE_ECAM,
> >> +    VIRT_HIGH_PCIE_MMIO,
> >>  };
> >>  
> >>  typedef enum VirtIOMMUType {
> >> @@ -116,7 +121,7 @@ typedef struct {
> >>      int32_t gic_version;
> >>      VirtIOMMUType iommu;
> >>      struct arm_boot_info bootinfo;
> >> -    const MemMapEntry *memmap;
> >> +    MemMapEntry *memmap;
> >>      const int *irqmap;
> >>      int smp_cpus;
> >>      void *fdt;
> >> @@ -126,6 +131,7 @@ typedef struct {
> >>      uint32_t msi_phandle;
> >>      uint32_t iommu_phandle;
> >>      int psci_conduit;
> >> +    hwaddr high_io_base;
> >>  } VirtMachineState;
> >>  
> >>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)  
> > 
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
@ 2019-02-22 10:18   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 10:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:50 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> On ARM, the kvm_type will be resolved by querying the KVMState.
> Let's add the MachineState handle to the callback so that we
> can retrieve the  KVMState handle. in kvm_init, when the callback
> is called, the kvm_state variable is not yet set.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Acked-by: David Gibson <david@gibson.dropbear.id.au>
> [ppc parts]
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> v6 -> v7:
> - add a comment for kvm_type
> - use machine instead of ms in the declaration
> - add Peter's R-b
> ---
>  accel/kvm/kvm-all.c   | 2 +-
>  hw/ppc/mac_newworld.c | 3 +--
>  hw/ppc/mac_oldworld.c | 2 +-
>  hw/ppc/spapr.c        | 2 +-
>  include/hw/boards.h   | 5 ++++-
>  5 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index fd92b6f375..241db496c3 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1593,7 +1593,7 @@ static int kvm_init(MachineState *ms)
>  
>      kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
>      if (mc->kvm_type) {
> -        type = mc->kvm_type(kvm_type);
> +        type = mc->kvm_type(ms, kvm_type);
>      } else if (kvm_type) {
>          ret = -EINVAL;
>          fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type);
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index 98461052ac..97e8817145 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -564,8 +564,7 @@ static char *core99_fw_dev_path(FWPathProvider *p, BusState *bus,
>  
>      return NULL;
>  }
> -
> -static int core99_kvm_type(const char *arg)
> +static int core99_kvm_type(MachineState *machine, const char *arg)
>  {
>      /* Always force PR KVM */
>      return 2;
> diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
> index 284431ddd6..cc1e463466 100644
> --- a/hw/ppc/mac_oldworld.c
> +++ b/hw/ppc/mac_oldworld.c
> @@ -420,7 +420,7 @@ static char *heathrow_fw_dev_path(FWPathProvider *p, BusState *bus,
>      return NULL;
>  }
>  
> -static int heathrow_kvm_type(const char *arg)
> +static int heathrow_kvm_type(MachineState *machine, const char *arg)
>  {
>      /* Always force PR KVM */
>      return 2;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index abf9ebce59..3d0811fa81 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2920,7 +2920,7 @@ static void spapr_machine_init(MachineState *machine)
>      }
>  }
>  
> -static int spapr_kvm_type(const char *vm_type)
> +static int spapr_kvm_type(MachineState *machine, const char *vm_type)
>  {
>      if (!vm_type) {
>          return 0;
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 05f9f45c3d..ed2fec82d5 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -156,6 +156,9 @@ typedef struct {
>   *    should instead use "unimplemented-device" for all memory ranges where
>   *    the guest will attempt to probe for a device that QEMU doesn't
>   *    implement and a stub device is required.
> + * @kvm_type:
> + *    Return the type of KVM corresponding to the kvm-type string option or
> + *    computed based on other criteria such as the host kernel capabilities.
>   */
>  struct MachineClass {
>      /*< private >*/
> @@ -171,7 +174,7 @@ struct MachineClass {
>      void (*init)(MachineState *state);
>      void (*reset)(void);
>      void (*hot_add_cpu)(const int64_t id, Error **errp);
> -    int (*kvm_type)(const char *arg);
> +    int (*kvm_type)(MachineState *machine, const char *arg);
>  
>      BlockInterfaceType block_default_type;
>      int units_per_default_bus;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier Eric Auger
@ 2019-02-22 10:40   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 10:40 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:52 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> The machine RAM attributes will need to be analyzed during the
> configure_accelerator() process. especially kvm_type() arm64
> machine callback will use them to know how many IPA/GPA bits are
> needed to model the whole RAM range. So let's assign those machine
> state fields before calling configure_accelerator.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> v6 -> v7:
> - add Peter's R-b
> 
> v4: new
> ---
>  vl.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 502857a176..fd0d51320d 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4239,6 +4239,9 @@ int main(int argc, char **argv, char **envp)
>      machine_opts = qemu_get_machine_opts();
>      qemu_opt_foreach(machine_opts, machine_set_property, current_machine,
>                       &error_fatal);
> +    current_machine->ram_size = ram_size;
> +    current_machine->maxram_size = maxram_size;
> +    current_machine->ram_slots = ram_slots;
>  
>      configure_accelerator(current_machine, argv[0]);
>  
> @@ -4434,9 +4437,6 @@ int main(int argc, char **argv, char **envp)
>      replay_checkpoint(CHECKPOINT_INIT);
>      qdev_machine_init();
>  
> -    current_machine->ram_size = ram_size;
> -    current_machine->maxram_size = maxram_size;
> -    current_machine->ram_slots = ram_slots;
>      current_machine->boot_order = boot_order;
>  
>      /* parse features once if machine provides default cpu_type */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine Eric Auger
@ 2019-02-22 12:45   ` Igor Mammedov
  2019-02-22 14:01     ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 12:45 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:54 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> This patch implements the machine class kvm_type() callback.
> It returns the number of bits requested to implement the whole GPA
> range including the RAM and IO regions located beyond.
> The returned value in passed though the KVM_CREATE_VM ioctl and
> this allows KVM to set the stage2 tables dynamically.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v6 -> v7:
> - Introduce RAMBASE and rename add LEGACY_ prefix in that patch
> - use local variables with explicit names in virt_set_memmap:
>   device_memory_base, device_memory_size
> - add an extended_memmap field in the class
> 
> v5 -> v6:
> - add some comments
> - high IO region cannot start before 256GiB
> ---
>  hw/arm/virt.c         | 50 ++++++++++++++++++++++++++++++++++++++++++-
>  include/hw/arm/virt.h |  2 ++
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 9db602457b..ad3a0ad73d 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>      bool aarch64 = true;
>  
> -    virt_set_memmap(vms);
> +    /*
> +     * In accelerated mode, the memory map is computed in kvm_type(),
> +     * if set, to create a VM with the right number of IPA bits.
> +     */
> +
> +    if (!mc->kvm_type || !kvm_enabled()) {
> +        virt_set_memmap(vms);
> +    }
>  
>      /* We can probe only here because during property set
>       * KVM is not available yet
> @@ -1814,6 +1821,36 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>      return NULL;
>  }
>  
> +/*
> + * for arm64 kvm_type [7-0] encodes the requested number of bits
> + * in the IPA address space
> + */
> +static int virt_kvm_type(MachineState *ms, const char *type_str)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> +    int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
> +    int requested_pa_size;
> +
> +    /* we freeze the memory map to compute the highest gpa */
> +    virt_set_memmap(vms);
> +
> +    requested_pa_size = 64 - clz64(vms->highest_gpa);
> +
> +    if (requested_pa_size > max_vm_pa_size) {
> +        error_report("-m and ,maxmem option values "
> +                     "require an IPA range (%d bits) larger than "
> +                     "the one supported by the host (%d bits)",
> +                     requested_pa_size, max_vm_pa_size);
> +       exit(1);
> +    }
> +    /*
> +     * By default we return 0 which corresponds to an implicit legacy
> +     * 40b IPA setting. Otherwise we return the actual requested PA
> +     * logsize
> +     */
> +    return requested_pa_size > 40 ? requested_pa_size : 0;
> +}
> +
>  static void virt_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>      mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
>      mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
> +    mc->kvm_type = virt_kvm_type;
>      assert(!mc->get_hotplug_handler);
>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
>      hc->plug = virt_machine_device_plug_cb;
> @@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
>                                      "Valid values are none and smmuv3",
>                                      NULL);
>  
> +    if (vmc->no_extended_memmap) {
> +        vms->extended_memmap = false;
> +    } else {
> +        vms->extended_memmap = true;
> +    }
> +
>      vms->irqmap = a15irqmap;
>  }
>  
> @@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
>  
>  static void virt_machine_3_1_options(MachineClass *mc)
>  {
> +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
>      virt_machine_4_0_options(mc);
>      compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
> +
> +    /* extended memory map is enabled from 4.0 onwards */
> +    vmc->no_extended_memmap = true;
That's probably was asked in v6,
do we really need this knob?

>  }
>  DEFINE_VIRT_MACHINE(3, 1)
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index acad0400d8..7798462cb0 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -106,6 +106,7 @@ typedef struct {
>      bool claim_edge_triggered_timers;
>      bool smbios_old_sys_ver;
>      bool no_highmem_ecam;
> +    bool no_extended_memmap;
>  } VirtMachineClass;
>  
>  typedef struct {
> @@ -135,6 +136,7 @@ typedef struct {
>      hwaddr highest_gpa;
>      hwaddr device_memory_base;
>      hwaddr device_memory_size;
> +    bool extended_memmap;
>  } VirtMachineState;
>  
>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements Eric Auger
@ 2019-02-22 12:57   ` Igor Mammedov
  2019-02-22 14:06     ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 12:57 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:53 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> Up to now the memory map has been static and the high IO region
> base has always been 256GiB.
> 
> This patch modifies the virt_set_memmap() function, which freezes
> the memory map, so that the high IO range base becomes floating,
> located after the initial RAM and the device memory.
> 
> The function computes
> - the base of the device memory,
> - the size of the device memory and
> - the highest GPA used in the memory map.
> 
> The two former will be used when defining the device memory region
> while the latter will be used at VM creation to choose the requested
> IPA size.
> 
> Setting all the existing highmem IO regions beyond the RAM
> allows to have a single contiguous RAM region (initial RAM and
> possible hotpluggable device memory). That way we do not need
> to do invasive changes in the EDK2 FW to support a dynamic
> RAM base.
> 
> Still the user cannot request an initial RAM size greater than 255GB.
> Also we handle the case where maxmem or slots options are passed,
> although no device memory is usable at the moment. In this case, we
> just ignore those settings.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/arm/virt.c         | 47 ++++++++++++++++++++++++++++++++++---------
>  include/hw/arm/virt.h |  3 +++
>  2 files changed, 41 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 12039a0367..9db602457b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -107,8 +107,9 @@
>   * of a terabyte of RAM will be doing it on a host with more than a
>   * terabyte of physical address space.)
>   */
> -#define RAMLIMIT_GB 255
> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
> +#define RAMBASE GiB
> +#define LEGACY_RAMLIMIT_GB 255
> +#define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
>  
>  /* Addresses and sizes of our components.
>   * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
> @@ -149,7 +150,7 @@ static const MemMapEntry base_memmap[] = {
>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> +    [VIRT_MEM] =                { RAMBASE, LEGACY_RAMLIMIT_BYTES },
>  };
>  
>  /*
> @@ -1367,16 +1368,48 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>  
>  static void virt_set_memmap(VirtMachineState *vms)
>  {
> +    MachineState *ms = MACHINE(vms);
>      hwaddr base;
>      int i;
>  
> +    if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
> +        error_report("mach-virt: does not support device memory: "
> +                     "ignore maxmem and slots options");
> +        ms->maxram_size = ms->ram_size;
> +        ms->ram_slots = 0;
> +    }
> +    if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
> +        error_report("mach-virt: cannot model more than %dGB RAM",
> +                     LEGACY_RAMLIMIT_GB);
> +        exit(1);
> +    }
I'd drop these checks and amend below code so that it would init device_memory
logic when ram_slots > 0. It should simplify follow up patches by dropping
all machine version specific parts.
It shouldn't break old machines as layout stays the same but would allow to
start old machine with pc-dimms which is fine from migration pov as target
also should be able to start QEMU with the same options for migration to start.


> +
>      vms->memmap = extended_memmap;
>  
>      for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
>          vms->memmap[i] = base_memmap[i];
>      }
>  
> -    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> +    /*
> +     * We compute the base of the high IO region depending on the
> +     * amount of initial and device memory. The device memory start/size
> +     * is aligned on 1GiB. We never put the high IO region below 256GiB
> +     * so that if maxram_size is < 255GiB we keep the legacy memory map.
> +     * The device region size assumes 1GiB page max alignment per slot.
> +     */
> +    vms->device_memory_base = ROUND_UP(RAMBASE + ms->ram_size, GiB);
> +    vms->device_memory_size = ms->maxram_size - ms->ram_size +
> +                              ms->ram_slots * GiB;
> +
> +    vms->high_io_base = vms->device_memory_base +
> +                        ROUND_UP(vms->device_memory_size, GiB);
> +    if (vms->high_io_base < vms->device_memory_base) {
> +        error_report("maxmem/slots too huge");
> +        exit(EXIT_FAILURE);
> +    }
> +    if (vms->high_io_base < 256 * GiB) {
> +        vms->high_io_base = 256 * GiB;
> +    }
>      base = vms->high_io_base;
>  
>      for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> @@ -1387,6 +1420,7 @@ static void virt_set_memmap(VirtMachineState *vms)
>          vms->memmap[i].size = size;
>          base += size;
>      }
> +    vms->highest_gpa = base - 1;
>  }
>  
>  static void machvirt_init(MachineState *machine)
> @@ -1470,11 +1504,6 @@ static void machvirt_init(MachineState *machine)
>  
>      vms->smp_cpus = smp_cpus;
>  
> -    if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
> -        exit(1);
> -    }
> -
>      if (vms->virt && kvm_enabled()) {
>          error_report("mach-virt: KVM does not support providing "
>                       "Virtualization extensions to the guest CPU");
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 3dc7a6c5d5..acad0400d8 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -132,6 +132,9 @@ typedef struct {
>      uint32_t iommu_phandle;
>      int psci_conduit;
>      hwaddr high_io_base;
> +    hwaddr highest_gpa;
> +    hwaddr device_memory_base;
> +    hwaddr device_memory_size;
>  } VirtMachineState;
>  
>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework Eric Auger
@ 2019-02-22 13:25   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 13:25 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:56 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> This patch adds the the memory hot-plug/hot-unplug infrastructure
> in machvirt. It is still not enabled as no device memory is allocated.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>

with nit below fixed

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> v4 -> v5:
> - change in pc_dimm_pre_plug signature
> - CONFIG_MEM_HOTPLUG replaced by CONFIG_MEM_DEVICE and CONFIG_DIMM
> 
> v3 -> v4:
> - check the memory device is not hotplugged
> 
> v2 -> v3:
> - change in pc_dimm_plug()'s signature
> - add pc_dimm_pre_plug call
> 
> v1 -> v2:
> - s/virt_dimm_plug|unplug/virt_memory_plug|unplug
> - s/pc_dimm_memory_plug/pc_dimm_plug
> - reworded title and commit message
> - added pre_plug cb
> - don't handle get_memory_region failure anymore
> ---
>  default-configs/arm-softmmu.mak |  2 ++
>  hw/arm/virt.c                   | 64 ++++++++++++++++++++++++++++++++-
>  2 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 734ca721e9..0a78421f72 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -163,3 +163,5 @@ CONFIG_PCI_EXPRESS_DESIGNWARE=y
>  CONFIG_STRONGARM=y
>  CONFIG_HIGHBANK=y
>  CONFIG_MUSICPAL=y
> +CONFIG_MEM_DEVICE=y
> +CONFIG_DIMM=y
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 5b656f9db5..470ca0ce2d 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -60,6 +60,8 @@
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
>  #include "target/arm/internals.h"
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -1804,6 +1806,49 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>      return ms->possible_cpus;
>  }
>  
> +static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                 Error **errp)
> +{
> +    const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> +
> +    if (dev->hotplugged) {
> +        error_setg(errp, "memory hotplug is not supported");
> +    }
> +
> +    if (is_nvdimm) {
> +        error_setg(errp, "nvdimm is not yet supported");
> +        return;
> +    }
> +
> +    pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
> +}
> +
> +static void virt_memory_plug(HotplugHandler *hotplug_dev,
> +                             DeviceState *dev, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +    Error *local_err = NULL;
> +
> +    pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), &local_err);
> +
> +    error_propagate(errp, local_err);
> +}
> +
> +static void virt_memory_unplug(HotplugHandler *hotplug_dev,
> +                               DeviceState *dev, Error **errp)
> +{
> +    pc_dimm_unplug(PC_DIMM(dev), MACHINE(hotplug_dev));
> +    object_unparent(OBJECT(dev));
> +}
> +
> +static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> +                                            DeviceState *dev, Error **errp)
> +{
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        virt_memory_pre_plug(hotplug_dev, dev, errp);
> +    }
> +}
> +
>  static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>                                          DeviceState *dev, Error **errp)
>  {
> @@ -1815,12 +1860,27 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>                                       SYS_BUS_DEVICE(dev));
>          }
>      }
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +            virt_memory_plug(hotplug_dev, dev, errp);
wrong indent

> +    }
> +}
> +
> +static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
> +                                          DeviceState *dev, Error **errp)
> +{
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        virt_memory_unplug(hotplug_dev, dev, errp);
> +    } else {
> +        error_setg(errp, "device unplug request for unsupported device"
> +                   " type: %s", object_get_typename(OBJECT(dev)));
> +    }
>  }
>  
>  static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>                                                          DeviceState *dev)
>  {
> -    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
> +       (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
>          return HOTPLUG_HANDLER(machine);
>      }
>  
> @@ -1884,7 +1944,9 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->kvm_type = virt_kvm_type;
>      assert(!mc->get_hotplug_handler);
>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
> +    hc->pre_plug = virt_machine_device_pre_plug_cb;
>      hc->plug = virt_machine_device_plug_cb;
> +    hc->unplug = virt_machine_device_unplug_cb;
>  }
>  
>  static void virt_instance_init(Object *obj)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
@ 2019-02-22 13:30   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 13:30 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:57 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> 
> This patch adds memory nodes corresponding to PC-DIMM regions.
> 
> NV_DIMM and ACPI_NVDIMM configs are not yet set for ARM so we
  ^^^
git grep says it doesn't exists

> don't need to care about NV-DIMM at this stage.
we use NVDIMM term everywhere in QEMU so fix it up just for consistency

> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
with all comments addressed

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> v6 -> v7:
> - rework the error messages, use a switch/case
> v3 -> v4:
> - git rid of @base and @len in fdt_add_hotpluggable_memory_nodes
> 
> v1 -> v2:
> - added qapi_free_MemoryDeviceInfoList and simplify the loop
> ---
>  hw/arm/boot.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index a830655e1a..255aaca0cf 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -19,6 +19,7 @@
>  #include "sysemu/numa.h"
>  #include "hw/boards.h"
>  #include "hw/loader.h"
> +#include "hw/mem/memory-device.h"
>  #include "elf.h"
>  #include "sysemu/device_tree.h"
>  #include "qemu/config-file.h"
> @@ -522,6 +523,41 @@ static void fdt_add_psci_node(void *fdt)
>      qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
>  }
>  
> +static int fdt_add_hotpluggable_memory_nodes(void *fdt,
> +                                             uint32_t acells, uint32_t scells) {
> +    MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
> +    MemoryDeviceInfo *mi;
> +    int ret = 0;
> +
> +    for (info = info_list; info != NULL; info = info->next) {
> +        mi = info->value;
> +        switch (mi->type) {
> +        case MEMORY_DEVICE_INFO_KIND_DIMM:
> +        {
> +            PCDIMMDeviceInfo *di = mi->u.dimm.data;
> +
> +            ret = fdt_add_memory_node(fdt, acells, di->addr,
> +                                      scells, di->size, di->node);
> +            if (ret) {
> +                fprintf(stderr,
> +                        "couldn't add PCDIMM /memory@%"PRIx64" node\n",
> +                        di->addr);
> +                goto out;
> +            }
> +            break;
> +        }
> +        default:
> +            fprintf(stderr, "%s memory nodes are not yet supported\n",
> +                    MemoryDeviceInfoKind_str(mi->type));
> +            ret = -ENOENT;
> +            goto out;
> +        }
> +    }
> +out:
> +    qapi_free_MemoryDeviceInfoList(info_list);
> +    return ret;
> +}
> +
>  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>                   hwaddr addr_limit, AddressSpace *as)
>  {
> @@ -621,6 +657,12 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          }
>      }
>  
> +    rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells);
> +    if (rc < 0) {
> +            fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
> +            goto fail;
wrong indent

> +    }
> +
>      rc = fdt_path_offset(fdt, "/chosen");
>      if (rc < 0) {
>          qemu_fdt_add_subnode(fdt, "/chosen");

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory
  2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory Eric Auger
@ 2019-02-22 13:48   ` Igor Mammedov
  2019-02-22 14:15     ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 13:48 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Wed, 20 Feb 2019 23:39:59 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> The device memory region is located after the initial RAM.
> its start/size are 1GB aligned.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> 
> ---
> v6 -> v7:
> - check the device memory top does not wrap
> - check the device memory can fit the slots
> 
> v4 -> v5:
> - device memory set after the initial RAM
> 
> v3 -> v4:
> - remove bootinfo.device_memory_start/device_memory_size
> - rename VIRT_HOTPLUG_MEM into VIRT_DEVICE_MEM
> ---
>  hw/arm/virt.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 470ca0ce2d..33ad9b3f63 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -62,6 +62,7 @@
>  #include "target/arm/internals.h"
>  #include "hw/mem/pc-dimm.h"
>  #include "hw/mem/nvdimm.h"
> +#include "hw/acpi/acpi.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -1263,6 +1264,34 @@ static void create_secure_ram(VirtMachineState *vms,
>      g_free(nodename);
>  }
>  
> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
> +{
> +    MachineState *ms = MACHINE(vms);
> +
> +    if (!vms->device_memory_size) {
having vms->device_memory_size seems like duplicate, why not to reuse
memory_region_size(MachineState::device_memory::mr) like we do elsewhere.

Also it would be better to all device_memory allocation/initialization
compact and close to each other like it's done in pc/spapr.

> +        return;
> +    }
> +
> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
> +        error_report("unsupported number of memory slots: %"PRIu64,
> +                     ms->ram_slots);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (QEMU_ALIGN_UP(ms->maxram_size, GiB) != ms->maxram_size) {
> +        error_report("maximum memory size must be GiB aligned");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> +    ms->device_memory->base = vms->device_memory_base;
> +
> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
> +                       "device-memory", vms->device_memory_size);
> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
> +                                &ms->device_memory->mr);
> +}
> +
>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>  {
>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
> @@ -1610,6 +1639,10 @@ static void machvirt_init(MachineState *machine)
>                                           machine->ram_size);
>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>  
> +    if (vms->extended_memmap) {
> +        create_device_memory(vms, sysmem);
> +    }
> +
>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>  
>      create_gic(vms, pic);

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine
  2019-02-22 12:45   ` Igor Mammedov
@ 2019-02-22 14:01     ` Auger Eric
  2019-02-22 14:39       ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-22 14:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

Hi Igor,

On 2/22/19 1:45 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:54 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> This patch implements the machine class kvm_type() callback.
>> It returns the number of bits requested to implement the whole GPA
>> range including the RAM and IO regions located beyond.
>> The returned value in passed though the KVM_CREATE_VM ioctl and
>> this allows KVM to set the stage2 tables dynamically.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v6 -> v7:
>> - Introduce RAMBASE and rename add LEGACY_ prefix in that patch
>> - use local variables with explicit names in virt_set_memmap:
>>   device_memory_base, device_memory_size
>> - add an extended_memmap field in the class
>>
>> v5 -> v6:
>> - add some comments
>> - high IO region cannot start before 256GiB
>> ---
>>  hw/arm/virt.c         | 50 ++++++++++++++++++++++++++++++++++++++++++-
>>  include/hw/arm/virt.h |  2 ++
>>  2 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 9db602457b..ad3a0ad73d 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
>>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>>      bool aarch64 = true;
>>  
>> -    virt_set_memmap(vms);
>> +    /*
>> +     * In accelerated mode, the memory map is computed in kvm_type(),
>> +     * if set, to create a VM with the right number of IPA bits.
>> +     */
>> +
>> +    if (!mc->kvm_type || !kvm_enabled()) {
>> +        virt_set_memmap(vms);
>> +    }
>>  
>>      /* We can probe only here because during property set
>>       * KVM is not available yet
>> @@ -1814,6 +1821,36 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>>      return NULL;
>>  }
>>  
>> +/*
>> + * for arm64 kvm_type [7-0] encodes the requested number of bits
>> + * in the IPA address space
>> + */
>> +static int virt_kvm_type(MachineState *ms, const char *type_str)
>> +{
>> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>> +    int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
>> +    int requested_pa_size;
>> +
>> +    /* we freeze the memory map to compute the highest gpa */
>> +    virt_set_memmap(vms);
>> +
>> +    requested_pa_size = 64 - clz64(vms->highest_gpa);
>> +
>> +    if (requested_pa_size > max_vm_pa_size) {
>> +        error_report("-m and ,maxmem option values "
>> +                     "require an IPA range (%d bits) larger than "
>> +                     "the one supported by the host (%d bits)",
>> +                     requested_pa_size, max_vm_pa_size);
>> +       exit(1);
>> +    }
>> +    /*
>> +     * By default we return 0 which corresponds to an implicit legacy
>> +     * 40b IPA setting. Otherwise we return the actual requested PA
>> +     * logsize
>> +     */
>> +    return requested_pa_size > 40 ? requested_pa_size : 0;
>> +}
>> +
>>  static void virt_machine_class_init(ObjectClass *oc, void *data)
>>  {
>>      MachineClass *mc = MACHINE_CLASS(oc);
>> @@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>>      mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>>      mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
>>      mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
>> +    mc->kvm_type = virt_kvm_type;
>>      assert(!mc->get_hotplug_handler);
>>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
>>      hc->plug = virt_machine_device_plug_cb;
>> @@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
>>                                      "Valid values are none and smmuv3",
>>                                      NULL);
>>  
>> +    if (vmc->no_extended_memmap) {
>> +        vms->extended_memmap = false;
>> +    } else {
>> +        vms->extended_memmap = true;
>> +    }
>> +
>>      vms->irqmap = a15irqmap;
>>  }
>>  
>> @@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
>>  
>>  static void virt_machine_3_1_options(MachineClass *mc)
>>  {
>> +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
>>      virt_machine_4_0_options(mc);
>>      compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
>> +
>> +    /* extended memory map is enabled from 4.0 onwards */
>> +    vmc->no_extended_memmap = true;
> That's probably was asked in v6,
> do we really need this knob?
Yes the point was raised by Peter and I replied by another question ;-) "
But don't we want to forbid any pre-4.0 machvirt to run with more than
255GiB RAM?
"
without this knob:
- pre-4.0 machines will gain the capability to support more than 255GB
initial RAM if the kernel supports dynamic IPA setting
- pre-4.0 machines will gain PCDIMM/NVDIMM support
- another concern, maxmem and slots were not checked previously. If for
some reason - without instantiating the actual slots -, the user
specified it this was previously ignored. Now this is not anymore as
both parameters allow to compute the requested IPA range. So this may
fail now.
So I thought this was clearer to disable all the above for pre-4.0
machines. However if both of you agree, I will remove it.

thanks

Eric
> 
>>  }
>>  DEFINE_VIRT_MACHINE(3, 1)
>>  
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index acad0400d8..7798462cb0 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -106,6 +106,7 @@ typedef struct {
>>      bool claim_edge_triggered_timers;
>>      bool smbios_old_sys_ver;
>>      bool no_highmem_ecam;
>> +    bool no_extended_memmap;
>>  } VirtMachineClass;
>>  
>>  typedef struct {
>> @@ -135,6 +136,7 @@ typedef struct {
>>      hwaddr highest_gpa;
>>      hwaddr device_memory_base;
>>      hwaddr device_memory_size;
>> +    bool extended_memmap;
>>  } VirtMachineState;
>>  
>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements
  2019-02-22 12:57   ` Igor Mammedov
@ 2019-02-22 14:06     ` Auger Eric
  2019-02-22 14:23       ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-22 14:06 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

Hi Igor,

On 2/22/19 1:57 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:53 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> Up to now the memory map has been static and the high IO region
>> base has always been 256GiB.
>>
>> This patch modifies the virt_set_memmap() function, which freezes
>> the memory map, so that the high IO range base becomes floating,
>> located after the initial RAM and the device memory.
>>
>> The function computes
>> - the base of the device memory,
>> - the size of the device memory and
>> - the highest GPA used in the memory map.
>>
>> The two former will be used when defining the device memory region
>> while the latter will be used at VM creation to choose the requested
>> IPA size.
>>
>> Setting all the existing highmem IO regions beyond the RAM
>> allows to have a single contiguous RAM region (initial RAM and
>> possible hotpluggable device memory). That way we do not need
>> to do invasive changes in the EDK2 FW to support a dynamic
>> RAM base.
>>
>> Still the user cannot request an initial RAM size greater than 255GB.
>> Also we handle the case where maxmem or slots options are passed,
>> although no device memory is usable at the moment. In this case, we
>> just ignore those settings.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/arm/virt.c         | 47 ++++++++++++++++++++++++++++++++++---------
>>  include/hw/arm/virt.h |  3 +++
>>  2 files changed, 41 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 12039a0367..9db602457b 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -107,8 +107,9 @@
>>   * of a terabyte of RAM will be doing it on a host with more than a
>>   * terabyte of physical address space.)
>>   */
>> -#define RAMLIMIT_GB 255
>> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
>> +#define RAMBASE GiB
>> +#define LEGACY_RAMLIMIT_GB 255
>> +#define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
>>  
>>  /* Addresses and sizes of our components.
>>   * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
>> @@ -149,7 +150,7 @@ static const MemMapEntry base_memmap[] = {
>>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>> +    [VIRT_MEM] =                { RAMBASE, LEGACY_RAMLIMIT_BYTES },
>>  };
>>  
>>  /*
>> @@ -1367,16 +1368,48 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>  
>>  static void virt_set_memmap(VirtMachineState *vms)
>>  {
>> +    MachineState *ms = MACHINE(vms);
>>      hwaddr base;
>>      int i;
>>  
>> +    if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
>> +        error_report("mach-virt: does not support device memory: "
>> +                     "ignore maxmem and slots options");
>> +        ms->maxram_size = ms->ram_size;
>> +        ms->ram_slots = 0;
>> +    }
>> +    if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
>> +        error_report("mach-virt: cannot model more than %dGB RAM",
>> +                     LEGACY_RAMLIMIT_GB);
>> +        exit(1);
>> +    }
> I'd drop these checks and amend below code so that it would init device_memory
> logic when ram_slots > 0. It should simplify follow up patches by dropping
> all machine version specific parts.
I don't have sufficient knowledge of virtio-mem/virtio-pmem. Do they
also use slots?


> It shouldn't break old machines as layout stays the same but would allow to
> start old machine with pc-dimms which is fine from migration pov as target
> also should be able to start QEMU with the same options for migration to start.
> 
> 
>> +
>>      vms->memmap = extended_memmap;
>>  
>>      for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
>>          vms->memmap[i] = base_memmap[i];
>>      }
>>  
>> -    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
>> +    /*
>> +     * We compute the base of the high IO region depending on the
>> +     * amount of initial and device memory. The device memory start/size
>> +     * is aligned on 1GiB. We never put the high IO region below 256GiB
>> +     * so that if maxram_size is < 255GiB we keep the legacy memory map.
>> +     * The device region size assumes 1GiB page max alignment per slot.
>> +     */
>> +    vms->device_memory_base = ROUND_UP(RAMBASE + ms->ram_size, GiB);
>> +    vms->device_memory_size = ms->maxram_size - ms->ram_size +
>> +                              ms->ram_slots * GiB;
So does everyone agree on this device memory size computation? I would
like to make sure this is future proof. Do I need to add a machine
option like on x86 to enforce slot alignment or is it OK?

Thanks

Eric
>> +
>> +    vms->high_io_base = vms->device_memory_base +
>> +                        ROUND_UP(vms->device_memory_size, GiB);
>> +    if (vms->high_io_base < vms->device_memory_base) {
>> +        error_report("maxmem/slots too huge");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +    if (vms->high_io_base < 256 * GiB) {
>> +        vms->high_io_base = 256 * GiB;
>> +    }
>>      base = vms->high_io_base;
>>  
>>      for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
>> @@ -1387,6 +1420,7 @@ static void virt_set_memmap(VirtMachineState *vms)
>>          vms->memmap[i].size = size;
>>          base += size;
>>      }
>> +    vms->highest_gpa = base - 1;
>>  }
>>  
>>  static void machvirt_init(MachineState *machine)
>> @@ -1470,11 +1504,6 @@ static void machvirt_init(MachineState *machine)
>>  
>>      vms->smp_cpus = smp_cpus;
>>  
>> -    if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
>> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
>> -        exit(1);
>> -    }
>> -
>>      if (vms->virt && kvm_enabled()) {
>>          error_report("mach-virt: KVM does not support providing "
>>                       "Virtualization extensions to the guest CPU");
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 3dc7a6c5d5..acad0400d8 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -132,6 +132,9 @@ typedef struct {
>>      uint32_t iommu_phandle;
>>      int psci_conduit;
>>      hwaddr high_io_base;
>> +    hwaddr highest_gpa;
>> +    hwaddr device_memory_base;
>> +    hwaddr device_memory_size;
>>  } VirtMachineState;
>>  
>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory
  2019-02-22 13:48   ` Igor Mammedov
@ 2019-02-22 14:15     ` Auger Eric
  2019-02-22 14:58       ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-22 14:15 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

Hi Igor,

On 2/22/19 2:48 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:59 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> The device memory region is located after the initial RAM.
>> its start/size are 1GB aligned.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
>>
>> ---
>> v6 -> v7:
>> - check the device memory top does not wrap
>> - check the device memory can fit the slots
>>
>> v4 -> v5:
>> - device memory set after the initial RAM
>>
>> v3 -> v4:
>> - remove bootinfo.device_memory_start/device_memory_size
>> - rename VIRT_HOTPLUG_MEM into VIRT_DEVICE_MEM
>> ---
>>  hw/arm/virt.c | 33 +++++++++++++++++++++++++++++++++
>>  1 file changed, 33 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 470ca0ce2d..33ad9b3f63 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -62,6 +62,7 @@
>>  #include "target/arm/internals.h"
>>  #include "hw/mem/pc-dimm.h"
>>  #include "hw/mem/nvdimm.h"
>> +#include "hw/acpi/acpi.h"
>>  
>>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>> @@ -1263,6 +1264,34 @@ static void create_secure_ram(VirtMachineState *vms,
>>      g_free(nodename);
>>  }
>>  
>> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
>> +{
>> +    MachineState *ms = MACHINE(vms);
>> +
>> +    if (!vms->device_memory_size) {
> having vms->device_memory_size seems like duplicate, why not to reuse
> memory_region_size(MachineState::device_memory::mr) like we do elsewhere.
OK so you mean allocating the ms->device_memory in virt_setmmap? In that
case, I wonder if all that code shouldn't land in set_memmap directly?
> 
> Also it would be better to all device_memory allocation/initialization
> compact and close to each other like it's done in pc/spapr.
I am not sure I get your point here. alloc & init are done in this
function. Or do you mean I should rather move that into set_memmap?

Thanks

Eric
> 
>> +        return;
>> +    }
>> +
>> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
>> +        error_report("unsupported number of memory slots: %"PRIu64,
>> +                     ms->ram_slots);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (QEMU_ALIGN_UP(ms->maxram_size, GiB) != ms->maxram_size) {
>> +        error_report("maximum memory size must be GiB aligned");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
>> +    ms->device_memory->base = vms->device_memory_base;
>> +
>> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
>> +                       "device-memory", vms->device_memory_size);
>> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
>> +                                &ms->device_memory->mr);
>> +}
>> +
>>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
>>  {
>>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
>> @@ -1610,6 +1639,10 @@ static void machvirt_init(MachineState *machine)
>>                                           machine->ram_size);
>>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
>>  
>> +    if (vms->extended_memmap) {
>> +        create_device_memory(vms, sysmem);
>> +    }
>> +
>>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
>>  
>>      create_gic(vms, pic);
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements
  2019-02-22 14:06     ` Auger Eric
@ 2019-02-22 14:23       ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 14:23 UTC (permalink / raw)
  To: Auger Eric
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones

On Fri, 22 Feb 2019 15:06:14 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 1:57 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:53 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> Up to now the memory map has been static and the high IO region
> >> base has always been 256GiB.
> >>
> >> This patch modifies the virt_set_memmap() function, which freezes
> >> the memory map, so that the high IO range base becomes floating,
> >> located after the initial RAM and the device memory.
> >>
> >> The function computes
> >> - the base of the device memory,
> >> - the size of the device memory and
> >> - the highest GPA used in the memory map.
> >>
> >> The two former will be used when defining the device memory region
> >> while the latter will be used at VM creation to choose the requested
> >> IPA size.
> >>
> >> Setting all the existing highmem IO regions beyond the RAM
> >> allows to have a single contiguous RAM region (initial RAM and
> >> possible hotpluggable device memory). That way we do not need
> >> to do invasive changes in the EDK2 FW to support a dynamic
> >> RAM base.
> >>
> >> Still the user cannot request an initial RAM size greater than 255GB.
> >> Also we handle the case where maxmem or slots options are passed,
> >> although no device memory is usable at the moment. In this case, we
> >> just ignore those settings.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> ---
> >>  hw/arm/virt.c         | 47 ++++++++++++++++++++++++++++++++++---------
> >>  include/hw/arm/virt.h |  3 +++
> >>  2 files changed, 41 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 12039a0367..9db602457b 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -107,8 +107,9 @@
> >>   * of a terabyte of RAM will be doing it on a host with more than a
> >>   * terabyte of physical address space.)
> >>   */
> >> -#define RAMLIMIT_GB 255
> >> -#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
> >> +#define RAMBASE GiB
> >> +#define LEGACY_RAMLIMIT_GB 255
> >> +#define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
> >>  
> >>  /* Addresses and sizes of our components.
> >>   * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
> >> @@ -149,7 +150,7 @@ static const MemMapEntry base_memmap[] = {
> >>      [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
> >>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
> >>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> >> -    [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> >> +    [VIRT_MEM] =                { RAMBASE, LEGACY_RAMLIMIT_BYTES },
> >>  };
> >>  
> >>  /*
> >> @@ -1367,16 +1368,48 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
> >>  
> >>  static void virt_set_memmap(VirtMachineState *vms)
> >>  {
> >> +    MachineState *ms = MACHINE(vms);
> >>      hwaddr base;
> >>      int i;
> >>  
> >> +    if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
> >> +        error_report("mach-virt: does not support device memory: "
> >> +                     "ignore maxmem and slots options");
> >> +        ms->maxram_size = ms->ram_size;
> >> +        ms->ram_slots = 0;
> >> +    }
> >> +    if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
> >> +        error_report("mach-virt: cannot model more than %dGB RAM",
> >> +                     LEGACY_RAMLIMIT_GB);
> >> +        exit(1);
> >> +    }  
> > I'd drop these checks and amend below code so that it would init device_memory
> > logic when ram_slots > 0. It should simplify follow up patches by dropping
> > all machine version specific parts.  
> I don't have sufficient knowledge of virtio-mem/virtio-pmem. Do they
> also use slots?
if they don't then they will have to take care of it for every machine
anyway so I'd dismiss that from consideration.

> 
> 
> > It shouldn't break old machines as layout stays the same but would allow to
> > start old machine with pc-dimms which is fine from migration pov as target
> > also should be able to start QEMU with the same options for migration to start.
> > 
> >   
> >> +
> >>      vms->memmap = extended_memmap;
> >>  
> >>      for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
> >>          vms->memmap[i] = base_memmap[i];
> >>      }
> >>  
> >> -    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> >> +    /*
> >> +     * We compute the base of the high IO region depending on the
> >> +     * amount of initial and device memory. The device memory start/size
> >> +     * is aligned on 1GiB. We never put the high IO region below 256GiB
> >> +     * so that if maxram_size is < 255GiB we keep the legacy memory map.
> >> +     * The device region size assumes 1GiB page max alignment per slot.
> >> +     */
> >> +    vms->device_memory_base = ROUND_UP(RAMBASE + ms->ram_size, GiB);
> >> +    vms->device_memory_size = ms->maxram_size - ms->ram_size +
> >> +                              ms->ram_slots * GiB;  
> So does everyone agree on this device memory size computation? I would
> like to make sure this is future proof. Do I need to add a machine
> option like on x86 to enforce slot alignment or is it OK?
computation looks fine to me,
Though, I'd try not duplicate data in vms->device_memory_base and vms->device_memory_size
as vms->device_memory is sufficient, see comment in 13/17

> 
> Thanks
> 
> Eric
> >> +
> >> +    vms->high_io_base = vms->device_memory_base +
> >> +                        ROUND_UP(vms->device_memory_size, GiB);
> >> +    if (vms->high_io_base < vms->device_memory_base) {
> >> +        error_report("maxmem/slots too huge");
> >> +        exit(EXIT_FAILURE);
> >> +    }
> >> +    if (vms->high_io_base < 256 * GiB) {
> >> +        vms->high_io_base = 256 * GiB;
> >> +    }
> >>      base = vms->high_io_base;
> >>  
> >>      for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> >> @@ -1387,6 +1420,7 @@ static void virt_set_memmap(VirtMachineState *vms)
> >>          vms->memmap[i].size = size;
> >>          base += size;
> >>      }
> >> +    vms->highest_gpa = base - 1;
> >>  }
> >>  
> >>  static void machvirt_init(MachineState *machine)
> >> @@ -1470,11 +1504,6 @@ static void machvirt_init(MachineState *machine)
> >>  
> >>      vms->smp_cpus = smp_cpus;
> >>  
> >> -    if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
> >> -        error_report("mach-virt: cannot model more than %dGB RAM", RAMLIMIT_GB);
> >> -        exit(1);
> >> -    }
> >> -
> >>      if (vms->virt && kvm_enabled()) {
> >>          error_report("mach-virt: KVM does not support providing "
> >>                       "Virtualization extensions to the guest CPU");
> >> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> >> index 3dc7a6c5d5..acad0400d8 100644
> >> --- a/include/hw/arm/virt.h
> >> +++ b/include/hw/arm/virt.h
> >> @@ -132,6 +132,9 @@ typedef struct {
> >>      uint32_t iommu_phandle;
> >>      int psci_conduit;
> >>      hwaddr high_io_base;
> >> +    hwaddr highest_gpa;
> >> +    hwaddr device_memory_base;
> >> +    hwaddr device_memory_size;
> >>  } VirtMachineState;
> >>  
> >>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)  
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-22 10:15       ` Igor Mammedov
@ 2019-02-22 14:28         ` Auger Eric
  2019-02-22 14:51           ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-22 14:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, dgilbert,
	shameerali.kolothum.thodi, qemu-devel, qemu-arm, eric.auger.pro,
	david

Hi Igor,

On 2/22/19 11:15 AM, Igor Mammedov wrote:
> On Thu, 21 Feb 2019 18:21:11 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>> On 2/21/19 5:19 PM, Igor Mammedov wrote:
>>> On Wed, 20 Feb 2019 23:39:49 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>   
>>>> In the prospect to introduce an extended memory map supporting more
>>>> RAM, let's split the memory map array into two parts:
>>>>
>>>> - the former a15memmap contains regions below and including the RAM  
>>>   
>>>> - extended_memmap, only initialized with entries located after the RAM.
>>>>   Only the size of the region is initialized there since their base
>>>>   address will be dynamically computed, depending on the top of the
>>>>   RAM (initial RAM at the moment), with same alignment as their size.  
>>> can't parse this part and pinpoint what is 'their', care to rephrase?  
>> Only the size of the High IO region entries is initialized (there are
>> currently 3 entries:  VIRT_HIGH_GIC_REDIST2, VIRT_HIGH_PCIE_ECAM,
>> VIRT_HIGH_PCIE_MMIO). The base address is dynamically computed so it is
>> not initialized.
>>>
>>>   
>>>> This new split will allow to grow the RAM size without changing the
>>>> description of the high regions.
>>>>
>>>> The patch also moves the memory map setup  
>>> s/moves/makes/
>>> s/$/dynamic and moves it/
>>>   
>>>> into machvirt_init().  
>>>   
>>>> The rationale is the memory map will be soon affected by the  
>>>   
>>>> kvm_type() call that happens after virt_instance_init() and  
>>> is dependency on kvm_type() still valid,
>>> shouldn't split memmap work for TCG just fine as well?  
>> See in 08/17: in TCG mode the memory map  will be "frozen" (set_memmap)
>> in machvirt_init. Otherwise set_memmap is called from kvm_type().
>>
>> Split memmap works both in TCG and in accelerated mode.
>>
>> I will rephrase the commit message.
>>>   
>>>> before machvirt_init().
>>>>
>>>> The memory map is unchanged (the top of the initial RAM still is
>>>> 256GiB). Then come the high IO regions with same layout as before.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>>>
>>>> ---
>>>> v6 -> v7:
>>>> - s/a15memmap/base_memmap
>>>> - slight rewording of the commit message
>>>> - add "if there is less than 256GiB of RAM then the floating area
>>>>   starts at the 256GiB mark" in the comment associated to the floating
>>>>   memory map
>>>> - Added Peter's R-b
>>>>
>>>> v5 -> v6
>>>> - removal of many macros in units.h
>>>> - introduce the virt_set_memmap helper
>>>> - new computation for offsets of high IO regions
>>>> - add comments
>>>> ---
>>>>  hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
>>>>  include/hw/arm/virt.h | 14 +++++++++----
>>>>  2 files changed, 52 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>> index a1955e7764..12039a0367 100644
>>>> --- a/hw/arm/virt.c
>>>> +++ b/hw/arm/virt.c
>>>> @@ -29,6 +29,7 @@
>>>>   */
>>>>  
>>>>  #include "qemu/osdep.h"
>>>> +#include "qemu/units.h"
>>>>  #include "qapi/error.h"
>>>>  #include "hw/sysbus.h"
>>>>  #include "hw/arm/arm.h"
>>>> @@ -121,7 +122,7 @@
>>>>   * Note that devices should generally be placed at multiples of 0x10000,
>>>>   * to accommodate guests using 64K pages.
>>>>   */
>>>> -static const MemMapEntry a15memmap[] = {
>>>> +static const MemMapEntry base_memmap[] = {
>>>>      /* Space up to 0x8000000 is reserved for a boot ROM */
>>>>      [VIRT_FLASH] =              {          0, 0x08000000 },
>>>>      [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
>>>> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
>>>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
>>>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
>>>>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
>>>> +};
>>>> +
>>>> +/*
>>>> + * Highmem IO Regions: This memory map is floating, located after the RAM.
>>>> + * Each IO region offset will be dynamically computed, depending on the  
>>> s/IO region offset/MemMapEntry base (GPA)/  
>>>> + * top of the RAM, so that its base get the same alignment as the size,  
>>>   
>>>> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is  
>>> s/region/entry/
>>>   
>>>> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
>>>> + */
>>>> +static MemMapEntry extended_memmap[] = {
>>>>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
>>>> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
>>>> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
>>>> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
>>>> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
>>>> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
>>>> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
>>>> +    /* Second PCIe window */
>>>> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>>>>  };
>>>>  
>>>>  static const int a15irqmap[] = {
>>>> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>>>      return arm_cpu_mp_affinity(idx, clustersz);
>>>>  }
>>>>  
>>>> +static void virt_set_memmap(VirtMachineState *vms)
>>>> +{
>>>> +    hwaddr base;
>>>> +    int i;
>>>> +
>>>> +    vms->memmap = extended_memmap;  
>>> I probably don't see something but ...
>>>   
>>>> +
>>>> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
>>>> +        vms->memmap[i] = base_memmap[i];  
>>>
>>> ARRAY_SIZE(base_memmap) > 3
>>> ARRAY_SIZE(extended_memmap) == 3
>>> as result shouldn't we observe OOB at vms->memmap[i] access
>>> starting from i==3 ?  
>> ARRAY_SIZE(extended_memmap) = ARRAY_SIZE(base_memmap) + 3
>> VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST is what you miss.
> Yep, that's the trick.
> It is too much subtle for my taste,
> is it possible to make extended_memmap sizing more explicit/trivial,
> so one could see it right away without figuring out how indexes influence array size?
The issue is if we explicitly size the array is will need to change the
size whenever we add a new entry. I don't see any better solution to be
honest. I can definitively add comments in the code about this sizing
aspect. Another solution is to merge the 2 arrays as suggested by Heyi
but I dislike the fact one part is initialized in one way and the other
is initialized another way?

Thanks

Eric

> 
> 
>> /* indices of IO regions located after the RAM */
>> enum {
>>     VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
>>     VIRT_HIGH_PCIE_ECAM,
>>     VIRT_HIGH_PCIE_MMIO,
>> };
>>
>>>   
>>>> +    }
>>>> +
>>>> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
>>>> +    base = vms->high_io_base;
>>>> +
>>>> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {  
>>> not sure why VIRT_LOWMEMMAP_LAST is needed at all, one could just continue
>>> with current 'i' value, provided extended_memmap wasn't corrupted by previous
>>> loop.  
>> Yep maybe. But I think it is less error prone like this if someone later
>> on adds some intermediate manipulation on i.
>>> And does this loop ever executes? VIRT_LOWMEMMAP_LAST > ARRAY_SIZE(extended_memmap)  
>> yes it does
>>
>> Thanks
>>
>> Eric
>>>   
>>>> +        hwaddr size = extended_memmap[i].size;
>>>> +
>>>> +        base = ROUND_UP(base, size);
>>>> +        vms->memmap[i].base = base;
>>>> +        vms->memmap[i].size = size;
>>>> +        base += size;
>>>> +    }
>>>> +}
>>>> +
>>>>  static void machvirt_init(MachineState *machine)
>>>>  {
>>>>      VirtMachineState *vms = VIRT_MACHINE(machine);
>>>> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
>>>>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>>>>      bool aarch64 = true;
>>>>  
>>>> +    virt_set_memmap(vms);
>>>> +
>>>>      /* We can probe only here because during property set
>>>>       * KVM is not available yet
>>>>       */
>>>> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
>>>>                                      "Valid values are none and smmuv3",
>>>>                                      NULL);
>>>>  
>>>> -    vms->memmap = a15memmap;
>>>>      vms->irqmap = a15irqmap;
>>>>  }
>>>>  
>>>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>>>> index a27086d524..3dc7a6c5d5 100644
>>>> --- a/include/hw/arm/virt.h
>>>> +++ b/include/hw/arm/virt.h
>>>> @@ -64,7 +64,6 @@ enum {
>>>>      VIRT_GIC_VCPU,
>>>>      VIRT_GIC_ITS,
>>>>      VIRT_GIC_REDIST,
>>>> -    VIRT_HIGH_GIC_REDIST2,
>>>>      VIRT_SMMU,
>>>>      VIRT_UART,
>>>>      VIRT_MMIO,
>>>> @@ -74,12 +73,18 @@ enum {
>>>>      VIRT_PCIE_MMIO,
>>>>      VIRT_PCIE_PIO,
>>>>      VIRT_PCIE_ECAM,
>>>> -    VIRT_HIGH_PCIE_ECAM,
>>>>      VIRT_PLATFORM_BUS,
>>>> -    VIRT_HIGH_PCIE_MMIO,
>>>>      VIRT_GPIO,
>>>>      VIRT_SECURE_UART,
>>>>      VIRT_SECURE_MEM,
>>>> +    VIRT_LOWMEMMAP_LAST,
>>>> +};
>>>> +
>>>> +/* indices of IO regions located after the RAM */
>>>> +enum {
>>>> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
>>>> +    VIRT_HIGH_PCIE_ECAM,
>>>> +    VIRT_HIGH_PCIE_MMIO,
>>>>  };
>>>>  
>>>>  typedef enum VirtIOMMUType {
>>>> @@ -116,7 +121,7 @@ typedef struct {
>>>>      int32_t gic_version;
>>>>      VirtIOMMUType iommu;
>>>>      struct arm_boot_info bootinfo;
>>>> -    const MemMapEntry *memmap;
>>>> +    MemMapEntry *memmap;
>>>>      const int *irqmap;
>>>>      int smp_cpus;
>>>>      void *fdt;
>>>> @@ -126,6 +131,7 @@ typedef struct {
>>>>      uint32_t msi_phandle;
>>>>      uint32_t iommu_phandle;
>>>>      int psci_conduit;
>>>> +    hwaddr high_io_base;
>>>>  } VirtMachineState;
>>>>  
>>>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)  
>>>
>>>   
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine
  2019-02-22 14:01     ` Auger Eric
@ 2019-02-22 14:39       ` Igor Mammedov
  2019-02-22 14:53         ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 14:39 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

On Fri, 22 Feb 2019 15:01:25 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 1:45 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:54 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> This patch implements the machine class kvm_type() callback.
> >> It returns the number of bits requested to implement the whole GPA
> >> range including the RAM and IO regions located beyond.
> >> The returned value in passed though the KVM_CREATE_VM ioctl and
> >> this allows KVM to set the stage2 tables dynamically.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>
> >> ---
> >>
> >> v6 -> v7:
> >> - Introduce RAMBASE and rename add LEGACY_ prefix in that patch
> >> - use local variables with explicit names in virt_set_memmap:
> >>   device_memory_base, device_memory_size
> >> - add an extended_memmap field in the class
> >>
> >> v5 -> v6:
> >> - add some comments
> >> - high IO region cannot start before 256GiB
> >> ---
> >>  hw/arm/virt.c         | 50 ++++++++++++++++++++++++++++++++++++++++++-
> >>  include/hw/arm/virt.h |  2 ++
> >>  2 files changed, 51 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 9db602457b..ad3a0ad73d 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
> >>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
> >>      bool aarch64 = true;
> >>  
> >> -    virt_set_memmap(vms);
> >> +    /*
> >> +     * In accelerated mode, the memory map is computed in kvm_type(),
> >> +     * if set, to create a VM with the right number of IPA bits.
> >> +     */
> >> +
> >> +    if (!mc->kvm_type || !kvm_enabled()) {
> >> +        virt_set_memmap(vms);
> >> +    }
> >>  
> >>      /* We can probe only here because during property set
> >>       * KVM is not available yet
> >> @@ -1814,6 +1821,36 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
> >>      return NULL;
> >>  }
> >>  
> >> +/*
> >> + * for arm64 kvm_type [7-0] encodes the requested number of bits
> >> + * in the IPA address space
> >> + */
> >> +static int virt_kvm_type(MachineState *ms, const char *type_str)
> >> +{
> >> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> >> +    int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
> >> +    int requested_pa_size;
> >> +
> >> +    /* we freeze the memory map to compute the highest gpa */
> >> +    virt_set_memmap(vms);
> >> +
> >> +    requested_pa_size = 64 - clz64(vms->highest_gpa);
> >> +
> >> +    if (requested_pa_size > max_vm_pa_size) {
> >> +        error_report("-m and ,maxmem option values "
> >> +                     "require an IPA range (%d bits) larger than "
> >> +                     "the one supported by the host (%d bits)",
> >> +                     requested_pa_size, max_vm_pa_size);
> >> +       exit(1);
> >> +    }
> >> +    /*
> >> +     * By default we return 0 which corresponds to an implicit legacy
> >> +     * 40b IPA setting. Otherwise we return the actual requested PA
> >> +     * logsize
> >> +     */
> >> +    return requested_pa_size > 40 ? requested_pa_size : 0;
> >> +}
> >> +
> >>  static void virt_machine_class_init(ObjectClass *oc, void *data)
> >>  {
> >>      MachineClass *mc = MACHINE_CLASS(oc);
> >> @@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
> >>      mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
> >>      mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
> >>      mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
> >> +    mc->kvm_type = virt_kvm_type;
> >>      assert(!mc->get_hotplug_handler);
> >>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
> >>      hc->plug = virt_machine_device_plug_cb;
> >> @@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
> >>                                      "Valid values are none and smmuv3",
> >>                                      NULL);
> >>  
> >> +    if (vmc->no_extended_memmap) {
> >> +        vms->extended_memmap = false;
> >> +    } else {
> >> +        vms->extended_memmap = true;
> >> +    }
> >> +
> >>      vms->irqmap = a15irqmap;
> >>  }
> >>  
> >> @@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
> >>  
> >>  static void virt_machine_3_1_options(MachineClass *mc)
> >>  {
> >> +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
> >>      virt_machine_4_0_options(mc);
> >>      compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
> >> +
> >> +    /* extended memory map is enabled from 4.0 onwards */
> >> +    vmc->no_extended_memmap = true;  
> > That's probably was asked in v6,
> > do we really need this knob?  
> Yes the point was raised by Peter and I replied by another question ;-) "
> But don't we want to forbid any pre-4.0 machvirt to run with more than
> 255GiB RAM?
> "
> without this knob:
> - pre-4.0 machines will gain the capability to support more than 255GB
> initial RAM if the kernel supports dynamic IPA setting
should be fine, you shouldn't be able to start old QEMU with more than 255Gb

> - pre-4.0 machines will gain PCDIMM/NVDIMM support
ditto

> - another concern, maxmem and slots were not checked previously. If for
> some reason - without instantiating the actual slots -, the user
> specified it this was previously ignored. Now this is not anymore as
> both parameters allow to compute the requested IPA range. So this may
> fail now.
well that's in category of a broken setup even if QEMU didn't complain about
it before. User should fix it instead of QEMU supporting madness.
(I think that such invariant doesn't even deserve deprecation process)

> So I thought this was clearer to disable all the above for pre-4.0
> machines. However if both of you agree, I will remove it.
> 
> thanks
> 
> Eric
> >   
> >>  }
> >>  DEFINE_VIRT_MACHINE(3, 1)
> >>  
> >> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> >> index acad0400d8..7798462cb0 100644
> >> --- a/include/hw/arm/virt.h
> >> +++ b/include/hw/arm/virt.h
> >> @@ -106,6 +106,7 @@ typedef struct {
> >>      bool claim_edge_triggered_timers;
> >>      bool smbios_old_sys_ver;
> >>      bool no_highmem_ecam;
> >> +    bool no_extended_memmap;
> >>  } VirtMachineClass;
> >>  
> >>  typedef struct {
> >> @@ -135,6 +136,7 @@ typedef struct {
> >>      hwaddr highest_gpa;
> >>      hwaddr device_memory_base;
> >>      hwaddr device_memory_size;
> >> +    bool extended_memmap;
> >>  } VirtMachineState;
> >>  
> >>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)  
> >   
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description
  2019-02-22 14:28         ` Auger Eric
@ 2019-02-22 14:51           ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 14:51 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, dgilbert,
	shameerali.kolothum.thodi, qemu-devel, qemu-arm, eric.auger.pro,
	david

On Fri, 22 Feb 2019 15:28:20 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 11:15 AM, Igor Mammedov wrote:
> > On Thu, 21 Feb 2019 18:21:11 +0100
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi Igor,
> >> On 2/21/19 5:19 PM, Igor Mammedov wrote:  
> >>> On Wed, 20 Feb 2019 23:39:49 +0100
> >>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>     
> >>>> In the prospect to introduce an extended memory map supporting more
> >>>> RAM, let's split the memory map array into two parts:
> >>>>
> >>>> - the former a15memmap contains regions below and including the RAM    
> >>>     
> >>>> - extended_memmap, only initialized with entries located after the RAM.
> >>>>   Only the size of the region is initialized there since their base
> >>>>   address will be dynamically computed, depending on the top of the
> >>>>   RAM (initial RAM at the moment), with same alignment as their size.    
> >>> can't parse this part and pinpoint what is 'their', care to rephrase?    
> >> Only the size of the High IO region entries is initialized (there are
> >> currently 3 entries:  VIRT_HIGH_GIC_REDIST2, VIRT_HIGH_PCIE_ECAM,
> >> VIRT_HIGH_PCIE_MMIO). The base address is dynamically computed so it is
> >> not initialized.  
> >>>
> >>>     
> >>>> This new split will allow to grow the RAM size without changing the
> >>>> description of the high regions.
> >>>>
> >>>> The patch also moves the memory map setup    
> >>> s/moves/makes/
> >>> s/$/dynamic and moves it/
> >>>     
> >>>> into machvirt_init().    
> >>>     
> >>>> The rationale is the memory map will be soon affected by the    
> >>>     
> >>>> kvm_type() call that happens after virt_instance_init() and    
> >>> is dependency on kvm_type() still valid,
> >>> shouldn't split memmap work for TCG just fine as well?    
> >> See in 08/17: in TCG mode the memory map  will be "frozen" (set_memmap)
> >> in machvirt_init. Otherwise set_memmap is called from kvm_type().
> >>
> >> Split memmap works both in TCG and in accelerated mode.
> >>
> >> I will rephrase the commit message.  
> >>>     
> >>>> before machvirt_init().
> >>>>
> >>>> The memory map is unchanged (the top of the initial RAM still is
> >>>> 256GiB). Then come the high IO regions with same layout as before.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >>>>
> >>>> ---
> >>>> v6 -> v7:
> >>>> - s/a15memmap/base_memmap
> >>>> - slight rewording of the commit message
> >>>> - add "if there is less than 256GiB of RAM then the floating area
> >>>>   starts at the 256GiB mark" in the comment associated to the floating
> >>>>   memory map
> >>>> - Added Peter's R-b
> >>>>
> >>>> v5 -> v6
> >>>> - removal of many macros in units.h
> >>>> - introduce the virt_set_memmap helper
> >>>> - new computation for offsets of high IO regions
> >>>> - add comments
> >>>> ---
> >>>>  hw/arm/virt.c         | 48 +++++++++++++++++++++++++++++++++++++------
> >>>>  include/hw/arm/virt.h | 14 +++++++++----
> >>>>  2 files changed, 52 insertions(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >>>> index a1955e7764..12039a0367 100644
> >>>> --- a/hw/arm/virt.c
> >>>> +++ b/hw/arm/virt.c
> >>>> @@ -29,6 +29,7 @@
> >>>>   */
> >>>>  
> >>>>  #include "qemu/osdep.h"
> >>>> +#include "qemu/units.h"
> >>>>  #include "qapi/error.h"
> >>>>  #include "hw/sysbus.h"
> >>>>  #include "hw/arm/arm.h"
> >>>> @@ -121,7 +122,7 @@
> >>>>   * Note that devices should generally be placed at multiples of 0x10000,
> >>>>   * to accommodate guests using 64K pages.
> >>>>   */
> >>>> -static const MemMapEntry a15memmap[] = {
> >>>> +static const MemMapEntry base_memmap[] = {
> >>>>      /* Space up to 0x8000000 is reserved for a boot ROM */
> >>>>      [VIRT_FLASH] =              {          0, 0x08000000 },
> >>>>      [VIRT_CPUPERIPHS] =         { 0x08000000, 0x00020000 },
> >>>> @@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
> >>>>      [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },
> >>>>      [VIRT_PCIE_ECAM] =          { 0x3f000000, 0x01000000 },
> >>>>      [VIRT_MEM] =                { 0x40000000, RAMLIMIT_BYTES },
> >>>> +};
> >>>> +
> >>>> +/*
> >>>> + * Highmem IO Regions: This memory map is floating, located after the RAM.
> >>>> + * Each IO region offset will be dynamically computed, depending on the    
> >>> s/IO region offset/MemMapEntry base (GPA)/    
> >>>> + * top of the RAM, so that its base get the same alignment as the size,    
> >>>     
> >>>> + * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is    
> >>> s/region/entry/
> >>>     
> >>>> + * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
> >>>> + */
> >>>> +static MemMapEntry extended_memmap[] = {
> >>>>      /* Additional 64 MB redist region (can contain up to 512 redistributors) */
> >>>> -    [VIRT_HIGH_GIC_REDIST2] =   { 0x4000000000ULL, 0x4000000 },
> >>>> -    [VIRT_HIGH_PCIE_ECAM] =     { 0x4010000000ULL, 0x10000000 },
> >>>> -    /* Second PCIe window, 512GB wide at the 512GB boundary */
> >>>> -    [VIRT_HIGH_PCIE_MMIO] =     { 0x8000000000ULL, 0x8000000000ULL },
> >>>> +    [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
> >>>> +    [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
> >>>> +    /* Second PCIe window */
> >>>> +    [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
> >>>>  };
> >>>>  
> >>>>  static const int a15irqmap[] = {
> >>>> @@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
> >>>>      return arm_cpu_mp_affinity(idx, clustersz);
> >>>>  }
> >>>>  
> >>>> +static void virt_set_memmap(VirtMachineState *vms)
> >>>> +{
> >>>> +    hwaddr base;
> >>>> +    int i;
> >>>> +
> >>>> +    vms->memmap = extended_memmap;    
> >>> I probably don't see something but ...
> >>>     
> >>>> +
> >>>> +    for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
> >>>> +        vms->memmap[i] = base_memmap[i];    
> >>>
> >>> ARRAY_SIZE(base_memmap) > 3
> >>> ARRAY_SIZE(extended_memmap) == 3
> >>> as result shouldn't we observe OOB at vms->memmap[i] access
> >>> starting from i==3 ?    
> >> ARRAY_SIZE(extended_memmap) = ARRAY_SIZE(base_memmap) + 3
> >> VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST is what you miss.  
> > Yep, that's the trick.
> > It is too much subtle for my taste,
> > is it possible to make extended_memmap sizing more explicit/trivial,
> > so one could see it right away without figuring out how indexes influence array size?  
> The issue is if we explicitly size the array is will need to change the
> size whenever we add a new entry. I don't see any better solution to be
> honest. I can definitively add comments in the code about this sizing
> aspect. Another solution is to merge the 2 arrays as suggested by Heyi
> but I dislike the fact one part is initialized in one way and the other
> is initialized another way?
I'd go with dynamically allocated memmap where we would copy base map and add extended.

But considering how memmap is used throughout code (treated like a map structure,
instead of an array) it probably deserves it's own series.
For now it's fine with extra comments and maybe more explanation in commit,
so that inattentive patch reader won't have to wonder what's going on now and
later doing archaeology studies on QEMU :)

> Thanks
> 
> Eric
> 
> > 
> >   
> >> /* indices of IO regions located after the RAM */
> >> enum {
> >>     VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
> >>     VIRT_HIGH_PCIE_ECAM,
> >>     VIRT_HIGH_PCIE_MMIO,
> >> };
> >>  
> >>>     
> >>>> +    }
> >>>> +
> >>>> +    vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
> >>>> +    base = vms->high_io_base;
> >>>> +
> >>>> +    for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {    
> >>> not sure why VIRT_LOWMEMMAP_LAST is needed at all, one could just continue
> >>> with current 'i' value, provided extended_memmap wasn't corrupted by previous
> >>> loop.    
> >> Yep maybe. But I think it is less error prone like this if someone later
> >> on adds some intermediate manipulation on i.  
> >>> And does this loop ever executes? VIRT_LOWMEMMAP_LAST > ARRAY_SIZE(extended_memmap)    
> >> yes it does
> >>
> >> Thanks
> >>
> >> Eric  
> >>>     
> >>>> +        hwaddr size = extended_memmap[i].size;
> >>>> +
> >>>> +        base = ROUND_UP(base, size);
> >>>> +        vms->memmap[i].base = base;
> >>>> +        vms->memmap[i].size = size;
> >>>> +        base += size;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  static void machvirt_init(MachineState *machine)
> >>>>  {
> >>>>      VirtMachineState *vms = VIRT_MACHINE(machine);
> >>>> @@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
> >>>>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
> >>>>      bool aarch64 = true;
> >>>>  
> >>>> +    virt_set_memmap(vms);
> >>>> +
> >>>>      /* We can probe only here because during property set
> >>>>       * KVM is not available yet
> >>>>       */
> >>>> @@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
> >>>>                                      "Valid values are none and smmuv3",
> >>>>                                      NULL);
> >>>>  
> >>>> -    vms->memmap = a15memmap;
> >>>>      vms->irqmap = a15irqmap;
> >>>>  }
> >>>>  
> >>>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> >>>> index a27086d524..3dc7a6c5d5 100644
> >>>> --- a/include/hw/arm/virt.h
> >>>> +++ b/include/hw/arm/virt.h
> >>>> @@ -64,7 +64,6 @@ enum {
> >>>>      VIRT_GIC_VCPU,
> >>>>      VIRT_GIC_ITS,
> >>>>      VIRT_GIC_REDIST,
> >>>> -    VIRT_HIGH_GIC_REDIST2,
> >>>>      VIRT_SMMU,
> >>>>      VIRT_UART,
> >>>>      VIRT_MMIO,
> >>>> @@ -74,12 +73,18 @@ enum {
> >>>>      VIRT_PCIE_MMIO,
> >>>>      VIRT_PCIE_PIO,
> >>>>      VIRT_PCIE_ECAM,
> >>>> -    VIRT_HIGH_PCIE_ECAM,
> >>>>      VIRT_PLATFORM_BUS,
> >>>> -    VIRT_HIGH_PCIE_MMIO,
> >>>>      VIRT_GPIO,
> >>>>      VIRT_SECURE_UART,
> >>>>      VIRT_SECURE_MEM,
> >>>> +    VIRT_LOWMEMMAP_LAST,
> >>>> +};
> >>>> +
> >>>> +/* indices of IO regions located after the RAM */
> >>>> +enum {
> >>>> +    VIRT_HIGH_GIC_REDIST2 =  VIRT_LOWMEMMAP_LAST,
> >>>> +    VIRT_HIGH_PCIE_ECAM,
> >>>> +    VIRT_HIGH_PCIE_MMIO,
> >>>>  };
> >>>>  
> >>>>  typedef enum VirtIOMMUType {
> >>>> @@ -116,7 +121,7 @@ typedef struct {
> >>>>      int32_t gic_version;
> >>>>      VirtIOMMUType iommu;
> >>>>      struct arm_boot_info bootinfo;
> >>>> -    const MemMapEntry *memmap;
> >>>> +    MemMapEntry *memmap;
> >>>>      const int *irqmap;
> >>>>      int smp_cpus;
> >>>>      void *fdt;
> >>>> @@ -126,6 +131,7 @@ typedef struct {
> >>>>      uint32_t msi_phandle;
> >>>>      uint32_t iommu_phandle;
> >>>>      int psci_conduit;
> >>>> +    hwaddr high_io_base;
> >>>>  } VirtMachineState;
> >>>>  
> >>>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)    
> >>>
> >>>     
> > 
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine
  2019-02-22 14:39       ` Igor Mammedov
@ 2019-02-22 14:53         ` Auger Eric
  0 siblings, 0 replies; 63+ messages in thread
From: Auger Eric @ 2019-02-22 14:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,

On 2/22/19 3:39 PM, Igor Mammedov wrote:
> On Fri, 22 Feb 2019 15:01:25 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/22/19 1:45 PM, Igor Mammedov wrote:
>>> On Wed, 20 Feb 2019 23:39:54 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>   
>>>> This patch implements the machine class kvm_type() callback.
>>>> It returns the number of bits requested to implement the whole GPA
>>>> range including the RAM and IO regions located beyond.
>>>> The returned value in passed though the KVM_CREATE_VM ioctl and
>>>> this allows KVM to set the stage2 tables dynamically.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v6 -> v7:
>>>> - Introduce RAMBASE and rename add LEGACY_ prefix in that patch
>>>> - use local variables with explicit names in virt_set_memmap:
>>>>   device_memory_base, device_memory_size
>>>> - add an extended_memmap field in the class
>>>>
>>>> v5 -> v6:
>>>> - add some comments
>>>> - high IO region cannot start before 256GiB
>>>> ---
>>>>  hw/arm/virt.c         | 50 ++++++++++++++++++++++++++++++++++++++++++-
>>>>  include/hw/arm/virt.h |  2 ++
>>>>  2 files changed, 51 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>> index 9db602457b..ad3a0ad73d 100644
>>>> --- a/hw/arm/virt.c
>>>> +++ b/hw/arm/virt.c
>>>> @@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
>>>>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
>>>>      bool aarch64 = true;
>>>>  
>>>> -    virt_set_memmap(vms);
>>>> +    /*
>>>> +     * In accelerated mode, the memory map is computed in kvm_type(),
>>>> +     * if set, to create a VM with the right number of IPA bits.
>>>> +     */
>>>> +
>>>> +    if (!mc->kvm_type || !kvm_enabled()) {
>>>> +        virt_set_memmap(vms);
>>>> +    }
>>>>  
>>>>      /* We can probe only here because during property set
>>>>       * KVM is not available yet
>>>> @@ -1814,6 +1821,36 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
>>>>      return NULL;
>>>>  }
>>>>  
>>>> +/*
>>>> + * for arm64 kvm_type [7-0] encodes the requested number of bits
>>>> + * in the IPA address space
>>>> + */
>>>> +static int virt_kvm_type(MachineState *ms, const char *type_str)
>>>> +{
>>>> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>>>> +    int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
>>>> +    int requested_pa_size;
>>>> +
>>>> +    /* we freeze the memory map to compute the highest gpa */
>>>> +    virt_set_memmap(vms);
>>>> +
>>>> +    requested_pa_size = 64 - clz64(vms->highest_gpa);
>>>> +
>>>> +    if (requested_pa_size > max_vm_pa_size) {
>>>> +        error_report("-m and ,maxmem option values "
>>>> +                     "require an IPA range (%d bits) larger than "
>>>> +                     "the one supported by the host (%d bits)",
>>>> +                     requested_pa_size, max_vm_pa_size);
>>>> +       exit(1);
>>>> +    }
>>>> +    /*
>>>> +     * By default we return 0 which corresponds to an implicit legacy
>>>> +     * 40b IPA setting. Otherwise we return the actual requested PA
>>>> +     * logsize
>>>> +     */
>>>> +    return requested_pa_size > 40 ? requested_pa_size : 0;
>>>> +}
>>>> +
>>>>  static void virt_machine_class_init(ObjectClass *oc, void *data)
>>>>  {
>>>>      MachineClass *mc = MACHINE_CLASS(oc);
>>>> @@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>>>>      mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>>>>      mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
>>>>      mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
>>>> +    mc->kvm_type = virt_kvm_type;
>>>>      assert(!mc->get_hotplug_handler);
>>>>      mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
>>>>      hc->plug = virt_machine_device_plug_cb;
>>>> @@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
>>>>                                      "Valid values are none and smmuv3",
>>>>                                      NULL);
>>>>  
>>>> +    if (vmc->no_extended_memmap) {
>>>> +        vms->extended_memmap = false;
>>>> +    } else {
>>>> +        vms->extended_memmap = true;
>>>> +    }
>>>> +
>>>>      vms->irqmap = a15irqmap;
>>>>  }
>>>>  
>>>> @@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
>>>>  
>>>>  static void virt_machine_3_1_options(MachineClass *mc)
>>>>  {
>>>> +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
>>>>      virt_machine_4_0_options(mc);
>>>>      compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
>>>> +
>>>> +    /* extended memory map is enabled from 4.0 onwards */
>>>> +    vmc->no_extended_memmap = true;  
>>> That's probably was asked in v6,
>>> do we really need this knob?  
>> Yes the point was raised by Peter and I replied by another question ;-) "
>> But don't we want to forbid any pre-4.0 machvirt to run with more than
>> 255GiB RAM?
>> "
>> without this knob:
>> - pre-4.0 machines will gain the capability to support more than 255GB
>> initial RAM if the kernel supports dynamic IPA setting
> should be fine, you shouldn't be able to start old QEMU with more than 255Gb
> 
>> - pre-4.0 machines will gain PCDIMM/NVDIMM support
> ditto
> 
>> - another concern, maxmem and slots were not checked previously. If for
>> some reason - without instantiating the actual slots -, the user
>> specified it this was previously ignored. Now this is not anymore as
>> both parameters allow to compute the requested IPA range. So this may
>> fail now.
> well that's in category of a broken setup even if QEMU didn't complain about
> it before. User should fix it instead of QEMU supporting madness.
> (I think that such invariant doesn't even deserve deprecation process)

OK. So I will remove the knob in the next release.

thanks

Eric
> 
>> So I thought this was clearer to disable all the above for pre-4.0
>> machines. However if both of you agree, I will remove it.
>>
>> thanks
>>
>> Eric
>>>   
>>>>  }
>>>>  DEFINE_VIRT_MACHINE(3, 1)
>>>>  
>>>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>>>> index acad0400d8..7798462cb0 100644
>>>> --- a/include/hw/arm/virt.h
>>>> +++ b/include/hw/arm/virt.h
>>>> @@ -106,6 +106,7 @@ typedef struct {
>>>>      bool claim_edge_triggered_timers;
>>>>      bool smbios_old_sys_ver;
>>>>      bool no_highmem_ecam;
>>>> +    bool no_extended_memmap;
>>>>  } VirtMachineClass;
>>>>  
>>>>  typedef struct {
>>>> @@ -135,6 +136,7 @@ typedef struct {
>>>>      hwaddr highest_gpa;
>>>>      hwaddr device_memory_base;
>>>>      hwaddr device_memory_size;
>>>> +    bool extended_memmap;
>>>>  } VirtMachineState;
>>>>  
>>>>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)  
>>>   
>>
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory
  2019-02-22 14:15     ` Auger Eric
@ 2019-02-22 14:58       ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 14:58 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

On Fri, 22 Feb 2019 15:15:36 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 2:48 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:59 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> The device memory region is located after the initial RAM.
> >> its start/size are 1GB aligned.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> >>
> >> ---
> >> v6 -> v7:
> >> - check the device memory top does not wrap
> >> - check the device memory can fit the slots
> >>
> >> v4 -> v5:
> >> - device memory set after the initial RAM
> >>
> >> v3 -> v4:
> >> - remove bootinfo.device_memory_start/device_memory_size
> >> - rename VIRT_HOTPLUG_MEM into VIRT_DEVICE_MEM
> >> ---
> >>  hw/arm/virt.c | 33 +++++++++++++++++++++++++++++++++
> >>  1 file changed, 33 insertions(+)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 470ca0ce2d..33ad9b3f63 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -62,6 +62,7 @@
> >>  #include "target/arm/internals.h"
> >>  #include "hw/mem/pc-dimm.h"
> >>  #include "hw/mem/nvdimm.h"
> >> +#include "hw/acpi/acpi.h"
> >>  
> >>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
> >>      static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> >> @@ -1263,6 +1264,34 @@ static void create_secure_ram(VirtMachineState *vms,
> >>      g_free(nodename);
> >>  }
> >>  
> >> +static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
> >> +{
> >> +    MachineState *ms = MACHINE(vms);
> >> +
> >> +    if (!vms->device_memory_size) {  
> > having vms->device_memory_size seems like duplicate, why not to reuse
> > memory_region_size(MachineState::device_memory::mr) like we do elsewhere.  
> OK so you mean allocating the ms->device_memory in virt_setmmap? In that
> case, I wonder if all that code shouldn't land in set_memmap directly?
> > 
> > Also it would be better to all device_memory allocation/initialization
> > compact and close to each other like it's done in pc/spapr.  
> I am not sure I get your point here. alloc & init are done in this
> function. Or do you mean I should rather move that into set_memmap?
I don't have a preference where to put it, could be here or in set_memmap,
what I do care is to keep all device_memory related parts together
(like calculating device_memory base and size and allocating/initializing
structure itself, it's much easier to read when one doesn't have to jump
around code and could see all related code in one place)

> 
> Thanks
> 
> Eric
> >   
> >> +        return;
> >> +    }
> >> +
> >> +    if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
> >> +        error_report("unsupported number of memory slots: %"PRIu64,
> >> +                     ms->ram_slots);
> >> +        exit(EXIT_FAILURE);
> >> +    }
> >> +
> >> +    if (QEMU_ALIGN_UP(ms->maxram_size, GiB) != ms->maxram_size) {
> >> +        error_report("maximum memory size must be GiB aligned");
> >> +        exit(EXIT_FAILURE);
> >> +    }
> >> +
> >> +    ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
> >> +    ms->device_memory->base = vms->device_memory_base;
> >> +
> >> +    memory_region_init(&ms->device_memory->mr, OBJECT(vms),
> >> +                       "device-memory", vms->device_memory_size);
> >> +    memory_region_add_subregion(sysmem, ms->device_memory->base,
> >> +                                &ms->device_memory->mr);
> >> +}
> >> +
> >>  static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
> >>  {
> >>      const VirtMachineState *board = container_of(binfo, VirtMachineState,
> >> @@ -1610,6 +1639,10 @@ static void machvirt_init(MachineState *machine)
> >>                                           machine->ram_size);
> >>      memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
> >>  
> >> +    if (vms->extended_memmap) {
> >> +        create_device_memory(vms, sysmem);
> >> +    }
> >> +
> >>      create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
> >>  
> >>      create_gic(vms, pic);  
> >   
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size Eric Auger
@ 2019-02-22 15:28   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 15:28 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, drjones, dgilbert, david

On Wed, 20 Feb 2019 23:40:00 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> From: Kwangwoo Lee <kwangwoo.lee@sk.com>
> 
> This patch uses configurable IO base and size to create NPIO AML for
s/uses ... /makes IO base and size configurable/

> ACPI NFIT. Since a different architecture like AArch64 does not use
> port-mapped IO, a configurable IO base is required to create correct
> mapping of ACPI IO address and size.
> 
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v6 -> v7:
> - Use NvdimmDsmIO constant
> - use AcpiGenericAddress instead of AcpiNVDIMMIOEntry
> 
> v2 -> v3:
> - s/size/len in pc_piix.c and pc_q35.c
> ---
>  hw/acpi/nvdimm.c        | 31 ++++++++++++++++++++++---------
>  hw/i386/pc_piix.c       |  6 +++++-
>  hw/i386/pc_q35.c        |  6 +++++-
>  include/hw/mem/nvdimm.h |  4 ++++
>  4 files changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index e53b2cb681..fddc790945 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -33,6 +33,9 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/mem/nvdimm.h"
>  
> +const struct AcpiGenericAddress NvdimmDsmIO = { .space_id = AML_AS_SYSTEM_IO,
> +        .bit_width = NVDIMM_ACPI_IO_LEN << 3, .address = NVDIMM_ACPI_IO_BASE};
it's target specific data, so I'd put this part in x86 specific place
hw/i386/acpi-build.[ch] or pc.[ch]
and initialize it like you don in the next patch for virt-arm

maybe add pc_/x86_ prefix to variable name and also we don't do CamelCase
with variable names if I recall correctly.


> +
>  static int nvdimm_device_list(Object *obj, void *opaque)
>  {
>      GSList **list = opaque;
> @@ -929,8 +932,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
>                              FWCfgState *fw_cfg, Object *owner)
>  {
>      memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
> -                          "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
> -    memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
> +                          "nvdimm-acpi-io", state->dsm_io.bit_width >> 3);
> +    memory_region_add_subregion(io, state->dsm_io.address, &state->io_mr);
>  
>      state->dsm_mem = g_array_new(false, true /* clear */, 1);
>      acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
> @@ -959,12 +962,14 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
>  
>  #define NVDIMM_QEMU_RSVD_UUID   "648B9CF2-CDA1-4312-8AD9-49C4AF32BD62"
>  
> -static void nvdimm_build_common_dsm(Aml *dev)
> +static void nvdimm_build_common_dsm(Aml *dev,
> +                                    AcpiNVDIMMState *acpi_nvdimm_state)
>  {
>      Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem, *elsectx2;
>      Aml *elsectx, *unsupport, *unpatched, *expected_uuid, *uuid_invalid;
>      Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf, *dsm_out_buf_size;
>      uint8_t byte_list[1];
> +    AmlRegionSpace rs;
>  
>      method = aml_method(NVDIMM_COMMON_DSM, 5, AML_SERIALIZED);
>      uuid = aml_arg(0);
> @@ -975,9 +980,16 @@ static void nvdimm_build_common_dsm(Aml *dev)
>  
>      aml_append(method, aml_store(aml_name(NVDIMM_ACPI_MEM_ADDR), dsm_mem));
>  
> +    if (acpi_nvdimm_state->dsm_io.space_id == AML_AS_SYSTEM_IO) {
> +        rs = AML_SYSTEM_IO;
> +    } else {
> +        rs = AML_SYSTEM_MEMORY;
> +    }
> +
>      /* map DSM memory and IO into ACPI namespace. */
> -    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, AML_SYSTEM_IO,
> -               aml_int(NVDIMM_ACPI_IO_BASE), NVDIMM_ACPI_IO_LEN));
> +    aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, rs,
> +               aml_int(acpi_nvdimm_state->dsm_io.address),
> +               acpi_nvdimm_state->dsm_io.bit_width >> 3));
>      aml_append(method, aml_operation_region(NVDIMM_DSM_MEMORY,
>                 AML_SYSTEM_MEMORY, dsm_mem, sizeof(NvdimmDsmIn)));
>  
> @@ -1260,7 +1272,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
>  }
>  
>  static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
> -                              BIOSLinker *linker, GArray *dsm_dma_arrea,
> +                              BIOSLinker *linker,
> +                              AcpiNVDIMMState *acpi_nvdimm_state,
>                                uint32_t ram_slots)
>  {
>      Aml *ssdt, *sb_scope, *dev;
> @@ -1288,7 +1301,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
>       */
>      aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
>  
> -    nvdimm_build_common_dsm(dev);
> +    nvdimm_build_common_dsm(dev, acpi_nvdimm_state);
>  
>      /* 0 is reserved for root device. */
>      nvdimm_build_device_dsm(dev, 0);
> @@ -1307,7 +1320,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
>                                                 NVDIMM_ACPI_MEM_ADDR);
>  
>      bios_linker_loader_alloc(linker,
> -                             NVDIMM_DSM_MEM_FILE, dsm_dma_arrea,
> +                             NVDIMM_DSM_MEM_FILE, acpi_nvdimm_state->dsm_mem,
>                               sizeof(NvdimmDsmIn), false /* high memory */);
>      bios_linker_loader_add_pointer(linker,
>          ACPI_BUILD_TABLE_FILE, mem_addr_offset, sizeof(uint32_t),
> @@ -1329,7 +1342,7 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>          return;
>      }
>  
> -    nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
> +    nvdimm_build_ssdt(table_offsets, table_data, linker, state,
>                        ram_slots);
>  
>      device_list = nvdimm_get_device_list();
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index fd0f2c268f..d0a262d106 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -298,7 +298,11 @@ static void pc_init1(MachineState *machine,
>      }
>  
>      if (pcms->acpi_nvdimm_state.is_enabled) {
> -        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
> +        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
> +
> +        acpi_nvdimm_state->dsm_io = NvdimmDsmIO;
> +
> +        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io, 
>                                 pcms->fw_cfg, OBJECT(pcms));
I dislike split initialization, suggest to pass AcpiGenericAddress
as an argument to nvdimm_init_acpi_state() and assign it to
acpi_nvdimm_state inside, so we could see acpi_nvdimm_state
initialization in one place (almost all).

>      }
>  }
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 4a175ea50e..21f594001f 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -330,7 +330,11 @@ static void pc_q35_init(MachineState *machine)
>      pc_nic_init(pcmc, isa_bus, host_bus);
>  
>      if (pcms->acpi_nvdimm_state.is_enabled) {
> -        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, system_io,
> +        AcpiNVDIMMState *acpi_nvdimm_state = &pcms->acpi_nvdimm_state;
> +
> +        acpi_nvdimm_state->dsm_io = NvdimmDsmIO;
> +
> +        nvdimm_init_acpi_state(acpi_nvdimm_state, system_io,
>                                 pcms->fw_cfg, OBJECT(pcms));
>      }
>  }
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index c5c9b3c7f8..ead51d958d 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -25,6 +25,7 @@
>  
>  #include "hw/mem/pc-dimm.h"
>  #include "hw/acpi/bios-linker-loader.h"
> +#include "hw/acpi/aml-build.h"
>  
>  #define NVDIMM_DEBUG 0
>  #define nvdimm_debug(fmt, ...)                                \
> @@ -123,6 +124,8 @@ struct NvdimmFitBuffer {
>  };
>  typedef struct NvdimmFitBuffer NvdimmFitBuffer;
>  
> +extern const struct AcpiGenericAddress NvdimmDsmIO;
> +
>  struct AcpiNVDIMMState {
>      /* detect if NVDIMM support is enabled. */
>      bool is_enabled;
> @@ -140,6 +143,7 @@ struct AcpiNVDIMMState {
>       */
>      int32_t persistence;
>      char    *persistence_string;
> +    struct AcpiGenericAddress dsm_io;
>  };
>  typedef struct AcpiNVDIMMState AcpiNVDIMMState;
>  

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
@ 2019-02-22 15:36   ` Igor Mammedov
  0 siblings, 0 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 15:36 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, drjones, dgilbert, david

On Wed, 20 Feb 2019 23:40:01 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> From: Kwangwoo Lee <kwangwoo.lee@sk.com>
> 
> Pre-plug and plug handlers are prepared for NVDIMM support.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Kwangwoo Lee <kwangwoo.lee@sk.com>
> ---
>  default-configs/arm-softmmu.mak |  2 ++
>  hw/arm/virt-acpi-build.c        |  6 ++++++
>  hw/arm/virt.c                   | 22 ++++++++++++++++++++++
>  include/hw/arm/virt.h           |  3 +++
>  4 files changed, 33 insertions(+)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 0a78421f72..03dbebb197 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -165,3 +165,5 @@ CONFIG_HIGHBANK=y
>  CONFIG_MUSICPAL=y
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
> +CONFIG_NVDIMM=y
> +CONFIG_ACPI_NVDIMM=y
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 781eafaf5e..f086adfa82 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -784,6 +784,7 @@ static
>  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>  {
>      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
> +    MachineState *ms = MACHINE(vms);
>      GArray *table_offsets;
>      unsigned dsdt, xsdt;
>      GArray *tables_blob = tables->table_data;
> @@ -824,6 +825,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>          }
>      }
>  
> +    if (vms->acpi_nvdimm_state.is_enabled) {
> +        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
> +                          &vms->acpi_nvdimm_state, ms->ram_slots);
> +    }
> +
>      if (its_class_name() && !vmc->no_its) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_iort(tables_blob, tables->linker, vms);
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 33ad9b3f63..1896920570 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -134,6 +134,7 @@ static const MemMapEntry base_memmap[] = {
>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>      [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
>      [VIRT_SMMU] =               { 0x09050000, 0x00020000 },
> +    [VIRT_ACPI_IO] =            { 0x09070000, 0x00010000 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
>      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
> @@ -1675,6 +1676,18 @@ static void machvirt_init(MachineState *machine)
>  
>      create_platform_bus(vms, pic);
>  
> +    if (vms->acpi_nvdimm_state.is_enabled) {
> +        AcpiNVDIMMState *acpi_nvdimm_state = &vms->acpi_nvdimm_state;
> +
> +        acpi_nvdimm_state->dsm_io.space_id = AML_AS_SYSTEM_MEMORY;
> +        acpi_nvdimm_state->dsm_io.address =
> +                vms->memmap[VIRT_ACPI_IO].base + NVDIMM_ACPI_IO_BASE;
> +        acpi_nvdimm_state->dsm_io.bit_width = NVDIMM_ACPI_IO_LEN << 3;
I'd prefer following style (well at least that's direction I try to push
towards to when dealing with ACPI)

           const AcpiGenericAddress dsmio = {
               .space_id = AML_AS_SYSTEM_MEMORY,
               .address = vms->memmap[VIRT_ACPI_IO].base + NVDIMM_ACPI_IO_BASE,
               .bit_width = NVDIMM_ACPI_IO_LEN << 3
           };

           
           nvdimm_init_acpi_state(&vms->acpi_nvdimm_state, sysmem, &dsmio,
                                  vms->fw_cfg, OBJECT(vms),);

> +
> +        nvdimm_init_acpi_state(acpi_nvdimm_state, sysmem,
> +                               vms->fw_cfg, OBJECT(vms));
> +    }
> +
>      vms->bootinfo.ram_size = machine->ram_size;
>      vms->bootinfo.kernel_filename = machine->kernel_filename;
>      vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
> @@ -1860,10 +1873,19 @@ static void virt_memory_plug(HotplugHandler *hotplug_dev,
>                               DeviceState *dev, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +    bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>      Error *local_err = NULL;
>  
>      pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), &local_err);
> +    if (local_err) {
> +        goto out;
> +    }
>  
> +    if (is_nvdimm) {
> +        nvdimm_plug(&vms->acpi_nvdimm_state);
> +    }
> +
> +out:
>      error_propagate(errp, local_err);
>  }
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 7798462cb0..bd9cf68311 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -37,6 +37,7 @@
>  #include "hw/arm/arm.h"
>  #include "sysemu/kvm.h"
>  #include "hw/intc/arm_gicv3_common.h"
> +#include "hw/mem/nvdimm.h"
>  
>  #define NUM_GICV2M_SPIS       64
>  #define NUM_VIRTIO_TRANSPORTS 32
> @@ -77,6 +78,7 @@ enum {
>      VIRT_GPIO,
>      VIRT_SECURE_UART,
>      VIRT_SECURE_MEM,
> +    VIRT_ACPI_IO,
>      VIRT_LOWMEMMAP_LAST,
>  };
>  
> @@ -137,6 +139,7 @@ typedef struct {
>      hwaddr device_memory_base;
>      hwaddr device_memory_size;
>      bool extended_memmap;
> +    AcpiNVDIMMState acpi_nvdimm_state;
>  } VirtMachineState;
>  
>  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options
  2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
@ 2019-02-22 15:48   ` Igor Mammedov
  2019-02-22 15:57     ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 15:48 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, dgilbert, david, drjones,
	Shivaprasad G Bhat

On Wed, 20 Feb 2019 23:40:03 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> Machine option nvdimm allows to turn NVDIMM support on.

With virt-arm and spapr[1] trying to add nvdimm, I think it's time
to generalize and move acpi_nvdimm_state to generic Machine
instead of duplicating the same code in several machines.

1) [RFC PATCH 3/4] spapr: Add NVDIMM device support

> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
[...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options
  2019-02-22 15:48   ` Igor Mammedov
@ 2019-02-22 15:57     ` Auger Eric
  0 siblings, 0 replies; 63+ messages in thread
From: Auger Eric @ 2019-02-22 15:57 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, Shivaprasad G Bhat, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,

On 2/22/19 4:48 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:40:03 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> Machine option nvdimm allows to turn NVDIMM support on.
> 
> With virt-arm and spapr[1] trying to add nvdimm, I think it's time
> to generalize and move acpi_nvdimm_state to generic Machine
> instead of duplicating the same code in several machines.

OK I will go this way

Thanks a lot for your time!

Best Regards

Eric
> 
> 1) [RFC PATCH 3/4] spapr: Add NVDIMM device support
> 
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
> [...]
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
                   ` (17 preceding siblings ...)
  2019-02-20 22:46 ` [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Auger Eric
@ 2019-02-22 16:27 ` Igor Mammedov
  2019-02-22 17:35   ` Auger Eric
  18 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-22 16:27 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	shameerali.kolothum.thodi, david, drjones, dgilbert, david

On Wed, 20 Feb 2019 23:39:46 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> This series aims to bump the 255GB RAM limit in machvirt and to
> support device memory in general, and especially PCDIMM/NVDIMM.
> 
> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> grow up to 255GB. From 256GB onwards we find IO regions such as the
> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> MMIO region. The address map was 1TB large. This corresponded to
> the max IPA capacity KVM was able to manage.
> 
> Since 4.20, the host kernel is able to support a larger and dynamic
> IPA range. So the guest physical address can go beyond the 1TB. The
> max GPA size depends on the host kernel configuration and physical CPUs.
> 
> In this series we use this feature and allow the RAM to grow without
> any other limit than the one put by the host kernel.
> 
> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> ram_size and then comes the device memory (,maxmem) of size
> maxram_size - ram_size. The device memory is potentially hotpluggable
> depending on the instantiated memory objects.
> 
> IO regions previously located between 256GB and 1TB are moved after
> the RAM. Their offset is dynamically computed, depends on ram_size
> and maxram_size. Size alignment is enforced.
> 
> In case maxmem value is inferior to 255GB, the legacy memory map
> still is used. The change of memory map becomes effective from 4.0
> onwards.
> 
> As we keep the initial RAM at 1GB base address, we do not need to do
> invasive changes in the EDK2 FW. It seems nobody is eager to do
> that job at the moment.
> 
> Device memory being put just after the initial RAM, it is possible
> to get access to this feature while keeping a 1TB address map.
> 
> This series reuses/rebases patches initially submitted by Shameer
> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> 
> Functionally, the series is split into 3 parts:
> 1) bump of the initial RAM limit [1 - 9] and change in
>    the memory map

> 2) Support of PC-DIMM [10 - 13]
Is this part complete ACPI wise (for coldplug)? I haven't noticed
DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
visible to the guest. It might be that DT is masking problem
but well, that won't work on ACPI only guests.

Even though I've tried make mem hotplug ACPI parts not x86 specific,
I'm afraid it might be tightly coupled with hotplug support.
So here are 2 options make DSDT part work without hotplug or
implement hotplug here. I think the former is just a waste of time
and we should just add hotplug. It should take relatively minor effort
since you already implemented most of boiler plate here.

As for how to implement ACPI HW part, I suggest to borrow GED
device that NEMU guys trying to use instead of GPIO route,
like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
So that it would be easier to share this with their virt-x86
machine eventually.


> 3) Support of NV-DIMM [14 - 17]
The same might be true for NUMA but I haven't dug this deep in to
that part.

> 
> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> 
> Work is ongoing to transform the whole memory as device memory.
> However this move is not trivial and to me, is independent on
> the improvements brought by this series:
> - if we were to use DIMM for initial RAM, those DIMMs would use
>   use slots. Although they would not be part of the ones provided
>   using the ",slots" options, they are ACPI limited resources.
> - DT and ACPI description needs to be reworked
> - NUMA integration needs special care
> - a special device memory object may be required to avoid consuming
>   slots and easing the FW description.
> 
> So I preferred to separate the concerns. This new implementation
> based on device memory could be candidate for another virt
> version.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> 
> History:
> 
> v6 -> v7:
> - Addressed Peter and Igor comments (exceptions sent my email)
> - Fixed TCG case. Now device memory works also for TCG and vcpu
>   pamax is checked
> - See individual logs for more details
> 
> v5 -> v6:
> - mingw compilation issue fix
> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>   IPA bits
> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>   of "hw/arm/virt: Split the memory map description"
> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>   squashed into the previous patch
> - change alignment of IO regions beyond the RAM so that it matches their
>   size
> 
> v4 -> v5:
> - change in the memory map
> - see individual logs
> 
> v3 -> v4:
> - rebase on David's "pc-dimm: next bunch of cleanups" and
>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> - kvm-type option not used anymore. We directly use
>   maxram_size and ram_size machine fields to compute the
>   MAX IPA range. Migration is naturally handled as CLI
>   option are kept between source and destination. This was
>   suggested by David.
> - device_memory_start and device_memory_size not stored
>   anymore in vms->bootinfo
> - I did not take into account 2 Igor's comments: the one
>   related to the refactoring of arm_load_dtb and the one
>   related to the generation of the dtb after system_reset
>   which would contain nodes of hotplugged devices (we do
>   not support hotplug at this stage)
> - check the end-user does not attempt to hotplug a device
> - addition of "vl: Set machine ram_size, maxram_size and
>   ram_slots earlier"
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (12):
>   hw/arm/virt: Rename highmem IO regions
>   hw/arm/virt: Split the memory map description
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_ipa_size
>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>   hw/arm/virt: Bump the 255GB initial RAM limit
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/virt: Allocate device_memory
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (3):
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c             |   2 +-
>  default-configs/arm-softmmu.mak |   4 +
>  hw/acpi/nvdimm.c                |  31 ++-
>  hw/arm/boot.c                   | 136 ++++++++++--
>  hw/arm/virt-acpi-build.c        |  23 +-
>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c               |   6 +-
>  hw/i386/pc_q35.c                |   6 +-
>  hw/ppc/mac_newworld.c           |   3 +-
>  hw/ppc/mac_oldworld.c           |   2 +-
>  hw/ppc/spapr.c                  |   2 +-
>  include/hw/arm/virt.h           |  24 ++-
>  include/hw/boards.h             |   5 +-
>  include/hw/mem/nvdimm.h         |   4 +
>  target/arm/kvm.c                |  10 +
>  target/arm/kvm_arm.h            |  13 ++
>  vl.c                            |   6 +-
>  17 files changed, 556 insertions(+), 85 deletions(-)
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-22 16:27 ` Igor Mammedov
@ 2019-02-22 17:35   ` Auger Eric
  2019-02-25  9:42     ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-22 17:35 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,

On 2/22/19 5:27 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:46 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> This series aims to bump the 255GB RAM limit in machvirt and to
>> support device memory in general, and especially PCDIMM/NVDIMM.
>>
>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>> MMIO region. The address map was 1TB large. This corresponded to
>> the max IPA capacity KVM was able to manage.
>>
>> Since 4.20, the host kernel is able to support a larger and dynamic
>> IPA range. So the guest physical address can go beyond the 1TB. The
>> max GPA size depends on the host kernel configuration and physical CPUs.
>>
>> In this series we use this feature and allow the RAM to grow without
>> any other limit than the one put by the host kernel.
>>
>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>> ram_size and then comes the device memory (,maxmem) of size
>> maxram_size - ram_size. The device memory is potentially hotpluggable
>> depending on the instantiated memory objects.
>>
>> IO regions previously located between 256GB and 1TB are moved after
>> the RAM. Their offset is dynamically computed, depends on ram_size
>> and maxram_size. Size alignment is enforced.
>>
>> In case maxmem value is inferior to 255GB, the legacy memory map
>> still is used. The change of memory map becomes effective from 4.0
>> onwards.
>>
>> As we keep the initial RAM at 1GB base address, we do not need to do
>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>> that job at the moment.
>>
>> Device memory being put just after the initial RAM, it is possible
>> to get access to this feature while keeping a 1TB address map.
>>
>> This series reuses/rebases patches initially submitted by Shameer
>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>
>> Functionally, the series is split into 3 parts:
>> 1) bump of the initial RAM limit [1 - 9] and change in
>>    the memory map
> 
>> 2) Support of PC-DIMM [10 - 13]
> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> visible to the guest. It might be that DT is masking problem
> but well, that won't work on ACPI only guests.

guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
added with the DIMM slots. So it looks fine to me. Isn't E820 a pure x86
matter? What else would you expect in the dsdt? I understand hotplug
would require extra modifications but I don't see anything else missing
for coldplug.
> Even though I've tried make mem hotplug ACPI parts not x86 specific,
> I'm afraid it might be tightly coupled with hotplug support.
> So here are 2 options make DSDT part work without hotplug or
> implement hotplug here. I think the former is just a waste of time
> and we should just add hotplug. It should take relatively minor effort
> since you already implemented most of boiler plate here.

Shameer sent an RFC series for supporting hotplug.

[RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
https://patchwork.kernel.org/cover/10783589/

I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
OK, even after system_reset.

Note the hotplug kernel support on ARM is very recent. I would prefer to
dissociate both efforts if we want to get a chance making coldplug for
4.0. Also we have an issue for NVDIMM since on reboot the guest does not
boot properly.

> 
> As for how to implement ACPI HW part, I suggest to borrow GED
> device that NEMU guys trying to use instead of GPIO route,
> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> So that it would be easier to share this with their virt-x86
> machine eventually.
Sounds like a different approach than the one initiated by Shameer?

Thanks

Eric
> 
> 
>> 3) Support of NV-DIMM [14 - 17]
> The same might be true for NUMA but I haven't dug this deep in to
> that part.
> 
>>
>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>
>> Work is ongoing to transform the whole memory as device memory.
>> However this move is not trivial and to me, is independent on
>> the improvements brought by this series:
>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>   use slots. Although they would not be part of the ones provided
>>   using the ",slots" options, they are ACPI limited resources.
>> - DT and ACPI description needs to be reworked
>> - NUMA integration needs special care
>> - a special device memory object may be required to avoid consuming
>>   slots and easing the FW description.
>>
>> So I preferred to separate the concerns. This new implementation
>> based on device memory could be candidate for another virt
>> version.
>>
>> Best Regards
>>
>> Eric
>>
>> References:
>>
>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>> http://patchwork.ozlabs.org/cover/914694/
>>
>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>
>> History:
>>
>> v6 -> v7:
>> - Addressed Peter and Igor comments (exceptions sent my email)
>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>   pamax is checked
>> - See individual logs for more details
>>
>> v5 -> v6:
>> - mingw compilation issue fix
>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>   IPA bits
>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>   of "hw/arm/virt: Split the memory map description"
>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>   squashed into the previous patch
>> - change alignment of IO regions beyond the RAM so that it matches their
>>   size
>>
>> v4 -> v5:
>> - change in the memory map
>> - see individual logs
>>
>> v3 -> v4:
>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>> - kvm-type option not used anymore. We directly use
>>   maxram_size and ram_size machine fields to compute the
>>   MAX IPA range. Migration is naturally handled as CLI
>>   option are kept between source and destination. This was
>>   suggested by David.
>> - device_memory_start and device_memory_size not stored
>>   anymore in vms->bootinfo
>> - I did not take into account 2 Igor's comments: the one
>>   related to the refactoring of arm_load_dtb and the one
>>   related to the generation of the dtb after system_reset
>>   which would contain nodes of hotplugged devices (we do
>>   not support hotplug at this stage)
>> - check the end-user does not attempt to hotplug a device
>> - addition of "vl: Set machine ram_size, maxram_size and
>>   ram_slots earlier"
>>
>> v2 -> v3:
>> - fix pc_q35 and pc_piix compilation error
>> - kwangwoo's email being not valid anymore, remove his address
>>
>> v1 -> v2:
>> - kvm_get_max_vm_phys_shift moved in arch specific file
>> - addition of NVDIMM part
>> - single series
>> - rebase on David's refactoring
>>
>> v1:
>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>
>> Best Regards
>>
>> Eric
>>
>>
>> Eric Auger (12):
>>   hw/arm/virt: Rename highmem IO regions
>>   hw/arm/virt: Split the memory map description
>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>   hw/arm/virt: Add memory hotplug framework
>>   hw/arm/virt: Allocate device_memory
>>   hw/arm/boot: Expose the pmem nodes in the DT
>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>
>> Kwangwoo Lee (2):
>>   nvdimm: use configurable ACPI IO base and size
>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>
>> Shameer Kolothum (3):
>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>
>>  accel/kvm/kvm-all.c             |   2 +-
>>  default-configs/arm-softmmu.mak |   4 +
>>  hw/acpi/nvdimm.c                |  31 ++-
>>  hw/arm/boot.c                   | 136 ++++++++++--
>>  hw/arm/virt-acpi-build.c        |  23 +-
>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>  hw/i386/pc_piix.c               |   6 +-
>>  hw/i386/pc_q35.c                |   6 +-
>>  hw/ppc/mac_newworld.c           |   3 +-
>>  hw/ppc/mac_oldworld.c           |   2 +-
>>  hw/ppc/spapr.c                  |   2 +-
>>  include/hw/arm/virt.h           |  24 ++-
>>  include/hw/boards.h             |   5 +-
>>  include/hw/mem/nvdimm.h         |   4 +
>>  target/arm/kvm.c                |  10 +
>>  target/arm/kvm_arm.h            |  13 ++
>>  vl.c                            |   6 +-
>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-22 17:35   ` Auger Eric
@ 2019-02-25  9:42     ` Igor Mammedov
  2019-02-25 10:13       ` Shameerali Kolothum Thodi
  2019-02-26  8:40       ` Auger Eric
  0 siblings, 2 replies; 63+ messages in thread
From: Igor Mammedov @ 2019-02-25  9:42 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

On Fri, 22 Feb 2019 18:35:26 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:46 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> > 
> >> This series aims to bump the 255GB RAM limit in machvirt and to
> >> support device memory in general, and especially PCDIMM/NVDIMM.
> >>
> >> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >> MMIO region. The address map was 1TB large. This corresponded to
> >> the max IPA capacity KVM was able to manage.
> >>
> >> Since 4.20, the host kernel is able to support a larger and dynamic
> >> IPA range. So the guest physical address can go beyond the 1TB. The
> >> max GPA size depends on the host kernel configuration and physical CPUs.
> >>
> >> In this series we use this feature and allow the RAM to grow without
> >> any other limit than the one put by the host kernel.
> >>
> >> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >> ram_size and then comes the device memory (,maxmem) of size
> >> maxram_size - ram_size. The device memory is potentially hotpluggable
> >> depending on the instantiated memory objects.
> >>
> >> IO regions previously located between 256GB and 1TB are moved after
> >> the RAM. Their offset is dynamically computed, depends on ram_size
> >> and maxram_size. Size alignment is enforced.
> >>
> >> In case maxmem value is inferior to 255GB, the legacy memory map
> >> still is used. The change of memory map becomes effective from 4.0
> >> onwards.
> >>
> >> As we keep the initial RAM at 1GB base address, we do not need to do
> >> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >> that job at the moment.
> >>
> >> Device memory being put just after the initial RAM, it is possible
> >> to get access to this feature while keeping a 1TB address map.
> >>
> >> This series reuses/rebases patches initially submitted by Shameer
> >> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>
> >> Functionally, the series is split into 3 parts:
> >> 1) bump of the initial RAM limit [1 - 9] and change in
> >>    the memory map
> > 
> >> 2) Support of PC-DIMM [10 - 13]
> > Is this part complete ACPI wise (for coldplug)? I haven't noticed
> > DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> > visible to the guest. It might be that DT is masking problem
> > but well, that won't work on ACPI only guests.
> 
> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> added with the DIMM slots.
Question is how does it get there? Does it come from DT or from firmware
via UEFI interfaces?

> So it looks fine to me. Isn't E820 a pure x86 matter?
sorry for misleading, I've meant is UEFI GetMemoryMap().
On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
via UEFI GetMemoryMap() as guest kernel might start using it as normal
memory early at boot and later put that memory into zone normal and hence
make it non-hot-un-pluggable. The same concerns apply to DT based means
of discovery.
(That's guest issue but it's easy to workaround it not putting hotpluggable
memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
That way memory doesn't get (ab)used by firmware or early boot kernel stages
and doesn't get locked up.

> What else would you expect in the dsdt?
Memory device descriptions, look for code that adds PNP0C80 with _CRS
describing memory ranges

> I understand hotplug
> would require extra modifications but I don't see anything else missing
> for coldplug.
> > Even though I've tried make mem hotplug ACPI parts not x86 specific,
> > I'm afraid it might be tightly coupled with hotplug support.
> > So here are 2 options make DSDT part work without hotplug or
> > implement hotplug here. I think the former is just a waste of time
> > and we should just add hotplug. It should take relatively minor effort
> > since you already implemented most of boiler plate here.
> 
> Shameer sent an RFC series for supporting hotplug.
> 
> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> https://patchwork.kernel.org/cover/10783589/
> 
> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
> OK, even after system_reset.
> 
> Note the hotplug kernel support on ARM is very recent. I would prefer to
> dissociate both efforts if we want to get a chance making coldplug for
> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
> boot properly.
I guess we can merge implemetation that works on some kernel configs
[DT based I'd guess], and add ACPI part later. Though that will be
a bit of a mess as we do not version firmware parts (ACPI tables).

> > As for how to implement ACPI HW part, I suggest to borrow GED
> > device that NEMU guys trying to use instead of GPIO route,
> > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> > So that it would be easier to share this with their virt-x86
> > machine eventually.
> Sounds like a different approach than the one initiated by Shameer?
ARM boards were first to use ACPI hw-reduced profile so they picked up
available back then GPIO based way to deliver hotplug event, later spec
introduced Generic Event Device for that means to use with hw-reduced
profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
GPIO mapping. I'd guess it will more compatible with various contemporary
guests and we could reuse the same code for both x86/arm virt boards)

1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

> 
> Thanks
> 
> Eric
> > 
> > 
> >> 3) Support of NV-DIMM [14 - 17]
> > The same might be true for NUMA but I haven't dug this deep in to
> > that part.
> > 
> >>
> >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> >>
> >> Work is ongoing to transform the whole memory as device memory.
> >> However this move is not trivial and to me, is independent on
> >> the improvements brought by this series:
> >> - if we were to use DIMM for initial RAM, those DIMMs would use
> >>   use slots. Although they would not be part of the ones provided
> >>   using the ",slots" options, they are ACPI limited resources.
> >> - DT and ACPI description needs to be reworked
> >> - NUMA integration needs special care
> >> - a special device memory object may be required to avoid consuming
> >>   slots and easing the FW description.
> >>
> >> So I preferred to separate the concerns. This new implementation
> >> based on device memory could be candidate for another virt
> >> version.
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> References:
> >>
> >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >> http://patchwork.ozlabs.org/cover/914694/
> >>
> >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> >>
> >> History:
> >>
> >> v6 -> v7:
> >> - Addressed Peter and Igor comments (exceptions sent my email)
> >> - Fixed TCG case. Now device memory works also for TCG and vcpu
> >>   pamax is checked
> >> - See individual logs for more details
> >>
> >> v5 -> v6:
> >> - mingw compilation issue fix
> >> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
> >>   IPA bits
> >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
> >>   of "hw/arm/virt: Split the memory map description"
> >> - "hw/arm/virt: Move memory map initialization into machvirt_init"
> >>   squashed into the previous patch
> >> - change alignment of IO regions beyond the RAM so that it matches their
> >>   size
> >>
> >> v4 -> v5:
> >> - change in the memory map
> >> - see individual logs
> >>
> >> v3 -> v4:
> >> - rebase on David's "pc-dimm: next bunch of cleanups" and
> >>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> >> - kvm-type option not used anymore. We directly use
> >>   maxram_size and ram_size machine fields to compute the
> >>   MAX IPA range. Migration is naturally handled as CLI
> >>   option are kept between source and destination. This was
> >>   suggested by David.
> >> - device_memory_start and device_memory_size not stored
> >>   anymore in vms->bootinfo
> >> - I did not take into account 2 Igor's comments: the one
> >>   related to the refactoring of arm_load_dtb and the one
> >>   related to the generation of the dtb after system_reset
> >>   which would contain nodes of hotplugged devices (we do
> >>   not support hotplug at this stage)
> >> - check the end-user does not attempt to hotplug a device
> >> - addition of "vl: Set machine ram_size, maxram_size and
> >>   ram_slots earlier"
> >>
> >> v2 -> v3:
> >> - fix pc_q35 and pc_piix compilation error
> >> - kwangwoo's email being not valid anymore, remove his address
> >>
> >> v1 -> v2:
> >> - kvm_get_max_vm_phys_shift moved in arch specific file
> >> - addition of NVDIMM part
> >> - single series
> >> - rebase on David's refactoring
> >>
> >> v1:
> >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >>
> >> Eric Auger (12):
> >>   hw/arm/virt: Rename highmem IO regions
> >>   hw/arm/virt: Split the memory map description
> >>   hw/boards: Add a MachineState parameter to kvm_type callback
> >>   kvm: add kvm_arm_get_max_vm_ipa_size
> >>   vl: Set machine ram_size, maxram_size and ram_slots earlier
> >>   hw/arm/virt: Dynamic memory map depending on RAM requirements
> >>   hw/arm/virt: Implement kvm_type function for 4.0 machine
> >>   hw/arm/virt: Bump the 255GB initial RAM limit
> >>   hw/arm/virt: Add memory hotplug framework
> >>   hw/arm/virt: Allocate device_memory
> >>   hw/arm/boot: Expose the pmem nodes in the DT
> >>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> >>
> >> Kwangwoo Lee (2):
> >>   nvdimm: use configurable ACPI IO base and size
> >>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >>
> >> Shameer Kolothum (3):
> >>   hw/arm/boot: introduce fdt_add_memory_node helper
> >>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> >>
> >>  accel/kvm/kvm-all.c             |   2 +-
> >>  default-configs/arm-softmmu.mak |   4 +
> >>  hw/acpi/nvdimm.c                |  31 ++-
> >>  hw/arm/boot.c                   | 136 ++++++++++--
> >>  hw/arm/virt-acpi-build.c        |  23 +-
> >>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
> >>  hw/i386/pc_piix.c               |   6 +-
> >>  hw/i386/pc_q35.c                |   6 +-
> >>  hw/ppc/mac_newworld.c           |   3 +-
> >>  hw/ppc/mac_oldworld.c           |   2 +-
> >>  hw/ppc/spapr.c                  |   2 +-
> >>  include/hw/arm/virt.h           |  24 ++-
> >>  include/hw/boards.h             |   5 +-
> >>  include/hw/mem/nvdimm.h         |   4 +
> >>  target/arm/kvm.c                |  10 +
> >>  target/arm/kvm_arm.h            |  13 ++
> >>  vl.c                            |   6 +-
> >>  17 files changed, 556 insertions(+), 85 deletions(-)
> >>
> > 
> > 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-25  9:42     ` Igor Mammedov
@ 2019-02-25 10:13       ` Shameerali Kolothum Thodi
  2019-02-26  8:40       ` Auger Eric
  1 sibling, 0 replies; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-02-25 10:13 UTC (permalink / raw)
  To: Igor Mammedov, Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel, dgilbert, qemu-arm,
	david, eric.auger.pro, Linuxarm

Hi Igor,

> -----Original Message-----
> From: Igor Mammedov [mailto:imammedo@redhat.com]
> Sent: 25 February 2019 09:42
> To: Auger Eric <eric.auger@redhat.com>
> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> qemu-devel@nongnu.org; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; dgilbert@redhat.com;
> qemu-arm@nongnu.org; david@gibson.dropbear.id.au;
> eric.auger.pro@gmail.com
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> and PCDIMM/NVDIMM support
> 
> On Fri, 22 Feb 2019 18:35:26 +0100
> Auger Eric <eric.auger@redhat.com> wrote:

[...]

> > I understand hotplug
> > would require extra modifications but I don't see anything else missing
> > for coldplug.
> > > Even though I've tried make mem hotplug ACPI parts not x86 specific,
> > > I'm afraid it might be tightly coupled with hotplug support.
> > > So here are 2 options make DSDT part work without hotplug or
> > > implement hotplug here. I think the former is just a waste of time
> > > and we should just add hotplug. It should take relatively minor effort
> > > since you already implemented most of boiler plate here.
> >
> > Shameer sent an RFC series for supporting hotplug.
> >
> > [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> > https://patchwork.kernel.org/cover/10783589/
> >
> > I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
> > OK, even after system_reset.
> >
> > Note the hotplug kernel support on ARM is very recent. I would prefer to
> > dissociate both efforts if we want to get a chance making coldplug for
> > 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
> > boot properly.
> I guess we can merge implemetation that works on some kernel configs
> [DT based I'd guess], and add ACPI part later. Though that will be
> a bit of a mess as we do not version firmware parts (ACPI tables).
> 
> > > As for how to implement ACPI HW part, I suggest to borrow GED
> > > device that NEMU guys trying to use instead of GPIO route,
> > > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> > > So that it would be easier to share this with their virt-x86
> > > machine eventually.
> > Sounds like a different approach than the one initiated by Shameer?
> ARM boards were first to use ACPI hw-reduced profile so they picked up
> available back then GPIO based way to deliver hotplug event, later spec
> introduced Generic Event Device for that means to use with hw-reduced
> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
> GPIO mapping. I'd guess it will more compatible with various contemporary
> guests and we could reuse the same code for both x86/arm virt boards)
> 
> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

Thanks for this suggestion. Looks like that is definitely a better way of handling
ACPI events. I will take a look and use GED for the next revision of hotplug 
series.

Thanks,
Shameer

> > Thanks
> >
> > Eric
> > >
> > >
> > >> 3) Support of NV-DIMM [14 - 17]
> > > The same might be true for NUMA but I haven't dug this deep in to
> > > that part.
> > >
> > >>
> > >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> > >>
> > >> Work is ongoing to transform the whole memory as device memory.
> > >> However this move is not trivial and to me, is independent on
> > >> the improvements brought by this series:
> > >> - if we were to use DIMM for initial RAM, those DIMMs would use
> > >>   use slots. Although they would not be part of the ones provided
> > >>   using the ",slots" options, they are ACPI limited resources.
> > >> - DT and ACPI description needs to be reworked
> > >> - NUMA integration needs special care
> > >> - a special device memory object may be required to avoid consuming
> > >>   slots and easing the FW description.
> > >>
> > >> So I preferred to separate the concerns. This new implementation
> > >> based on device memory could be candidate for another virt
> > >> version.
> > >>
> > >> Best Regards
> > >>
> > >> Eric
> > >>
> > >> References:
> > >>
> > >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > >> http://patchwork.ozlabs.org/cover/914694/
> > >>
> > >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > >>
> > >> This series can be found at:
> > >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> > >>
> > >> History:
> > >>
> > >> v6 -> v7:
> > >> - Addressed Peter and Igor comments (exceptions sent my email)
> > >> - Fixed TCG case. Now device memory works also for TCG and vcpu
> > >>   pamax is checked
> > >> - See individual logs for more details
> > >>
> > >> v5 -> v6:
> > >> - mingw compilation issue fix
> > >> - kvm_arm_get_max_vm_phys_shift always returns the number of
> supported
> > >>   IPA bits
> > >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the
> review
> > >>   of "hw/arm/virt: Split the memory map description"
> > >> - "hw/arm/virt: Move memory map initialization into machvirt_init"
> > >>   squashed into the previous patch
> > >> - change alignment of IO regions beyond the RAM so that it matches their
> > >>   size
> > >>
> > >> v4 -> v5:
> > >> - change in the memory map
> > >> - see individual logs
> > >>
> > >> v3 -> v4:
> > >> - rebase on David's "pc-dimm: next bunch of cleanups" and
> > >>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> > >> - kvm-type option not used anymore. We directly use
> > >>   maxram_size and ram_size machine fields to compute the
> > >>   MAX IPA range. Migration is naturally handled as CLI
> > >>   option are kept between source and destination. This was
> > >>   suggested by David.
> > >> - device_memory_start and device_memory_size not stored
> > >>   anymore in vms->bootinfo
> > >> - I did not take into account 2 Igor's comments: the one
> > >>   related to the refactoring of arm_load_dtb and the one
> > >>   related to the generation of the dtb after system_reset
> > >>   which would contain nodes of hotplugged devices (we do
> > >>   not support hotplug at this stage)
> > >> - check the end-user does not attempt to hotplug a device
> > >> - addition of "vl: Set machine ram_size, maxram_size and
> > >>   ram_slots earlier"
> > >>
> > >> v2 -> v3:
> > >> - fix pc_q35 and pc_piix compilation error
> > >> - kwangwoo's email being not valid anymore, remove his address
> > >>
> > >> v1 -> v2:
> > >> - kvm_get_max_vm_phys_shift moved in arch specific file
> > >> - addition of NVDIMM part
> > >> - single series
> > >> - rebase on David's refactoring
> > >>
> > >> v1:
> > >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>
> > >> Best Regards
> > >>
> > >> Eric
> > >>
> > >>
> > >> Eric Auger (12):
> > >>   hw/arm/virt: Rename highmem IO regions
> > >>   hw/arm/virt: Split the memory map description
> > >>   hw/boards: Add a MachineState parameter to kvm_type callback
> > >>   kvm: add kvm_arm_get_max_vm_ipa_size
> > >>   vl: Set machine ram_size, maxram_size and ram_slots earlier
> > >>   hw/arm/virt: Dynamic memory map depending on RAM requirements
> > >>   hw/arm/virt: Implement kvm_type function for 4.0 machine
> > >>   hw/arm/virt: Bump the 255GB initial RAM limit
> > >>   hw/arm/virt: Add memory hotplug framework
> > >>   hw/arm/virt: Allocate device_memory
> > >>   hw/arm/boot: Expose the pmem nodes in the DT
> > >>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > >>
> > >> Kwangwoo Lee (2):
> > >>   nvdimm: use configurable ACPI IO base and size
> > >>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > >>
> > >> Shameer Kolothum (3):
> > >>   hw/arm/boot: introduce fdt_add_memory_node helper
> > >>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> > >>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > >>
> > >>  accel/kvm/kvm-all.c             |   2 +-
> > >>  default-configs/arm-softmmu.mak |   4 +
> > >>  hw/acpi/nvdimm.c                |  31 ++-
> > >>  hw/arm/boot.c                   | 136 ++++++++++--
> > >>  hw/arm/virt-acpi-build.c        |  23 +-
> > >>  hw/arm/virt.c                   | 364
> ++++++++++++++++++++++++++++----
> > >>  hw/i386/pc_piix.c               |   6 +-
> > >>  hw/i386/pc_q35.c                |   6 +-
> > >>  hw/ppc/mac_newworld.c           |   3 +-
> > >>  hw/ppc/mac_oldworld.c           |   2 +-
> > >>  hw/ppc/spapr.c                  |   2 +-
> > >>  include/hw/arm/virt.h           |  24 ++-
> > >>  include/hw/boards.h             |   5 +-
> > >>  include/hw/mem/nvdimm.h         |   4 +
> > >>  target/arm/kvm.c                |  10 +
> > >>  target/arm/kvm_arm.h            |  13 ++
> > >>  vl.c                            |   6 +-
> > >>  17 files changed, 556 insertions(+), 85 deletions(-)
> > >>
> > >
> > >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-25  9:42     ` Igor Mammedov
  2019-02-25 10:13       ` Shameerali Kolothum Thodi
@ 2019-02-26  8:40       ` Auger Eric
  2019-02-26 13:11         ` Auger Eric
  1 sibling, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-26  8:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,

On 2/25/19 10:42 AM, Igor Mammedov wrote:
> On Fri, 22 Feb 2019 18:35:26 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>
>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>
>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>> the max IPA capacity KVM was able to manage.
>>>>
>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>
>>>> In this series we use this feature and allow the RAM to grow without
>>>> any other limit than the one put by the host kernel.
>>>>
>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>> ram_size and then comes the device memory (,maxmem) of size
>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>> depending on the instantiated memory objects.
>>>>
>>>> IO regions previously located between 256GB and 1TB are moved after
>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>> and maxram_size. Size alignment is enforced.
>>>>
>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>> still is used. The change of memory map becomes effective from 4.0
>>>> onwards.
>>>>
>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>> that job at the moment.
>>>>
>>>> Device memory being put just after the initial RAM, it is possible
>>>> to get access to this feature while keeping a 1TB address map.
>>>>
>>>> This series reuses/rebases patches initially submitted by Shameer
>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>
>>>> Functionally, the series is split into 3 parts:
>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>    the memory map
>>>
>>>> 2) Support of PC-DIMM [10 - 13]
>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>> visible to the guest. It might be that DT is masking problem
>>> but well, that won't work on ACPI only guests.
>>
>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>> added with the DIMM slots.
> Question is how does it get there? Does it come from DT or from firmware
> via UEFI interfaces?
> 
>> So it looks fine to me. Isn't E820 a pure x86 matter?
> sorry for misleading, I've meant is UEFI GetMemoryMap().
> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> memory early at boot and later put that memory into zone normal and hence
> make it non-hot-un-pluggable. The same concerns apply to DT based means
> of discovery.
> (That's guest issue but it's easy to workaround it not putting hotpluggable
> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> and doesn't get locked up.
> 
>> What else would you expect in the dsdt?
> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> describing memory ranges

OK thank you for the explanations. I will work on PNP0C80 addition then.
Does it mean that in ACPI mode we must not output DT hotplug memory
nodes or assuming that PNP0C80 is properly described, it will "override"
DT description?

> 
>> I understand hotplug
>> would require extra modifications but I don't see anything else missing
>> for coldplug.
>>> Even though I've tried make mem hotplug ACPI parts not x86 specific,
>>> I'm afraid it might be tightly coupled with hotplug support.
>>> So here are 2 options make DSDT part work without hotplug or
>>> implement hotplug here. I think the former is just a waste of time
>>> and we should just add hotplug. It should take relatively minor effort
>>> since you already implemented most of boiler plate here.
>>
>> Shameer sent an RFC series for supporting hotplug.
>>
>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>> https://patchwork.kernel.org/cover/10783589/
>>
>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
>> OK, even after system_reset.
>>
>> Note the hotplug kernel support on ARM is very recent. I would prefer to
>> dissociate both efforts if we want to get a chance making coldplug for
>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
>> boot properly.
> I guess we can merge implemetation that works on some kernel configs
> [DT based I'd guess], and add ACPI part later. Though that will be
> a bit of a mess as we do not version firmware parts (ACPI tables).
> 
>>> As for how to implement ACPI HW part, I suggest to borrow GED
>>> device that NEMU guys trying to use instead of GPIO route,
>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
>>> So that it would be easier to share this with their virt-x86
>>> machine eventually.
>> Sounds like a different approach than the one initiated by Shameer?
> ARM boards were first to use ACPI hw-reduced profile so they picked up
> available back then GPIO based way to deliver hotplug event, later spec
> introduced Generic Event Device for that means to use with hw-reduced
> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
> GPIO mapping. I'd guess it will more compatible with various contemporary
> guests and we could reuse the same code for both x86/arm virt boards)
> 
> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

That's really helpful for the ARM hotplug works. Thanks!

Eric
> 
>>
>> Thanks
>>
>> Eric
>>>
>>>
>>>> 3) Support of NV-DIMM [14 - 17]
>>> The same might be true for NUMA but I haven't dug this deep in to
>>> that part.
>>>
>>>>
>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>>>
>>>> Work is ongoing to transform the whole memory as device memory.
>>>> However this move is not trivial and to me, is independent on
>>>> the improvements brought by this series:
>>>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>>>   use slots. Although they would not be part of the ones provided
>>>>   using the ",slots" options, they are ACPI limited resources.
>>>> - DT and ACPI description needs to be reworked
>>>> - NUMA integration needs special care
>>>> - a special device memory object may be required to avoid consuming
>>>>   slots and easing the FW description.
>>>>
>>>> So I preferred to separate the concerns. This new implementation
>>>> based on device memory could be candidate for another virt
>>>> version.
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> References:
>>>>
>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>
>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>>>
>>>> History:
>>>>
>>>> v6 -> v7:
>>>> - Addressed Peter and Igor comments (exceptions sent my email)
>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>>>   pamax is checked
>>>> - See individual logs for more details
>>>>
>>>> v5 -> v6:
>>>> - mingw compilation issue fix
>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>>>   IPA bits
>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>>>   of "hw/arm/virt: Split the memory map description"
>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>>>   squashed into the previous patch
>>>> - change alignment of IO regions beyond the RAM so that it matches their
>>>>   size
>>>>
>>>> v4 -> v5:
>>>> - change in the memory map
>>>> - see individual logs
>>>>
>>>> v3 -> v4:
>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>>>> - kvm-type option not used anymore. We directly use
>>>>   maxram_size and ram_size machine fields to compute the
>>>>   MAX IPA range. Migration is naturally handled as CLI
>>>>   option are kept between source and destination. This was
>>>>   suggested by David.
>>>> - device_memory_start and device_memory_size not stored
>>>>   anymore in vms->bootinfo
>>>> - I did not take into account 2 Igor's comments: the one
>>>>   related to the refactoring of arm_load_dtb and the one
>>>>   related to the generation of the dtb after system_reset
>>>>   which would contain nodes of hotplugged devices (we do
>>>>   not support hotplug at this stage)
>>>> - check the end-user does not attempt to hotplug a device
>>>> - addition of "vl: Set machine ram_size, maxram_size and
>>>>   ram_slots earlier"
>>>>
>>>> v2 -> v3:
>>>> - fix pc_q35 and pc_piix compilation error
>>>> - kwangwoo's email being not valid anymore, remove his address
>>>>
>>>> v1 -> v2:
>>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>>> - addition of NVDIMM part
>>>> - single series
>>>> - rebase on David's refactoring
>>>>
>>>> v1:
>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>>
>>>> Eric Auger (12):
>>>>   hw/arm/virt: Rename highmem IO regions
>>>>   hw/arm/virt: Split the memory map description
>>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>>>   hw/arm/virt: Add memory hotplug framework
>>>>   hw/arm/virt: Allocate device_memory
>>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>>
>>>> Kwangwoo Lee (2):
>>>>   nvdimm: use configurable ACPI IO base and size
>>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>>
>>>> Shameer Kolothum (3):
>>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>>
>>>>  accel/kvm/kvm-all.c             |   2 +-
>>>>  default-configs/arm-softmmu.mak |   4 +
>>>>  hw/acpi/nvdimm.c                |  31 ++-
>>>>  hw/arm/boot.c                   | 136 ++++++++++--
>>>>  hw/arm/virt-acpi-build.c        |  23 +-
>>>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>>>  hw/i386/pc_piix.c               |   6 +-
>>>>  hw/i386/pc_q35.c                |   6 +-
>>>>  hw/ppc/mac_newworld.c           |   3 +-
>>>>  hw/ppc/mac_oldworld.c           |   2 +-
>>>>  hw/ppc/spapr.c                  |   2 +-
>>>>  include/hw/arm/virt.h           |  24 ++-
>>>>  include/hw/boards.h             |   5 +-
>>>>  include/hw/mem/nvdimm.h         |   4 +
>>>>  target/arm/kvm.c                |  10 +
>>>>  target/arm/kvm_arm.h            |  13 ++
>>>>  vl.c                            |   6 +-
>>>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>>>
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-26  8:40       ` Auger Eric
@ 2019-02-26 13:11         ` Auger Eric
  2019-02-26 16:56           ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-26 13:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

Hi Igor,

On 2/26/19 9:40 AM, Auger Eric wrote:
> Hi Igor,
> 
> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>> On Fri, 22 Feb 2019 18:35:26 +0100
>> Auger Eric <eric.auger@redhat.com> wrote:
>>
>>> Hi Igor,
>>>
>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>
>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>
>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>> the max IPA capacity KVM was able to manage.
>>>>>
>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>
>>>>> In this series we use this feature and allow the RAM to grow without
>>>>> any other limit than the one put by the host kernel.
>>>>>
>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>> depending on the instantiated memory objects.
>>>>>
>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>> and maxram_size. Size alignment is enforced.
>>>>>
>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>> onwards.
>>>>>
>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>> that job at the moment.
>>>>>
>>>>> Device memory being put just after the initial RAM, it is possible
>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>
>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>
>>>>> Functionally, the series is split into 3 parts:
>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>    the memory map
>>>>
>>>>> 2) Support of PC-DIMM [10 - 13]
>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>> visible to the guest. It might be that DT is masking problem
>>>> but well, that won't work on ACPI only guests.
>>>
>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>> added with the DIMM slots.
>> Question is how does it get there? Does it come from DT or from firmware
>> via UEFI interfaces?
>>
>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>> memory early at boot and later put that memory into zone normal and hence
>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>> of discovery.
>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>> and doesn't get locked up.
>>
>>> What else would you expect in the dsdt?
>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>> describing memory ranges
> 
> OK thank you for the explanations. I will work on PNP0C80 addition then.
> Does it mean that in ACPI mode we must not output DT hotplug memory
> nodes or assuming that PNP0C80 is properly described, it will "override"
> DT description?

After further investigations, I think the pieces you pointed out are
added by Shameer's series, ie. through the build_memory_hotplug_aml()
call. So I suggest we separate the concerns: this series brings support
for DIMM coldplug. hotplug, including all the relevant ACPI structures
will be added later on by Shameer.

Thanks

Eric
> 
>>
>>> I understand hotplug
>>> would require extra modifications but I don't see anything else missing
>>> for coldplug.
>>>> Even though I've tried make mem hotplug ACPI parts not x86 specific,
>>>> I'm afraid it might be tightly coupled with hotplug support.
>>>> So here are 2 options make DSDT part work without hotplug or
>>>> implement hotplug here. I think the former is just a waste of time
>>>> and we should just add hotplug. It should take relatively minor effort
>>>> since you already implemented most of boiler plate here.
>>>
>>> Shameer sent an RFC series for supporting hotplug.
>>>
>>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>>> https://patchwork.kernel.org/cover/10783589/
>>>
>>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
>>> OK, even after system_reset.
>>>
>>> Note the hotplug kernel support on ARM is very recent. I would prefer to
>>> dissociate both efforts if we want to get a chance making coldplug for
>>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
>>> boot properly.
>> I guess we can merge implemetation that works on some kernel configs
>> [DT based I'd guess], and add ACPI part later. Though that will be
>> a bit of a mess as we do not version firmware parts (ACPI tables).
>>
>>>> As for how to implement ACPI HW part, I suggest to borrow GED
>>>> device that NEMU guys trying to use instead of GPIO route,
>>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
>>>> So that it would be easier to share this with their virt-x86
>>>> machine eventually.
>>> Sounds like a different approach than the one initiated by Shameer?
>> ARM boards were first to use ACPI hw-reduced profile so they picked up
>> available back then GPIO based way to deliver hotplug event, later spec
>> introduced Generic Event Device for that means to use with hw-reduced
>> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
>> GPIO mapping. I'd guess it will more compatible with various contemporary
>> guests and we could reuse the same code for both x86/arm virt boards)
>>
>> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c
> 
> That's really helpful for the ARM hotplug works. Thanks!
> 
> Eric
>>
>>>
>>> Thanks
>>>
>>> Eric
>>>>
>>>>
>>>>> 3) Support of NV-DIMM [14 - 17]
>>>> The same might be true for NUMA but I haven't dug this deep in to
>>>> that part.
>>>>
>>>>>
>>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>>>>
>>>>> Work is ongoing to transform the whole memory as device memory.
>>>>> However this move is not trivial and to me, is independent on
>>>>> the improvements brought by this series:
>>>>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>>>>   use slots. Although they would not be part of the ones provided
>>>>>   using the ",slots" options, they are ACPI limited resources.
>>>>> - DT and ACPI description needs to be reworked
>>>>> - NUMA integration needs special care
>>>>> - a special device memory object may be required to avoid consuming
>>>>>   slots and easing the FW description.
>>>>>
>>>>> So I preferred to separate the concerns. This new implementation
>>>>> based on device memory could be candidate for another virt
>>>>> version.
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> References:
>>>>>
>>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>>
>>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>>>
>>>>> This series can be found at:
>>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>>>>
>>>>> History:
>>>>>
>>>>> v6 -> v7:
>>>>> - Addressed Peter and Igor comments (exceptions sent my email)
>>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>>>>   pamax is checked
>>>>> - See individual logs for more details
>>>>>
>>>>> v5 -> v6:
>>>>> - mingw compilation issue fix
>>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>>>>   IPA bits
>>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>>>>   of "hw/arm/virt: Split the memory map description"
>>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>>>>   squashed into the previous patch
>>>>> - change alignment of IO regions beyond the RAM so that it matches their
>>>>>   size
>>>>>
>>>>> v4 -> v5:
>>>>> - change in the memory map
>>>>> - see individual logs
>>>>>
>>>>> v3 -> v4:
>>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>>>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>>>>> - kvm-type option not used anymore. We directly use
>>>>>   maxram_size and ram_size machine fields to compute the
>>>>>   MAX IPA range. Migration is naturally handled as CLI
>>>>>   option are kept between source and destination. This was
>>>>>   suggested by David.
>>>>> - device_memory_start and device_memory_size not stored
>>>>>   anymore in vms->bootinfo
>>>>> - I did not take into account 2 Igor's comments: the one
>>>>>   related to the refactoring of arm_load_dtb and the one
>>>>>   related to the generation of the dtb after system_reset
>>>>>   which would contain nodes of hotplugged devices (we do
>>>>>   not support hotplug at this stage)
>>>>> - check the end-user does not attempt to hotplug a device
>>>>> - addition of "vl: Set machine ram_size, maxram_size and
>>>>>   ram_slots earlier"
>>>>>
>>>>> v2 -> v3:
>>>>> - fix pc_q35 and pc_piix compilation error
>>>>> - kwangwoo's email being not valid anymore, remove his address
>>>>>
>>>>> v1 -> v2:
>>>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>>>> - addition of NVDIMM part
>>>>> - single series
>>>>> - rebase on David's refactoring
>>>>>
>>>>> v1:
>>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric Auger (12):
>>>>>   hw/arm/virt: Rename highmem IO regions
>>>>>   hw/arm/virt: Split the memory map description
>>>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>>>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>>>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>>>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>>>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>>>>   hw/arm/virt: Add memory hotplug framework
>>>>>   hw/arm/virt: Allocate device_memory
>>>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>>>
>>>>> Kwangwoo Lee (2):
>>>>>   nvdimm: use configurable ACPI IO base and size
>>>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>>>
>>>>> Shameer Kolothum (3):
>>>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>>>
>>>>>  accel/kvm/kvm-all.c             |   2 +-
>>>>>  default-configs/arm-softmmu.mak |   4 +
>>>>>  hw/acpi/nvdimm.c                |  31 ++-
>>>>>  hw/arm/boot.c                   | 136 ++++++++++--
>>>>>  hw/arm/virt-acpi-build.c        |  23 +-
>>>>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>>>>  hw/i386/pc_piix.c               |   6 +-
>>>>>  hw/i386/pc_q35.c                |   6 +-
>>>>>  hw/ppc/mac_newworld.c           |   3 +-
>>>>>  hw/ppc/mac_oldworld.c           |   2 +-
>>>>>  hw/ppc/spapr.c                  |   2 +-
>>>>>  include/hw/arm/virt.h           |  24 ++-
>>>>>  include/hw/boards.h             |   5 +-
>>>>>  include/hw/mem/nvdimm.h         |   4 +
>>>>>  target/arm/kvm.c                |  10 +
>>>>>  target/arm/kvm_arm.h            |  13 ++
>>>>>  vl.c                            |   6 +-
>>>>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>>>>
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-26 13:11         ` Auger Eric
@ 2019-02-26 16:56           ` Igor Mammedov
  2019-02-26 17:53             ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-26 16:56 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, qemu-devel,
	shameerali.kolothum.thodi, dgilbert, qemu-arm, david,
	eric.auger.pro

On Tue, 26 Feb 2019 14:11:58 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/26/19 9:40 AM, Auger Eric wrote:
> > Hi Igor,
> > 
> > On 2/25/19 10:42 AM, Igor Mammedov wrote:
> >> On Fri, 22 Feb 2019 18:35:26 +0100
> >> Auger Eric <eric.auger@redhat.com> wrote:
> >>
> >>> Hi Igor,
> >>>
> >>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> >>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>
> >>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>> support device memory in general, and especially PCDIMM/NVDIMM.
> >>>>>
> >>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >>>>> MMIO region. The address map was 1TB large. This corresponded to
> >>>>> the max IPA capacity KVM was able to manage.
> >>>>>
> >>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>> IPA range. So the guest physical address can go beyond the 1TB. The
> >>>>> max GPA size depends on the host kernel configuration and physical CPUs.
> >>>>>
> >>>>> In this series we use this feature and allow the RAM to grow without
> >>>>> any other limit than the one put by the host kernel.
> >>>>>
> >>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
> >>>>> depending on the instantiated memory objects.
> >>>>>
> >>>>> IO regions previously located between 256GB and 1TB are moved after
> >>>>> the RAM. Their offset is dynamically computed, depends on ram_size
> >>>>> and maxram_size. Size alignment is enforced.
> >>>>>
> >>>>> In case maxmem value is inferior to 255GB, the legacy memory map
> >>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>> onwards.
> >>>>>
> >>>>> As we keep the initial RAM at 1GB base address, we do not need to do
> >>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>> that job at the moment.
> >>>>>
> >>>>> Device memory being put just after the initial RAM, it is possible
> >>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>
> >>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>
> >>>>> Functionally, the series is split into 3 parts:
> >>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>    the memory map
> >>>>
> >>>>> 2) Support of PC-DIMM [10 - 13]
> >>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>> visible to the guest. It might be that DT is masking problem
> >>>> but well, that won't work on ACPI only guests.
> >>>
> >>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> >>> added with the DIMM slots.
> >> Question is how does it get there? Does it come from DT or from firmware
> >> via UEFI interfaces?
> >>
> >>> So it looks fine to me. Isn't E820 a pure x86 matter?
> >> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> >> memory early at boot and later put that memory into zone normal and hence
> >> make it non-hot-un-pluggable. The same concerns apply to DT based means
> >> of discovery.
> >> (That's guest issue but it's easy to workaround it not putting hotpluggable
> >> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> >> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> >> and doesn't get locked up.
> >>
> >>> What else would you expect in the dsdt?
> >> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> >> describing memory ranges
> > 
> > OK thank you for the explanations. I will work on PNP0C80 addition then.
> > Does it mean that in ACPI mode we must not output DT hotplug memory
> > nodes or assuming that PNP0C80 is properly described, it will "override"
> > DT description?
> 
> After further investigations, I think the pieces you pointed out are
> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> call. So I suggest we separate the concerns: this series brings support
> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> will be added later on by Shameer.

Maybe we should not put pc-dimms in DT for this series until it gets clear
if it doesn't conflict with ACPI in some way.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-26 16:56           ` Igor Mammedov
@ 2019-02-26 17:53             ` Auger Eric
  2019-02-27 10:10               ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-26 17:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, dgilbert,
	shameerali.kolothum.thodi, qemu-devel, qemu-arm, eric.auger.pro,
	david

Hi Igor,

On 2/26/19 5:56 PM, Igor Mammedov wrote:
> On Tue, 26 Feb 2019 14:11:58 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/26/19 9:40 AM, Auger Eric wrote:
>>> Hi Igor,
>>>
>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>
>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>>>
>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>
>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>>>
>>>>>>> In this series we use this feature and allow the RAM to grow without
>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>
>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>>>> depending on the instantiated memory objects.
>>>>>>>
>>>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>
>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>> onwards.
>>>>>>>
>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>> that job at the moment.
>>>>>>>
>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>
>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>
>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>    the memory map
>>>>>>
>>>>>>> 2) Support of PC-DIMM [10 - 13]
>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>> but well, that won't work on ACPI only guests.
>>>>>
>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>>>> added with the DIMM slots.
>>>> Question is how does it get there? Does it come from DT or from firmware
>>>> via UEFI interfaces?
>>>>
>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>>>> memory early at boot and later put that memory into zone normal and hence
>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>>>> of discovery.
>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>>>> and doesn't get locked up.
>>>>
>>>>> What else would you expect in the dsdt?
>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>>>> describing memory ranges
>>>
>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>> DT description?
>>
>> After further investigations, I think the pieces you pointed out are
>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>> call. So I suggest we separate the concerns: this series brings support
>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>> will be added later on by Shameer.
> 
> Maybe we should not put pc-dimms in DT for this series until it gets clear
> if it doesn't conflict with ACPI in some way.

I guess you mean removing the DT hotpluggable memory nodes only in ACPI
mode? Otherwise you simply remove the DIMM feature, right?

I double checked and if you remove the hotpluggable memory DT nodes in
ACPI mode:
- you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
guess you're right, if the DT nodes are available, that memory is
considered as not unpluggable by the guest.
- You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
system.

Hotplug/unplug is clearly not supported by this series and any attempt
results in "memory hotplug is not supported". Is it really an issue if
the guest does not consider DIMM slots as not hot-unpluggable memory? I
am not even sure the guest kernel would support to unplug that memory.

In case we want all ACPI tables to be ready for making this memory seen
as hot-unpluggable we need some Shameer's patches on top of this series.

Also don't DIMM slots already make sense in DT mode. Usually we accept
to add one feature in DT and then in ACPI. For instance we can benefit
from nvdimm in dt mode right? So, considering an incremental approach I
would be in favour of keeping the DT nodes.

Thanks

Eric
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-26 17:53             ` Auger Eric
@ 2019-02-27 10:10               ` Igor Mammedov
  2019-02-27 10:27                 ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-27 10:10 UTC (permalink / raw)
  To: Auger Eric
  Cc: peter.maydell, drjones, david, dgilbert,
	shameerali.kolothum.thodi, qemu-devel, qemu-arm, eric.auger.pro,
	david

On Tue, 26 Feb 2019 18:53:24 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/26/19 5:56 PM, Igor Mammedov wrote:
> > On Tue, 26 Feb 2019 14:11:58 +0100
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi Igor,
> >>
> >> On 2/26/19 9:40 AM, Auger Eric wrote:  
> >>> Hi Igor,
> >>>
> >>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
> >>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>  
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
> >>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>  
> >>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
> >>>>>>>
> >>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >>>>>>> MMIO region. The address map was 1TB large. This corresponded to
> >>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>
> >>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
> >>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
> >>>>>>>
> >>>>>>> In this series we use this feature and allow the RAM to grow without
> >>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>
> >>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
> >>>>>>> depending on the instantiated memory objects.
> >>>>>>>
> >>>>>>> IO regions previously located between 256GB and 1TB are moved after
> >>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
> >>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>
> >>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
> >>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>> onwards.
> >>>>>>>
> >>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
> >>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>> that job at the moment.
> >>>>>>>
> >>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>
> >>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>
> >>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>    the memory map  
> >>>>>>  
> >>>>>>> 2) Support of PC-DIMM [10 - 13]  
> >>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>> but well, that won't work on ACPI only guests.  
> >>>>>
> >>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> >>>>> added with the DIMM slots.  
> >>>> Question is how does it get there? Does it come from DT or from firmware
> >>>> via UEFI interfaces?
> >>>>  
> >>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
> >>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> >>>> memory early at boot and later put that memory into zone normal and hence
> >>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
> >>>> of discovery.
> >>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
> >>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> >>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> >>>> and doesn't get locked up.
> >>>>  
> >>>>> What else would you expect in the dsdt?  
> >>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> >>>> describing memory ranges  
> >>>
> >>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>> DT description?  
> >>
> >> After further investigations, I think the pieces you pointed out are
> >> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >> call. So I suggest we separate the concerns: this series brings support
> >> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >> will be added later on by Shameer.  
> > 
> > Maybe we should not put pc-dimms in DT for this series until it gets clear
> > if it doesn't conflict with ACPI in some way.  
> 
> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> mode? Otherwise you simply remove the DIMM feature, right?
Something like this so DT won't get in conflict with ACPI.
Only we don't have a switch for it something like, -machine fdt=on (with default off)
 
> I double checked and if you remove the hotpluggable memory DT nodes in
> ACPI mode:
> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> guess you're right, if the DT nodes are available, that memory is
> considered as not unpluggable by the guest.
> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> system.
> 
> Hotplug/unplug is clearly not supported by this series and any attempt
> results in "memory hotplug is not supported". Is it really an issue if
> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> am not even sure the guest kernel would support to unplug that memory.
> 
> In case we want all ACPI tables to be ready for making this memory seen
> as hot-unpluggable we need some Shameer's patches on top of this series.
May be we should push for this way (into 4.0), it's just a several patches
after all or even merge them in your series (I'd guess it would need to be
rebased on top of your latest work)
 
> Also don't DIMM slots already make sense in DT mode. Usually we accept
> to add one feature in DT and then in ACPI. For instance we can benefit
usually it doesn't conflict with each other (at least I'm not aware of it)
but I see a problem with in this case.

> from nvdimm in dt mode right? So, considering an incremental approach I
> would be in favour of keeping the DT nodes.
I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
more versatile.

I consider target application of arm/virt as a board that's used to
run in production generic ACPI capable guest in most use cases and
various DT only guests as secondary ones. It's hard to make
both usecases be happy with defaults (that's probably  one of the
reasons why 'sbsa' board is being added).

So I'd give priority to ACPI based arm/virt versus DT when defaults are
considered.

> Thanks
> 
> Eric
> > 
> > 
> > 
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-27 10:10               ` Igor Mammedov
@ 2019-02-27 10:27                 ` Auger Eric
  2019-02-27 10:41                   ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-27 10:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, dgilbert,
	shameerali.kolothum.thodi, qemu-devel, qemu-arm, eric.auger.pro,
	david

Hi Igor, Shameer,

On 2/27/19 11:10 AM, Igor Mammedov wrote:
> On Tue, 26 Feb 2019 18:53:24 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/26/19 5:56 PM, Igor Mammedov wrote:
>>> On Tue, 26 Feb 2019 14:11:58 +0100
>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>   
>>>> Hi Igor,
>>>>
>>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
>>>>> Hi Igor,
>>>>>
>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>  
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>>>  
>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>>>>>
>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>>>
>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>>>>>
>>>>>>>>> In this series we use this feature and allow the RAM to grow without
>>>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>>>
>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>>>>>> depending on the instantiated memory objects.
>>>>>>>>>
>>>>>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>>>
>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>>>> onwards.
>>>>>>>>>
>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>>>> that job at the moment.
>>>>>>>>>
>>>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>>>
>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>>>
>>>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>>>    the memory map  
>>>>>>>>  
>>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>>>> but well, that won't work on ACPI only guests.  
>>>>>>>
>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>>>>>> added with the DIMM slots.  
>>>>>> Question is how does it get there? Does it come from DT or from firmware
>>>>>> via UEFI interfaces?
>>>>>>  
>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>>>>>> memory early at boot and later put that memory into zone normal and hence
>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>>>>>> of discovery.
>>>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>>>>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>>>>>> and doesn't get locked up.
>>>>>>  
>>>>>>> What else would you expect in the dsdt?  
>>>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>>>>>> describing memory ranges  
>>>>>
>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>>>> DT description?  
>>>>
>>>> After further investigations, I think the pieces you pointed out are
>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>>>> call. So I suggest we separate the concerns: this series brings support
>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>>>> will be added later on by Shameer.  
>>>
>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
>>> if it doesn't conflict with ACPI in some way.  
>>
>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
>> mode? Otherwise you simply remove the DIMM feature, right?
> Something like this so DT won't get in conflict with ACPI.
> Only we don't have a switch for it something like, -machine fdt=on (with default off)
>  
>> I double checked and if you remove the hotpluggable memory DT nodes in
>> ACPI mode:
>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
>> guess you're right, if the DT nodes are available, that memory is
>> considered as not unpluggable by the guest.
>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
>> system.
>>
>> Hotplug/unplug is clearly not supported by this series and any attempt
>> results in "memory hotplug is not supported". Is it really an issue if
>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
>> am not even sure the guest kernel would support to unplug that memory.
>>
>> In case we want all ACPI tables to be ready for making this memory seen
>> as hot-unpluggable we need some Shameer's patches on top of this series.
> May be we should push for this way (into 4.0), it's just a several patches
> after all or even merge them in your series (I'd guess it would need to be
> rebased on top of your latest work)

Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
(without the reduced hw_reduced_acpi flag) in this series and isolate in
a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
called in virt code?

Then would remain the GED/GPIO actual integration.

Thanks

Eric
>  
>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>> to add one feature in DT and then in ACPI. For instance we can benefit
> usually it doesn't conflict with each other (at least I'm not aware of it)
> but I see a problem with in this case.
> 
>> from nvdimm in dt mode right? So, considering an incremental approach I
>> would be in favour of keeping the DT nodes.
> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> more versatile.
> 
> I consider target application of arm/virt as a board that's used to
> run in production generic ACPI capable guest in most use cases and
> various DT only guests as secondary ones. It's hard to make
> both usecases be happy with defaults (that's probably  one of the
> reasons why 'sbsa' board is being added).
> 
> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> considered.
> 
>> Thanks
>>
>> Eric
>>>
>>>
>>>
>>>   
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-27 10:27                 ` Auger Eric
@ 2019-02-27 10:41                   ` Shameerali Kolothum Thodi
  2019-02-27 17:51                     ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-02-27 10:41 UTC (permalink / raw)
  To: Auger Eric, Igor Mammedov
  Cc: peter.maydell, drjones, david, dgilbert, qemu-devel, qemu-arm,
	eric.auger.pro, david, Linuxarm

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 27 February 2019 10:27
> To: Igor Mammedov <imammedo@redhat.com>
> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> dgilbert@redhat.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> david@gibson.dropbear.id.au
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> and PCDIMM/NVDIMM support
> 
> Hi Igor, Shameer,
> 
> On 2/27/19 11:10 AM, Igor Mammedov wrote:
> > On Tue, 26 Feb 2019 18:53:24 +0100
> > Auger Eric <eric.auger@redhat.com> wrote:
> >
> >> Hi Igor,
> >>
> >> On 2/26/19 5:56 PM, Igor Mammedov wrote:
> >>> On Tue, 26 Feb 2019 14:11:58 +0100
> >>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>
> >>>> Hi Igor,
> >>>>
> >>>> On 2/26/19 9:40 AM, Auger Eric wrote:
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:
> >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>
> >>>>>>> Hi Igor,
> >>>>>>>
> >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>>>> support device memory in general, and especially
> PCDIMM/NVDIMM.
> >>>>>>>>>
> >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as
> the
> >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high
> PCIe
> >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded
> to
> >>>>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>>>
> >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.
> The
> >>>>>>>>> max GPA size depends on the host kernel configuration and physical
> CPUs.
> >>>>>>>>>
> >>>>>>>>> In this series we use this feature and allow the RAM to grow
> without
> >>>>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>>>
> >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>>>> maxram_size - ram_size. The device memory is potentially
> hotpluggable
> >>>>>>>>> depending on the instantiated memory objects.
> >>>>>>>>>
> >>>>>>>>> IO regions previously located between 256GB and 1TB are moved
> after
> >>>>>>>>> the RAM. Their offset is dynamically computed, depends on
> ram_size
> >>>>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>>>
> >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory
> map
> >>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>>>> onwards.
> >>>>>>>>>
> >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to
> do
> >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>>>> that job at the moment.
> >>>>>>>>>
> >>>>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>>>
> >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>>>
> >>>>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>>>    the memory map
> >>>>>>>>
> >>>>>>>>> 2) Support of PC-DIMM [10 - 13]
> >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>>>> but well, that won't work on ACPI only guests.
> >>>>>>>
> >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of
> mem
> >>>>>>> added with the DIMM slots.
> >>>>>> Question is how does it get there? Does it come from DT or from
> firmware
> >>>>>> via UEFI interfaces?
> >>>>>>
> >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?
> >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as
> normal
> >>>>>> memory early at boot and later put that memory into zone normal and
> hence
> >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based
> means
> >>>>>> of discovery.
> >>>>>> (That's guest issue but it's easy to workaround it not putting
> hotpluggable
> >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it
> properly)
> >>>>>> That way memory doesn't get (ab)used by firmware or early boot
> kernel stages
> >>>>>> and doesn't get locked up.
> >>>>>>
> >>>>>>> What else would you expect in the dsdt?
> >>>>>> Memory device descriptions, look for code that adds PNP0C80 with
> _CRS
> >>>>>> describing memory ranges
> >>>>>
> >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>>>> DT description?
> >>>>
> >>>> After further investigations, I think the pieces you pointed out are
> >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >>>> call. So I suggest we separate the concerns: this series brings support
> >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >>>> will be added later on by Shameer.
> >>>
> >>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> >>> if it doesn't conflict with ACPI in some way.
> >>
> >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> >> mode? Otherwise you simply remove the DIMM feature, right?
> > Something like this so DT won't get in conflict with ACPI.
> > Only we don't have a switch for it something like, -machine fdt=on (with
> default off)
> >
> >> I double checked and if you remove the hotpluggable memory DT nodes in
> >> ACPI mode:
> >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> >> guess you're right, if the DT nodes are available, that memory is
> >> considered as not unpluggable by the guest.
> >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> >> system.
> >>
> >> Hotplug/unplug is clearly not supported by this series and any attempt
> >> results in "memory hotplug is not supported". Is it really an issue if
> >> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> >> am not even sure the guest kernel would support to unplug that memory.
> >>
> >> In case we want all ACPI tables to be ready for making this memory seen
> >> as hot-unpluggable we need some Shameer's patches on top of this series.
> > May be we should push for this way (into 4.0), it's just a several patches
> > after all or even merge them in your series (I'd guess it would need to be
> > rebased on top of your latest work)
> 
> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> (without the reduced hw_reduced_acpi flag) in this series and isolate in
> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> called in virt code?

Sure, that’s fine with me. So what would you use for the event_handler_method in
build_memory_hotplug_aml()? GPO0 device?

Thanks,
Shameer

> Then would remain the GED/GPIO actual integration.
> 
> Thanks
> 
> Eric
> >
> >> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >> to add one feature in DT and then in ACPI. For instance we can benefit
> > usually it doesn't conflict with each other (at least I'm not aware of it)
> > but I see a problem with in this case.
> >
> >> from nvdimm in dt mode right? So, considering an incremental approach I
> >> would be in favour of keeping the DT nodes.
> > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> > more versatile.
> >
> > I consider target application of arm/virt as a board that's used to
> > run in production generic ACPI capable guest in most use cases and
> > various DT only guests as secondary ones. It's hard to make
> > both usecases be happy with defaults (that's probably  one of the
> > reasons why 'sbsa' board is being added).
> >
> > So I'd give priority to ACPI based arm/virt versus DT when defaults are
> > considered.
> >
> >> Thanks
> >>
> >> Eric
> >>>
> >>>
> >>>
> >>>
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-27 10:41                   ` Shameerali Kolothum Thodi
@ 2019-02-27 17:51                     ` Igor Mammedov
  2019-02-28  7:48                       ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-27 17:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Auger Eric, peter.maydell, drjones, david, dgilbert, qemu-devel,
	qemu-arm, eric.auger.pro, david, Linuxarm

On Wed, 27 Feb 2019 10:41:45 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> Hi Eric,
> 
> > -----Original Message-----
> > From: Auger Eric [mailto:eric.auger@redhat.com]
> > Sent: 27 February 2019 10:27
> > To: Igor Mammedov <imammedo@redhat.com>
> > Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> > dgilbert@redhat.com; Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> > qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> > david@gibson.dropbear.id.au
> > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> > and PCDIMM/NVDIMM support
> > 
> > Hi Igor, Shameer,
> > 
> > On 2/27/19 11:10 AM, Igor Mammedov wrote:  
> > > On Tue, 26 Feb 2019 18:53:24 +0100
> > > Auger Eric <eric.auger@redhat.com> wrote:
> > >  
> > >> Hi Igor,
> > >>
> > >> On 2/26/19 5:56 PM, Igor Mammedov wrote:  
> > >>> On Tue, 26 Feb 2019 14:11:58 +0100
> > >>> Auger Eric <eric.auger@redhat.com> wrote:
> > >>>  
> > >>>> Hi Igor,
> > >>>>
> > >>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
> > >>>>> Hi Igor,
> > >>>>>
> > >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
> > >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> > >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> > >>>>>>  
> > >>>>>>> Hi Igor,
> > >>>>>>>
> > >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
> > >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> > >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> > >>>>>>>>  
> > >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> > >>>>>>>>> support device memory in general, and especially  
> > PCDIMM/NVDIMM.  
> > >>>>>>>>>
> > >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> > >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as  
> > the  
> > >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high  
> > PCIe  
> > >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded  
> > to  
> > >>>>>>>>> the max IPA capacity KVM was able to manage.
> > >>>>>>>>>
> > >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> > >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.  
> > The  
> > >>>>>>>>> max GPA size depends on the host kernel configuration and physical  
> > CPUs.  
> > >>>>>>>>>
> > >>>>>>>>> In this series we use this feature and allow the RAM to grow  
> > without  
> > >>>>>>>>> any other limit than the one put by the host kernel.
> > >>>>>>>>>
> > >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> > >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> > >>>>>>>>> maxram_size - ram_size. The device memory is potentially  
> > hotpluggable  
> > >>>>>>>>> depending on the instantiated memory objects.
> > >>>>>>>>>
> > >>>>>>>>> IO regions previously located between 256GB and 1TB are moved  
> > after  
> > >>>>>>>>> the RAM. Their offset is dynamically computed, depends on  
> > ram_size  
> > >>>>>>>>> and maxram_size. Size alignment is enforced.
> > >>>>>>>>>
> > >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory  
> > map  
> > >>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> > >>>>>>>>> onwards.
> > >>>>>>>>>
> > >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to  
> > do  
> > >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> > >>>>>>>>> that job at the moment.
> > >>>>>>>>>
> > >>>>>>>>> Device memory being put just after the initial RAM, it is possible
> > >>>>>>>>> to get access to this feature while keeping a 1TB address map.
> > >>>>>>>>>
> > >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> > >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> > >>>>>>>>>
> > >>>>>>>>> Functionally, the series is split into 3 parts:
> > >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> > >>>>>>>>>    the memory map  
> > >>>>>>>>  
> > >>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
> > >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> > >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> > >>>>>>>> visible to the guest. It might be that DT is masking problem
> > >>>>>>>> but well, that won't work on ACPI only guests.  
> > >>>>>>>
> > >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of  
> > mem  
> > >>>>>>> added with the DIMM slots.  
> > >>>>>> Question is how does it get there? Does it come from DT or from  
> > firmware  
> > >>>>>> via UEFI interfaces?
> > >>>>>>  
> > >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
> > >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> > >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> > >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as  
> > normal  
> > >>>>>> memory early at boot and later put that memory into zone normal and  
> > hence  
> > >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based  
> > means  
> > >>>>>> of discovery.
> > >>>>>> (That's guest issue but it's easy to workaround it not putting  
> > hotpluggable  
> > >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it  
> > properly)  
> > >>>>>> That way memory doesn't get (ab)used by firmware or early boot  
> > kernel stages  
> > >>>>>> and doesn't get locked up.
> > >>>>>>  
> > >>>>>>> What else would you expect in the dsdt?  
> > >>>>>> Memory device descriptions, look for code that adds PNP0C80 with  
> > _CRS  
> > >>>>>> describing memory ranges  
> > >>>>>
> > >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> > >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> > >>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> > >>>>> DT description?  
> > >>>>
> > >>>> After further investigations, I think the pieces you pointed out are
> > >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> > >>>> call. So I suggest we separate the concerns: this series brings support
> > >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> > >>>> will be added later on by Shameer.  
> > >>>
> > >>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> > >>> if it doesn't conflict with ACPI in some way.  
> > >>
> > >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> > >> mode? Otherwise you simply remove the DIMM feature, right?  
> > > Something like this so DT won't get in conflict with ACPI.
> > > Only we don't have a switch for it something like, -machine fdt=on (with  
> > default off)  
> > >  
> > >> I double checked and if you remove the hotpluggable memory DT nodes in
> > >> ACPI mode:
> > >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> > >> guess you're right, if the DT nodes are available, that memory is
> > >> considered as not unpluggable by the guest.
> > >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> > >> system.
> > >>
> > >> Hotplug/unplug is clearly not supported by this series and any attempt
> > >> results in "memory hotplug is not supported". Is it really an issue if
> > >> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> > >> am not even sure the guest kernel would support to unplug that memory.
> > >>
> > >> In case we want all ACPI tables to be ready for making this memory seen
> > >> as hot-unpluggable we need some Shameer's patches on top of this series.  
> > > May be we should push for this way (into 4.0), it's just a several patches
> > > after all or even merge them in your series (I'd guess it would need to be
> > > rebased on top of your latest work)  
> > 
> > Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> > (without the reduced hw_reduced_acpi flag) in this series and isolate in
> > a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> > called in virt code?  
probably we can do it as transitional step as we need working mmio interface
in place for build_memory_hotplug_aml() to work, provided it won't create
migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).

What about dummy initial GED (empty device), that manages mmio region only
and then later it will be filled with remaining logic IRQ. In this case mmio region
and vmstate won't change (maybe) so it won't cause ABI or migration issues.


> Sure, that’s fine with me. So what would you use for the event_handler_method in
> build_memory_hotplug_aml()? GPO0 device?

a method name not defined in spec, so it won't be called might do.


> 
> Thanks,
> Shameer
> 
> > Then would remain the GED/GPIO actual integration.
> > 
> > Thanks
> > 
> > Eric  
> > >  
> > >> Also don't DIMM slots already make sense in DT mode. Usually we accept
> > >> to add one feature in DT and then in ACPI. For instance we can benefit  
> > > usually it doesn't conflict with each other (at least I'm not aware of it)
> > > but I see a problem with in this case.
> > >  
> > >> from nvdimm in dt mode right? So, considering an incremental approach I
> > >> would be in favour of keeping the DT nodes.  
> > > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> > > more versatile.
> > >
> > > I consider target application of arm/virt as a board that's used to
> > > run in production generic ACPI capable guest in most use cases and
> > > various DT only guests as secondary ones. It's hard to make
> > > both usecases be happy with defaults (that's probably  one of the
> > > reasons why 'sbsa' board is being added).
> > >
> > > So I'd give priority to ACPI based arm/virt versus DT when defaults are
> > > considered.
> > >  
> > >> Thanks
> > >>
> > >> Eric  
> > >>>
> > >>>
> > >>>
> > >>>  
> > >  

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-27 17:51                     ` Igor Mammedov
@ 2019-02-28  7:48                       ` Auger Eric
  2019-02-28 14:05                         ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-02-28  7:48 UTC (permalink / raw)
  To: Igor Mammedov, Shameerali Kolothum Thodi
  Cc: peter.maydell, drjones, david, Linuxarm, qemu-devel, dgilbert,
	qemu-arm, david, eric.auger.pro

Hi Igor, Shameer,

On 2/27/19 6:51 PM, Igor Mammedov wrote:
> On Wed, 27 Feb 2019 10:41:45 +0000
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> 
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>> Sent: 27 February 2019 10:27
>>> To: Igor Mammedov <imammedo@redhat.com>
>>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
>>> dgilbert@redhat.com; Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
>>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
>>> david@gibson.dropbear.id.au
>>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
>>> and PCDIMM/NVDIMM support
>>>
>>> Hi Igor, Shameer,
>>>
>>> On 2/27/19 11:10 AM, Igor Mammedov wrote:  
>>>> On Tue, 26 Feb 2019 18:53:24 +0100
>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>  
>>>>> Hi Igor,
>>>>>
>>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote:  
>>>>>> On Tue, 26 Feb 2019 14:11:58 +0100
>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>  
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
>>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>>>>  
>>>>>>>>>> Hi Igor,
>>>>>>>>>>
>>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
>>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>>>>>>  
>>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>>>>>>> support device memory in general, and especially  
>>> PCDIMM/NVDIMM.  
>>>>>>>>>>>>
>>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as  
>>> the  
>>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high  
>>> PCIe  
>>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded  
>>> to  
>>>>>>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>>>>>>
>>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.  
>>> The  
>>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical  
>>> CPUs.  
>>>>>>>>>>>>
>>>>>>>>>>>> In this series we use this feature and allow the RAM to grow  
>>> without  
>>>>>>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially  
>>> hotpluggable  
>>>>>>>>>>>> depending on the instantiated memory objects.
>>>>>>>>>>>>
>>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved  
>>> after  
>>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on  
>>> ram_size  
>>>>>>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>>>>>>
>>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory  
>>> map  
>>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>>>>>>> onwards.
>>>>>>>>>>>>
>>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to  
>>> do  
>>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>>>>>>> that job at the moment.
>>>>>>>>>>>>
>>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>>>>>>
>>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>>>>>>
>>>>>>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>>>>>>    the memory map  
>>>>>>>>>>>  
>>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
>>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>>>>>>> but well, that won't work on ACPI only guests.  
>>>>>>>>>>
>>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of  
>>> mem  
>>>>>>>>>> added with the DIMM slots.  
>>>>>>>>> Question is how does it get there? Does it come from DT or from  
>>> firmware  
>>>>>>>>> via UEFI interfaces?
>>>>>>>>>  
>>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
>>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as  
>>> normal  
>>>>>>>>> memory early at boot and later put that memory into zone normal and  
>>> hence  
>>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based  
>>> means  
>>>>>>>>> of discovery.
>>>>>>>>> (That's guest issue but it's easy to workaround it not putting  
>>> hotpluggable  
>>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it  
>>> properly)  
>>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot  
>>> kernel stages  
>>>>>>>>> and doesn't get locked up.
>>>>>>>>>  
>>>>>>>>>> What else would you expect in the dsdt?  
>>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with  
>>> _CRS  
>>>>>>>>> describing memory ranges  
>>>>>>>>
>>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>>>>>>> DT description?  
>>>>>>>
>>>>>>> After further investigations, I think the pieces you pointed out are
>>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>>>>>>> call. So I suggest we separate the concerns: this series brings support
>>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>>>>>>> will be added later on by Shameer.  
>>>>>>
>>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
>>>>>> if it doesn't conflict with ACPI in some way.  
>>>>>
>>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
>>>>> mode? Otherwise you simply remove the DIMM feature, right?  
>>>> Something like this so DT won't get in conflict with ACPI.
>>>> Only we don't have a switch for it something like, -machine fdt=on (with  
>>> default off)  
>>>>  
>>>>> I double checked and if you remove the hotpluggable memory DT nodes in
>>>>> ACPI mode:
>>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
>>>>> guess you're right, if the DT nodes are available, that memory is
>>>>> considered as not unpluggable by the guest.
>>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
>>>>> system.
>>>>>
>>>>> Hotplug/unplug is clearly not supported by this series and any attempt
>>>>> results in "memory hotplug is not supported". Is it really an issue if
>>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
>>>>> am not even sure the guest kernel would support to unplug that memory.
>>>>>
>>>>> In case we want all ACPI tables to be ready for making this memory seen
>>>>> as hot-unpluggable we need some Shameer's patches on top of this series.  
>>>> May be we should push for this way (into 4.0), it's just a several patches
>>>> after all or even merge them in your series (I'd guess it would need to be
>>>> rebased on top of your latest work)  
>>>
>>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
>>> (without the reduced hw_reduced_acpi flag) in this series and isolate in
>>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
>>> called in virt code?  
> probably we can do it as transitional step as we need working mmio interface
> in place for build_memory_hotplug_aml() to work, provided it won't create
> migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).
> 
> What about dummy initial GED (empty device), that manages mmio region only
> and then later it will be filled with remaining logic IRQ. In this case mmio region
> and vmstate won't change (maybe) so it won't cause ABI or migration issues.


> 
> 
>> Sure, that’s fine with me. So what would you use for the event_handler_method in
>> build_memory_hotplug_aml()? GPO0 device?
> 
> a method name not defined in spec, so it won't be called might do.

At this point the event_handler_method, ie. \_SB.GPO0._E02, is not
supposed to be called, right? So effectivily we should be able to use
any other method name (unlinked to any GPIO/GED). I guess at this stage
only the PNP0C80 definition blocks + methods are used.

What still remains fuzzy for me is in case of cold plug the mmio hotplug
control region part only is read (despite the slot selection of course)
and returns 0 for addr/size and also flags meaning the slot is not
enabled. So despite the slots are advertised as hotpluggable/enabled in
the SRAT; I am not sure for the OS it actually makes any difference
whether the DSDT definition blocks are described or not.

To be honest I am afraid this is too late to add those additional
features for 4.0 now. This is going to jeopardize the first preliminary
part which is the introduction of the new memory map, allowing the
expansion of the initial RAM and paving the way for device memory
introduction. So I think I am going to resend the first 10 patches in a
standalone series. And we can iterate on the PCDIMM/NVDIMM parts
independently.

Thanks

Eric
> 
> 
>>
>> Thanks,
>> Shameer
>>
>>> Then would remain the GED/GPIO actual integration.
>>>
>>> Thanks
>>>
>>> Eric  
>>>>  
>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>> to add one feature in DT and then in ACPI. For instance we can benefit  
>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>> but I see a problem with in this case.
>>>>  
>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>> would be in favour of keeping the DT nodes.  
>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>> more versatile.
>>>>
>>>> I consider target application of arm/virt as a board that's used to
>>>> run in production generic ACPI capable guest in most use cases and
>>>> various DT only guests as secondary ones. It's hard to make
>>>> both usecases be happy with defaults (that's probably  one of the
>>>> reasons why 'sbsa' board is being added).
>>>>
>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>> considered.
>>>>  
>>>>> Thanks
>>>>>
>>>>> Eric  
>>>>>>
>>>>>>
>>>>>>
>>>>>>  
>>>>  
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-28  7:48                       ` Auger Eric
@ 2019-02-28 14:05                         ` Igor Mammedov
  2019-03-01 14:18                           ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-02-28 14:05 UTC (permalink / raw)
  To: Auger Eric
  Cc: Shameerali Kolothum Thodi, peter.maydell, drjones, david,
	Linuxarm, qemu-devel, dgilbert, qemu-arm, david, eric.auger.pro

On Thu, 28 Feb 2019 08:48:18 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor, Shameer,
> 
> On 2/27/19 6:51 PM, Igor Mammedov wrote:
> > On Wed, 27 Feb 2019 10:41:45 +0000
> > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> >   
> >> Hi Eric,
> >>  
> >>> -----Original Message-----
> >>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>> Sent: 27 February 2019 10:27
> >>> To: Igor Mammedov <imammedo@redhat.com>
> >>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> >>> dgilbert@redhat.com; Shameerali Kolothum Thodi
> >>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> >>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> >>> david@gibson.dropbear.id.au
> >>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> >>> and PCDIMM/NVDIMM support
> >>>
> >>> Hi Igor, Shameer,
> >>>
> >>> On 2/27/19 11:10 AM, Igor Mammedov wrote:    
> >>>> On Tue, 26 Feb 2019 18:53:24 +0100
> >>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>    
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote:    
> >>>>>> On Tue, 26 Feb 2019 14:11:58 +0100
> >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>    
> >>>>>>> Hi Igor,
> >>>>>>>
> >>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote:    
> >>>>>>>> Hi Igor,
> >>>>>>>>
> >>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:    
> >>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>>>>    
> >>>>>>>>>> Hi Igor,
> >>>>>>>>>>
> >>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:    
> >>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>>>>>>    
> >>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>>>>>>> support device memory in general, and especially    
> >>> PCDIMM/NVDIMM.    
> >>>>>>>>>>>>
> >>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as    
> >>> the    
> >>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high    
> >>> PCIe    
> >>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded    
> >>> to    
> >>>>>>>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.    
> >>> The    
> >>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical    
> >>> CPUs.    
> >>>>>>>>>>>>
> >>>>>>>>>>>> In this series we use this feature and allow the RAM to grow    
> >>> without    
> >>>>>>>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially    
> >>> hotpluggable    
> >>>>>>>>>>>> depending on the instantiated memory objects.
> >>>>>>>>>>>>
> >>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved    
> >>> after    
> >>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on    
> >>> ram_size    
> >>>>>>>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory    
> >>> map    
> >>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>>>>>>> onwards.
> >>>>>>>>>>>>
> >>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to    
> >>> do    
> >>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>>>>>>> that job at the moment.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>>>>>>    the memory map    
> >>>>>>>>>>>    
> >>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13]    
> >>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>>>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>>>>>>> but well, that won't work on ACPI only guests.    
> >>>>>>>>>>
> >>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of    
> >>> mem    
> >>>>>>>>>> added with the DIMM slots.    
> >>>>>>>>> Question is how does it get there? Does it come from DT or from    
> >>> firmware    
> >>>>>>>>> via UEFI interfaces?
> >>>>>>>>>    
> >>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?    
> >>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as    
> >>> normal    
> >>>>>>>>> memory early at boot and later put that memory into zone normal and    
> >>> hence    
> >>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based    
> >>> means    
> >>>>>>>>> of discovery.
> >>>>>>>>> (That's guest issue but it's easy to workaround it not putting    
> >>> hotpluggable    
> >>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it    
> >>> properly)    
> >>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot    
> >>> kernel stages    
> >>>>>>>>> and doesn't get locked up.
> >>>>>>>>>    
> >>>>>>>>>> What else would you expect in the dsdt?    
> >>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with    
> >>> _CRS    
> >>>>>>>>> describing memory ranges    
> >>>>>>>>
> >>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>>>>>>> DT description?    
> >>>>>>>
> >>>>>>> After further investigations, I think the pieces you pointed out are
> >>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >>>>>>> call. So I suggest we separate the concerns: this series brings support
> >>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >>>>>>> will be added later on by Shameer.    
> >>>>>>
> >>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> >>>>>> if it doesn't conflict with ACPI in some way.    
> >>>>>
> >>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> >>>>> mode? Otherwise you simply remove the DIMM feature, right?    
> >>>> Something like this so DT won't get in conflict with ACPI.
> >>>> Only we don't have a switch for it something like, -machine fdt=on (with    
> >>> default off)    
> >>>>    
> >>>>> I double checked and if you remove the hotpluggable memory DT nodes in
> >>>>> ACPI mode:
> >>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> >>>>> guess you're right, if the DT nodes are available, that memory is
> >>>>> considered as not unpluggable by the guest.
> >>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> >>>>> system.
> >>>>>
> >>>>> Hotplug/unplug is clearly not supported by this series and any attempt
> >>>>> results in "memory hotplug is not supported". Is it really an issue if
> >>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> >>>>> am not even sure the guest kernel would support to unplug that memory.
> >>>>>
> >>>>> In case we want all ACPI tables to be ready for making this memory seen
> >>>>> as hot-unpluggable we need some Shameer's patches on top of this series.    
> >>>> May be we should push for this way (into 4.0), it's just a several patches
> >>>> after all or even merge them in your series (I'd guess it would need to be
> >>>> rebased on top of your latest work)    
> >>>
> >>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> >>> (without the reduced hw_reduced_acpi flag) in this series and isolate in
> >>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> >>> called in virt code?    
> > probably we can do it as transitional step as we need working mmio interface
> > in place for build_memory_hotplug_aml() to work, provided it won't create
> > migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).
> > 
> > What about dummy initial GED (empty device), that manages mmio region only
> > and then later it will be filled with remaining logic IRQ. In this case mmio region
> > and vmstate won't change (maybe) so it won't cause ABI or migration issues.  
> 
> 
> > 
> >   
> >> Sure, that’s fine with me. So what would you use for the event_handler_method in
> >> build_memory_hotplug_aml()? GPO0 device?  
> > 
> > a method name not defined in spec, so it won't be called might do.  
> 
> At this point the event_handler_method, ie. \_SB.GPO0._E02, is not
> supposed to be called, right? So effectivily we should be able to use
> any other method name (unlinked to any GPIO/GED). I guess at this stage
> only the PNP0C80 definition blocks + methods are used.
pretty much yes.
 
> What still remains fuzzy for me is in case of cold plug the mmio hotplug
> control region part only is read (despite the slot selection of course)
> and returns 0 for addr/size and also flags meaning the slot is not
> enabled.
If you mean guest reads 0s than it looks broken, could you show
trace log with mhp_* tracepoints enabled during a dimm hotplug.

> So despite the slots are advertised as hotpluggable/enabled in
> the SRAT; I am not sure for the OS it actually makes any difference
> whether the DSDT definition blocks are described or not.
SRAT isn't used fro informing guests about amount of present RAM,
it holds affinity information for present and possible RAM

> To be honest I am afraid this is too late to add those additional
> features for 4.0 now. This is going to jeopardize the first preliminary
> part which is the introduction of the new memory map, allowing the
> expansion of the initial RAM and paving the way for device memory
> introduction. So I think I am going to resend the first 10 patches in a
> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
> independently.
sounds good to me, I'll try to review 1-10 today 
 
> Thanks
> 
> Eric
> > 
> >   
> >>
> >> Thanks,
> >> Shameer
> >>  
> >>> Then would remain the GED/GPIO actual integration.
> >>>
> >>> Thanks
> >>>
> >>> Eric    
> >>>>    
> >>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >>>>> to add one feature in DT and then in ACPI. For instance we can benefit    
> >>>> usually it doesn't conflict with each other (at least I'm not aware of it)
> >>>> but I see a problem with in this case.
> >>>>    
> >>>>> from nvdimm in dt mode right? So, considering an incremental approach I
> >>>>> would be in favour of keeping the DT nodes.    
> >>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> >>>> more versatile.
> >>>>
> >>>> I consider target application of arm/virt as a board that's used to
> >>>> run in production generic ACPI capable guest in most use cases and
> >>>> various DT only guests as secondary ones. It's hard to make
> >>>> both usecases be happy with defaults (that's probably  one of the
> >>>> reasons why 'sbsa' board is being added).
> >>>>
> >>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> >>>> considered.
> >>>>    
> >>>>> Thanks
> >>>>>
> >>>>> Eric    
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>    
> > 
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-02-28 14:05                         ` Igor Mammedov
@ 2019-03-01 14:18                           ` Auger Eric
  2019-03-01 16:33                             ` Igor Mammedov
  0 siblings, 1 reply; 63+ messages in thread
From: Auger Eric @ 2019-03-01 14:18 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Shameerali Kolothum Thodi, peter.maydell, drjones, david,
	Linuxarm, qemu-devel, dgilbert, qemu-arm, david, eric.auger.pro

Hi Igor,

[..]

>  
>> What still remains fuzzy for me is in case of cold plug the mmio hotplug
>> control region part only is read (despite the slot selection of course)
>> and returns 0 for addr/size and also flags meaning the slot is not
>> enabled.
> If you mean guest reads 0s than it looks broken, could you show
> trace log with mhp_* tracepoints enabled during a dimm hotplug.

Please find the traces + cmd line on x86


/qemu-system-x86_64 -M
q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
tcp:localhost:4444,server -trace
events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
-object
memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
-device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
-device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
-drive
file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
-device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
-netdev
tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
-net none -d guest_errors

******************************************************************
ioctl(TUNSETIFF): Device or resource busy
qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
waiting for connection on: disconnected:tcp:::1:4444,server
qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0

That's the only traces I get until I get the login prompt.

Thanks

Eric


> 
>> So despite the slots are advertised as hotpluggable/enabled in
>> the SRAT; I am not sure for the OS it actually makes any difference
>> whether the DSDT definition blocks are described or not.
> SRAT isn't used fro informing guests about amount of present RAM,
> it holds affinity information for present and possible RAM
> 
>> To be honest I am afraid this is too late to add those additional
>> features for 4.0 now. This is going to jeopardize the first preliminary
>> part which is the introduction of the new memory map, allowing the
>> expansion of the initial RAM and paving the way for device memory
>> introduction. So I think I am going to resend the first 10 patches in a
>> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
>> independently.
> sounds good to me, I'll try to review 1-10 today 
>  
>> Thanks
>>
>> Eric
>>>
>>>   
>>>>
>>>> Thanks,
>>>> Shameer
>>>>  
>>>>> Then would remain the GED/GPIO actual integration.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eric    
>>>>>>    
>>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit    
>>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>>>> but I see a problem with in this case.
>>>>>>    
>>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>>>> would be in favour of keeping the DT nodes.    
>>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>>>> more versatile.
>>>>>>
>>>>>> I consider target application of arm/virt as a board that's used to
>>>>>> run in production generic ACPI capable guest in most use cases and
>>>>>> various DT only guests as secondary ones. It's hard to make
>>>>>> both usecases be happy with defaults (that's probably  one of the
>>>>>> reasons why 'sbsa' board is being added).
>>>>>>
>>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>>>> considered.
>>>>>>    
>>>>>>> Thanks
>>>>>>>
>>>>>>> Eric    
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    
>>>>>>    
>>>
>>>   
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-03-01 14:18                           ` Auger Eric
@ 2019-03-01 16:33                             ` Igor Mammedov
  2019-03-01 17:52                               ` Auger Eric
  0 siblings, 1 reply; 63+ messages in thread
From: Igor Mammedov @ 2019-03-01 16:33 UTC (permalink / raw)
  To: Auger Eric
  Cc: Shameerali Kolothum Thodi, peter.maydell, drjones, david,
	Linuxarm, qemu-devel, dgilbert, qemu-arm, david, eric.auger.pro

On Fri, 1 Mar 2019 15:18:14 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> [..]
> 
> >    
> >> What still remains fuzzy for me is in case of cold plug the mmio hotplug
> >> control region part only is read (despite the slot selection of course)
> >> and returns 0 for addr/size and also flags meaning the slot is not
> >> enabled.  
> > If you mean guest reads 0s than it looks broken, could you show
> > trace log with mhp_* tracepoints enabled during a dimm hotplug.  
> 
> Please find the traces + cmd line on x86
I've thought that you where talking about pc-dimms, so here it goes:

nvdimm is not part of memory hotplug interface, they have their own
mmio region through which they let guest read completely rebuilt
NFIT table and their own GPE._E04 event handler.

you see 0's in trace because guest enumerates all PNP0C80 in DSDT
to check if for any present pc-dimm for which mem hotplug interface
reports 0s since there is none.

PS:
In ACPI spec there is an example of NVDIMMs where they also have
associated memory device (PNP0C80) and that it's somehow related
to nvdimm hotplug, but it's not described in sufficient detail
so I'm honestly do not know what to do with it. Hence QEMU doesn't
have PNP0C80 counterpart for nvdimm. To me it looks more like
a mistake in the spec, but that's a topic for another discussion.


> /qemu-system-x86_64 -M
> q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
> Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
> 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
> tcp:localhost:4444,server -trace
> events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
> unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
> mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
> PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
> -object
> memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
> -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
> memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
> -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
> virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
> -drive
> file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
> -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
> -netdev
> tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
> -net none -d guest_errors
> 
> ******************************************************************
> ioctl(TUNSETIFF): Device or resource busy
> qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
> waiting for connection on: disconnected:tcp:::1:4444,server
> qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
> qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
> 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
> 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
> 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
> 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
> 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
> 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
> 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
> 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
> 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
> 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
> 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
> 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
> 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
> 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
> 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
> 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
> 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 
> That's the only traces I get until I get the login prompt.
> 
> Thanks
> 
> Eric
> 
> 
> >   
> >> So despite the slots are advertised as hotpluggable/enabled in
> >> the SRAT; I am not sure for the OS it actually makes any difference
> >> whether the DSDT definition blocks are described or not.  
> > SRAT isn't used fro informing guests about amount of present RAM,
> > it holds affinity information for present and possible RAM
> >   
> >> To be honest I am afraid this is too late to add those additional
> >> features for 4.0 now. This is going to jeopardize the first preliminary
> >> part which is the introduction of the new memory map, allowing the
> >> expansion of the initial RAM and paving the way for device memory
> >> introduction. So I think I am going to resend the first 10 patches in a
> >> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
> >> independently.  
> > sounds good to me, I'll try to review 1-10 today 
> >    
> >> Thanks
> >>
> >> Eric  
> >>>
> >>>     
> >>>>
> >>>> Thanks,
> >>>> Shameer
> >>>>    
> >>>>> Then would remain the GED/GPIO actual integration.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Eric      
> >>>>>>      
> >>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit      
> >>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
> >>>>>> but I see a problem with in this case.
> >>>>>>      
> >>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
> >>>>>>> would be in favour of keeping the DT nodes.      
> >>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> >>>>>> more versatile.
> >>>>>>
> >>>>>> I consider target application of arm/virt as a board that's used to
> >>>>>> run in production generic ACPI capable guest in most use cases and
> >>>>>> various DT only guests as secondary ones. It's hard to make
> >>>>>> both usecases be happy with defaults (that's probably  one of the
> >>>>>> reasons why 'sbsa' board is being added).
> >>>>>>
> >>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> >>>>>> considered.
> >>>>>>      
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> Eric      
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>      
> >>>>>>      
> >>>
> >>>     
> >   

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
  2019-03-01 16:33                             ` Igor Mammedov
@ 2019-03-01 17:52                               ` Auger Eric
  0 siblings, 0 replies; 63+ messages in thread
From: Auger Eric @ 2019-03-01 17:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, david, qemu-devel,
	Shameerali Kolothum Thodi, Linuxarm, qemu-arm, eric.auger.pro,
	dgilbert, david

Hi Igor,

On 3/1/19 5:33 PM, Igor Mammedov wrote:
> On Fri, 1 Mar 2019 15:18:14 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> [..]
>>
>>>    
>>>> What still remains fuzzy for me is in case of cold plug the mmio hotplug
>>>> control region part only is read (despite the slot selection of course)
>>>> and returns 0 for addr/size and also flags meaning the slot is not
>>>> enabled.  
>>> If you mean guest reads 0s than it looks broken, could you show
>>> trace log with mhp_* tracepoints enabled during a dimm hotplug.  
>>
>> Please find the traces + cmd line on x86
> I've thought that you where talking about pc-dimms, so here it goes:
> 
> nvdimm is not part of memory hotplug interface, they have their own
> mmio region through which they let guest read completely rebuilt
> NFIT table and their own GPE._E04 event handler.
> 
> you see 0's in trace because guest enumerates all PNP0C80 in DSDT
> to check if for any present pc-dimm for which mem hotplug interface
> reports 0s since there is none.
> 
> PS:
> In ACPI spec there is an example of NVDIMMs where they also have
> associated memory device (PNP0C80) and that it's somehow related
> to nvdimm hotplug, but it's not described in sufficient detail
> so I'm honestly do not know what to do with it. Hence QEMU doesn't
> have PNP0C80 counterpart for nvdimm. To me it looks more like
> a mistake in the spec, but that's a topic for another discussion.

Oh my bad. Indeed with PCDIMM I can see the actual addresses being read.

Thank you for the explanation!

Have a nice WE.

Eric

> 
> 
>> /qemu-system-x86_64 -M
>> q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
>> Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
>> 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
>> tcp:localhost:4444,server -trace
>> events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
>> unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
>> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
>> mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
>> PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
>> -object
>> memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
>> -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
>> memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
>> -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
>> virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
>> -drive
>> file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
>> -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
>> -netdev
>> tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
>> -net none -d guest_errors
>>
>> ******************************************************************
>> ioctl(TUNSETIFF): Device or resource busy
>> qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
>> waiting for connection on: disconnected:tcp:::1:4444,server
>> qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
>> qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
>> 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
>> 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
>> 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
>> 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
>> 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
>> 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
>> 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
>> 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
>> 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
>> 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
>> 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
>> 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
>> 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
>> 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
>> 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
>> 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
>> 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
>> 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
>> 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0
>>
>> That's the only traces I get until I get the login prompt.
>>
>> Thanks
>>
>> Eric
>>
>>
>>>   
>>>> So despite the slots are advertised as hotpluggable/enabled in
>>>> the SRAT; I am not sure for the OS it actually makes any difference
>>>> whether the DSDT definition blocks are described or not.  
>>> SRAT isn't used fro informing guests about amount of present RAM,
>>> it holds affinity information for present and possible RAM
>>>   
>>>> To be honest I am afraid this is too late to add those additional
>>>> features for 4.0 now. This is going to jeopardize the first preliminary
>>>> part which is the introduction of the new memory map, allowing the
>>>> expansion of the initial RAM and paving the way for device memory
>>>> introduction. So I think I am going to resend the first 10 patches in a
>>>> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
>>>> independently.  
>>> sounds good to me, I'll try to review 1-10 today 
>>>    
>>>> Thanks
>>>>
>>>> Eric  
>>>>>
>>>>>     
>>>>>>
>>>>>> Thanks,
>>>>>> Shameer
>>>>>>    
>>>>>>> Then would remain the GED/GPIO actual integration.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Eric      
>>>>>>>>      
>>>>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit      
>>>>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>>>>>> but I see a problem with in this case.
>>>>>>>>      
>>>>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>>>>>> would be in favour of keeping the DT nodes.      
>>>>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>>>>>> more versatile.
>>>>>>>>
>>>>>>>> I consider target application of arm/virt as a board that's used to
>>>>>>>> run in production generic ACPI capable guest in most use cases and
>>>>>>>> various DT only guests as secondary ones. It's hard to make
>>>>>>>> both usecases be happy with defaults (that's probably  one of the
>>>>>>>> reasons why 'sbsa' board is being added).
>>>>>>>>
>>>>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>>>>>> considered.
>>>>>>>>      
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Eric      
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      
>>>>>>>>      
>>>>>
>>>>>     
>>>   
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2019-03-01 17:52 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 22:39 [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Eric Auger
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper Eric Auger
2019-02-21 14:58   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions Eric Auger
2019-02-21 15:05   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description Eric Auger
2019-02-21 16:19   ` Igor Mammedov
2019-02-21 17:21     ` Auger Eric
2019-02-22 10:15       ` Igor Mammedov
2019-02-22 14:28         ` Auger Eric
2019-02-22 14:51           ` Igor Mammedov
2019-02-22  7:34   ` Heyi Guo
2019-02-22  8:08     ` Auger Eric
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback Eric Auger
2019-02-22 10:18   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 05/17] kvm: add kvm_arm_get_max_vm_ipa_size Eric Auger
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier Eric Auger
2019-02-22 10:40   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements Eric Auger
2019-02-22 12:57   ` Igor Mammedov
2019-02-22 14:06     ` Auger Eric
2019-02-22 14:23       ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine Eric Auger
2019-02-22 12:45   ` Igor Mammedov
2019-02-22 14:01     ` Auger Eric
2019-02-22 14:39       ` Igor Mammedov
2019-02-22 14:53         ` Auger Eric
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 09/17] hw/arm/virt: Bump the 255GB initial RAM limit Eric Auger
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework Eric Auger
2019-02-22 13:25   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT Eric Auger
2019-02-22 13:30   ` Igor Mammedov
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 12/17] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT Eric Auger
2019-02-20 22:39 ` [Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory Eric Auger
2019-02-22 13:48   ` Igor Mammedov
2019-02-22 14:15     ` Auger Eric
2019-02-22 14:58       ` Igor Mammedov
2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size Eric Auger
2019-02-22 15:28   ` Igor Mammedov
2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure Eric Auger
2019-02-22 15:36   ` Igor Mammedov
2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 16/17] hw/arm/boot: Expose the pmem nodes in the DT Eric Auger
2019-02-20 22:40 ` [Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options Eric Auger
2019-02-22 15:48   ` Igor Mammedov
2019-02-22 15:57     ` Auger Eric
2019-02-20 22:46 ` [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support Auger Eric
2019-02-22 16:27 ` Igor Mammedov
2019-02-22 17:35   ` Auger Eric
2019-02-25  9:42     ` Igor Mammedov
2019-02-25 10:13       ` Shameerali Kolothum Thodi
2019-02-26  8:40       ` Auger Eric
2019-02-26 13:11         ` Auger Eric
2019-02-26 16:56           ` Igor Mammedov
2019-02-26 17:53             ` Auger Eric
2019-02-27 10:10               ` Igor Mammedov
2019-02-27 10:27                 ` Auger Eric
2019-02-27 10:41                   ` Shameerali Kolothum Thodi
2019-02-27 17:51                     ` Igor Mammedov
2019-02-28  7:48                       ` Auger Eric
2019-02-28 14:05                         ` Igor Mammedov
2019-03-01 14:18                           ` Auger Eric
2019-03-01 16:33                             ` Igor Mammedov
2019-03-01 17:52                               ` Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.